|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
ILLUSTRATION
Z Charts example The following table contains sales data for a sole trader who runs a domestic service operation.
The table is based on monthly sales figures, given in columns 2 and 3. The 1998 figures are included to help us to calculate the Moving Annual Total (M.A.T.). The cumulative totals for 1999 (column 4) are calculated by adding the current month's sales figure to the previous month's total. So, in January there is only one figure to be entered. This is the start of the year and the total sales at the end of the first month stood at £1990. At the end of February 1999, £2260 sales were made. This figure is added to the January total to give the cumulative total of (£1990 + £2260 =) £4250. The other figures in this column are derived in the same way. The Moving Annual Totals (MAT) are found by totalling the monthly sales figures over a twelve month period. As time proceeds and a new month's sales data are known this figure is added to the MAT and the previous year's corresponding month's figure is eliminated. So, the MAT at the end of December 1998 is the total of the monthly sales from the start of January 1998 to the end of December 1998 which come to £18 220. The MAT at the end of January 1999 is the total of the monthly sales from the start of February 1998 to the end of January 1999. You can get this figure by subtracting the January 1998 sales figure from £18 220 and adding the January 1999 figure (18 220 - 1680 + 1990 = 18 530.) Notice that the cumulative total and the MAT for December 1999 must come to the same figure. The data in columns 3, 4 and 5 of the table can now be built into a graph, to enable the fluctuations to be more easily noticed ![]() Top Cross-section data example UK Labour Market Participation of Ethnic Groups (1996) The UK workforce comprised the following ethnic minority proportions:
Using the table giving data for Labour Market Participation rates of Ethnic Minority Groups, we can calculate that Black Caribbean workers in 1996 accounted for 19.4 % of the total ethnic minority workforce in the UK. Extension to percentages in cross-sectional data material If we had incomplete information, for instance if the total number of people in all categories was unknown, we could still find this out. As long as we knew the total number of Black Caribbean workers in the UK and what percentage of the total ethnic minority working population this comprised. Make the total ethnic minority working population in UK = x 19.4x = 320 000 This is the same as: 0.194x = 320 000 0.194x = 320 000 x = 320 000 = 1 649 484 x = 1 650 000 (rounded up to the nearest ten thousand) TopRatios: A Worked Example Alice, Ben and Charlie enter into a partnership. Profits from the partnership are to be divided amongst the partners in ratio to their capital. In a year when the profit amounts to £30 000, how much does each receive? Ratio of Alice's, Ben's and Charlie's capital = There are 12 parts in total Alice will receive 5/12 . 30 000 = £12 500 (Confirm answer by adding these figures: 12 500 + 7 500 + 10 000 = 30 000) TopExponents: A Worked Example Remember that we are illustrating exponents (see the explanation of exponents if you are not sure) by using a, m and n, where m = 4 and n = 2. am . an = am + n a4 . a2 = (a.a.a.a) (a.a) = a.a.a.a.a.a = a6 = a4 + 2 TopSequence of Operations: A Worked Example 4 x 42 - (2 + 1 x 22)2 / 3 = Central Tendency: A Worked Example Table 1 contains information on the levels of unemployment in the UK between 1992 and 1996.
Table 1: ILO unemployment rate 1992 (Q2) - 1996 (Q2) : UK: All: Aged 16 and over: %: SA Taking this data as our source, the modal value or the most frequently occurring number can be observed best by ordering the data, as follows:
Table 2: 1992 - 1996 unemployment rates (%) placed in value order. You can see that there are actually 5 modal values: 8.3, 8.7, 10.0, 10.3, and 10.4. This does not convey much useful information to us as students of labour market patterns in the UK. The median value is also best viewed from the ordered data. You can see that the middle value of the ordered data is 9.7 as there are eight observations above and below this point. The mean is probably the best central measure because there are no outlying observations (see digging/meaning/explanation). X-bar (the mean unemployment rate in the UK 1992 - 1996) = 161.2/17 Mean = 9.5 (correct to 2 s.f.) TopWeighted Average: A Worked Example The Weighted Average was described in the 'Explanation' section (see the explanation of indices if you are not sure). It is a useful calculation of the average when you have data which is grouped, such as in the following example:
Table 3: Cars per household, %, 1996, UK. In Table 3, above, the percentage of households owning 0 to 4 cars is shown. To find the average number of cars per household, we multiply each possible number of cars by the percentage in that category (the frequency). Column 3 shows this calculation, where Xi is the number of cars and fi is the percentage in that category. The average is the sum of this column, divided by the total frequency (100 as the data is in percentages). 131/100 = 1.31 The formula for the weighted average is as follows: X-bar = Indices: A Worked Example An index is a means of comparing something that is numerically measurable, over time to quantify the changes that have occurred. Indices use a base year as the point of comparison. If we consider the performance of a company's share price over a year, we could find that the price dipped at the midpoint of the year before rallying at the year end. This might be expressed as follows: Index of share price for Company X (Jan = 100)
To show how easy it is to change the base with indexed data, let's re-base this company's share performance to December.
Now, in order to work out the value of January's and June's indexed performance, we perform the following calculation: ?? = 100/123 x 100 = 81.3 ??? = 84/123 x 100 = 68.3 So, the new indexed share price table appears as follows:
Why not try the worksheet on index numbers to check you understand this? TopMeasures of Dispersion: Worked Examples To illustrate the concepts of the range, mean absolute deviation, and the standard error we will continue to use the data in Table 1 above. The range as we saw in the explanation section is simply the difference between the lowest and highest observation. The range of the data in Table 1 is 8.3 - 10.6 and the difference between the two readings, that is the difference between the highest and lowest unemployment rates in the UK between 1992 and 1996, is 2.6. In order to calculate the mean absolute deviation (MAD), we must construct another table from which we will be able to read off this statistic. Table 4: Unemployment Rates in the UK
Source: National Statistics In Table 4 above, column 3 gives the difference between the observed value and the mean. Note that we have already calculated the mean earlier. Column 4 states the values when the sign is ignored. These "absolute" values are summed at the foot of this column. This figure is divided by the number of observations, which in this case is 17: MAD = 12.3/17 = 0.7 (correct to 2 s.f.) To interpret this statistic we consider its size. A larger value implies a larger dispersion. Standard deviation The standard deviation can be calculated from the data in Table 3 as well. Column 5 gives the squared value of the difference between the observed value and the mean. These values are summed at the foot of the column. This is divided by the number of observations (17) to give the variance, and the square root of this sum is calculated to give the standard deviation. s2 = 11.54 / 17 = 0.68 Again, the larger the value, the larger the deviation. The standard deviation carries the same units of measurement as the original data, so here the standard deviation = 0.82% What does this statistic tell us? This is a hard question to answer. On its own, the standard deviation is useful only in that it gives us a means of comparison with other standard deviations, in the knowledge that any larger deviations within the data are going to be represented. Elsewhere in TimeWeb we will be using standard deviations for more advanced statistical work, but for now we must merely say that s.d. is the most frequently used measure of dispersion and is used when more formal statistical testing is required. TopCoefficient of Variation: A Worked Example As you can see in the 'Explanation' part of this section, the coefficient of variation is a summary measure used to give an indication of the amount of variability present in the data. It is calculated by expressing the standard deviation as a percentage of the mean. Using the above example: Coefficient of Variation = 0.82 / 9.5 x 100 This statistic would be used to compare against other datasets and a judgement would be made of the amount of variability in one dataset as against another. TopCoefficient of Skewness: As discussed in the 'Explanation' section, this summary statistic indicates the tendency for values in a dataset to bunch at one end of a distribution. Using the second formula given and applying it to the unemployment data given above, we can calculate that: Coefficient of skewness within the data for UK unemployment 1992 - 96 = 3(9.5 - 9.7) / 0.82 Moving Averages: A Worked Example As we saw in the 'Explanation' section, a moving average allows us to "smooth" out a series of data so that the underlying movement over time can be seen. Using the new car registrations data contained in Table 5 below, we can illustrate how to construct a moving average for this highly seasonal data. Table 5: UK New Car Registrations 1994 (Q4) - 1999 (Q3)
Source: National Statistics The first decision is a matter of judgement: over what period to calculate the averaging process? The longer the length of the average, the smoother the series becomes, as each individual piece of information becomes less significant on its own. But the drawback with calculating the averaging over a long period is that changes in the underlying trend are not picked up quickly. With this new car data, we want to remove the seasonality from the data whilst still seeing the trend year-on-year. An average of this data every fourth period should enable us to achieve this. Having decide this, the first step is to average the first four observations. This gives us the moving average for the mid-point of these four observations. In this example this is: (400.6 + 613.9 + 511.7 + 748.3) / 4 = 558.6 The next step is to move the average along the dataset, by dropping the first observation and including the next period's data, then averaging over the four observations again: (613.9 + 511.7 + 748.3 + 432.7) / 4 = 576.6 These calculations are shown in Table 6 below:
Why not try the worksheet on moving averages to see how well you understand this? TopSampling Methods and Survey Types: One of the world's best-known polling organisations, Gallup, say that one of the most frequently asked questions they get from Americans is why they've never been interviewed for a survey. In an adult population of almost two hundred million, Americans express scepticism about the scientific reliability of sampling. In particular, they do not believe that a survey of 1500 - 2000 people can represent the views of all citizens. Gallup's sampling principle is that selecting a sample of a small proportion of the whole population can represent the opinions of all the people, provided that the sample is properly selected.
Sampling: Further examples
Correlation between variables Let's start by looking at how a scatter diagram can illustrate these relationships:
Normal Distribution Curve illustration The chart below illustrates a normally distributed population. You will notice that the curve conforms to the characteristics outlined in the explanation section: the most frequent value is at the centre; there is symmetry about the central value; there is diminishing frequency as you move away from the centre. A line is drawn from each of the two points of inflexion (one on either side of the mean) to the X-axis. The distance from that point to the mean point on the X-axis is equal to the standard deviation. Four separate areas are now identifiable from the chart:
Area A shows the area between the mean and one standard deviation above the mean. Area B shows the area between the mean and one standard deviation below the mean. Area C indicates the area to the right of one standard deviation above the mean. Area D indicates the area to the left of one standard deviation below the mean. Because the normal curve is symmetrical, Area A equals Area B. Areas C and D are also equal. The total of A, B, C and D equals the total area under the curve, or the entire population. Mathematical calculations show that in any normal distribution, approximately 68% of all observations fall within one standard deviation (SD) of the mean (Areas A plus B). So, about 34% of observations lie between the mean and one standard deviation above the mean (Area A) and 34% lie between the mean and one standard deviation below the mean (Area B). By subtraction, we can tell that in a normal distribution 32% of the observations fall outside one standard deviation, 16% on either side (16% in Area C and 16% in Area D). Let's now put this into the language of probability: In any normal distribution, there is a .68 probability that a particular value will fall within one standard deviation of the mean; there is approximately a .34 probability that a value will lie between the mean and one SD above the mean (Area A) and a .34 probability that a value will lie between the mean and one SD below the mean (Area B). Also, there is a .16 probability that a particular value will lie above one SD from the mean (Area C) and a .16 probability that the value will lie below one SD from the mean (Area D). Using this knowledge, we can re-draw our normal curve chart, now putting in six separate areas:
The vertical lines from the curve to the X-axis represent the mean (at the centre) and distances of one and two SDs on either side of the mean. Areas A and B have the same characteristics as in the first chart; each being equal and each containing approximately 34% of all the values in the normal distribution. Areas C and D are also equal and are defined by the vertical lines indicating one and two SDs from the mean (on either side). Each of these areas contain approximately 13.5% of all the values in the normal distribution. Areas E and F at the extreme ends of the curve are defined by the vertical line indicating three SDs from the mean and the tail ends of the distribution. Each of these areas contain 2.5% of all the values . In other words, in a normal distribution, 5% of a population will be beyond two SDs: 2.5% above the mean and 2.5% below. Let's restate this information in the language of probability:
Why not try the what samples tell us worksheet to see that you understand this? TopRandom Sampling: Random sampling is usually the preferred method of sampling, because of the lack of built-in bias that is involved. This method requires that a list of every member of the population is available. There are times when this will be impossible, for instance when an entire national or regional population is involved, or for example if you are studying the whole population of small businesses in the UK. In these cases, the simple random sampling method outlined below will not be appropriate. In a simple random sample, with a list of the entire population being studied, the sampler gives a number to every item on the list and selects the sample by using a random number generator or a table of random numbers.
Imagine you want to study all the cars being stored in a warehousing complex, but you don't have the time or other resources to deal with them all. You might decide to work with a sample of 30 cars out of a total warehouse population of 1000. So, you begin by assigning a number to every member of the total population. As the largest number you need (1000) has four digits, every car in the warehouse is given a four digit number, beginning with 0001, 0002, 0003 and so on, up to 1000. You look at your list of random numbers, which looks like the following:
You begin the selection by pointing (with your eyes closed) to an area in the table. Imagine you point to line 10 (the lines are numbered down the left-hand side of the table). The first possible four digit number between 0001 and 1000 is 0177. Notice that as the table contains five digit numbers, it's acceptable to start by taking the fifth digit of the first number in line 10. The second four digit number is 0568. You would continue down the table, gathering four digit numbers until you had collected thirty numbers between 0001 and 1000. Each of these would represent one car in the warehouse, chosen at random to form a sample of thirty cars. There is less bias in this selection method because every member of the population has an equal chance of being selected, and represented in the sample. You have made no attempt to organise the population into sections, so the selection process is free from your direction. TopProbability Jaques Bernoulli was the first to suggest what is known as the 'central limit theorem' which is based on his work on probability. Imagine that you have a container that holds thousands of pebbles; you don't know how many there are, neither do you know that of the 5000 pebbles, 3000 of them are white and 2000 black. The ratio of white to black pebbles is therefore 3:2. Bernoulli asked how many pebbles you would draw from the container before you could make an estimate of the actual ratio of white to black pebbles. Of course you would begin to get a fairly clear idea pretty soon, as you picked out a pebble, noted its colour and then replaced it in the container. But the key to the limit theorem is whether or not you can repeat the experiment over and over until it's ten, or one hundred times more probable that the 3:2 ratio exists. Bernoulli states that this is the case; the more experiments are carried out, the more likely it is that the estimated ratio will get close to the true ratio. TopTime series To identify trends in time series data, other than drawing a trend curve onto a graph freehand, there are two common measures used:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||