Worked Examples of Sample Means and Degrees of Freedom [TimeWeb]

Illustration ILLUSTRATION
Sampling and Statistics

Contents:


 

Degrees of Freedom example

There is an explanation available of degrees of freedom if you are not sure.

If a random sample of 16 light bulbs produced in a larger batch is selected and the mean of the sample is 1450 hours and the estimated SD is 80 hours, estimate the population mean at the 95% confidence level.

SE of the sample means = 80 / (Square Root 16)
= 20 hours

Number of degrees of freedom = 16 - 1
= 15

The t statistic (read from the t distribution tables) at a 95% level and with 15 degrees of freedom = 2.13
So the population mean = 2.13 x 20 Approximately Equal 43 hours
So, we can be 95% confident that m (population mean) lies in within the range

1450 +/- 43 = 1407 to 1493 hours.

[Top]


 

Example of a collection of sample means (s-means)

Assume that we can properly identify a sample from a large population that we are interested in studying, by using random, quota or stratified sampling techniques outlined earlier.

We are interested in collecting a representative sample of a large population: for instance, numbers of people in the workforce who are aged under eighteen. Let's say we want to find out how many hours per week this group works on average.

Imagine that we sample a group of thirty people under the age of eighteen who are in some form of paid work. We have a group of numbers that represent the number of hours worked in a week by each of the thirty people in our sample. We can then calculate the mean of this sample, either by adding up all the values and dividing by the total number in the sample, or by entering the values into an Excel worksheet and getting the calculation done that way.

Now, suppose that in our desire to produce as representative a sample as possible within the time and cost contraints of our project, we continue to draw samples of thirty people under the age of eighteen. We use the same random process as with our first sample and make sure that we do not include in the samples anyone who was part of the earlier samples.

What we have produced is a collection of sample means, one for each of the samples we have drawn from the population. These are quite likely to be fairly close to each other in value, but there will be some differences. In other words, the collection of means from the various samples taken will have a frequency distribution (with a mean value, a median, a variation and a standard deviation).

Let's suppose that the following table represents the ten samples and their means:

Sample 1: S-Mean = 6.25 hours
Sample 2: S-Mean = 6.50 hours
Sample 3: S-Mean = 6.00 hours
Sample 4: S-Mean = 7.75 hours
Sample 5: S-Mean = 4.50 hours
Sample 6: S-Mean = 8.00 hours
Sample 7: S-Mean = 3.50 hours
Sample 8: S-Mean = 9.25 hours
Sample 9: S-Mean = 4.75 hours
Sample 10:S-Mean = 6.50 hours

Remember that each of these S-Means is the average for a sample of thirty under- eighteens who carry out some form of paid work. The list of numbers will have mean value and a standard deviation. Can you place these into an Excel worksheet to calculate these values?

Excel screenshot

The mean value in this case is 6.3 hours and the standard deviation is 1.74 hours. Check you could get the same result by using a spreadsheet package.

Of course, if we kept collecting samples like the ten in this example, eventually we would have sampled the entire population (as long as we made sure that no two under-eighteens were in more than one sample). The average of all of our samples would then be the average for the whole population, because all of our samples were the same as the whole population.

In practice, we don't have the time or the money to conduct such a huge sampling task and in most cases, we don't have to.

There is a worksheet available on 'what samples tell us'

  • Standard Errors and Increasing Sample Size

    As we have seen, we can take a single sample of more than 30 items and make conclusions about the large population from which it is drawn.

    As we find out more about the standard error, we can notice other interesting details that should aid our understanding of statistics in practice.

    Firstly, the size of the confidence interval depends on the size of the standard error. So, if we can minimise the standard error, we can reduce the range of values in each confidence level - thus producing more precise conclusions.

    This is because we calculate the standard error from the sample by taking the standard deviation of the sample and dividing it by the square root of the number of observations in the sample. Increasing the number in the sample may only have a small effect on reducing the size of the standard error.

    You may ask yourself how much you would have to increase the sample size by in order to have any significant impact. The answer is that increasing the sample size will indeed narrow the range of results, but that the sample size has to be increased so dramatically that the cost and time taken would make it unworkable.

    This can be illustrated by the following example: If we were studying a sample of 100 students and their exam performance and if the standard deviation of the list of results was, say, 14, then we could calculate the standard error by dividing the standard deviation by the square root of the number in the sample. So, 14 divided by the square root of 100, or 14 divided by 10 = 1.4.

    This means that in estimating the confidence intervals for the entire population of students, we use the figure of 1.4 marks as the basis of the intervals to calculate the ranges for .68, .95 and .99 probability.

    We might think that we ought to try to reduce the range in order to get a more precise result. We could increase the sample size, in order to increase the size of its square root and therefore reduce the size of the standard error.

    But, because we are dealing with the square root of the number in the sample, we find that to have any significant impact on the standard error, we would have to increase the sample size considerably.

    So in the example given above, we were studying a sample of 100 students, and found the result for a standard error of 1.4 by dividing the standard deviation of the sample (14) by the square root of 100 (10). If we wanted to reduce the standard error by one half, we would have to divide 14 by 20. In order to do this we would have to sample 400 students, as the square root of 400 is 20.

    Do you see the relationship here between sample size and size of the standard error? In order to halve the standard error, we have to increase the sample size by four times its original scope.

    What we should learn from this is that in many cases it is not worth the effort of increasing the sample size in order to achieve more precise results. If you bear in mind that the really time consuming part of the analysis is the selection of the sample information, then you can see that it is usually more efficient to keep the sample relatively small (as long as it is over 30 items) and to focus our efforts on gathering the best sample we can. This means, of course, ensuring that our sample is as free as possible from bias.

  • Review of confidence interval analysis of a population from a single sample.

    It may be wise here, to review the steps we should take in making generalisations within confidence levels abour an entire population from a single sample:

    1. Firstly, we select a sampling strategy, which usually means a random sample, and select our sample, making sure that we have at least 30 observations within it.

    2. Then we collect the information from the sample and process it, (using Excel or similar spreadsheet package), in order to find out the mean and the standard error of the sample.

    3. Finally, we make conclusions at the different confidence intervals: 68% for a range within plus or minus 1 standard error of the mean of the sample; 95% for a range within plus or minus 2 standard errors of the mean of the sample; and 99% for a range within plus or minus 3 standard errors of the mean of the sample.

[Top]