|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
ILLUSTRATION
Contents:
Significance Tests: Let's look at testing a claim of the first type (see analysis of the types of claim in the explanation section). Imagine a call-centre, dealing with enquiries from customers who hold insurance policies with a particular financial institution. Suppose, the staff report that due to frequent computer system faults their work is held up for on average 100 minutes per month. As a general manager, what can you conclude from a study of 75 staff which shows an average downtime of 85 minutes, with a standard deviation of 45 minutes? The Null Hypothesis here is that there is no difference between the claim made by the call centre staff and the results of the survey. The staff's claim is that, although the average downtime in the sample is lower than their claim, the difference between that sample average and their claim is not significant. The Alternative Hypothesis is that there is a difference and that, therefore, the staff's claim is not valid. What is important now to establish is what the probability is that the results in the sample come from the same population as that in the staff's claim? What you have to do first is to calculate the standard error of the sample. You ought to remember that this tells you the standard deviation in the normal curve of the averages of all the samples you could take. The standard error is arrived at using the following formula: Standard Error = Standard Deviation of the sample divided by the square root of the number in the sample. In this example: Standard Error is 45 divided by the square root of 75 Which is 45 divided by 8.66 or 5.20 minutes. Now, the workers claim that the average downtime is 100 mins. The sample gave the result for the same measurement as 85 mins. The difference between the sample mean and the mean in the staff claim is 15 minutes. So how significant is this difference? The standard error of 5.20 is the standard deviation in the normal curve of sample means. So the difference of 15 minutes is equivalent to 15 divided by 5.20, or 2.88 standard errors. Put another way, the difference of 15 minutes has a z-score of 2.88 in the normal curve of sample means. A z-score of 2.88 lies beyond two standard deviations. And we know that approximately 95 per cent of all normally distributed results will occur within 2 standard deviations of the mean (i.e., for z-scores less than 2). What's more, we know that of the 5 per cent of the results which lie beyond 2 standard deviations, 2.5 will be above the mean and 2.5 will be below the mean. So, in this case, we know that the results of the sample show that the sample lies in an area with only a 2.5 per cent probability (p = 0.025). That is, the area to the right of the mean, more than 2 standard deviations away from the mean. This means, in effect, that there only a 2.5 percent chance (p = .025) that our sample comes from the same distribution as that described in the staff's claim. Therefore, we can say with approximately 95 percent certainty that the sample we took is not from the same population as that put forward in the staff's claim. So, finally we can say, that the Null Hypothesis is not valid and that the Alternative Hypothesis is accepted and the staff's claim is false. How many hours does it take (before you have) to change a light bulb? A manufacturer of light bulbs makes the claim that the average life of their product is 75 hours. We take a random sample of the product (50 light bulbs) and discover that the average life of this sample is 69 hours, with a standard deviation of 15 hours. Are we in a position to say that the manufacturer's claim is unfounded, or does the evidence of the sample test uphold the claim? The Null Hypothesis here is that the sample we have taken comes from the same population of light bulbs as those described in the manufacturer's claim. The Alternative Hypothesis is that the sample indicates that the population of bulbs that provided the sample is different from the one described and that therefore the manufacturer's claim is false. You calculate the standard error of the mean, which is 15 divided by the square root of 50, or 15 divided by 7.1 which equals 2.1 hours. So now we can look at the manufacturer's claim which is that the mean life of their light bulbs is 75 hours. How confident can we be that the sample result of 69 hours that we produced challenges that claim? To estimate that probability we need to find out the distance an average of 69 hours is from the mean time claimed by the manufacturer. That value is 6 hours below the dealer's stated mean, or 6 units to the left of the manufacturer's mean on the normal curve. What is the z-score for this -6 hours? We know that a z-score of -2 or lower (a score falling 2 or more standard deviations to the left of the mean) is illustrated on the normal curve by an area that cuts off 2.5 per cent of values from the rest of the curve. So, the mean of our sample falls in the tail end of the normal curve representing all the sample means, in an area representing frequency values of 2.5 per cent. To summarise, there is a 2.5 per cent probability (or p = .025) that the Null Hypothesis is correct and that the manufacturer's claim is true. There is a 97.5 per cent probability (p = .975) that the Alternative Hypothesis is correct and that the claim is false. It may help to put this statement another way: the risk of dismissing the Null Hypothesis is 2.5 per cent (p = 0.025). If we do dismiss the Null Hypothesis (and the manufacturer's claim), we have a .025 chance of being wrong. Is that a risk worth taking? That will depend on how certain we want to be before making the decision. In other words, we will need to set a level of significance. [Top] Level of Significance The level of risk we are willing to take, when accepting or rejecting the Null Hypothesis depends upon the level of significance we wish to set. This means deciding upon the size of the risk we are prepared to take or how stringent we want our test to be. Statisticians set arbitrary limits of .05 or .01, which means that a significance level of .05 for rejecting the Null Hypothesis is not as stringent as a significance level of .01. The level of significance will usually be set out in the specifications of the test. How you apply these levels of significance requires a little more explanation, given in some of the examples below. The level of significance we set shows how careful we are being in judging the Null Hypothesis. If the limit is .05, that means that if there is only a 5 per cent or less chance of being wrong, then we shall accept that as decisive. In such a case, our result from the light bulb example, which gave a probability of .025 that the Null Hypothesis is correct, is significant. At this level we can dismiss the manufacturer's claim, because .025 is less than .05. The significance level indicates the cut off point, below which we should not accept a claim with such a low probability. We might, though, have set a level of .01 probability as our significance point. This means we will not accept a claim unless its probability is 1 percent or lower. Put another way, we won't accept a claim until we are 99 per cent or more certain. In that case, the result of our test of the light bulbs (p = .025) is above our confidence level and we would not reject the claim. Let's go over this point again. The analysis of the sample reveals that there is only a .025 probability that the population from which our sample was drawn is the same population as that described in the manufacturer's claim. If we are confident enough at a probability of .95 (or 95 per cent) then we will accept this result as significant and assert that our sample establishes that we are .95 confident that the manufacturer's claim is false. If, however, the level of significance is set at .01, then we will reject the Alternative Hypothesis and accept the Null Hypothesis, (the manufacturer's claim). This is because our result of .025 is not below the level of significance we have set. Remember that if we decide that we are not going to take the risk of dismissing the manufacturer's claim because we are not sufficiently certain, we have, in effect, accepted the Null Hypothesis and rejected the Alternative Hypothesis. But that does not mean that we have clearly "proved" the dealer's claim. After all, we have established that the probability that the dealer's claim is right is only .025, or 25 cases out of 1000. Being careful, we would accept the dealer's claim but reserve judgment on the case in general. [Top] One Tail and Two Tail Tests of Significance Once we know the z-score for the difference between the sample mean and the mean established in the claim, we know whether or not to reject the Null Hypothesis. A z-score of between -2 and +2 tells us that the sample mean lies within 95 per cent of the overall population of sample means. A z-score between -3 and +3 gives us approximately 99 per cent certainty. Put another way, a z-score outside the range between -2 and +2 tells us that there is a less than .05 per cent probability that this result indicates that the sample mean comes from the general population. And a z-score beyond the range -3 to +3 indicates less than a .01 probability (or 1 chance in 100) that the sample mean comes from the general population. Elsewhere in TimeWeb, we have stressed that these z-score figures are approximations. The accurate figures are as follows: For a .05 level of significance, the exact z-score which marks the cut-off point is 1.64 (that is, the range is from -1.64 to +1.64). For a .01 level of significance the exact z-scores which mark the range are -2.33 and +2.33. Any result outside these ranges, selected by us to decide the level of significance we set, will require us to reject the Null Hypothesis. Z-scores for other levels of significance can be read off the z-score table.
There is help available on using a z-score table that you may want to have a look at. Let's look at what these exact z-scores mean: A z-score of +2.33 marks the point separating 99 per cent of the distribution from the 1 per cent in the tail of the curve to the right of that score (above the mean). A z-score of -2.33 similarly marks the point separating 99 per cent of the distribution from the 1 per cent in the tail of the curve to the left of that score (below the mean). A z-score of +1.64 indicates the point in the normal curve separating 95 per cent of the distribution from the 5 per cent at the extreme right-hand end. These particular z-scores (1.64 and 2.33) are useful when we are interested only in whether our sample mean is different, (larger or smaller), in one direction from the mean in the claim. For example, in investigating the manufacturer's claim, we were not interested in whether, on average, the light bulbs last longer than claimed. We wanted to know whether the light bulbs failed to live up to specifications. In terms of the normal curve, we were interested only in one half of the distribution of the sample means, the lower half. We wanted to know, with 95 per cent certainty, whether that claim was true or not. To get such 95 per cent certainty, we needed to locate the line which separates the area representing 95 per cent of the normal curve from the lower 5 per cent. That line is given by the z-score of 1.64. If we wanted 99 per cent certainty, we need the z-score which indicates the line separating the lowest 1 per cent from the rest of the distribution; that z-score is -2.33. Similarly, in dealing with the call centre staff's claim about work downtime, the manager is not interested in whether or not the system breakdowns involve more downtime than they claim. The manager wants to find the point where there is 95 per cent certainty that the mean of the sample is less than than the mean established in the claim. Once again, the relevant point for 95 per cent certainty is given by a z-score of 1.64. Such tests, in which we are interested only in one direction in the curve, are called one-tail tests. In them, the Alternative Hypothesis will involve a statement with the phrase "is less than" or "is more than," but not both. In some tests, though, we are concerned with whether the sample mean is above or below the mean in the claim. So we are interested in both ends of the distribution, those above and those below the mean in the claim. Such tests are called two-tail tests. They involve an Alternative Hypothesis with a phrase "is greater or less than" or "is different from." In such a test, the z-scores which establish the significance limits are, for .05 probability, The mathematical procedures for analyzing one-tail and two-tail tests are the same. The difference comes in the particular z-scores we use to establish different levels of significance. [Top] Type I and Type II Errors In any test of such claims like the light bulbs or the staff downtime examples we've looked at, there is usually a risk that we may be rejecting a true hypothesis; that is, the probability that we may be wrong is greater than 0. Rejecting a Null Hypothesis when it is true is called a Type I Error. On the other hand, we may decline the risk and accept the Null Hypothesis when it is, in fact, false. This is called a Type II Error. The level of significance we set in deciding whether to accept or reject the Null Hypothesis depends upon which of these two errors we most wish to avoid. If there are very serious consequences in a Type I error, then we should seek to minimize the risk, by setting the level of significance at .01. Such a stringent level means that we are more likely to accept the Null Hypothesis than we are at the .05 level, since the relevant z-score will have to be more than 3 rather than more than 2 (or, using the exact figures, more than 2.58 rather than more than 1.96). But the less risk we are willing to take (in order to minimize Type I errors) the more we are likely to fall into Type II errors. Setting very strict limits for rejecting the Null Hypothesis will increase the chances that we are accepting one that is false. This point about Type I and Type II errors is a reminder that the sorts of statistical tests we are applying do not "prove" anything once and for all. The tests are, in effect, a technical device to determine whether a specific claim (a hypothesis) meets a given standard (a level of certainty). There will always be some risk, however slight, that the conclusion we draw from a statistical test of a particular hypothesis is wrong. So, by showing that a hypothesis has passed a particular statistical test, we do not prove the hypothesis beyond all doubt. It does, however, tell researchers that there may very well be something in the claim. By the same logic, rejecting a hypothesis does not disprove it for all time. Statistical analysis can only indicate that the claim has failed to meet a given level of certainty. This point also brings out how, by apparently manipulating statistics, one can seem both to "prove" and to "disprove" a particular claim in a single test. You should be able to see that at the .05 level of significance you could reject the Null Hypothesis and accept the Alternative Hypothesis, while at the same time at a .01 level of significance you may have to reserve judgement and accept the Null Hypothesis. In other words, interpreting the conclusions of such a statistical analysis requires us to know the confidence level at which the claim is made and to be very careful about accepting statistical results without such knowledge. Why not have a look at the review worksheet on significance tests to test your understanding. [Top] |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||