Sunday, January 24, 2010

I Am Confident That I Am 95% Confused!

You recently completed an analysis of DC resistance data that shows that the distribution is centered around 49.94 Ohm with a standard deviation of 1.96 Ohm. The JMP output also includes a 95% confidence interval for the mean DC resistance equal to [49.31;50.57]. This is good news because you can now report to your customer that there is 95% chance that average DC resistance is between 49.31 and 50.57 Ohm, and therefore you are meeting the specified target of 50 Ohm.

Before you talk to the customer you decide to check with your local statistician, to be sure about the claim you are going to make. She tells you that we cannot really say that there is a 95% chance the mean DC resistance is between 49.31 and 50.57 Ohm. You see, she says, this is a long run type of statement in the sense that if you were to construct many of such intervals, on average, 95 out a 100 will contain the true DC resistance mean. You leave her office totally confused because in your mind these two statements sound the same.

Imagine yourself sitting at a poker table. Depending on your imagination you can be sitting at a table at the Bellagio's Poker Room in Vegas, or at your friend's house on a Thursday night. For a 53-card deck, before the draw, you have about a 42% chance of being dealt a hand with "one pair". However, once the hand is dealt you either have "one pair" or you don't. In other words, the frequency of "one pair" showing up in a hand is about 42%. Frequency here means that if you play poker on a regular basis then, on average, in 100 games played you expect to get a hand with "one pair" in about 42 of those games.

The same is true for a confidence interval. Before you generate a 95% confidence interval for the mean, there is 95% chance that the interval will contain the true mean value, but once the interval is generated, [49.31; 50.57] for example, the true mean value is either in the interval or outside it. And the fact is that we really don't know if the true mean value is in the interval because we don't know what the true mean value is! It is just like getting a poker hand without being able to turn the cards to see if you got the "one pair". All we have is the confidence that on average 95% of the intervals will in fact contain the true mean value. The confidence is a statement, not on a given interval, but on the procedure that is used to generate the interval.

Simulations can helps us visualize and understand the meaning of statistical confidence. The video below shows a simulation that generates one hundred 95% confidence intervals for the mean. In the simulation we mimic the DC resistance data in the histogram above by using a sample size of 40 observations, from a population with true mean=50 and true standard deviation=2. For a 95% confidence interval for the mean we expect, on average, that 95% of the intervals will contain the true value of 50. In the simulation those intervals that do not contain 50 are colored red. You can see that for each new sample in the simulation sometimes 93% of the intervals contain 50, other times 97%, 95%, or 98%, but on average 95% of then do contain the true population mean of 50.

I hope this helps dispel some of the confusion regarding the meaning of statistical confidence. You can find more details about the meaning of statistical confidence and statistical intervals in Chapter 2 of our book, or in the white paper Statistical Intervals: Confidence, Prediction, Enclosure.

No comments:

Post a Comment