Sunday, January 24, 2010

I Am Confident That I Am 95% Confused!

You recently completed an analysis of DC resistance data that shows that the distribution is centered around 49.94 Ohm with a standard deviation of 1.96 Ohm. The JMP output also includes a 95% confidence interval for the mean DC resistance equal to [49.31;50.57]. This is good news because you can now report to your customer that there is 95% chance that average DC resistance is between 49.31 and 50.57 Ohm, and therefore you are meeting the specified target of 50 Ohm.

Before you talk to the customer you decide to check with your local statistician, to be sure about the claim you are going to make. She tells you that we cannot really say that there is a 95% chance the mean DC resistance is between 49.31 and 50.57 Ohm. You see, she says, this is a long run type of statement in the sense that if you were to construct many of such intervals, on average, 95 out a 100 will contain the true DC resistance mean. You leave her office totally confused because in your mind these two statements sound the same.

Imagine yourself sitting at a poker table. Depending on your imagination you can be sitting at a table at the Bellagio's Poker Room in Vegas, or at your friend's house on a Thursday night. For a 53-card deck, before the draw, you have about a 42% chance of being dealt a hand with "one pair". However, once the hand is dealt you either have "one pair" or you don't. In other words, the frequency of "one pair" showing up in a hand is about 42%. Frequency here means that if you play poker on a regular basis then, on average, in 100 games played you expect to get a hand with "one pair" in about 42 of those games.

The same is true for a confidence interval. Before you generate a 95% confidence interval for the mean, there is 95% chance that the interval will contain the true mean value, but once the interval is generated, [49.31; 50.57] for example, the true mean value is either in the interval or outside it. And the fact is that we really don't know if the true mean value is in the interval because we don't know what the true mean value is! It is just like getting a poker hand without being able to turn the cards to see if you got the "one pair". All we have is the confidence that on average 95% of the intervals will in fact contain the true mean value. The confidence is a statement, not on a given interval, but on the procedure that is used to generate the interval.

Simulations can helps us visualize and understand the meaning of statistical confidence. The video below shows a simulation that generates one hundred 95% confidence intervals for the mean. In the simulation we mimic the DC resistance data in the histogram above by using a sample size of 40 observations, from a population with true mean=50 and true standard deviation=2. For a 95% confidence interval for the mean we expect, on average, that 95% of the intervals will contain the true value of 50. In the simulation those intervals that do not contain 50 are colored red. You can see that for each new sample in the simulation sometimes 93% of the intervals contain 50, other times 97%, 95%, or 98%, but on average 95% of then do contain the true population mean of 50.

I hope this helps dispel some of the confusion regarding the meaning of statistical confidence. You can find more details about the meaning of statistical confidence and statistical intervals in Chapter 2 of our book, or in the white paper Statistical Intervals: Confidence, Prediction, Enclosure.

Tuesday, January 12, 2010

Today Was a Good "Webinar" Day

We did it! Brenda and I, sponsored and with the help of the SAS Press team, gave our first webinar today. The SAS Press team said we reached the 50% attendance rate which, according to webinar attendance rate statistics, is pretty good. I must confess that it took a little bit of getting used to. When I give talks or teach I'm always in front of a live audience, and I pay close attention to the participants and their body language for cues as to how my delivery is going, if they are understanding the material, or if they need a break. In a webinar you pretty much talk to your computer screen without any visual or audio feedback from the audience.

For the webinar we used a semiconductor industry example involving the qualification of a temperature-controlled vertical furnace used for thin film deposition on wafers. The goal of the qualification was to show that the average thickness of the silicon dioxide layer, a key fitness-for-use parameter, meets the target value of 90 Angstrom, and to predict how much product will be outside the 90 ± 3 Å specifications.

We walked the participants through a 7-Step Method that includes clearly stating the questions or uncertainties to be answered, translating those into statistical hypothesis that can be tested with data, and the different aspects of data collection, analysis, interpretations of results, and recommendations (more details in Chapter 4 of our book). We featured JMP's Distribution and Control Chart platforms, as well as the Formula Editor to predict the expected yield loss using a normal distribution. Several interesting questions were raised by the participants including what is the meaning of confidence level, what is a good Cpk value, how do we predict yield loss with respect to specifications, and the value of changing the specifications rather than centering the process. Great topics for future posts!

Today was a good day. We had the opportunity to deliver a well attended webinar and, to top it all off, the SAS Press team told us that our book, Analyzing and Interpreting Continuous Data Using JMP: A Step-by-Step Guide, just won the 2009-2010 Society for Technical Communications Distinguished award. For this, we are thankful to the judges, our readers, and the JMP and SAS Press teams. We are also very grateful to those of you who were in attendance today for giving us the chance to try this out.

Monday, January 4, 2010

Is a Control Chart Enough to Evaluate Process Stability?

A control or process behavior chart is commonly used to determine if the output for a process is in a "state of statistical control", i.e., it is stable or predictable. A fun exercise is to generate random noise, plot it on a control chart and then ask users to interpret what they see. The range of answers is as diverse as asking someone to interpret the meaning behind a surrealist painting by Salvador Dalí. As a case in point, take a look at the control chart below and determine if the output of this process is stable or not.

I suppose a few of you would recognize this as white noise, while others may see some interesting patterns. What about those 2 points that are close to the control limits? Is there more variation in the first half of the series than the second half? Is there a shift in the process mean in the second half of the series? Is there a cycle?

How can we take some of the subjectivity out of interpreting control charts? Western Electric rules are often recommended for assessing process stability. Certainly, this is more reliable than merely eyeballing it ourselves, we humans tend to see patterns when there are none, and they can provide us with important insights about our data. For instance, the same data is shown below with 4 runs tests turned on. We see that we have two violations in runs tests. Test 2 detects a shift in the process mean by looking for at least 8 points in a row falling on the same side of the center line; while Test 5 flags when at least 2 out of 3 successive points fall on the same side, and more than 2 sigma units away from the center line (Zone A or beyond). Does this mean our process output is unstable?

Remember, this data represents random noise. Some of you may be surprised that there are any violations in runs rules, but these are what we call 'false alarms'. Yes, even random data will occasionally violate runs rules with some expected frequency. False alarms add to the complexity of identifying truly unstable processes. Once again, how can we take some of the subjectivity out of interpreting control charts?

Method 1 and Method 2 to the rescue! In José's last post, he described 3 ways for computing the standard deviation. Recall, Method 1 uses all of the data to calculate a global estimate of the standard deviation using the formula for the sample standard deviation. Method 2, however, uses a local estimate of variation by averaging the subgroup ranges, or in this case, moving ranges, and dividing the overall range average by the scaling factor d2. When the process is stable, these two estimates will be close in value, and the ratio of their squared values (SR ratio) will be close to 1. If our process is unstable, then the standard deviation estimate from Method 1 will most likely be larger than than the estimate from Method 2, and the ratio of their squared values will be greater than 1.

For the random data in the control chart shown above, the SR ratio = 1.672/1.622 = 1.06, which is close to 1, suggesting a stable process or in a state of statistical control. As a counterpoint, lets calculate the SR ratio for the control chart shown in my last post, which is reproduced below. The SR ratio = 2.352/0.442 = 28.52, which is way bigger than 1. This suggests an unstable process; however, in this case, it is due to the inappropriate control limits for this data.

The SR ratio is a very useful statistic to complement the visual assessment of the stability of a process. It also provides a consistent metric for classifying a process as stable or unstable and, in conjunction with the Cpk, can be used to assess the health of a process (more in a future post). For the two examples shown, it was easy to interpret the SR ratios of 1.06 and 28.52, which represent the two extremes of stability and instability. But what happens if we obtained an SR ratio of 1.5 or 2, is it close to 1 or not? For these situations, we need to obtain the p-value for the SR ratio and determine if it is statistically significant at a given significance level. To learn more about the SR ratio and other stability assessment criteria, see the paper I co-authored with Professor George Runger, Quantitive Assessment to Evaluate Process Stability.