Stat Insights: ±3

Tuesday, November 3, 2009

Practical Significance Always Wins Out

Engineers and scientists are the most pragmatic people that I know when it comes to analyzing and extracting key information with the statistical tools they have at hand. It is this level of pragmatism that often leads me to recommend equivalence tests for comparing one population mean to a standard value k, in place of the more common test of significance. Think about how a Student's t-test plays out in an analysis to test the hypothesis, Null: μ = 50 ohm vs. Alternative: μ ≠ 50 ohm. If we reject the null hypothesis in favor of the alternative then we say that we have a statistically significant result. Once this is established, the next question is how far is the mean off from the target value of 50? In some cases, this difference is small, say 0.05 ohm, and is of no practical consequence.

The other possible outcome for this test of significance is that we do not reject the null hypothesis and, although we can never prove that μ = 50 ohm, we sometimes behave like we did and assume that the mean is no different from our standard value of 50. The natural question that arises is usually, "can I say that the average resistance = 50 ohm?" to which I reply "not really".

My secret weapon to combining statistical and practical significance in one fell swoop is to use an Equivalence Test. Equivalence tests allow us to prove that our mean is equivalent to a standard value within a stated bound. For instance, we can prove that the average DC resistance of a cable is 50 ohm within ± 0.25 ohm. This is accomplished by using two one-sided t-tests (TOST) on either side of the boundary conditions and we must simultaneously reject both sets of hypothesis to conclude equivalence. These two sets of hypotheses are:

a) H0: μ ≤ 49.75 vs. H1: μ > 49.75 and
b) H0: μ ≥ 50.25 vs. H1: μ < 50.25.

The equivalence test output for this scenario is shown below. Notice that, at the 5% level of significance, both p-values for the 2 one-sided t-tests are not statistically significant and therefore, we have NOT shown that our mean is 50 ± 0.25 ohm. But why not? The test-retest error for our measurement device is 0.2 ohm, which is close to the equivalence bound of 0.25 ohm. As a general rule, the equivalence bound should be larger than the test-retest error.

Let's look at one more example using this data to show that our mean is equivalent to 50 ohm within ± 0.6 ohm. We have chosen our equivalence bound to be 3 times the measurement error of 0.2 ohm. The JMP output below now shows that, at the 5% level of significance, both p-values from the 2 one-sided t-tests are statistically significant. Therefore, we have shown equivalence of the average resistance to the stated bounds of 49.4 and 50.6 ohm and therefore, equivalent to 50 ohm performance.

To learn more about comparing average performance to a standard, and one-sample equivalence tests, see Chapter 4 of out book, Analyzing and Interpreting Continuous Data Using JMP: A Step-by-Step Guide.

Sunday, October 4, 2009

3 Is The Magic Number

I'm sure that I am about to date myself here, but who remembers Schoolhouse Rock in the 1970's? One of my favorite songs was 'Three is a Magic Number', which Jack Johnson later adapted in his song '3R's' from the Curious George soundtrack. I wonder if Bob Dorough was thinking about statistics when he came up with that song. Certainly, 3 is a number that seems to have some significance in a couple of important areas related to engineering. For instance, in Statistical Process Control (SPC), upper and lower control limits are typically 3 standard deviations on either side of the center line. And when fitness-for-use information is unknown, some may set specification limits for key attributes of a product, component, or raw material, based upon process capability, using the formula mean ±3×(standard deviation).

For a normal distribution we expect 99.73% of the population to be between ±3x(standard deviation). In fact, for many distributions most of the population is contained between ±3x(standard deviation), hence the "magic" of the number 3. For control charts, using 3 as the multiplier, was well justified by Walter Shewhart because it provides a good balance between chasing down false alarms and missing signals due to assignable causes. However, when it comes to setting specification limits, the value 3 in the formula mean ±3×(standard deviation) may not contain 99.73% of the population unless the sample size is very large.

Using "3" to set specification limits assumes that we know, without error, the true population mean and standard deviation. In practice, we almost never know the true population parameters and we must estimate them from a random and, usually small, representative sample. Luckily for us, there is a statistical interval called a tolerance interval that takes into account the uncertainty of the estimates of the mean and standard deviation and the sample size, and is well suited for setting specification limits. The interval has the form mean ±k×(standard deviation), with k being a function of the confidence, the sample size, and the proportion of the population we want the interval to contain (99.73% for an equivalent ±3×(standard deviation) interval).

Consider an example using 40 resistance measurements taken from 40 cables. The JMP output for a tolerance interval that contains 99.73% of the population, indicates that with 95% confidence, we expect 99.73% of the resistance measurements to be between 42.60 Ohm and 57.28 Ohm. These values should be used to set our lower and upper specification limits, instead of mean ±3×(standard deviation).

To learn more about tolerance intervals see Statistical Intervals.