Stat Insights: Equivalence

Tuesday, November 3, 2009

Practical Significance Always Wins Out

Engineers and scientists are the most pragmatic people that I know when it comes to analyzing and extracting key information with the statistical tools they have at hand. It is this level of pragmatism that often leads me to recommend equivalence tests for comparing one population mean to a standard value k, in place of the more common test of significance. Think about how a Student's t-test plays out in an analysis to test the hypothesis, Null: μ = 50 ohm vs. Alternative: μ ≠ 50 ohm. If we reject the null hypothesis in favor of the alternative then we say that we have a statistically significant result. Once this is established, the next question is how far is the mean off from the target value of 50? In some cases, this difference is small, say 0.05 ohm, and is of no practical consequence.

The other possible outcome for this test of significance is that we do not reject the null hypothesis and, although we can never prove that μ = 50 ohm, we sometimes behave like we did and assume that the mean is no different from our standard value of 50. The natural question that arises is usually, "can I say that the average resistance = 50 ohm?" to which I reply "not really".

My secret weapon to combining statistical and practical significance in one fell swoop is to use an Equivalence Test. Equivalence tests allow us to prove that our mean is equivalent to a standard value within a stated bound. For instance, we can prove that the average DC resistance of a cable is 50 ohm within ± 0.25 ohm. This is accomplished by using two one-sided t-tests (TOST) on either side of the boundary conditions and we must simultaneously reject both sets of hypothesis to conclude equivalence. These two sets of hypotheses are:

a) H0: μ ≤ 49.75 vs. H1: μ > 49.75 and
b) H0: μ ≥ 50.25 vs. H1: μ < 50.25.

The equivalence test output for this scenario is shown below. Notice that, at the 5% level of significance, both p-values for the 2 one-sided t-tests are not statistically significant and therefore, we have NOT shown that our mean is 50 ± 0.25 ohm. But why not? The test-retest error for our measurement device is 0.2 ohm, which is close to the equivalence bound of 0.25 ohm. As a general rule, the equivalence bound should be larger than the test-retest error.

Let's look at one more example using this data to show that our mean is equivalent to 50 ohm within ± 0.6 ohm. We have chosen our equivalence bound to be 3 times the measurement error of 0.2 ohm. The JMP output below now shows that, at the 5% level of significance, both p-values from the 2 one-sided t-tests are statistically significant. Therefore, we have shown equivalence of the average resistance to the stated bounds of 49.4 and 50.6 ohm and therefore, equivalent to 50 ohm performance.

To learn more about comparing average performance to a standard, and one-sample equivalence tests, see Chapter 4 of out book, Analyzing and Interpreting Continuous Data Using JMP: A Step-by-Step Guide.

Sunday, October 25, 2009

Different or Equivalent?

When we show that the results of our study are "statistically significant" we feel that the study was worth the effort, that we have met our objectives. This is because the current meaning of the word "significant" implies that something is important or consequential but, unfortunately, that was not its intended meaning. (See John Cook's blog "The Endeavor" for a nice post on the Origin of “statistically significant”).

Let's say we need to to make a claim about the average DC resistance of a certain type of cable we manufacture. We set up the null hypothesis as μ=50 Ohm vs. the alternative hypothesis μ≠50 Ohm, and measure the resistance of 40 such cables. If the one-sample t-test, based on the sample of 40 cables, is statistically significant we can claim that the average DC resistance is different from 50 Ohm. Our claim does not imply that this difference is of any practical importance, this depends on the size of the difference, just that the average DC resistance is not 50 Ohm. A test of significance is a test of difference. This is the operational definition given to the term "statistical significance" by Sir Ronald Fisher in his 1925 book Statistical Methods for Research Workers: “Critical tests of this kind may be called tests of significance, and when such tests are available we may discover whether a second sample is or is not significantly different from the first" (emphasis mine).

What if we do not reject the null hypothesis μ=50 Ohm? Although tests of significance are set up to demonstrate difference not equality, we sometimes take this lack of evidence as evidence that the average DC resistance is in fact 50 Ohm. This is because in practice we encounter situations where we need to demonstrate to a customer, or government agency, that the average DC resistance is "close" to 50 Ohm. In the context of significance testing what we need to do is to swap the null and alternative hypothesis and test for equivalence within a given bound; i.e., test μ≠50 Ohm vs. |μ-50 Ohm|< δ, where δ is a small number. In the next post Brenda discusses how a test of equivalence is a great way of combining statistical with practical significance.