Stat Insights: December 2009

Tuesday, December 15, 2009

SPC and ANOVA, What's the Connection?

The plot below shows 3 subgroups of size 8 for each of two different processes. For Process 1 the 3 subgroups look similar, while for Process 2 subgroup 2 has lower readings than subgroups 1 and 3.

Data from Dr. Donald J. Wheeler's SPC Workbook (1994).

Three Estimates of Standard Deviation

For each process, there are three ways we can obtain an estimate of the standard deviation of the population that generated this data. Method 1 consists of computing a global estimate the standard deviation using all the 8x3 = 24 observations. The standard deviation of Process 2 is almost twice as large the standard deviation of Process 1.

In Method 2 we first calculate the range of each of the 3 subgroups, compute the average of the 3 ranges, and then compute an estimate of standard deviation using Rbar/d2, where d2 is a correction factor that depends on the subgroup size. For subgroups of size 8 d2 = 2.847. This is the local estimate from an R chart that is used to compute the control limits for an Xbar chart.

Since for each process the 3 subgroups have the same ranges (5, 5, and 3), they have the same Rbar = 4.3333, giving the same estimate of standard deviation, 4.3333/2.847 = 1.5221.

Finally, for Method 3 we first compute the standard deviation of the 3 subgroup averages,

and then scale up the resulting standard deviation by the square root of the number of observations per subgroup, √8 = 2.8284. For Process 1 the estimate is given by 0.5774×√8 = 1.7322, while for Process 2, 3×√8 = 8.485.

The table below shows the Methods 1, 2, and 3 standard deviation estimates for Process 1 and 2. Readers familiar with ANalysis Of VAriance (ANOVA) will recognize Method 2 as the estimate based on the within sum-of-squares, while Method 3 is the estimate coming from the between sum-of-squares.

You can quickly see that for Process 1 all 3 estimates are similar in magnitude. This is a consequence of Process 1 being stable or in a state of statistical control. Process 2, on the other hand, is out-of-control and therefore the 3 estimates are quite different.

In SPC an R chart answers the question "Is the within subgroup variation consistent across subgroups?" While the XBar chart answers the question “Allowing for the amount of variation within subgroups, are there detectable differences between the subgroup averages?”. In an ANOVA the signal-to-noise ratio, F ratio, is a function of Method 3/Method 2, and signals are detected whenever the F ratio is statistically significant. As you can see there is a one-to-one correspondence between an XBar-R chart and the oneway ANOVA.

A process that is in a state of statistical control is a process with no signals from the ANOVA point of view.

In an upcoming post Brenda will talk about how we can use Method 1 and Method 2 to evaluate process stability.

Monday, December 7, 2009

JMP Summary Statistics Without The Statistics

One of may favorites, and most used, JMP commands is the Summary command within the Tables menu (Tables > Summary). The Summary command can generate several summary statistics (Mean, Std. Dev., Min, Max, etc.) for the continuous variables in your data table according to the different levels of grouping (classification) variables. But do you know that you can just use the Group variable list in the Summary dialog without requesting any summary statistics?

To illustrate, the Cars sample table, from the JMP Sample Library, contains 352 observations from trials in which stock automobiles are crashed into a wall at 35MPH with dummies in the driver and front passenger seats. The sample table also contains several classification variables including Make, Number of Doors, and Size.

I was curious to know how many different brands where used in the study. We can answer this question is by selecting Table > Summary, placing the variable Make in the Group area of the Summary dialog, and clicking OK.

The resulting table contains a list of the unique makes that were used in the study along with the number of observations belonging to each make. There were 37 different brands used in the study, with 42 Chevrolet cars and only 2 BMW. Another (very) nice feature is that the summary table is linked to the active data table, the source table, so clicking on 'Row 6: Make = Chevrolet' selects in the source table the corresponding 42 rows where Make = Chevrolet.

You can now select the Table > Subset command to create a subset table with only the Chevrolet observations. This is very handy if you have a table with thousands of observations and you need to create subset tables according to the levels of one classification variable, or the combinations of levels of classification variables.

What if you want to add summary statistics to one of these summary tables? No need to go back to the Table > Summary menu. Just click the contextual menu (red triangle) in the upper left-hand corner, the columns area, of the summary table and select Add Statistics Column. This brings up the Summary dialog for you to select the variable, or variables, and the summary statistics you want.

If you use pivot tables in excel to summarize your data I encourage to try the powerful data manipulation tools in the Tables menu of JMP, including the Tabulate platform, which is a fully drag-and-drop interface for creating summary tables.