Stat Insights: February 2010

Monday, February 22, 2010

Old Dog, New Tricks

The statistics profession has gotten some good hype over the past year. In the summer the New York Times published "For Todays Graduate, Just One Word: Statistics". In this article, they discuss "the new breed of statisticians. . ." which ". . . use powerful computers and sophisticated mathematical models to hunt for meaningful patterns and insights in vast troves of data." Some of these statisticians can earn a whopping 6 figure salary in their first year after graduating and they get to analyze data from areas which include ". . . sensor signals, surveillance tapes, social network chatter, public records and more." And, I have to agree with the chief economist at Google, Hal Varian, the job does sound kind of "sexy".

As a 20 year veteran supporting engineers and scientists in the Industrial sector, I feel a little left behind when I read such articles or peruse the job openings section of journals and see the type of statisticians being recruited, month after month, and year after year. It is interesting to see how statistics, and statisticians, have adapted to the changing world around us. During the technology boom in the 1980's, the industrial statistician was king (or queen). I consider myself privileged to have actually worked in the Semiconductor industry during its boom, where the need to look for patterns and signals in vast amounts of data were, and still are, common place. If you were pursuing a statistics degree in the past decade, you would be foolish not to consider the specialization of Biostatistics. With the explosion of direct-to-consumer marketing of drugs, drug companies needed these types of statisticians to design and analyze clinical trials to determine the efficacy and safety of the drugs. As we look to the past 5 years or so, we see a new hybrid of statistician, one that combines statistics, mathematics, and computer science to better deal with digital data in a variety of areas, such as finance, web traffic, and marketing.

As I already mentioned, it is hard not to want to "jump ship" to be part of the latest exciting surge of statistics to come along. That is, until I get a dose of reality which brings me back to center. I guess troves of data also require droves of statisticians and data analysts. If you go on to read the New York Times article mentioned above, you will see that these new super statisticians may work in a group with 250 other data analyst, all hoping for that big break through mathematical/statistical algorithm that will better predict consumer behavior or web traffic patterns.

I should consider myself lucky that I actually get to interact with the engineers and scientist that run the experiments that I have designed and take action on the outcomes of the analysis that I presented. Unfortunately, the days where the industrial statistician reined supreme are long past. But luckily, there is still enough manufacturing in the United States to keep the few of us who remain busy and, even though I'm an old dog, I can still learn some new tricks!

Tuesday, February 2, 2010

What Kind of Trouble Are You In?

Well, I guess that depends on what you have done and more importantly if you have gotten caught! Many of you are probably wondering what this has to do with statistics. In his book, The Six Sigma Practitioner's Guide to Data Analysis (2005), Wheeler aptly describes the nature of "trouble" as it relates to the stability (predictability) and capability (product conformance) of manufacturing processes. Using these two dimensions, a process can be in one of four states:

1. No Trouble: Conforming product (capable) & predictable process (stable)
2. Process Trouble: Conforming product (capable) & unpredictable process (unstable)
3. Product Trouble: Nonconforming product (incapable) & predictable process (stable)
4. Double Trouble: Nonconforming product (incapable) & unpredictable process (unstable).

Hopefully, most of you have experienced a process that is in the 'No Trouble' zone, which is also referred to as the 'ideal state'. The focus of these processes should be on maintaining and sustaining a state of stability and capability. A process which is in 'Process Trouble' is unstable, but producing product which is within specification limits most of the time. That is, measurements may be out of the control limits but within specification limits. Unless this type of process requires heroic feats, by operators and engineers, to make conforming product, its instability most likely goes undetected because we have not yet gotten caught by a yield bust. Moving on, a process that is in 'Product Trouble' can be thought of as a process that is predictably "bad". In other words, because it is stable the process is doing the best that it can in in currents state, but its best performance results in a consistent amount of nonconforming product. While nonconforming product is undesirable, if the losses are consistent, the job of planning and logistics will be much easier. Finally, the 'Double Trouble' process is both unstable and incapable, which can result in a big headache for the business and for those supporting it!

In order to determine the state of your process, you will first need to determine the key output attributes and measurements and then assess their process stability and capability. Recall from my last post, "Is a Control Chart Enough to Evaluate Process Stability", the stability of a process can be determined using a control chart and looking for nonrandom patterns or trends in the data and unusual points that plot outside of the control limits. In addition, the SR ratio can be added to provide a more objective assessment of the process stability, with a stable process producing an SR ratio close to 1 and an unstable process resulting in an SR ratio > 1. The control chart below is the Tensile strength data presented in my post, "Why Are My Control Limits So Narrow?", with the control limits adjusted for the large batch-to-batch variation. There are no points out of control for this process and the SR ratio = 2.27² / 2.245² = 1.02, indicating a stable process parameter.

How do we assess the capability of a process? This can be done by evaluating the process capability index, C_pk, and determining if it meets our stated goal of "at least 1". For the Tensile data, what if our specification limits for any individual measurement are LSL =45 and USL = 61 and note the target value = 53. From the output below, we see that C_pk = 0.832 with a 95% confidence interval of (0.713, 0.951). Since the upper bound of our confidence interval is less than 1, we have not shown that we can meet our goal of "at least 1". Therefore, by this definition, we would assess this process parameter as incapable.

Based upon results from the process stability assessment (stable) and process capability assessment (incapable) this would put this process parameter in the "Product Trouble" zone. In other words, our process is predictably "bad" and makes out-of-spec product on a regular basis. If we recenter the average Tensile Strength closer to the target value of 53, then we can achieve a C_pk = 1.137, as is shown by C_p in the output above and possible achieve the "ideal state" or "no trouble" zone.

Periodically conducting these assessments to understand the state of your processes is advisable. There is no reason to wait until you are in "double trouble" to pay attention to the health of your processes, because in this context, one of two scenarios has probably occurred. Either your customer was the recipient of bad product and informed you of the problem or, you discovered a rash of bad product through an unexpected yield bust at final inspection. Yes, you got caught! In either event, working through these types of process upsets is draining to the business and potentially dissatisfying to the customer. Remember, ignoring an unstable and incapable process, will eventually catch up with you.