Sunday, September 27, 2009

Data - Model = Noise

A model is just a recipe for transforming data into noise.

We are used to thinkng of a statistical model as a representation of our data that can be used for describing its behavior, or to predict future values. We fit a statistical model with the hope that it does a good job at extracting the signals in our data. In other words, the goodness of a statistical model can be evaluated by how well it does at leaving behind "just noise".

How good is the model at transforming our data into noise? After the model is fit, the Residuals = Data - Model should behave like white noise, or have no predominant signals left in them. Graphical residual analysis provides a way for us to verify our assumptions about the model, and to make sure that no predominant signals are left in the residuals. They allow us to evaluate the model's lack-of-fit.

In my next post I will show a calibration curve study in which residuals plots helped discover an unaccounted signal even though the R-Square was almost 100%.

Wednesday, September 23, 2009

Statistical Driven Insights

In the past 50 years statistics has contributed to many insights and developments in engineering and science, and in this era of massive volumes of data, it continues to play a significant role. It is unfortunate that most students in these fields are either not exposed to statistical methods early on in their careers, or they are put off by the belief that statistics is confusing, and irrelevant to the practical problems they encounter.

In this blog we hope to share our reflections, lessons learned, and JMP tricks for how to use statistics in engineering and science. We will do this in a way that makes statistics more palatable, even exciting, and we will show how statistics can help spark "aha" moments that lead to new hypotheses and discoveries. We will do our best to stay away from examples that are not relevant, and concentrate on cases that are found in a variety of industries, such as, automotive, chemical, semiconductor, or pharmaceutical, to name a few.

This is an exciting time for us. The launch of this blog also coincides with the publication of our book: Analyzing and Interpreting Continuous Data Using JMP: A Step-b-Step Guide. This book is based on the knowledge we have gained over our many years collaborating with, and learning from, engineers and scientists. In an upcoming post we will share more details about this book.

As Prof. George E.P. Box, FRS, one of the most original statistical minds of all times, so aptly put it: "Discovering the unexpected is more important than confirming the known". We strongly believe that statistics is a catalyst for "discovering the unexpected", and for generating knowledge within the framework of the scientific method. Thanks for taking the time to embark on this ‘Statistical Insights’ journey with us.

Brenda and José