Sunday, September 27, 2009

Data - Model = Noise

A model is just a recipe for transforming data into noise.

We are used to thinkng of a statistical model as a representation of our data that can be used for describing its behavior, or to predict future values. We fit a statistical model with the hope that it does a good job at extracting the signals in our data. In other words, the goodness of a statistical model can be evaluated by how well it does at leaving behind "just noise".

How good is the model at transforming our data into noise? After the model is fit, the Residuals = Data - Model should behave like white noise, or have no predominant signals left in them. Graphical residual analysis provides a way for us to verify our assumptions about the model, and to make sure that no predominant signals are left in the residuals. They allow us to evaluate the model's lack-of-fit.

In my next post I will show a calibration curve study in which residuals plots helped discover an unaccounted signal even though the R-Square was almost 100%.

No comments:

Post a Comment