Monday, May 10, 2010

Statistics Is the New Grammar

That is how Clive Thompson ends his article Do You Speak Statistics? in the May issue of Wired magazine. I like this sentence, it makes you think of statistics in a different way. Grammar has to do with language, with syntax, or "the principles or rules of an art, science, or technique" (Merriam-Webster). Statistics being the "New Grammar" implies a language that we may not totally understand, or even be aware of: a new way of looking at, and interpreting the world around us. Through several examples Thompson makes the point that statistics is crucial for public life.

…our inability to grasp statistics — and the mother of it all, probability — makes us believe stupid things. (Clive Thompson)

Back in the late 80s Prof. John A. Paulos wrote a book Innumeracy: Mathematical Illiteracy and Its Consequences, using a term, innumeracy, coined by Prof. Douglas R Hofstadter to denote a "person's inability to make sense of numbers" (Thompson quotes Prof. Allen but it does not mention innumeracy). In my post Lack of Statistical Reasoning I wrote about Prof. Pinker's observation that lack of statistical reasoning is the most important scientific concept that lay people fail to understand. When someone asks me what I do for a living I tell them that I help people "make sense of data", and when I collaborate with, and teach, engineers and scientists I help them realize what may seem obvious:

variation exists in everything we do;
understanding and reducing variation is key for success.

Thompson argues that "thinking statistically is tricky". Perhaps, but Statistical Thinking starts with the realization that, as Six-Sigma practitioners know full well, variation is everywhere.

A lot of controversy has been generated after the hacking of emails of the Climatic Research Unit (CRU) at the University of East Anglia. The university set up a Scientific Appraisal Panel to "assess the integrity of the research published by the Climatic Research Unit in the light of various external assertions". The first conclusion of the report is "no evidence of any deliberate scientific malpractice in any of the work of the Climatic Research Unit". From the statistics point of view is very interesting to read the second conclusion:

We cannot help remarking that it is very surprising that research in an area that depends so heavily on statistical methods has not been carried out in close collaboration with professional statisticians.

The panel's opinion is that the work of the scientists doing climate research "is fundamentally statistical", echoing Thompson's argument.

A few years ago Gregory F. Treverton, Director, RAND Center for Global Risk and Security, wrote an interesting article, Risks and Riddles, where he made a wonderful distinction between puzzles and mysteries: "Puzzles can be solved; they have answers", "A mystery cannot be answered; it can only be framed". But the connection he made between puzzles and mysteries to information is compelling,

Puzzle-solving is frustrated by a lack of information.
By contrast, mysteries often grow out of too much information. (Gregory F. Treverton)

There is so much information these days that another Wired magazine writer, Chris Anderson, calls it the Petabyte Age. A petabyte is a lot of data, a petabyte is a quadrillion bytes (1015), or the equivalent to about 13 years of HD-TV video. Google handles so much information that in 2008 was processing over 20 petabytes of data per day. In this data deluge, how do we know what to keep (signal) and what to throw away (noise)? That is why statistics is the new grammar.




No comments:

Post a Comment