Showing posts with label Statistical Thinking. Show all posts
Showing posts with label Statistical Thinking. Show all posts

Monday, May 10, 2010

Statistics Is the New Grammar

That is how Clive Thompson ends his article Do You Speak Statistics? in the May issue of Wired magazine. I like this sentence, it makes you think of statistics in a different way. Grammar has to do with language, with syntax, or "the principles or rules of an art, science, or technique" (Merriam-Webster). Statistics being the "New Grammar" implies a language that we may not totally understand, or even be aware of: a new way of looking at, and interpreting the world around us. Through several examples Thompson makes the point that statistics is crucial for public life.

…our inability to grasp statistics — and the mother of it all, probability — makes us believe stupid things. (Clive Thompson)

Back in the late 80s Prof. John A. Paulos wrote a book Innumeracy: Mathematical Illiteracy and Its Consequences, using a term, innumeracy, coined by Prof. Douglas R Hofstadter to denote a "person's inability to make sense of numbers" (Thompson quotes Prof. Allen but it does not mention innumeracy). In my post Lack of Statistical Reasoning I wrote about Prof. Pinker's observation that lack of statistical reasoning is the most important scientific concept that lay people fail to understand. When someone asks me what I do for a living I tell them that I help people "make sense of data", and when I collaborate with, and teach, engineers and scientists I help them realize what may seem obvious:

variation exists in everything we do;
understanding and reducing variation is key for success.

Thompson argues that "thinking statistically is tricky". Perhaps, but Statistical Thinking starts with the realization that, as Six-Sigma practitioners know full well, variation is everywhere.

A lot of controversy has been generated after the hacking of emails of the Climatic Research Unit (CRU) at the University of East Anglia. The university set up a Scientific Appraisal Panel to "assess the integrity of the research published by the Climatic Research Unit in the light of various external assertions". The first conclusion of the report is "no evidence of any deliberate scientific malpractice in any of the work of the Climatic Research Unit". From the statistics point of view is very interesting to read the second conclusion:

We cannot help remarking that it is very surprising that research in an area that depends so heavily on statistical methods has not been carried out in close collaboration with professional statisticians.

The panel's opinion is that the work of the scientists doing climate research "is fundamentally statistical", echoing Thompson's argument.

A few years ago Gregory F. Treverton, Director, RAND Center for Global Risk and Security, wrote an interesting article, Risks and Riddles, where he made a wonderful distinction between puzzles and mysteries: "Puzzles can be solved; they have answers", "A mystery cannot be answered; it can only be framed". But the connection he made between puzzles and mysteries to information is compelling,

Puzzle-solving is frustrated by a lack of information.
By contrast, mysteries often grow out of too much information. (Gregory F. Treverton)

There is so much information these days that another Wired magazine writer, Chris Anderson, calls it the Petabyte Age. A petabyte is a lot of data, a petabyte is a quadrillion bytes (1015), or the equivalent to about 13 years of HD-TV video. Google handles so much information that in 2008 was processing over 20 petabytes of data per day. In this data deluge, how do we know what to keep (signal) and what to throw away (noise)? That is why statistics is the new grammar.




Friday, November 20, 2009

Lack of Statistical Reasoning

In Sunday Book Review's Up Front: Steven Pinker section of the New York Times, it was interesting to read about Malcom Gladwell's comment on "getting a master's degree in statistics" in order "to break into journalism today". This has been a great year for statistics considering Google's chief economist, Hal Varian, comment earlier this year: “I keep saying that the sexy job in the next 10 years will be statisticians”, and the Wall Street Journal's The Best and Worst Jobs survey which has Mathematician as number 1, and Statistician as number 3.

What really caught my attention in Sunday's Up Front was Prof. Steven Pinker's, who wrote the review on Gladwell's new book "What the Dog Saw", remark when asked "what is the most important scientific concept that lay people fail to understand". He said: “Statistical reasoning. A difficulty in grasping probability underlies fallacies from medical quackery and stock-market scams to misinterpreting sex differences and the theory of evolution.”

I agree with him but I believe that is not only lay people that lack statistical reasoning, but as scientists and engineers we sometimes forget about Statistical Thinking. Statistical Thinking is a philosophy of learning and action that recognizes that:

  • All work occurs in a system of interconnected processes,
  • Variation exists in all processes, and
  • Understanding and reducing variation is key for success

Globalization and a focus on environmental issues is helping us to "think globally", or look at systems rather than individual processes. When it comes to realizing that variation exists in everything we do, we lose sight of it as if we were in a "physics lab where there is no friction". We may believe that if we do things in "exactly" the same way, we'll get the same result. Process engineers know first hand that doing things "exactly" the same way is a challenge because of variation in raw materials, equipment, methods, operators, environmental conditions, etc. They understand the need for operating "on target with minimum variation". Understanding and minimizing variation bring about consistency, more "elbow room" to move within specifications, and makes it possible to achieve six sigma levels of quality.

This understanding of variation is key in other disciplines as well. I am waiting for the day when financial reports do not just compare a given metric with the previous year, but utilize process behavior (control) charts to show the distribution of the metric over time, giving us a picture of its trends, of its variation, helping us not to confuse the signals with the noise.