Tuesday, November 23, 2010

Visualizing Data with Bubble Plots

Bubble plots are a great way of displaying 3 or more variables using a X-Y scatter plot, and are a useful diagnostic tool for detecting outliers and influential observations in both logistic regression (Hosmer and Lemeshow used them in their 1989 book Applied Logistic Regression), and in multiple linear regression (What If Einstein Had JMP). New technologies have made it possible to animate the bubbles according to a given variable, such as time, as it was masterfully demonstrated by Hans Rosling in his talk, New Insights on Poverty, at the March 2007 TED conference.

Today, Nathan Yau posted an entry in his data visualization blog, FlowingData, on How to Make Bubble Charts using R. He gives 5 steps (6 if you count step 0), and the corresponding R code, to create a static bubble plot that shows the 2008 US burglary rate vs murder rate for each of the 50 states, with red bubbles representing the state population.

(From http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/5-edited-version-2/)

Let me show you how easy it is to create static and dynamic bubble plots in JMP. The 2008 crime rate data is available at http://datasets.flowingdata.com/crimeRatesByState2008.csv, and can be conveniently read into JMP version 9 using File>Internet Open, as shown below.

To create a bubble plot we select Graph>Bubble Plot to bring up the Bubble Plot launch panel. Here we select Burglary as the Y, Murder as the X, and Population as the Sizes. The bubbles in Nathan's plot are red and are labeled with the state name. In order to color and label the bubbles we select State for both ID and Coloring. These selections are shown below

Once you click OK our multicolor bubble plot appears (I have modified the axis to match Nathan's plot). We quickly see that Louisiana and Maryland have the highest murder rates, and similar population sizes, and that North Carolina has the highest burglary rate.

In Step 3 Nathan shows how to size the bubbles by making the radius a function of the area of the bubble. Below the X-axis in JMP's bubble plot there is a slider to dynamically control the size of the bubble. You can just move the slider to the right for larger bubbles, or to the left to decrease their size. Very easy; no code required.

The static bubble plot above is a snapshot of the crime rates in 2008. What if we want to visualize how the burglary and murder rates changed over the years? In JMP, a time variable can be used to animate the bubbles. We use the same Bubble Plot selections as before but now we add Year as the Time variable.

Several stories now emerge from this dynamic plot. Around 1976 Nevada starts to move away from the rest of the states, with both a high burglary and murder rates, reaching a maximum around 1980, and returning to California and Florida levels by 1984. Around 1989 the murder rate in Louisiana starts to increase reaching 20 per 100,000 by 1993, staying between 15 and 20 per 100,00 all the way up to 1997, with a fairly constant burglary rate. We can also see that the crime rates for North Dakota are consistently low, and that by 1999 all the states seem to form a more cohesive group moving towards the lower left corner.

Bubble plots can be animated using other variables, not necessarily a time one. I have used the dynamic bubble plot to show how the relationship between a material degradation vs. time, changes from linear to nonlinear as a function of temperature. In the video below you can see that as the temperature increases from 9°C to 50°C the material degrades faster, and that for higher temperatures, 40°C and 50°C, the degradation is nonlinear. This is a nice visual that helps convey the message without the need to show the model equations.

With JMP's static and dynamic bubble plots you can easily display up to 6 variables (seven using ID2) in the 2-dimensional space of a scatter plot. What an efficient way of visualizing data!

No comments:

Post a Comment