Monday, November 29, 2010

Visualizing Change with Bubble Plots

In my last post I showed some of the features of JMP's Bubble Plot platform using the US crime rate data (How to Make Bubble Charts) to create a static bubble plot, and the 1973 to 1999 US crime rate data to generate a dynamic bubble plot. Describing the dynamic bubble plot i wrote

Around 1976 Nevada starts to move away from the rest of the states, with both a high burglary and murder rates, reaching a maximum around 1980, and returning to California and Florida levels by 1984. Around 1989 the murder rate in Louisiana starts to increase reaching 20 per 100,000 by 1993, staying between 15 and 20 per 100,00 all the way up to 1997, with a fairly constant burglary rate. We can also see that the crime rates for North Dakota are consistently low

You can see these stories unfold in the animation, but after it is over our brains tend to forget the path a bubble took; the sequence of steps that led to its final position. Fortunately for us, the Bubble Plot platform has an option, Trail Lines, that can help our brains visualize motion. This option can be accessed from the bubble plot contextual menu:


Let's select the bubbles for Nevada, Louisiana, and North Dakota. If you run the animation a trail follows the motion of each of these bubbles. By the end of the sequence, 1999, the plot shows the paths taken by these 3 states.


Now we can clearly see Nevada, the green trail, shooting up to the upper right (high burglary and murder rates), and then coming back. Note how Louisiana (blue line) moves horizontally to the right (higher murder rate), without changing too much in the vertical direction (burglary rate). North Dakota's path (yellow line) is a short zigzag motion, keeping itself around a burglary rate of 435 per 100,000, and a murder rate of 1.18 per 100,000.

In Visualizing Change, data visualization expert Stephen Few discusses four meaningful characteristics of change through time: magnitude, shape, velocity and direction. These four characteristics are easier to visualize by using Trail Bubbles in addition to Trail Lines. The plot below shows the Trail Lines and Trail Bubbles for Louisiana, Nevada, and North Dakota. To help the eye, I've added labels for the starting year of 1973.


The magnitude of change can be assessed by looking at the difference between bubble locations. For Nevada, between 1973 and 1980, you can see big changes in the burglary rate, from about 2000 to 3000 per 100,000, and the murder rate, from 12 to 20 per 100,000. By 1999 Nevada's burglary rate have been cut in half to 1000 per 100,000. The shape of change is given by the overall shape of the bubbles, while the direction and velocity of change can be visualized by the trend of the trails and the rate at which a bubble moves from one place to the next. For Nevada, the shape of change is somewhat concave, with rapid changes (big jumps from one bubble to the next), trending upward and downward in the 45° diagonal.

Louisiana's burglary rate did not change much (vertical changes), but its murder rate went up to 20 per 100,000, ending at 10 per 1000,000, lower than where it started (horizontal changes). The changes did not seem to occur rapidly, because the distance between the bubbles is small. As we saw before, not a lot of changes in North Dakota. Its shape is a circle with a small radius; i.e., neither big, nor rapid changes in either murder or burglary rates (the last bubble is almost where started).

A JMP bubble plot, with line and bubbles trails, can really change the way you visualize change. Go ahead, give it a try.


Tuesday, November 23, 2010

Visualizing Data with Bubble Plots

Bubble plots are a great way of displaying 3 or more variables using a X-Y scatter plot, and are a useful diagnostic tool for detecting outliers and influential observations in both logistic regression (Hosmer and Lemeshow used them in their 1989 book Applied Logistic Regression), and in multiple linear regression (What If Einstein Had JMP). New technologies have made it possible to animate the bubbles according to a given variable, such as time, as it was masterfully demonstrated by Hans Rosling in his talk, New Insights on Poverty, at the March 2007 TED conference.

Today, Nathan Yau posted an entry in his data visualization blog, FlowingData, on How to Make Bubble Charts using R. He gives 5 steps (6 if you count step 0), and the corresponding R code, to create a static bubble plot that shows the 2008 US burglary rate vs murder rate for each of the 50 states, with red bubbles representing the state population.


(From http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/5-edited-version-2/)

Let me show you how easy it is to create static and dynamic bubble plots in JMP. The 2008 crime rate data is available at http://datasets.flowingdata.com/crimeRatesByState2008.csv, and can be conveniently read into JMP version 9 using File>Internet Open, as shown below.


To create a bubble plot we select Graph>Bubble Plot to bring up the Bubble Plot launch panel. Here we select Burglary as the Y, Murder as the X, and Population as the Sizes. The bubbles in Nathan's plot are red and are labeled with the state name. In order to color and label the bubbles we select State for both ID and Coloring. These selections are shown below


Once you click OK our multicolor bubble plot appears (I have modified the axis to match Nathan's plot). We quickly see that Louisiana and Maryland have the highest murder rates, and similar population sizes, and that North Carolina has the highest burglary rate.


In Step 3 Nathan shows how to size the bubbles by making the radius a function of the area of the bubble. Below the X-axis in JMP's bubble plot there is a slider to dynamically control the size of the bubble. You can just move the slider to the right for larger bubbles, or to the left to decrease their size. Very easy; no code required.

The static bubble plot above is a snapshot of the crime rates in 2008. What if we want to visualize how the burglary and murder rates changed over the years? In JMP, a time variable can be used to animate the bubbles. We use the same Bubble Plot selections as before but now we add Year as the Time variable.


Several stories now emerge from this dynamic plot. Around 1976 Nevada starts to move away from the rest of the states, with both a high burglary and murder rates, reaching a maximum around 1980, and returning to California and Florida levels by 1984. Around 1989 the murder rate in Louisiana starts to increase reaching 20 per 100,000 by 1993, staying between 15 and 20 per 100,00 all the way up to 1997, with a fairly constant burglary rate. We can also see that the crime rates for North Dakota are consistently low, and that by 1999 all the states seem to form a more cohesive group moving towards the lower left corner.

Bubble plots can be animated using other variables, not necessarily a time one. I have used the dynamic bubble plot to show how the relationship between a material degradation vs. time, changes from linear to nonlinear as a function of temperature. In the video below you can see that as the temperature increases from 9°C to 50°C the material degrades faster, and that for higher temperatures, 40°C and 50°C, the degradation is nonlinear. This is a nice visual that helps convey the message without the need to show the model equations.


With JMP's static and dynamic bubble plots you can easily display up to 6 variables (seven using ID2) in the 2-dimensional space of a scatter plot. What an efficient way of visualizing data!