Yesterday John Gruber linked to an infographic, “Unboxing the iPad Data” by John Kumahara and Johnathan Bonnell. In terms of graphic design it’s visually pleasing, but it falls short in a few areas and highlights common challenges in designing infographics. These are problems that occur all the time in visualizations, so let’s see what we can learn from this example.
I know it’s easy to misinterpret criticism, so I want to emphasize that these are comments about this particular graphic, not about the authors’ skills or ability.
People understand numbers. So when you are evaluating a visualization one of the most important questions is whether the graphic is better than just the numbers. The most successful visualizations show trends, outliers, and insights that a table of numbers wouldn’t.
In some places “Unboxing the iPad Data” clearly shows the numbers of interest. In others, it obscures them for the sake of design.
The fact that Apple sold 300,000 devices and 1,000,000 applications in the first weekend is a big deal—so these should be big numbers. Instead you have to read the fine print to see that 1 actually means 1 million.
Equally large are numbers that few people care about, like the number of respondents or the specifics of the Likert scale used.
When the numbers speak for themselves, rely on them without decoration. Clearly show what is important.
Certain aspects of color are a subjective matter. One designer thinks this shade of red is the right choice; another thinks it’s ugly. But there is a science to the perception of color. We know that some colors catch people’s eyes more than others. I would argue that these pie charts would be more readily perceptible if the colors were swapped.
The intense saturation of the light blue makes it look like it is highlighting something. Here the portion of interest is the small white wedge representing 15%, but the white is overpowered by the blue.
(There is the separate question of whether these pie charts help us understand the difference between the 8% and 15% range represented in the left-most column. The small pie charts are attractive, but does this small multiples grid of pie charts help the viewer understand this dataset better than a table of these numbers alone?)
A similar issue affects the bar chart. Here the viewer must compare between likely (represented by white) and unlikely (represented by blue). Again, the blue stands out and draws the user’s attention.
A minor detail in the bar chart is the orientation of the text. In the U.S., we are more comfortable turning our heads right to read things instead of left. Think of a bookshelf—how do you turn your head to read the books’ titles? My preference (preference!) would be to rotate the labels on this bar chart 180°.
Designers must be careful that their infographics accurately depict meaningful information. Here, for example, we see that the peak rate of tweets about the iPad was 26,668 in an hour.
The depiction juxtaposes this number against a timeline that suggests the peak occurred between 11:00am and 12:00pm. If this is the case, then the segment should be labeled so that the viewer can learn this readily. On other hand, if we don’t know the time of the peak, then this illustration is misleading because it implies a fact where there is ambiguity.
The segment of this infographic that depicts the cost of apps for the iPhone and iPad is less clear still.
The accompanying text reads:
The other notable difference between the iPad and the iPhone, are the app prices. The average price of the initial iPad apps ran around $4.99 (according to Mobclix) while the iPhone apps averaged a steady $1.99.
I’ve looked at this pie chart for some time and I can’t figure out what it is showing. The ratio of average app price to the total average app price? Even if that were the case, 5/7 is 71% and this chart is split into 60% and 40% segments.
There are a variety of visual variables a designer can use to encode a given set of data. Among these are length, position, angle, area, color, lightness, and others. Some of these are better suited to certain kinds of data than others, and some are more readily perceptible than others (see Cleveland and McGill’s “Graphical Perception” for extensive details and experiments).
Look at this area comparison of percentage of iPads sold. Before we even consider the accuracy, look at these two circles and ask yourself how much bigger is circle B than circle A? Go ahead, just type your guess in the box.
The circle on the right is times bigger than the one on the left.
The area of the circle on the left is 1075 pixels (d = 37, r = 18.5, A = 1075 px) and the circle on the right is 7390 pixels (d = 97, r = 48.5, A = 7390). That’s 6.8 times bigger. You said x.x, so you were off by x.%
People are much better at comparing length and position than comparing area. This is a very common mistake (one I’ve made myself). Before you represent a variable with area, you should consider that you may be handicapping people’s ability to compare the data.
Area is hard to understand, but it’s hard for designers to get it right as well. Consider the legend for this map:
Are these circles the right size? Let’s construct a table and find out:
|Nominal Size||Diameter||Radius||Area||× 1%|
The rightmost column shows the area of the circle compared with the area of the 1% circle. It turns out that the area of the 20% circle is 53 times bigger than the area of the 1% circle—more than 2.5 times bigger than it should be. Comparing areas is hard; it’s harder with an inaccurate legend. The difficulty of accurately representing area is another reason to avoid using it.
Maps are such a common form of visualization that we use them even when they are not helpful. On top of that, they’re hard. Maps’ static nature make it hard to show important dimensions of data.
The size of maps is fixed, which can be at odds with what your are trying to communicate. In the map shown above much of the interesting data is crammed into the northeast because that’s where those states are located. Meanwhile, a bunch of sparsely populated states in the northwest of the map use up space without communicating anything.
Data on maps is constrained by realities the map itself cannot express. There’s no convenient way to show population data. Does California have a giant circle because Californians buy more iPads than their counterparts across the country? Or is it because California has more people than any other state?
Here the map isn’t clearly better than just the numbers. A table listing the sales numbers for each state, possibly along with the per capita sales per state, would express the point with greater clarity and more succinctly. In the end, that’s the goal, right?