"Unboxing the iPad Data," Deconstructed

10 April 2010
2:11 AM

2 Comments

Yesterday John Gruber linked to an infographic, “Unboxing the iPad Data” by John Kumahara and Johnathan Bonnell. In terms of graphic design it’s visually pleasing, but it falls short in a few areas and highlights common challenges in designing infographics. These are problems that occur all the time in visualizations, so let’s see what we can learn from this example.

I know it’s easy to misinterpret criticism, so I want to emphasize that these are comments about this particular graphic, not about the authors’ skills or ability.

Numbers

People understand numbers. So when you are evaluating a visualization one of the most important questions is whether the graphic is better than just the numbers. The most successful visualizations show trends, outliers, and insights that a table of numbers wouldn’t.

In some places “Unboxing the iPad Data” clearly shows the numbers of interest. In others, it obscures them for the sake of design.

Display of numbers: 3,122 apps in the store, 300,000 devices sold, 1 million apps sold

The fact that Apple sold 300,000 devices and 1,000,000 applications in the first weekend is a big deal—so these should be big numbers. Instead you have to read the fine print to see that 1 actually means 1 million.

Equally large are numbers that few people care about, like the number of respondents or the specifics of the Likert scale used.

7 point scale, 2,176 audience polled

When the numbers speak for themselves, rely on them without decoration. Clearly show what is important.

Colors

Certain aspects of color are a subjective matter. One designer thinks this shade of red is the right choice; another thinks it’s ugly. But there is a science to the perception of color. We know that some colors catch people’s eyes more than others. I would argue that these pie charts would be more readily perceptible if the colors were swapped.

Small pie chart

The intense saturation of the light blue makes it look like it is highlighting something. Here the portion of interest is the small white wedge representing 15%, but the white is overpowered by the blue.

(There is the separate question of whether these pie charts help us understand the difference between the 8% and 15% range represented in the left-most column. The small pie charts are attractive, but does this small multiples grid of pie charts help the viewer understand this dataset better than a table of these numbers alone?)

A similar issue affects the bar chart. Here the viewer must compare between likely (represented by white) and unlikely (represented by blue). Again, the blue stands out and draws the user’s attention.

Number of tweets about the iPad

A minor detail in the bar chart is the orientation of the text. In the U.S., we are more comfortable turning our heads right to read things instead of left. Think of a bookshelf—how do you turn your head to read the books’ titles? My preference (preference!) would be to rotate the labels on this bar chart 180°.

Meaning

Designers must be careful that their infographics accurately depict meaningful information. Here, for example, we see that the peak rate of tweets about the iPad was 26,668 in an hour.

Number of tweets about the iPad

The depiction juxtaposes this number against a timeline that suggests the peak occurred between 11:00am and 12:00pm. If this is the case, then the segment should be labeled so that the viewer can learn this readily. On other hand, if we don’t know the time of the peak, then this illustration is misleading because it implies a fact where there is ambiguity.

The segment of this infographic that depicts the cost of apps for the iPhone and iPad is less clear still.

The accompanying text reads:

The other notable difference between the iPad and the iPhone, are the app prices. The average price of the initial iPad apps ran around $4.99 (according to Mobclix) while the iPhone apps averaged a steady $1.99.

I’ve looked at this pie chart for some time and I can’t figure out what it is showing. The ratio of average app price to the total average app price? Even if that were the case, 5/7 is 71% and this chart is split into 60% and 40% segments.

Area

There are a variety of visual variables a designer can use to encode a given set of data. Among these are length, position, angle, area, color, lightness, and others. Some of these are better suited to certain kinds of data than others, and some are more readily perceptible than others (see Cleveland and McGill’s “Graphical Perception” for extensive details and experiments).

Sales map

Look at this area comparison of percentage of iPads sold. Before we even consider the accuracy, look at these two circles and ask yourself how much bigger is circle B than circle A? Go ahead, just type your guess in the box.

Two circles

The circle on the right is times bigger than the one on the left.

The area of the circle on the left is 1075 pixels (d = 37, r = 18.5, A = 1075 px) and the circle on the right is 7390 pixels (d = 97, r = 48.5, A = 7390). That’s 6.8 times bigger.

People are much better at comparing length and position than comparing area. This is a very common mistake (one I’ve made myself). Before you represent a variable with area, you should consider that you may be handicapping people’s ability to compare the data.

Area is hard to understand, but it’s hard for designers to get it right as well. Consider the legend for this map:

Area Legend

Are these circles the right size? Let’s construct a table and find out:

Nominal Size Diameter Radius Area × 1%
1% 22 11 380 1
5% 70 35 3848 10
10% 102 51 8171 21.5
20% 160 80 20106 53

The rightmost column shows the area of the circle compared with the area of the 1% circle. It turns out that the area of the 20% circle is 53 times bigger than the area of the 1% circle—more than 2.5 times bigger than it should be. Comparing areas is hard; it’s harder with an inaccurate legend. The difficulty of accurately representing area is another reason to avoid using it.

Maps

Maps are such a common form of visualization that we use them even when they are not helpful. On top of that, they’re hard. Maps’ static nature make it hard to show important dimensions of data.

The size of maps is fixed, which can be at odds with what your are trying to communicate. In the map shown above much of the interesting data is crammed into the northeast because that’s where those states are located. Meanwhile, a bunch of sparsely populated states in the northwest of the map use up space without communicating anything.

Data on maps is constrained by realities the map itself cannot express. There’s no convenient way to show population data. Does California have a giant circle because Californians buy more iPads than their counterparts across the country? Or is it because California has more people than any other state?

Here the map isn’t clearly better than just the numbers. A table listing the sales numbers for each state, possibly along with the per capita sales per state, would express the point with greater clarity and more succinctly. In the end, that’s the goal, right?

Greasemonkey & jQuery

23 March 2010
4:13 PM

4 Comments

I taught a class this past fall where we used Greasemonkey as a tool for building prototypes of new browser functionality. We also used jQuery extensively to make JavaScript in the browser simpler. There are some tricks to getting the two to work together that I’ve been meaning to share.

Loading jQuery in a Greasemonkey script

Simply loading jQuery within Greasemonkey hasn’t always been straightforward. I think they should just bundle jQuery with the Greasemonkey package since just about everyone is using it already, but that’s a separate issue.

The old way to include jQuery in your Greasemonkey scripts was to add a <script> tag to every page you visit linking to the jQuery library. Although this works, it’s a clunky solution. It introduces timing issues: since you can’t be sure when the script would load, you have to poll continually, checking to see if jQuery has downloaded yet. Once you detect that jQuery is available, you can run your script. This approach also might clobber other JavaScript running on the main page (like the Prototype library) before you have a chance to call jQuery.noConflict.

Fortunately, later versions of Greasemonkey made this hack unnecessary by adding the @require attribute to scripts. When you install a Greasemonkey script that has an @require statement, the listed resource is downloaded once and included in the script as if it were pasted directly into the document. Unfortunately, there’s a small stumbling block for people testing their scripts: the @require statement is only executed when you install a script. This means that if you create a new Greasemonkey script in Firefox, add the @require line, and save your file nothing will happen. You have to uninstall your script and reinstall it because @require is only processed at installation. Once you do this, you can use jQuery all you want in your script. Except for…

Sather Tower × 4

10 March 2010
3:34 PM

3 Comments
Four photos of Sather Tower
Clockwise from top-left: A foggy afternoon on Feb. 18; from the main staircase of South Hall on Feb. 24; nighttime on Mar. 1; and the west facade before sunset on Feb. 25.

Sather Tower, also called the Campanile, is at the center of UC Berkeley’s campus. It’s right across from South Hall, where the School of Information is housed, and where I spent most of my time. These are four pictures I took of Sather Tower in a two-week period.

Conversation with an Anthropomorphized Twiki

25 February 2010
7:08 PM

1 Comment

This semester I’m taking a class on user interface design and development. One common technique we’ve covered is think-aloud studies, where you get a user to explain what they’re doing aloud as they do it. This gives you some insight into what the user is thinking about your interface, and why they choose the options they do.

For this class, we post our assignments to a class wiki run on Twiki. Twiki is an adequate piece of software, but it has some rough edges. I got frustrated trying to attach images to my assignment, so I wrote a think-aloud dialogue of my interaction with the software.

Insert/edit image to Twiki page
This is the popup dialog to insert an image into the current page. You have to enter the image’s URL or select an image that you uploaded before getting to this page.

Ryan: Great, I’m just going to upload some pictures to my page. I’ll just click the insert picture button.
Twiki: Sure thing. What’s the URL for the picture that you want to insert?
Ryan: Uhh…I don’t have one. I mean it doesn’t have a URL. Err…not yet?
Twiki: No problem. You can select a picture that you’ve already uploaded.
Ryan: I haven’t uploaded any pictures.
Twiki: (silence)
Ryan: So where do I go to upload a picture?
Twiki: (silence)
Ryan: Can I do it anywhere on this page?
Twiki: You can attach pictures to a page…
Ryan: Great—that’s what I want to do.
Twiki: …but not when you are editing the page.
Ryan: OK, so I’ll save this page and come back later to finish editing after I upload my pictures.
Twiki: Just click the attach button.
Ryan: OK, so now I can upload my pictures on this page.
Twiki: Picture.
Ryan: Picture?
Twiki: Why would you want to upload more than one at a time?
Ryan: Oh, I don’t know, I just have a few I want to put on this page.
Twiki: One. Picture.
Ryan: Fine, I’ll upload a picture and click Attach and then repeat the process for each picture that I have.
Twiki: Good.
Ryan: This one picture that I uploaded, though. It’s 2000 pixels, which is kind of large to display in a browser window. Could you do something about that?
Twiki: That seems like something you should have though about before you uploaded it.
Ryan: I mean, can’t you resize it or something to make it the right size for my wiki page?
Twiki: No.
Ryan: OK, hold on, I’ll be right back. I have to go resize these images by hand.
Ryan: Back! Now I’ll upload these three…oh…right…Attach, upload, save, attach upload, save, attach, upload, save.
Twiki: Good work.
Ryan: Now let’s put these images into my wiki page.
Twiki: Image.
Ryan: Image?
Twiki: Well, you have to do it one at a time.
Ryan: (sigh) You’re the wiki for a class on usability?
Twiki: Apparently so.
Ryan: Is this some kind of joke?
Twiki: No.
Ryan: This is ridiculous. I couldn’t design a less usable wiki if I sat down and tried to.
Twiki: Now you’re just exaggerating.
Ryan: And don’t even get me started on your “WYSIWYG” editor. It’s like you vomit on my text whenever I’m not looking.
Twiki: Please be reasonable.
Ryan: I AM BEING REASONABLE. And what kind of name is “Twiki”? Even the pronunciation is unusable! I’ve been speaking English for decades and I can’t decide how to use your name. Tee-wiki. Twick-ee.
Twiki: It’s my name.
Ryan: It’s absurd.
Twiki: You just wish I was Wikipedia.
Ryan: No, I don’t want—
Twiki: You wish I had millions of contributors and video and embedded SVGs and links everywhere.
Ryan: Really I just want you to be easier to—
Twiki: (sobbing) WHY CAN’T YOU JUST LOVE ME FOR WHAT I AM?
Ryan: I’m sorry. I didn’t mean to—
Twiki: You can be so cruel.
Ryan: I’m sorry. I’m sorry. I’m sorry. (softly, not realizing this is a bad time) It’s just that, on a lot of other wikis—not that I want you to be exactly like them—but it’s just that I could probably have uploaded my pictures in the time it took me to write this. But with you…
Twiki: (muffled sobs)
Ryan: …
Twiki: …
Ryan: You know what, forget I said anything. I’ll just upload the photos.

Is Online First-come, First-served Fair?

18 February 2010
10:28 AM

4 Comments

President Clinton is speaking at UC Berkeley next week. Tickets were available to students who filled out an online form starting at 7:00am this morning. Predictably, at 6:59am the website offering tickets looked like this:

Four different server errors

Judging by messages on Twitter, hundreds of other students had the same experience. At 7:39am, tickets were sold out.

Despite being awake and online at the right time, I never even saw the form. I’m not upset or tremendously disappointed, but this experience started me thinking about the process of distribution. This is the fourth or fifth time I’ve participated in an online offer in the last year, and they always leave me with the sense that they’re not fair. In this situation I think fair means that every person who is eligible to receive a ticket and wants one has an equal chance to receive one. 1

The issue is that this system relies on first-come, first-served rationing (FCFS) for a scarse resource. FCFS is pretty well understood in the physical world. We know that people can only be in one place at a time: each person gets a single spot in line. Whoever shows up first, or camps out overnight, will get to see the movie first. Incidentally, these practices are exactly why organizers want to move to an online system. Real-world FCFS means large, unwieldy crowds and groups of students who set up tents on campus.

An online version of first-come, first-served solves these problems: when people vie for a resource online (on the Internet, or calling by phone), they don’t form massive crowds outside your door or camp on your porch. Without the restrictions of the physical world, however, FCFS is different online. I can get 10 places in line just by loading a page in my browser 10 times. If I am handy with programming, I might be able to get hundreds of places in line by writing a script to connect on my behalf. If I have a slow Internet connection, or I’m positioned at a certain point on the network, I might never be able to get in line because the server is too busy. In short, online FCFS has a chance at being fair only if your servers can handle the load and you can prevent people from taking multiple places in line.

A natural next question might be how you can prepare your servers for the demand created by an event like this. People who know a lot more than I do about IT might suggest things like caching, replicated servers, faster machines, and more memory. But this misses the real problem: online FCFS is not a good approximation of fair.

A better alternative is an online lottery. Instead of awarding tickets based on whoever arrives first, you provide a window during which anybody can obtain a lottery number. Then you randomly select people using a computer and give them tickets. This system solves the same problems that online FCFS does: there are no large crowds and no camping out. It also solves some of the problems with FCFS. Since you can get a lottery number at any time during the window, network traffic will be spread out across this time. And since awards are based on random chance,2 a lottery is more fair because every ticket holder has an equal chance.

Of course, there are some similar problems. If the window to get a lottery ticket isn’t large enough everyone who wants a ticket won’t have a chance to get one—not fair. Also, the problem of multiple people in line is now replaced with the problem of multiple lottery tickets. People will find ways to get multiple lottery tickets, increasing their odds of being selected. But I don’t think that ballot stuffing is a much greater problem than having multiple spots in line. With a lottery system you can deal with ballot stuffing on your own time instead of in real time.

In the controlled circumstances of the Clinton ticket giveaway, stuffing is nearly impossible. Each student has a unique student ID number which can be checked to ensure that each person gets only one lottery number. And since each student can only obtain a single ticket for this event for personal use (confirmed with photo ID when claiming the ticket), people cannot use their friends to get extra lottery numbers.

The only downside to a lottery is that it requires more work. With online FCFS you can just tell people to come, watch your server buckle under the load, and eventually give away all the tickets. Online FCFS is also fine if you’re not interested in fair. I don’t mean this in a pejorative sense. If you’re selling concert tickets, you get paid no matter who buys them; there’s no obvious incentive for fair distribution.3

Given the detailed instructions for distributing tickets, it seems like Berkeley does want to be fair about the process. Next time, I suggest they consider a lottery.


1 Of course there are other definitions for fair. If you understand fair in a different sense (like “following an defined process for distribution”) then this system may well be fair. I am using fair as equitable because I think that is a popular understanding of fair.

2 Computers only generate pseudorandom numbers, but these are random enough for our needs.

3 There might be external repercussions for distribution using a process that isn’t perceived as fair. The New Yorker ran an article last summer, “The Price of the Ticket” addressing this issue as it applies to Ticketmaster and fishy practices selling tickets.

A Nook of My Own

25 January 2010
12:47 AM

1 Comment

This past Christmas my future in-laws gave me a lovely gift, the nook, Barnes & Noble’s new electronic reader. I’ve decided to use the nook to conduct a semester long experiment, where I use it for as much of my reading as possible.

Even with the imminent arrival of the Apple Canvas, I’m excited about testing the nook for a few reasons. First, I’m hoping that reading on a reflective screen like the nook’s will be easier on my eyes. Every semester I have at least a few hundred pages of reading, nearly all of which is only available online. With the nook I have another alternative besides printing everything out or reading on my computer screen, which is hard on my eyes.

Second, I think that having a device just for reading will make it easier to stay focused on the task at hand (command-tab is my worst enemy). Some people think that single-purpose devices are too limited to become popular. For my money, I think they’re right—multipurpose devices will probably be more popular, but a smaller market doesn’t mean the nook (or Kindle) can’t be a commercial success.

While I’m testing the nook over the next few months I’ll write up my thoughts. Here’s my first batch.

Out of the Box

nook out of the box
The nook ships with a message on its screen which lasts indefinitely because of the E Ink technology.

The nook arrives in an attractive clear plastic box that feels quite hefty. When I pulled off the cardboard cover from the box there was a message welcoming me to the nook. This was a nifty touch that actually tricked me: I thought the message was printed on a clear film that I would have to pull off the screen, but it was displayed on the screen itself. One of the advantages of the E Ink technology is that it remains on the screen indefinitely without consuming any power, so they display this message at the factory and it stays there until you see it. I had to fumble with the box for awhile to get it open and remove the nook. The packing is somewhat excessive, as indicated by the two pages of instructions on how to unpack your nook. If I need seven steps to get your product out of the box, you packed it wrong.

Hello World! from Arduino

3 September 2009
11:54 PM

1 Comment

I’m taking an exciting class this semester called Theory and Practice of Tangible User Interfaces. Today was our first lab class where we got our box of inputs and outputs, a breadboard, and an Arduino microcontroller. With these tools I have a way to sense and control things in the physical world with a normal programming environment, which is a step towards the “tangible” interfaces from the class’s name.

TUI, as students call the class, culminates in a final projects that in past years have been a real showcase of creativity: the bubblegum sequencer, Jug Hero, blowing virtual bubbles, and others. I haven’t had my innovative and great idea yet, but since we just got our kits today, I have a bit of time.

Now the main task is to get working with the new hardware and development environment. The language you use to program the Arduino board—also called Arduino—is based on C, which is a more low-level language than I have been using recently. (I spent the summer writing Javascript and Ruby, both of which are very high-level languages). Anyway, typically the first task in any programming language is getting the computer to display the message “Hello world”.

My Physical Computing textbook explains:

Anybody who has learned how to use a couple of different computer systems or programming languages will tell you that the hardest part is getting a computer to do anything at all. … In software, it’s traditional to prove your mastery of any environment by getting your program to say “Hello World!” The “Hello World!” message of the microcontroller is a blinking LED. Once you get the microcontroller to blink an LED, it’s all downhill from there.

But a simple blinking LED didn’t seem like an appropriate start, especially since the newer Arduino boards are programmed at the factory to blink any connected LED. Once I finished fumbling with my breadboard and resistors and finally plugged in all the parts, my light started blinking automatically without me doing any programming at all. I decided that a better start would be to have my light blink “Hello world” in Morse code.

Like almost everyone, I know only enough Morse code to do SOS, and it turns out I was doing even that wrong. Whenever I tap out SOS, I put longer delays between dashes than dots. That’s wrong—the delay between symbols that form a letter is supposed to be constant. The rules for timing are systematic:

International Morse code is composed of five elements:

  • short mark, dot or ‘dit’ (·) — one unit long
  • longer mark, dash or ‘dah’ (-) — three units long
  • intra-character gap (between the dots and dashes within a character) — one unit long
  • short gap (between letters) — three units long
  • medium gap (between words) — seven units long

I copied this timing information and the codes for all the letters from Wikipedia and implemented them as my first Arduino program. It took a bit longer than I expected because I’m not that familiar with C. Debugging was also tricky because I realized I had no idea what the Morse code for “Hello world” was supposed to look like! While I was debugging, I had to use SOS to see if everything was working properly.

Here’s what the end result looks like:

If you want to see the code behind my Morse, here’s the source. It’s messier than I would like because I couldn’t get sizeof() to operate as I expected.

What is bad design?

21 May 2009
11:10 AM

1 Comment

Bad design is ugly, it is frustrating, it doesn’t create joy, it is not well considered, but above all bad design wastes time. This isn’t to say that design is just about saving time: Good design can make you slow down, it can make you think, it can be about things besides efficiency. Good design makes you think as much as you should, but no more.

USPS Form 3849
This form is about the size of an index card.

I’m writing after a particular encounter with time-wasting bad design via the U.S. Postal Service’s form 3849. When a package cannot be delivered, the mail carrier leaves a orange slip that lets you request redelivery or go to pick up the package at the post office.

My girlfriend wasn’t home when the postman came earlier this week, so I went to pick up a package for her. She signed item 2 on the form, which is labeled “Sign here to authorize redelivery or to authorize an agent to sign for you.” I went to the post office, waited an hour, presented the signed slip, and was turned away: my name didn’t appear in the space for the agent’s name. “What?” I asked, “what space for the agent’s name?” “Right here,” the clerk told me, “it clearly says to write the agent’s name right here.”

Location of agent's name

She was right. The words on the form clearly indicate that the agent’s name should be entered in the upper-right corner. The form’s visual language, however, contradicts these instructions. Once you follow the numbered list and sign in section 2, it doesn’t look like there is anything else you have to do. The space for the agent’s name is so tiny that a glance at the form would never suggest that you have to write anything there. This is bad design. Not only does it waste time because a person has to think too much about filling it out—Look at the instructions in section 1, for example, which tell you to fill out section 3, then come back to section 2—but it also wastes time when people wait in line with an improper form.

In my case, it wasted an hour. That’s bad design.

Live Blogging the I School Masters Presentations

14 May 2009
4:05 PM

1 Comment

3:58. As a culmination of two years of study at the School of Information, masters students present final projects that represent a significant work based on learning in the classroom. Next year I’ll be giving one of these presentations, but this year I’m just along for the ride.

The first step is a lightning round where each group of students get two minutes to present an overview of their projects. The projects are separated into three tracks.

Love to chat, but it’s time to get started…

Missing Tweets

24 April 2009
5:41 PM

3 Comments

While looking at Twitter last week I noticed that updates were missing from my user timeline. According to Twitter, I didn’t post anything between March 20 and April 9. Through a little investigative coding (and some help from Twistory, I found that my updates were still part of the Twitter system, but not associated with my user timeline. For instance, I posted this message on April 1, but if you look at that part of my user timeline, it doesn’t appear.

Twitter has closed the support ticket on this issue, but my problem isn’t solved. I opened my own support ticket, but the only advice I’ve received so far has been about changing my bio.

Since Twitter is such a large service, fixing this problem probably isn’t a high priority if it only affects a couple people. But it’s hard to know if it affects you unless you feel like going through all your messages and counting. Since computers are pretty decent when it comes to counting, I built Missing Tweets to help people check if they are missing any updates. The script works on the assumption that Twitter still maintains the correct total count for a user’s updates. This is true in my case: Twitter says I have made 993 updates, but only 955 show up in my user timeline. That means that 38 are missing.

Most people I have tested this script on aren’t missing any tweets, but a few are. Test it on yourself, and let Twitter know if you are missing tweets.

This doesn’t work if you have more than 3200 tweets. You can still use Missing Tweets even if your account is protected. Your browser will prompt you for your username and password. This communication is happening directly between your computer and Twitter; my site isn’t involved in any way.

Update (2009-05-02): All of my tweets have been reconnected with my user timeline. I’m not missing any more tweets, but you can still check if you are.