When small is more

We’ve done a few critique/redesigns of graphics on the site, but now its time to shine that sometimes unflattering light back on ourselves. While going through some materials I came across a graphic much like this one.

The chart is clean, with axes lightened so the data is in the foreground, and the series direct labelled. Unfortunately it is not very effective at conveying much beyond “There is lots of pillaging.” If we look closely we may also see a slight upward trend in Thieving and in revenues overall, but insights beyond that are all obscured by our chart choice.

The problem with stacked charts is that only the first series, in this case Thieving, and the total of all series are clearly displayed. Everything else is distorted by the shifting baseline of the series beneath it. If you want to note patterns in the individual series, stacked charts are inadequate.

Don’t be afraid of small things
The solution in this case is to simply make a small chart for each series. Often called “small multiples” these charts reveal the patterns for each series and let us compare between the series. To give a sense of the proportions of the different series we can add another chart specifically for that purpose. What pops out now is a rather interesting change in Plundering which was previously hidden.

Don’t be afraid to make your charts smaller to communicate a bigger, more complete message.

Posted in Chart Redesign, Visualization | 3 Comments

Salvaging the Pie

The poor, maligned 3D pie chart. He is so popular among the common folk, but put him next to his peers and his vacant stare betrays (not entirely unfounded) feelings of insecurity and inadequacy. Sometimes the only way to address such feelings is to let go of your inhibitions and do something unexpected. He has value hidden away, we’re sure of it. And so, for the third installment in our Data Looks Better Naked series, we are recommending that the 3D pie do what the bar chart and table have done before him: start stripping to see what he might be concealing.

Devour the Pie

There are a ton of articles out there explaining the disadvantages of pie charts, which is why they rarely turn up in our work. My good friend, Dan, is fond of saying, “You have three pie charts to use in your lifetime, so choose them wisely,” however this article by Bruce Gabrielle makes some decent counterpoints worth considering (except the second point, pie charts are terrible at trends.) So if you’ve got a pie chart, consider if something else wouldn’t be a better solution. Or you could just embrace it.

The slides for those interested:


Posted in Chart Redesign, Fun, Visualization | 7 Comments

Breathing City

Inspired by John Nelson’s breathing earth and Conveyal’s aggregate-disser post, I wondered if I could make a breathing city. Manhattan looks somewhat lung-like, so it seemed natural. Should be a fun, quick project. How naive I was.

Search and Recover
Conveyal had already gathered the data I would need to do a dot density plot, so it should be easy to find it using their post as a starting point. But wait they didn’t share links to the source files and they didn’t respond to my email. Google should solve that… hours of surfing later, I find what I’m looking for in four different places: population, employment, land use, building footprints.

Excellent, now just run it all through Conveyal’s conveniently open source tool. Except its written in Java, so lets install the Java SDK. Oh and it has several library dependencies. Finding… installing… finding… (several hours later)… installing… not working. Clearly I know far too little about java to get this going.

Python Wrestling
We use python at Darkhorse, and learning some geographic libraries could be useful. Let’s use the code for the racial dot map project as a starting point for creating a python version of Conveyal’s disser tool. I just need several new libraries which translates to more hours finding, installing, re-installing, searching, uninstalling, removing, copying, installing again because computers are petty and vindictive. Then finally shapely and osgeo are working… yay!

Baby Steps
Now we just take several hours to learn how not to use these libraries and eventually we stumble across one or two things that work, then a couple more, and then we crawl toward a messy program that does what someone else has already done, but at least I speak this one’s language.

Combine CSVs with Shapes
What? Excel stopped allowing you to edit and save DBF files, when did that happen? This tool is a bit buggy, but it brings that feature back.

QGIS forces antialiasing
You can’t turn it off. If you want to create single pixel markers, it just won’t let you color them properly, I tried for far too long (if you know how, please let me know). Good thing Excel is a poor man’s GIS (Yep every one of the frames was made in Excel)

Find More Data
So now I can make a plot that doesn’t breathe. But I want to show change in the typical workday. I’m gonna need more data for that. Several searches and false starts later we find work related activity percentages by time of day. Manhattan probably has a different profile than the US average, but close enough.

Find even more data
But each dot is a person, so I can’t just have them flicking on and off randomly to match the time of day percentages. They need to go to work for a while then come home for a while, so I need to give each dot a schedule. Maybe this is overkill, but I can’t stop now. More searching finds us a rough hours of work distribution.

Solver time
Now I’m forced to assign schedules to the ~1.5 million people living in Manhattan and the ~2 million people working in Manhattan. But the sum of those schedules needs to resemble to hours of work distribution and the percentage at work for each hour of the day. Time to break out Excel’s solver engine. With it we can create ~200 schedules with probabilities to match those profiles. Then we can distribute them to each of of our people.

Data is done
Finally we have data for each of 24 hours for both home and work. We’re making some huge simplifying assumptions (e.g. Manhattan’s work profile is the same as the rest of the US, when people aren’t at work they are at home, there are only 200 possible ways to spend your day, when we build this someone will want to see it) but we have a reasonable data set.

Now we just make some maps and push pixels around on the screen until they look good, then painstakingly create 24 versions to string together for the animation. Eric Fisher has some great tips for making and coloring dot density plots in this post. Then we’ll add a bar chart, no an area chart, no a line chart, wait… I’ve got it, a mesmerizing heart rate monitor looking thingy to go with our breathing theme. Nice.

See, it is super easy and takes almost no time at all to create something like this, as long as your definitions of “super easy” and “no time” are flexible enough to include difficult and time-consuming.


Note: A previous version of this GIF had the orange work line in the ‘heart rate’ chart incorrectly shifted one hour. This has since been corrected.

Posted in Analytics, Visualization | 21 Comments

Clear Off the Table

We received a lot of attention for our Data Looks Better Naked post. People got bored on Christmas Eve and some interesting searches for Star Trek somehow landed them on our page. Now their charts look better.

The principles outlined in that article aren’t just for charts, though. You can apply them to your data tables with similar improvements in readability and aesthetics. To paraphrase Edward Tufte, too often when we create a data table, we imprison our data behind a wall of grid lines. Instead we can let the data itself form the structure that aids readability by making better use of alignment and whitespace.

In the gif below we start with a table formatted similar to one of Excel’s many styling options which, much like the chart styles, do nothing to improve the table. Progressive deletions and some reorganization deliver a clearer and more compelling picture.

As with charts, rather than dressing up our data we should be stripping it down. For more information on table design, you can read Chapter 8 of Stephen Few’s Show Me the Numbers. My apologies to any true fans of 80′s wrestling, the stats below, much like the ring rivalries, are entirely fabricated.

The slide deck for viewing at your own pace:

Posted in Chart Redesign, Visualization | 15 Comments

The Uniform Distribution

The goaltenders from my youth – Bill Ranford, Andy Moog, Grant Fuhr – all had jersey numbers in the low thirties. And most of the goalies I can think of now have numbers in the low thirties. This got me wondering, how do the numbers number by position across the major sports leagues? What traditions, rules, and preferences do they reveal?

So, after some python scraping and excel manipulating we find ourselves with a paradox: uniform distributions that aren’t uniform distributions. Download them for yourself in tower or poster format.


So what can we see in the numbers? Well, Jackie Robinson’s league-wide retired #42 is quite clear, while the NHL’s only league-wide retired number, #99, is not nearly as apparent given the thin number of 90s to begin with and its position at the end of the spectrum. 

The NFL’s arcane and rigid numbering system also shows up quite clearly. Something I was completely unaware of until going through this exercise.

Plus there is an interesting parallel between soccer and hockey when it comes to picking numbers for the defence where both seem to prefer single digits that are not #1. Both leagues also like to give goalies the #1, though you can see the NHL’s three most popular goalie numbers are #30, #31, and #35 which I’m pleased to say are Bill Ranford, Grant Fuhr, and Andy Moog’s numbers respectively. I’ll imagine this is the leagues current players paying homage to the hero’s of my youth, though it is more likely has to do with tradition.

So click through and let us know what you discover in the graphics.



Posted in Fun, Visualization | 2 Comments

The Five Faces of Analytics

The chasm between Business and IT is well documented and has existed since the first punch-card mainframe dimmed the lights of MIT to solve the ballistic trajectory of WWII munitions.  Since that time, great leaps in data collection, storage, connectivity, and processing power have made IT infrastructure ubiquitous.  You’re not even in the game if you don’t have an IT group.


But the productivity gains have not kept pace with the investments.  The newest hype-driver, analytics, claims that it will finally deliver the goods.  But can it?

Some studies suggest that analytics projects have an 80% failure rate.  That’s abysmal.  In the next few articles, we’ll look at reasons for this failure.  We’ll start by describing the roles, then talk about process, and finally identify the characteristics of world-class analytics.

A helpful starting point is to imagine your analytics dream team.  Who would you hire, and what would their roles be?  I suggest that there are five distinct job descriptions:

Data Steward – this skillset is alive and well in most organizations.  Almost everyone has a data warehouse, talks about the ETL process, and has had discussions around the business rules of cleaning up and storing their data.  The data steward will use tools such as SQL Server, MySQL, Oracle, and if she’s a superstar, she’ll dabble in Python and web scraping and know the difference between Hadoop and Map Reduce.

Analytic Explorer – this skillset is a tough one to find.  It requires math, statistics, and modeling along with a healthy dose of creativity and skepticism. These are the people who can spin straw into gold or write tomorrow’s news today.  His job is to explore your data, combine it with sources outside the firewall, and distill it down to insights that will support your most critical decisions.  He’ll use tools such as Excel, R, MATLAB, ArcInfo, SAS, Tableau, and SPSS.  If he’s a superstar, he’ll know all about Bayes, Optimization, and the difference between precision, accuracy, and skill.

Information Artist – This is the role of a creative.  Her goal is to sell the results to the decision-maker.  And the lack of emphasis on this skillset is one of the reasons analytics is such a failure (and why Apple is such a success).  Edward Tufte – the godfather of data visualization – speculates that the lack of good data design caused the Challenger space shuttle tragedy.  Think of this person as being as crucial as your sales force.  In fact, that is their job – to sell the right answer.  Excel and PowerPoint can suffice, although the more skilled will use a variety of tools from Google Earth to Adobe Illustrator to D3.  If she’s a superstar, she’ll be as comfortable talking about the math behind the visuals as she is talking about the psychology behind her design.

Automator – If the Explorer finds the path through the dark forest to the fountain of youth, and the visualizer designs a beautiful bottle for the elixir,  then the Automator turns that path into an eight-lane highway and builds a factory to bottle that stuff as soon as it comes out of the ground.  His job is to operationalize the work of the Explorer and Visualizer.  He makes sure that results are timely and fast.  He adds scale. He might use traditional coding methods like C# or .NET or he might fiddle with Ruby or Objective C. Or he might even be the guru of Business Objects, Microstrategy, or D3.

The Champion – The champion stands with one foot in the land of “gut feel”, and the other planted firmly in the side of “evidence”.  She can speak the language of the geeks, and translate it to that of the battle-hardened general.  She believes strongly in data-driven decision making, but also recognizes the value of deep domain experience.  She’s tireless in her efforts to sculpt the processes of the organization to support analytics. She aims to harvest the brightest insights from the sharp young analysts and the cleverest hacks from the wily old veterans.  Her focus is adoption, and if she’s a superstar, she’ll make you believe that this analytics thing was your idea in the first place.

So that’s your dream team: a steward, an explorer, an artist, an automator and a champion.

But there’s a problem.  This team rarely exists in the wild.  Most companies hire the Data Steward, and then try to do the rest through a major software implementation.  Unfortunately, the software is not meant to explore and discover.  And it was designed by engineers who don’t understand the psychology of data visuals.  It’s like expecting your bookkeeper to be your CFO.  Sure they can both do accounting, but you won’t be happy with the results.

In other instances, organizations will try to shoehorn engineers into the roles “in their spare time”.  Again, with neither the training nor the time to explore the data or design the results, they’re doomed to fail.  These skillsets are distinct, and they shouldn’t be ignored.

So what’s the canny company to do?  If you’re extremely lucky, you’ll find the unicorn of the 21st century known as the Data Scientist, pay her a quarter million, and watch the magic happen. (A data scientist can do all five roles.) Or you can try to develop these skills in-house.  Or you can hire contractors – perhaps engaging a consulting firm to take on the Explorer or Visualizer roles for a time.  Or you can outsource the whole thing.

What’s important is that you recognize that each of these roles is necessary.  Neither software nor “Dave in Engineering” can replace them. Happy hunting.


Posted in Analytics | Comments Off

Data looks better naked

Edward Tufte introduced the concept of data-ink in his 1983 classic The Visual Display of Quantitative Information. In it he states “Data-ink is the non-erasable core of the graphic, the non-redundant ink arranged in response to variation in the numbers represented” (emphasis mine). Tufte asserts that in displaying data we should remove all non-data-ink and redundant data-ink, within reason, to increase the data-ink-ratio and create a sound graphical design.

Stephen Few convincingly argues that some redundancy is often more effective and we agree, however, most graphics don’t struggle with understatement. In fact, most contain a stunning amount of excess ink (or pixels). Rather than dressing our data up we should be stripping it down.

To illustrate how less ink is more effective, attractive and impactive we put together this animated gif. In it we start with a chart, similar to what we’ve seen in many presentations, and vastly improve it with progressive deletions and no additions.

And here is the slide deck if you want to go at your own pace.

The next time you are trying to improve a chart, consider what you can take away rather than what you can add.

“Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away”
– Antoine de Saint-Exupery

Posted in Chart Redesign, Visualization | 7 Comments

A Simple Tool

A while back one of our employees was buying a house and journeyed out into the vast interwebs to do some mortgage calculation. His trek was long, winding and far less fruitful than hoped. In the end, he had to create a custom spreadsheet because the tools available were too unwieldy or inflexible to answer his simple questions. He suggested we might like to take this on as an internal visualization project sometime in the future. Well the future has arrived and it is simple.

We are not in the financing and loans business, but this application provides us the opportunity to demonstrate some simple design principles. Please play with A Better Mortgage and Loan Calculator and read on.

Simple Questions

Most calculators make the answering your question needlessly complex. They ask for information that helps them and not you or use dense, industry jargon that you must decipher. You shouldn’t need to know the difference between interest term and amortization period. Eliminate those roadblocks and you eliminate frustration.

We started by investigating what questions the typical borrower has. In the end we boiled it down to three:
How much can I afford to borrow?
What will my payments be?
How long will it take me to pay it down?

Give us an interest rate and answer two of those questions and we can answer the third for you.

Simple Hierarchy

What is most important must appear most important. Most calculators make little distinction between the essential, the secondary, and unnecessary. Everything is similar in size and color, positioned with little care for how you will attend to the information. Below, for example, every number is bold and small, so it takes some time to find the one you’re looking for, not a lot of time, but why put even the slightest strain on you when we have the near infinite flexibility of HTML at our bidding.

We use size, color and position to help guide you through the process. A big title at the top tells you what question we’re answering and lets you choose alternate questions. Inputs are highlighted feature clearly next but even more prominent is the big, bright answer to your question.  Secondary charts and figures follow to provide additional context.

Simple Visuals

Visuals don’t have to be complex or flashy. In this case simple visuals make a more effective display. Below is a particularly bad chart. With all the effort spent on its 3 dimensions, color gradients and drop shadows, they forgot to ask whether it was easy to read and understand. It shouldn’t be hard to distinguish whether first green bar is at $200,000 or $250,000 but it is here.

The visuals we developed didn’t require a new charting paradigm, they are simple charts where the clutter has been eliminated and labels and interactivity added to give an elegant and coherent view of the data.

Simple Interaction

The formula for calculating some of this can look complex, certainly more complex than your typical home buyer wants to think about.

But by a computer’s standards this is a very simple thing to calculate. Even your phone can perform this calculation thousands of times in a second. So instead of making you press the dreaded submit button for an answer, we just show you the answer anytime you change anything, making feedback immediate and the experience smooth.

This instant reaction allows you to quickly see how changes in your assumptions will impact payment amounts, interest paid, or time to pay back. We haven’t just given one answer; we’ve provided the context of all the other answers of interest. For more on this type of interaction read Bret Victor’s thoughts on Explorable Explanations.

Simple is worth it


“Simple can be harder than complex: You have to work hard to get your thinking clean to make it simple. But it’s worth it in the end because once you get there, you can move mountains.”
― Steve Jobs

Moving houses isn’t moving mountains, but working hard to make even a calculator simple leads to satisfied, engaged users and creators.



Posted in Visualization | Comments Off

Bubbles, Bricks and Tukey’s Tics

In Stephen Few’s latest newsletter he proposed a new method for displaying geospatial data. Not long after, based on feedback from Andy Cotgreave and others, he expressed his disappointment with breadth of applications for which the technique seems useful. What Few has done though is generate new interest and discussion for expanding and improving our methods for plotting data on maps.

Stephen worked with Panopticon to produce this comparison tool, but it is limited to three shapes. I wanted to compare more, so I put something together. Below are several of the outputs for comparison.

First we’ll start with the bubbles that bricks were intended to replace. Seeing overall geographic patterns doesn’t require much effort, but distinguishing between any two points of similar size is difficult especially in the bigger circles. However, when things begin to overlap geographic patterns remain visible and locations of the points can still be determined.

With bricks we can see overall patterns, and distinguish between any two points quite easily, but when things have considerable overlap the location of points becomes lost, and geographic patterns are harder to make out. Bricks also have the disadvantage of breaking things into groups of nine which is awkward in our base 10 world.

I thought I’d try out pie slices, which seem to my eye a little less effective at quickly distinguishing magnitudes than bricks, but slightly easier than bricks at pinpointing the geographic location it is associated with. They can be also partitioned into groups of 10 or any other appropriate amount. They are however, similarly poor when they begin to overlap.

Triangles seem comparable to bricks but, like bricks, have the disadvantage of adding up to nine and being difficult to differentiate and pinpoint when overlapped.

I also tried Tukey Tallying which is clearly not suited for this task in any way.

One of the most promising results of the discussion Few initiated was the latest iteration of Francis Gagnon‘s concentric circles technique. Despite Few being rather dismissive of their value in his discussion board, I think they improve bubbles without any of the drawbacks that Stephen’s bricks create. They may not be as preattentively recognizable, but I’m not sure that is much of a disadvantage for most purposes. I recommend reading these three posts where Francis outlines his thoughts at each iteration of their creation.

One further option I created, inspired by Gagnon’s circles and this image, is what I call ingrowing circles. Differentiating between two different circles in this manner seems easier to my eye than standard bubbles, but not as effective as Gagnon’s circles. Overlapping and locating the points is easier than bricks and there is no limitation on how many segmentations can be applied. Certain data sets or design aesthetics might find this method useful, though the concentric circles technique seems the most effective for the general case.


While Stephen Few went for wholesale change with a completely new methodology, I think the most progress was made with the incremental improvements of Francis Gagnon’s concentric circles.  What are your thoughts on these and other alternative methods?  Let me know if you have a new and creative option to throw into the mix.

Posted in Visualization | 6 Comments

Interactive NHL Visualization

The NHL Playoffs are under way and [insert favourite team name here] fever has probably hit your town (unless, like us, your team has made your heart sad). About a year ago we had put together a static visualization of payroll rankings in the NHL, now we’ve extended it to a full interactive piece that lets you explore more than just payroll’s impact on the sport.

Interactivity in a data visualization shouldn’t just mean tooltips when you click on something, it can be so much more powerful than that. Interactivity can provide whole new perspectives, facilitate more comparisons, and encourage exploration and discovery. This is readily apparent when you compare the static version of our graphic to this new interactive.

In this piece we’ve learned interesting facts about last year’s LA Kings, the 2007 Anaheim Ducks and more. If we look specifically at Detroit we can see how they consistently make the playoffs (often the finals) and rank near the top of every metric we provided.

You can also see the dramatic effect the introduction of a salary cap had on the gap between the highest and lowest paid teams in the league.

I encourage you to explore this visualization and let us know what stories you discover in the data.

Posted in Fun, Visualization | 2 Comments