Breathing City

Inspired by John Nelson’s breathing earth and Conveyal’s aggregate-disser post, I wondered if I could make a breathing city. Manhattan looks somewhat lung-like, so it seemed natural. Should be a fun, quick project. How naive I was.

Search and Recover
Conveyal had already gathered the data I would need to do a dot density plot, so it should be easy to find it using their post as a starting point. But wait they didn’t share links to the source files and they didn’t respond to my email. Google should solve that… hours of surfing later, I find what I’m looking for in four different places: population, employment, land use, building footprints.

Java
Excellent, now just run it all through Conveyal’s conveniently open source tool. Except its written in Java, so lets install the Java SDK. Oh and it has several library dependencies. Finding… installing… finding… (several hours later)… installing… not working. Clearly I know far too little about java to get this going.

Python Wrestling
We use python at Darkhorse, and learning some geographic libraries could be useful. Let’s use the code for the racial dot map project as a starting point for creating a python version of Conveyal’s disser tool. I just need several new libraries which translates to more hours finding, installing, re-installing, searching, uninstalling, removing, copying, installing again because computers are petty and vindictive. Then finally shapely and osgeo are working… yay!

Baby Steps
Now we just take several hours to learn how not to use these libraries and eventually we stumble across one or two things that work, then a couple more, and then we crawl toward a messy program that does what someone else has already done, but at least I speak this one’s language.

Combine CSVs with Shapes
What? Excel stopped allowing you to edit and save DBF files, when did that happen? This tool is a bit buggy, but it brings that feature back.

QGIS forces antialiasing
You can’t turn it off. If you want to create single pixel markers, it just won’t let you color them properly, I tried for far too long (if you know how, please let me know). Good thing Excel is a poor man’s GIS (Yep every one of the frames was made in Excel)

Find More Data
So now I can make a plot that doesn’t breathe. But I want to show change in the typical workday. I’m gonna need more data for that. Several searches and false starts later we find work related activity percentages by time of day. Manhattan probably has a different profile than the US average, but close enough.

Find even more data
But each dot is a person, so I can’t just have them flicking on and off randomly to match the time of day percentages. They need to go to work for a while then come home for a while, so I need to give each dot a schedule. Maybe this is overkill, but I can’t stop now. More searching finds us a rough hours of work distribution.

Solver time
Now I’m forced to assign schedules to the ~1.5 million people living in Manhattan and the ~2 million people working in Manhattan. But the sum of those schedules needs to resemble to hours of work distribution and the percentage at work for each hour of the day. Time to break out Excel’s solver engine. With it we can create ~200 schedules with probabilities to match those profiles. Then we can distribute them to each of of our people.

Data is done
Finally we have data for each of 24 hours for both home and work. We’re making some huge simplifying assumptions (e.g. Manhattan’s work profile is the same as the rest of the US, when people aren’t at work they are at home, there are only 200 possible ways to spend your day, when we build this someone will want to see it) but we have a reasonable data set.

Design
Now we just make some maps and push pixels around on the screen until they look good, then painstakingly create 24 versions to string together for the animation. Eric Fisher has some great tips for making and coloring dot density plots in this post. Then we’ll add a bar chart, no an area chart, no a line chart, wait… I’ve got it, a mesmerizing heart rate monitor looking thingy to go with our breathing theme. Nice.

Relief
See, it is super easy and takes almost no time at all to create something like this, as long as your definitions of “super easy” and “no time” are flexible enough to include difficult and time-consuming.

Posted in Analytics, Visualization | 21 Comments

Clear Off the Table

We received a lot of attention for our Data Looks Better Naked post. People got bored on Christmas Eve and some interesting searches for Star Trek somehow landed them on our page. Now their charts look better.

The principles outlined in that article aren’t just for charts, though. You can apply them to your data tables with similar improvements in readability and aesthetics. To paraphrase Edward Tufte, too often when we create a data table, we imprison our data behind a wall of grid lines. Instead we can let the data itself form the structure that aids readability by making better use of alignment and whitespace.

In the gif below we start with a table formatted similar to one of Excel’s many styling options which, much like the chart styles, do nothing to improve the table. Progressive deletions and some reorganization deliver a clearer and more compelling picture.

As with charts, rather than dressing up our data we should be stripping it down. For more information on table design, you can read Chapter 8 of Stephen Few’s Show Me the Numbers. My apologies to any true fans of 80′s wrestling, the stats below, much like the ring rivalries, are entirely fabricated.

The slide deck for viewing at your own pace:

Posted in Chart Redesign, Visualization | 15 Comments

The Uniform Distribution

The goaltenders from my youth – Bill Ranford, Andy Moog, Grant Fuhr – all had jersey numbers in the low thirties. And most of the goalies I can think of now have numbers in the low thirties. This got me wondering, how do the numbers number by position across the major sports leagues? What traditions, rules, and preferences do they reveal?

So, after some python scraping and excel manipulating we find ourselves with a paradox: uniform distributions that aren’t uniform distributions. Download them for yourself in tower or poster format.

 

So what can we see in the numbers? Well, Jackie Robinson’s league-wide retired #42 is quite clear, while the NHL’s only league-wide retired number, #99, is not nearly as apparent given the thin number of 90s to begin with and its position at the end of the spectrum. 

The NFL’s arcane and rigid numbering system also shows up quite clearly. Something I was completely unaware of until going through this exercise.

Plus there is an interesting parallel between soccer and hockey when it comes to picking numbers for the defence where both seem to prefer single digits that are not #1. Both leagues also like to give goalies the #1, though you can see the NHL’s three most popular goalie numbers are #30, #31, and #35 which I’m pleased to say are Bill Ranford, Grant Fuhr, and Andy Moog’s numbers respectively. I’ll imagine this is the leagues current players paying homage to the hero’s of my youth, though it is more likely has to do with tradition.

So click through and let us know what you discover in the graphics.

 

 

Posted in Fun, Visualization | 2 Comments

The Five Faces of Analytics

The chasm between Business and IT is well documented and has existed since the first punch-card mainframe dimmed the lights of MIT to solve the ballistic trajectory of WWII munitions.  Since that time, great leaps in data collection, storage, connectivity, and processing power have made IT infrastructure ubiquitous.  You’re not even in the game if you don’t have an IT group.

 

But the productivity gains have not kept pace with the investments.  The newest hype-driver, analytics, claims that it will finally deliver the goods.  But can it?

Some studies suggest that analytics projects have an 80% failure rate.  That’s abysmal.  In the next few articles, we’ll look at reasons for this failure.  We’ll start by describing the roles, then talk about process, and finally identify the characteristics of world-class analytics.

A helpful starting point is to imagine your analytics dream team.  Who would you hire, and what would their roles be?  I suggest that there are five distinct job descriptions:

Data Steward – this skillset is alive and well in most organizations.  Almost everyone has a data warehouse, talks about the ETL process, and has had discussions around the business rules of cleaning up and storing their data.  The data steward will use tools such as SQL Server, MySQL, Oracle, and if she’s a superstar, she’ll dabble in Python and web scraping and know the difference between Hadoop and Map Reduce.

Analytic Explorer – this skillset is a tough one to find.  It requires math, statistics, and modeling along with a healthy dose of creativity and skepticism. These are the people who can spin straw into gold or write tomorrow’s news today.  His job is to explore your data, combine it with sources outside the firewall, and distill it down to insights that will support your most critical decisions.  He’ll use tools such as Excel, R, MATLAB, ArcInfo, SAS, Tableau, and SPSS.  If he’s a superstar, he’ll know all about Bayes, Optimization, and the difference between precision, accuracy, and skill.

Information Artist – This is the role of a creative.  Her goal is to sell the results to the decision-maker.  And the lack of emphasis on this skillset is one of the reasons analytics is such a failure (and why Apple is such a success).  Edward Tufte – the godfather of data visualization – speculates that the lack of good data design caused the Challenger space shuttle tragedy.  Think of this person as being as crucial as your sales force.  In fact, that is their job – to sell the right answer.  Excel and PowerPoint can suffice, although the more skilled will use a variety of tools from Google Earth to Adobe Illustrator to D3.  If she’s a superstar, she’ll be as comfortable talking about the math behind the visuals as she is talking about the psychology behind her design.

Automator – If the Explorer finds the path through the dark forest to the fountain of youth, and the visualizer designs a beautiful bottle for the elixir,  then the Automator turns that path into an eight-lane highway and builds a factory to bottle that stuff as soon as it comes out of the ground.  His job is to operationalize the work of the Explorer and Visualizer.  He makes sure that results are timely and fast.  He adds scale. He might use traditional coding methods like C# or .NET or he might fiddle with Ruby or Objective C. Or he might even be the guru of Business Objects, Microstrategy, or D3.

The Champion – The champion stands with one foot in the land of “gut feel”, and the other planted firmly in the side of “evidence”.  She can speak the language of the geeks, and translate it to that of the battle-hardened general.  She believes strongly in data-driven decision making, but also recognizes the value of deep domain experience.  She’s tireless in her efforts to sculpt the processes of the organization to support analytics. She aims to harvest the brightest insights from the sharp young analysts and the cleverest hacks from the wily old veterans.  Her focus is adoption, and if she’s a superstar, she’ll make you believe that this analytics thing was your idea in the first place.

So that’s your dream team: a steward, an explorer, an artist, an automator and a champion.

But there’s a problem.  This team rarely exists in the wild.  Most companies hire the Data Steward, and then try to do the rest through a major software implementation.  Unfortunately, the software is not meant to explore and discover.  And it was designed by engineers who don’t understand the psychology of data visuals.  It’s like expecting your bookkeeper to be your CFO.  Sure they can both do accounting, but you won’t be happy with the results.

In other instances, organizations will try to shoehorn engineers into the roles “in their spare time”.  Again, with neither the training nor the time to explore the data or design the results, they’re doomed to fail.  These skillsets are distinct, and they shouldn’t be ignored.

So what’s the canny company to do?  If you’re extremely lucky, you’ll find the unicorn of the 21st century known as the Data Scientist, pay her a quarter million, and watch the magic happen. (A data scientist can do all five roles.) Or you can try to develop these skills in-house.  Or you can hire contractors – perhaps engaging a consulting firm to take on the Explorer or Visualizer roles for a time.  Or you can outsource the whole thing.

What’s important is that you recognize that each of these roles is necessary.  Neither software nor “Dave in Engineering” can replace them. Happy hunting.

 

Posted in Analytics | Leave a comment

Data looks better naked

Edward Tufte introduced the concept of data-ink in his 1983 classic The Visual Display of Quantitative Information. In it he states “Data-ink is the non-erasable core of the graphic, the non-redundant ink arranged in response to variation in the numbers represented” (emphasis mine). Tufte asserts that in displaying data we should remove all non-data-ink and redundant data-ink, within reason, to increase the data-ink-ratio and create a sound graphical design.

Stephen Few convincingly argues that some redundancy is often more effective and we agree, however, most graphics don’t struggle with understatement. In fact, most contain a stunning amount of excess ink (or pixels). Rather than dressing our data up we should be stripping it down.

To illustrate how less ink is more effective, attractive and impactive we put together this animated gif. In it we start with a chart, similar to what we’ve seen in many presentations, and vastly improve it with progressive deletions and no additions.

And here is the slide deck if you want to go at your own pace.

The next time you are trying to improve a chart, consider what you can take away rather than what you can add.

“Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away”
– Antoine de Saint-Exupery

Posted in Chart Redesign, Visualization | 7 Comments

A Simple Tool

A while back one of our employees was buying a house and journeyed out into the vast interwebs to do some mortgage calculation. His trek was long, winding and far less fruitful than hoped. In the end, he had to create a custom spreadsheet because the tools available were too unwieldy or inflexible to answer his simple questions. He suggested we might like to take this on as an internal visualization project sometime in the future. Well the future has arrived and it is simple.

We are not in the financing and loans business, but this application provides us the opportunity to demonstrate some simple design principles. Please play with A Better Mortgage and Loan Calculator and read on.

Simple Questions

Most calculators make the answering your question needlessly complex. They ask for information that helps them and not you or use dense, industry jargon that you must decipher. You shouldn’t need to know the difference between interest term and amortization period. Eliminate those roadblocks and you eliminate frustration.

We started by investigating what questions the typical borrower has. In the end we boiled it down to three:
How much can I afford to borrow?
What will my payments be?
How long will it take me to pay it down?

Give us an interest rate and answer two of those questions and we can answer the third for you.

Simple Hierarchy

What is most important must appear most important. Most calculators make little distinction between the essential, the secondary, and unnecessary. Everything is similar in size and color, positioned with little care for how you will attend to the information. Below, for example, every number is bold and small, so it takes some time to find the one you’re looking for, not a lot of time, but why put even the slightest strain on you when we have the near infinite flexibility of HTML at our bidding.

We use size, color and position to help guide you through the process. A big title at the top tells you what question we’re answering and lets you choose alternate questions. Inputs are highlighted feature clearly next but even more prominent is the big, bright answer to your question.  Secondary charts and figures follow to provide additional context.

Simple Visuals

Visuals don’t have to be complex or flashy. In this case simple visuals make a more effective display. Below is a particularly bad chart. With all the effort spent on its 3 dimensions, color gradients and drop shadows, they forgot to ask whether it was easy to read and understand. It shouldn’t be hard to distinguish whether first green bar is at $200,000 or $250,000 but it is here.

The visuals we developed didn’t require a new charting paradigm, they are simple charts where the clutter has been eliminated and labels and interactivity added to give an elegant and coherent view of the data.

Simple Interaction

The formula for calculating some of this can look complex, certainly more complex than your typical home buyer wants to think about.

But by a computer’s standards this is a very simple thing to calculate. Even your phone can perform this calculation thousands of times in a second. So instead of making you press the dreaded submit button for an answer, we just show you the answer anytime you change anything, making feedback immediate and the experience smooth.

This instant reaction allows you to quickly see how changes in your assumptions will impact payment amounts, interest paid, or time to pay back. We haven’t just given one answer; we’ve provided the context of all the other answers of interest. For more on this type of interaction read Bret Victor’s thoughts on Explorable Explanations.

Simple is worth it

 

“Simple can be harder than complex: You have to work hard to get your thinking clean to make it simple. But it’s worth it in the end because once you get there, you can move mountains.”
― Steve Jobs

Moving houses isn’t moving mountains, but working hard to make even a calculator simple leads to satisfied, engaged users and creators.

 

 

Posted in Visualization | Comments Off

Bubbles, Bricks and Tukey’s Tics

In Stephen Few’s latest newsletter he proposed a new method for displaying geospatial data. Not long after, based on feedback from Andy Cotgreave and others, he expressed his disappointment with breadth of applications for which the technique seems useful. What Few has done though is generate new interest and discussion for expanding and improving our methods for plotting data on maps.

Stephen worked with Panopticon to produce this comparison tool, but it is limited to three shapes. I wanted to compare more, so I put something together. Below are several of the outputs for comparison.

First we’ll start with the bubbles that bricks were intended to replace. Seeing overall geographic patterns doesn’t require much effort, but distinguishing between any two points of similar size is difficult especially in the bigger circles. However, when things begin to overlap geographic patterns remain visible and locations of the points can still be determined.

With bricks we can see overall patterns, and distinguish between any two points quite easily, but when things have considerable overlap the location of points becomes lost, and geographic patterns are harder to make out. Bricks also have the disadvantage of breaking things into groups of nine which is awkward in our base 10 world.

I thought I’d try out pie slices, which seem to my eye a little less effective at quickly distinguishing magnitudes than bricks, but slightly easier than bricks at pinpointing the geographic location it is associated with. They can be also partitioned into groups of 10 or any other appropriate amount. They are however, similarly poor when they begin to overlap.

Triangles seem comparable to bricks but, like bricks, have the disadvantage of adding up to nine and being difficult to differentiate and pinpoint when overlapped.

I also tried Tukey Tallying which is clearly not suited for this task in any way.

One of the most promising results of the discussion Few initiated was the latest iteration of Francis Gagnon‘s concentric circles technique. Despite Few being rather dismissive of their value in his discussion board, I think they improve bubbles without any of the drawbacks that Stephen’s bricks create. They may not be as preattentively recognizable, but I’m not sure that is much of a disadvantage for most purposes. I recommend reading these three posts where Francis outlines his thoughts at each iteration of their creation.

One further option I created, inspired by Gagnon’s circles and this image, is what I call ingrowing circles. Differentiating between two different circles in this manner seems easier to my eye than standard bubbles, but not as effective as Gagnon’s circles. Overlapping and locating the points is easier than bricks and there is no limitation on how many segmentations can be applied. Certain data sets or design aesthetics might find this method useful, though the concentric circles technique seems the most effective for the general case.

 

While Stephen Few went for wholesale change with a completely new methodology, I think the most progress was made with the incremental improvements of Francis Gagnon’s concentric circles.  What are your thoughts on these and other alternative methods?  Let me know if you have a new and creative option to throw into the mix.

Posted in Visualization | 6 Comments

Interactive NHL Visualization

The NHL Playoffs are under way and [insert favourite team name here] fever has probably hit your town (unless, like us, your team has made your heart sad). About a year ago we had put together a static visualization of payroll rankings in the NHL, now we’ve extended it to a full interactive piece that lets you explore more than just payroll’s impact on the sport.

Interactivity in a data visualization shouldn’t just mean tooltips when you click on something, it can be so much more powerful than that. Interactivity can provide whole new perspectives, facilitate more comparisons, and encourage exploration and discovery. This is readily apparent when you compare the static version of our graphic to this new interactive.

In this piece we’ve learned interesting facts about last year’s LA Kings, the 2007 Anaheim Ducks and more. If we look specifically at Detroit we can see how they consistently make the playoffs (often the finals) and rank near the top of every metric we provided.

You can also see the dramatic effect the introduction of a salary cap had on the gap between the highest and lowest paid teams in the league.

I encourage you to explore this visualization and let us know what stories you discover in the data.

Posted in Fun, Visualization | 2 Comments

Exceltron Annual Report (or Pretty Counts)

One of Edward Tufte’s most quotable quotes is “if the statistics are boring, then you’ve got the wrong numbers.” I wholeheartedly agree with this sentiment but good visualization is not just a matter of putting the right statistics in front of the audience. Aesthetics matter, and they matter a lot. Most people, whether they realize it or not, place a disproportionate amount of value on the look of things. People want, even need, to be engaged and good design is essential in ensuring they are.

Studies show that more attractive people have higher incomes. This doesn’t mean more attractive people are more effective in their positions, it means their attractiveness is valued and people are willing to pay for it. The same appears to be true for data visualization. Take, for instance, the OECD’s announcement in January of the winner of their Global Data Visualization Contest.

The winning entry is visually stunning: well balanced, excellent colors, smooth animation. The problem is it is incredibly difficult to decode. Even with the legend right next to the visualization and a good amount of time playing with it, I was not able to gain any significant insight.

Now look at the entry to which they gave an honorable mention.

It doesn’t take long to learn how to interpret it’s display, all of the countries can be compared at a glance, and it is relatively clear what we are comparing and how to compare it.  Interaction lets us filter so we can further our understanding, rather than just flipping us to another country as in the winner’s.

The contest guidelines stated the visualization “should encourage comparison across the countries, and should reveal the individual statistics that go into these indicators.” So why then did the contest judges choose the winner? Not because it is more insightful to look at, but (I think) because it’s more enjoyable to look at. The “striking visual design, which draws users into exploring their piece” was mentioned specifically.  These judging results indicate an unfortunately high weighting given to aesthetics.

What we can take from the this is that people highly value style, even to their own detriment. That means we need to value style but execute without causing detriment. We need to find a way to ensure that we not only deliver data in a meaningful way but also seek to deliver it in a compelling/engaging style. Function must come first, but if we ignore Form we risk being ignored.

To that end, I went through a little exercise to show that you don’t need specialized tools to execute good design, in fact you can create “striking visual design” in something as ubiquitous as Excel, which most of us use regularly. Nicholas Felton’s Feltron Annual Reports are a beautiful example of information design for the quantified self.  If you haven’t browsed them before, I encourage you to go and have a look.  His elegant compositions turn what could be mundane statistics about someone we’ve never met into an engaging story about human actions and interactions.

Can we take a page out of Nicholas Felton’s book and make it happen in Excel?  Well, here is what happens if you use Excel’s default charts to show some of his data.

Charts like these, while easy to read and understand, would not have garnered Mr. Felton much attention or interest never mind renown.  But you can see below just how close we can get in applying his design to the charts and layouts. If we had some money to buy new fonts, we could get even closer.

You can download the workbook to scroll around a full page and see how it is accomplished in Excel, though you will need the fantastic and free League Gothic font installed on your computer for some of the numbers to look as they should.  I even threw random numbers in some of the charts so you can press F9 and watch them dance.

Good design isn’t a matter of pushing a button, or selecting a theme.  It takes effort to learn the language of graphic design and time to apply its principles.  But that effort and time is well worth it when it garners a more engaged and enlightened viewer.

If the statistics are boring then you’ve got the wrong numbers. If no one is paying attention then maybe you need to work on your visuals.

Posted in Visualization | 1 Comment

Cursory Dashboard Principles

Last week I received an email from Bernard Lebelle, who is apparently updating his book on dashboarding in Excel and will be including screen shots of the three winning tools from Chandoo’s Excel Salary Contest.   You can download my submission here. Mr. Lebelle’s book is in French so I can’t speak to its quality, but my hope is that it will espouse good dashboard design principles that make knowledge transfer and decision making easier.

I was asked to send “a few lines presenting the design principles behind your creation.”  I’m not sure how much of my response will make the book, and given the book is in French, I may never know, so I thought I’d post it here.  The points are by no means exhaustive, or even prioritized, but all are worth considering the next time you are putting together a dashboard or analytic tool.

The data should be the star of the show.  There are no distracting extra elements like clip art, background images or grids (chart junk as Tufte would call it).

Don’t distort the data. All the bar chart axes start at zero, so comparing bar lengths is meaningful. No exploding pies, no 3D graphs.

Use color judiciously to highlight and provide meaning.  There is only one color other than black/grey/white and it is used to show the currently selected data and set it apart from what it is compared to.  This consistent and restrained color usage draws attention where it should be directed and gives the dashboard a professional look.

Alignment is important.  The elements are all aligned to a grid to give it a pleasing, organized layout and an easy to follow flow.

Provide context. In this case the context of the whole data set is always available for comparison against the selected data set.

Put it on one screen. Information seen together on one screen allows you to identify relationships and make connections.  You can pack a lot of information in one screen without losing meaning or depth.

Keep interaction simple and elegant.This dashboard has only one interaction: filtering by region and/or position.  It is easy to understand and achieve with all the instruction necessary summed up in a single sentence.

Make it responsive.  Response to user interactions should be immediate (or at least feel immediate).  If people have to wait for the computer they will be frustrated with the tool and/or they will spend less time gathering insight and more time staring at the “calculating %”.

To see another quality entry to the contest which, sadly, didn’t fare as well as I’d hoped in the online voting system, you can view Ben Jones’ write-up about his submission.

Posted in Analytics, Visualization | 1 Comment