Recently the Alberta government hosted Apps for Alberta - a competition using the province’s open data. Being an Alberta-based data visualization firm, we felt encouraged, perhaps even duty-bound, to enter. So we did. We managed to pull together a couple submissions, the first of which is a look at high school grades in the province.
In an ideal world, you start by asking: What question am I helping to answer? Then you go out and locate the most appropriate data available to help you answer that question and, because it is an ideal world, exactly the data you need is already there. It’s squeaky clean and accessible, just waiting to be visualized.
In the real world, data availability is often the limiting factor. You therefore start with the data you have and try to discover what questions it might help answer. If any of the question/answer combinations you discover end up engaging your curiosity, then you have a candidate for a visualization.
Such was the case with this competition. We poured through the available Alberta data sets looking for something that was interesting (engagement value) and relevant (societal value). Eventually we landed on the high school grades data.
Next we needed to define our audience. It’s easy to fall into the trap of defining an overly broad audience (e.g. the General Public). I mean, it’s on the Internet, right? The whole world is your audience. But there is a paradox: the more narrow your focus, the more engaging your application. We therefore defined our target user as “parents of children about to enter high school”. The application will answer some questions well rather than many questions poorly.
It’s also important to recognize the limitations of the data. Our audience's question might be "what is the best school for my child?" Our data can only answer "How do schools' grades compare?" There are a host of education quality concerns that aren't addressed by our data (e.g. the socio-economic status of the students, opportunities outside the classroom, etc.). And there are likely biases in the data collection (e.g. teaching to the test, cheating, small sample-size schools, or inconsistent grading criteria). But what we do have (average grade by school and subject) is still valuable.
As an aside, our visualization doesn't make claims it cannot possibly deliver on. Headlines like Find the Best School in Alberta or Where Do Schools Suck the Most? may pull in more hits on Reddit, but would be a disservice to the users of this tool and the teachers and institutions reported on. The socio-economic factors likely dominate any straight school-to-school comparison. Nevertheless, a more nuanced exploration reveals some interesting gems.
Students across the province seem to struggle with English more than the other subjects. Or perhaps English grading standards are higher.
Private schools are over-represented among the higher ranks (as you'd expect), but there are still some private schools for whom grades are not a primary focus.
Our most interesting finding is in looking at intra-school grading. While most schools have fairly consistent scores across subjects, certain institutions appear to excel in a specific subject. By comparing within a school, we are naturally accounting for most of the socio-economic impact. So if my daughter wants to go into engineering, Henry Wise Wood might be an excellent choice with its strong Mathematics program. If I thought my son was destined for politics, I might point him toward St. Francis Xavier and its exceptional Social Studies program.
Play around with the tool and feel free to share your findings and feedback in the comments below.