The Home Office recently opened up some of its data, mostly in the form of PDF reports and Excel spreadsheets. Right after, I went on holiday and offline (!) for a week, so I set myself the task of putting together some visualisations of the data using two client-side visualisation libraries that I liked the look of:
- jQuery sparklines which I think look simply gorgeous and which follow the jQuery tradition of being incredibly easy to put on a page
As a quick summary, I ended up with solutions that use an HTML page with rdfQuery code that pulls in static RDF/XML files and performs queries on them to create the particular formats that the two client-side libraries require.
The first one I’m going to talk about is a visualisation of types of offences using JIT. There’s a screenshot below to give you a flavour, but you’d be better off actually visiting the page because it’s interactive: mousing over and clicking on the labels enables you to navigate around the hierarchy.
- Bob DuCharme giving an overview on the semantic web
- Leigh Dodds talking about publishing linked data
- Andy Seabourne talking about SPARQL
- Duncan Hall talking about creating ontologies
It’s not really XML, I suppose, but it’s certainly a bunch of interesting and timely topics. I particularly hope that we’ll get some public sector people in the room so that we can discuss some of the challenges and opportunities in that area.
I’ll start with the problem. To create the graphs I showed in my last post, I wanted to split MPs into groups based on their party affiliation. Ideally, I wanted the Google Visualisation query to look like:
select mp, additionalCosts, totalTravel, totalBasic where party = 'Conservative' order by totalClaim desc limit 25
because this is reasonably easy to understand and for a developer to create without having to know any magic URIs.
The party affiliation for an MP is given in the RDF supplied within the Talis store as a pointer to one of the resources:
Now, if you visit http://dbpedia.org/resource/Conservative_Party_(UK) then you’ll see precious few properties and none of them give you access to the string ‘Conservative’. If you look at http://dbpedia.org/resource/Liberal_Democrats, you’ll see plenty of properties, one of which is
dbpprop:partyName. But trying to query on
dbpprop:partyName within the Talis data store gives me nothing, because that information hasn’t been imported into the particular store that this SPARQL query is running on.
My last post showed a visualisation of the Guardian’s MP’s Expenses data, ported into a Talis triplestore. Here’s a screenshot of another one (follow the link for the interactive version). The files that are used to create it are attached to this post.
There are several things that are frustrating about creating these visualisations, which I want to discuss because I think they lead to some lessons about what data publilshers and members of the semantic web community should do to make these things easy. The first thing I want to talk about is datatyping.
During Guardian Hack Day 2, Leigh ported the Guardian’s MP’s Expenses data into Talis. Most wonderfully, this gives a SPARQL endpoint that can be used to query the data. I thought I’d try to use the same approach as I blogged about recently, using a SPARQL query as a Data Source for a Google Visualisation of the MP’s expenses data.
To cut to the chase, here’s a screenshot of the result (follow the link for the more interactive version):