visualisation

SPARQL & Visualisation Frustrations: Aggregation and Projection

Today, I’m going to moan about the lack of features in SPARQL that are necessary to do many kinds of data analysis and visualisation. Going from raw data, held in RDF, to data like

  • the average traffic flow along the M5
  • the total amount claimed by each MP
  • the number of corporate insolvency notices published each day

cannot be done with SPARQL on its own. These calculations involve aggregation, grouping and projection which are planned for SPARQL vNext, but not here yet (at least, not in any standard way or in every triplestore).

Here’s the pretty graph to illustrate today’s rant:

Corporate insolvency notices per day from the London Gazette since 1st May 2008, averaged over 20 days

More Crime

I wrote previously about a visualisation using Home Office data to navigate around categories of offences. The second interesting set of data from the Home Office that I found, tucked away in a small link on a page about Crime Reduction Toolkits was a spreadsheet of recorded crime statistics between 1898 and the present day. Each column is a different category of offence (I won’t say class because they don’t map onto the Classes from the spreadsheet of notifiable offences).

This time I wanted to try out the jQuery sparklines plug-in to illustrate how crime notifications have changed over time. The resulting page is available at http://www.jenitennison.com/visualisation/crime.html; here’s a screenshot for Bigamy:

Summary statistics for rate of Bigamy within the UK

Offence Hierarchy Visualisation with rdfQuery and JIT

The Home Office recently opened up some of its data, mostly in the form of PDF reports and Excel spreadsheets. Right after, I went on holiday and offline (!) for a week, so I set myself the task of putting together some visualisations of the data using two client-side visualisation libraries that I liked the look of:

  • jQuery sparklines which I think look simply gorgeous and which follow the jQuery tradition of being incredibly easy to put on a page
  • the JavaScript InfoVis Toolkit (JIT) which can be used to create some very attractive and interactive visualisations for hierarchical information

As a quick summary, I ended up with solutions that use an HTML page with rdfQuery code that pulls in static RDF/XML files and performs queries on them to create the particular formats that the two client-side libraries require.

The first one I’m going to talk about is a visualisation of types of offences using JIT. There’s a screenshot below to give you a flavour, but you’d be better off actually visiting the page because it’s interactive: mousing over and clicking on the labels enables you to navigate around the hierarchy.

Visualisation of Criminal Damage offences

SPARQL & Visualisation Frustrations: Linked Data

I’ll start with the problem. To create the graphs I showed in my last post, I wanted to split MPs into groups based on their party affiliation. Ideally, I wanted the Google Visualisation query to look like:

select mp, additionalCosts, totalTravel, totalBasic 
where party = 'Conservative' 
order by totalClaim desc 
limit 25

because this is reasonably easy to understand and for a developer to create without having to know any magic URIs.

The party affiliation for an MP is given in the RDF supplied within the Talis store as a pointer to one of the resources:

  • http://dbpedia.org/resource/Labour_Party_(UK)
  • http://dbpedia.org/resource/Conservative_Party_(UK)
  • http://dbpedia.org/resource/Liberal_Democrats

Now, if you visit http://dbpedia.org/resource/Conservative_Party_(UK) then you’ll see precious few properties and none of them give you access to the string ‘Conservative’. If you look at http://dbpedia.org/resource/Liberal_Democrats, you’ll see plenty of properties, one of which is dbpprop:partyName. But trying to query on dbpprop:partyName within the Talis data store gives me nothing, because that information hasn’t been imported into the particular store that this SPARQL query is running on.

Map Visualisation of MPs Travel Expenses

During Guardian Hack Day 2, Leigh ported the Guardian’s MP’s Expenses data into Talis. Most wonderfully, this gives a SPARQL endpoint that can be used to query the data. I thought I’d try to use the same approach as I blogged about recently, using a SPARQL query as a Data Source for a Google Visualisation of the MP’s expenses data.

To cut to the chase, here’s a screenshot of the result (follow the link for the more interactive version):

Map of travel expenses for the 100 MPs with the lowest majorities

Creating Google Visualisations of Linked Data

Update: For the people who couldn’t read the post because the graph didn’t have 0 as its x-axis minimum, here is the version of the graph that does. I haven’t removed the other version, since doing so would make the comments confusing and I think it’s interesting to compare the two.

London Borough Life Expectancy Bar Chart with Y-Axis Minimum at 0

There are idealists who immediately see the publication of Open Data as a Good Thing, and leap up and down (metaphorically or physically) shouting “Raw Data Now”. There are also a whole bunch of people who need to “see the shiny”. They need to understand why publishing Open Data is a Good Thing, and most particularly what the benefit is going to be to them.

This is understandable. Publishers bear the cost of the development of URI schemes, XML formats, RDF ontologies, and other infrastructure for serving data, and the ongoing maintenance cost of domain resolution, bandwidth usage and user support. Even publishers with a public-service remit (who may not need to see monetary payback) need to be convinced that there will be some kind of return on the investment.

One result of making data available is that it enables you and others to easily construct nice visualisations over the data, and maybe spot useful patterns within it. This is particularly useful for public sector information because it can provide feedback on how effective a particular policy has been or where more resources need to be spent.

So I thought it would be worthwhile trying to explore how to create visualisations of some data, starting with the London Borough data that I’ve published using Talis.

Syndicate content