sparql

SPARQL & Visualisation Frustrations: Aggregation and Projection

Today, I’m going to moan about the lack of features in SPARQL that are necessary to do many kinds of data analysis and visualisation. Going from raw data, held in RDF, to data like

  • the average traffic flow along the M5
  • the total amount claimed by each MP
  • the number of corporate insolvency notices published each day

cannot be done with SPARQL on its own. These calculations involve aggregation, grouping and projection which are planned for SPARQL vNext, but not here yet (at least, not in any standard way or in every triplestore).

Here’s the pretty graph to illustrate today’s rant:

Corporate insolvency notices per day from the London Gazette since 1st May 2008, averaged over 20 days

Semantic Technologies at the XML Summer School

I posted before about the joys of the XML Summer School: the learning, the punting, the drinking! Now the Semantic Technologies track has been fleshed out to include:

  • Bob DuCharme giving an overview on the semantic web
  • Leigh Dodds talking about publishing linked data
  • Andy Seabourne talking about SPARQL
  • Duncan Hall talking about creating ontologies

It’s not really XML, I suppose, but it’s certainly a bunch of interesting and timely topics. I particularly hope that we’ll get some public sector people in the room so that we can discuss some of the challenges and opportunities in that area.

SPARQL & Visualisation Frustrations: Linked Data

I’ll start with the problem. To create the graphs I showed in my last post, I wanted to split MPs into groups based on their party affiliation. Ideally, I wanted the Google Visualisation query to look like:

select mp, additionalCosts, totalTravel, totalBasic 
where party = 'Conservative' 
order by totalClaim desc 
limit 25

because this is reasonably easy to understand and for a developer to create without having to know any magic URIs.

The party affiliation for an MP is given in the RDF supplied within the Talis store as a pointer to one of the resources:

  • http://dbpedia.org/resource/Labour_Party_(UK)
  • http://dbpedia.org/resource/Conservative_Party_(UK)
  • http://dbpedia.org/resource/Liberal_Democrats

Now, if you visit http://dbpedia.org/resource/Conservative_Party_(UK) then you’ll see precious few properties and none of them give you access to the string ‘Conservative’. If you look at http://dbpedia.org/resource/Liberal_Democrats, you’ll see plenty of properties, one of which is dbpprop:partyName. But trying to query on dbpprop:partyName within the Talis data store gives me nothing, because that information hasn’t been imported into the particular store that this SPARQL query is running on.

SPARQL & Visualisation Frustrations: RDF Datatyping

My last post showed a visualisation of the Guardian’s MP’s Expenses data, ported into a Talis triplestore. Here’s a screenshot of another one (follow the link for the interactive version). The files that are used to create it are attached to this post.

Graphs of highest 25 expense claims in each party

There are several things that are frustrating about creating these visualisations, which I want to discuss because I think they lead to some lessons about what data publilshers and members of the semantic web community should do to make these things easy. The first thing I want to talk about is datatyping.

Syndicate content