rdf

Semantic Technologies at the XML Summer School

I posted before about the joys of the XML Summer School: the learning, the punting, the drinking! Now the Semantic Technologies track has been fleshed out to include:

  • Bob DuCharme giving an overview on the semantic web
  • Leigh Dodds talking about publishing linked data
  • Andy Seabourne talking about SPARQL
  • Duncan Hall talking about creating ontologies

It’s not really XML, I suppose, but it’s certainly a bunch of interesting and timely topics. I particularly hope that we’ll get some public sector people in the room so that we can discuss some of the challenges and opportunities in that area.

SPARQL & Visualisation Frustrations: Linked Data

I’ll start with the problem. To create the graphs I showed in my last post, I wanted to split MPs into groups based on their party affiliation. Ideally, I wanted the Google Visualisation query to look like:

select mp, additionalCosts, totalTravel, totalBasic 
where party = 'Conservative' 
order by totalClaim desc 
limit 25

because this is reasonably easy to understand and for a developer to create without having to know any magic URIs.

The party affiliation for an MP is given in the RDF supplied within the Talis store as a pointer to one of the resources:

  • http://dbpedia.org/resource/Labour_Party_(UK)
  • http://dbpedia.org/resource/Conservative_Party_(UK)
  • http://dbpedia.org/resource/Liberal_Democrats

Now, if you visit http://dbpedia.org/resource/Conservative_Party_(UK) then you’ll see precious few properties and none of them give you access to the string ‘Conservative’. If you look at http://dbpedia.org/resource/Liberal_Democrats, you’ll see plenty of properties, one of which is dbpprop:partyName. But trying to query on dbpprop:partyName within the Talis data store gives me nothing, because that information hasn’t been imported into the particular store that this SPARQL query is running on.

SPARQL & Visualisation Frustrations: RDF Datatyping

My last post showed a visualisation of the Guardian’s MP’s Expenses data, ported into a Talis triplestore. Here’s a screenshot of another one (follow the link for the interactive version). The files that are used to create it are attached to this post.

Graphs of highest 25 expense claims in each party

There are several things that are frustrating about creating these visualisations, which I want to discuss because I think they lead to some lessons about what data publilshers and members of the semantic web community should do to make these things easy. The first thing I want to talk about is datatyping.

Map Visualisation of MPs Travel Expenses

During Guardian Hack Day 2, Leigh ported the Guardian’s MP’s Expenses data into Talis. Most wonderfully, this gives a SPARQL endpoint that can be used to query the data. I thought I’d try to use the same approach as I blogged about recently, using a SPARQL query as a Data Source for a Google Visualisation of the MP’s expenses data.

To cut to the chase, here’s a screenshot of the result (follow the link for the more interactive version):

Map of travel expenses for the 100 MPs with the lowest majorities

The Real Deal: data.gov.uk

I’m sure that you’ve noticed that my recent posts have been somewhat obsessed with publishing and using public sector information. It’s because I’ve somehow been sucked into the work going on within the UK government, with Tim Berners-Lee and Nigel Shadbolt advising, to publish its data as linked data.

My recent blog posts about publishing data using Talis have actually been a front for much more complex work that I’ve been doing with a different data set.

Creating Google Visualisations of Linked Data

Update: For the people who couldn’t read the post because the graph didn’t have 0 as its x-axis minimum, here is the version of the graph that does. I haven’t removed the other version, since doing so would make the comments confusing and I think it’s interesting to compare the two.

London Borough Life Expectancy Bar Chart with Y-Axis Minimum at 0

There are idealists who immediately see the publication of Open Data as a Good Thing, and leap up and down (metaphorically or physically) shouting “Raw Data Now”. There are also a whole bunch of people who need to “see the shiny”. They need to understand why publishing Open Data is a Good Thing, and most particularly what the benefit is going to be to them.

This is understandable. Publishers bear the cost of the development of URI schemes, XML formats, RDF ontologies, and other infrastructure for serving data, and the ongoing maintenance cost of domain resolution, bandwidth usage and user support. Even publishers with a public-service remit (who may not need to see monetary payback) need to be convinced that there will be some kind of return on the investment.

One result of making data available is that it enables you and others to easily construct nice visualisations over the data, and maybe spot useful patterns within it. This is particularly useful for public sector information because it can provide feedback on how effective a particular policy has been or where more resources need to be spent.

So I thought it would be worthwhile trying to explore how to create visualisations of some data, starting with the London Borough data that I’ve published using Talis.

Versioning URIs

Yesterday I went along to a workshop on developing URI guidelines for the UK public sector. Because of the current drive to get more UK public sector information online, and the fact that we have Tim Berners-Lee on board, there’s a growing recognition of the fact that we need URIs for the real-world and conceptual things that we talk about in the public sector: schools, roads, hospitals, services, councils, and so on.

One of the particular points of contention at the meeting was whether URIs for non-information resources (ie for real-world and conceptual things) should contain dates or version numbers, or not.

Publishing Linked Data on the Talis Platform, Part 3

This is the third in a series of posts about using the Talis Platform as a back end for serving linked data. In the first part, I showed how to add data to a store. In the second post, I showed how to use some PHP scripts to publish the data as Linked Data, at the URLs you use as your identifiers.

In this post, I’m going to begin the process of exposing the data in a way that makes it easy to locate and reuse. One of the biggest lessons I learned after the initial publication of the London Gazette data as RDFa is that the publication of data and metadata about individual items is not enough. To make the data usable, you have to make it discoverable. To make it discoverable there must be an entry point from which you can locate the data. One kind of easy entry point is a list.

In the case of the data about London Boroughs that I’ve been using, there aren’t currently any links to the data, so there is no way to discover it aside from me telling you the URI template (http://www.jenitennison.com/data/id/london-borough/{name}, where name is hyphenated and in lowercase) and you knowing the name of a London Borough that you want to look up. Discovery via a URI template that I told you relies on out-of-band information, and contradicts the RESTful tenet of “hypertext as the engine of application state”.

Instead, I need to offer an entry point from which you can follow links (or fill in forms) to discover information about the various London Boroughs. Since I’m dealing with a small set of information here, I’m going to do this in the straight-forward way of having http://www.jenitennison.com/data/london-borough contain a brief description of each of the known London Boroughs, including (obviously) a link to the URI for the London Borough, from which you can get more information.

Publishing Linked Data on the Talis Platform, Part 2

In my last post, I showed how to add data to a Talis store. In this post, I’m going to show how you can use the Talis Platform as a back end for a Linked Data view on the RDF you added to it.

As you’ll see, the great thing about this method is that it only takes a couple of PHP files and an .htaccess file on a server. Assuming that you’ve got a web server that supports PHP, it’s an approach you can use without installing anything. The code I’ve written is pretty generic and should be widely applicable; feel free to reuse and adapt it.

Publishing Linked Data on the Talis Platform

I was at OpenTech a couple of weekends ago, and heard a lot of great talks. I particularly enjoyed the one by Simon Willison in which he talked about the Guardian Data Blog. Essentially, the data collected by the journalists at the Guardian, that form the basis of their pretty visualisations and so forth, gets published in Google Spreadsheets.

Looking through the data blog today, I saw that the Greater London Authority have similarly released their data using Google Spreadsheets.

Now Google Spreadsheets are just fine — they’re easy for end-users to use and it’s not hard for data nerds to extract data from them. They have real advantages for publishing because they are quick and easy to set up.

But take a look through the page listing the tables of data and you can see that many of them are about the same areas. The Guardian Data Blog have actually created a new spreadsheet that pulls together that information. Even with the aggregated data, in Google Spreadsheets there’s no way to address the data held in each table about Sutton (say).

Now, a few months ago, Talis announced the Talis Connected Commons, which enables anyone to publish public domain data using the Talis Platform for free. It turns out that it’s really easy to publish addressable data using the Talis Platform as a host.

Syndicate content