The Real Deal: data.gov.uk

Jul 26, 2009

I’m sure that you’ve noticed that my recent posts have been somewhat obsessed with publishing and using public sector information. It’s because I’ve somehow been sucked into the work going on within the UK government, with Tim Berners-Lee and Nigel Shadbolt advising, to publish its data as linked data.

My recent blog posts about publishing data using Talis have actually been a front for much more complex work that I’ve been doing with a different data set.

Opaque URIs != Unreadable URIs

Jul 25, 2009

I’ve been talking about URIs a lot recently. One of the things that has bothered me about some of the conversations is the conflation of the concepts of “opaque URIs” and “non-human-readable URIs”. This is my argument for keeping the concepts separate.

The opacity of URIs is an important axiom in web architecture. It states that web applications must not try to pick apart URIs in order to work out information from them. Applications must not, for example, use the fact that a URI has .html at the end to infer that it resolves to an HTML document. It’s closely related to hypertext as engine of application state, in that opaque URIs should not be generated by web applications either: they must be discovered through links and the submission of forms.

But this has nothing to do with readability or hackability, both of which are extremely important for human users. Readable URIs help human users understand something about the resource that the URI is pointing to. Hackable URIs (by which I mean ones that people might manipulate by altering or removing portions of the path or query) enable human users to locate other resources that they might be interested in.

Creating Google Visualisations of Linked Data

Jul 23, 2009

Update: For the people who couldn’t read the post because the graph didn’t have 0 as its x-axis minimum, here is the version of the graph that does. I haven’t removed the other version, since doing so would make the comments confusing and I think it’s interesting to compare the two.

London Borough Life Expectancy Bar Chart with Y-Axis Minimum at 0

There are idealists who immediately see the publication of Open Data as a Good Thing, and leap up and down (metaphorically or physically) shouting “Raw Data Now”. There are also a whole bunch of people who need to “see the shiny”. They need to understand why publishing Open Data is a Good Thing, and most particularly what the benefit is going to be to them.

This is understandable. Publishers bear the cost of the development of URI schemes, XML formats, RDF ontologies, and other infrastructure for serving data, and the ongoing maintenance cost of domain resolution, bandwidth usage and user support. Even publishers with a public-service remit (who may not need to see monetary payback) need to be convinced that there will be some kind of return on the investment.

One result of making data available is that it enables you and others to easily construct nice visualisations over the data, and maybe spot useful patterns within it. This is particularly useful for public sector information because it can provide feedback on how effective a particular policy has been or where more resources need to be spent.

So I thought it would be worthwhile trying to explore how to create visualisations of some data, starting with the London Borough data that I’ve published using Talis.

Versioning URIs

Jul 22, 2009

Yesterday I went along to a workshop on developing URI guidelines for the UK public sector. Because of the current drive to get more UK public sector information online, and the fact that we have Tim Berners-Lee on board, there’s a growing recognition of the fact that we need URIs for the real-world and conceptual things that we talk about in the public sector: schools, roads, hospitals, services, councils, and so on.

One of the particular points of contention at the meeting was whether URIs for non-information resources (ie for real-world and conceptual things) should contain dates or version numbers, or not.

Publishing Linked Data on the Talis Platform, Part 3

Jul 21, 2009

This is the third in a series of posts about using the Talis Platform as a back end for serving linked data. In the first part, I showed how to add data to a store. In the second post, I showed how to use some PHP scripts to publish the data as Linked Data, at the URLs you use as your identifiers.

In this post, I’m going to begin the process of exposing the data in a way that makes it easy to locate and reuse. One of the biggest lessons I learned after the initial publication of the London Gazette data as RDFa is that the publication of data and metadata about individual items is not enough. To make the data usable, you have to make it discoverable. To make it discoverable there must be an entry point from which you can locate the data. One kind of easy entry point is a list.

In the case of the data about London Boroughs that I’ve been using, there aren’t currently any links to the data, so there is no way to discover it aside from me telling you the URI template (http://www.jenitennison.com/data/id/london-borough/{name}, where name is hyphenated and in lowercase) and you knowing the name of a London Borough that you want to look up. Discovery via a URI template that I told you relies on out-of-band information, and contradicts the RESTful tenet of “hypertext as the engine of application state”.

Instead, I need to offer an entry point from which you can follow links (or fill in forms) to discover information about the various London Boroughs. Since I’m dealing with a small set of information here, I’m going to do this in the straight-forward way of having http://www.jenitennison.com/data/london-borough contain a brief description of each of the known London Boroughs, including (obviously) a link to the URI for the London Borough, from which you can get more information.