Priorities for RDF

Nov 28, 2010

A couple of weeks ago I did a talk at the TPAC Plenary Day about why RDF hasn’t had the uptake that it might and what could be done about it.

I felt quite uncomfortable about doing this for many reasons. The predominant one is that I’m well aware that the world is made by the people who turn up. It is far far easier to snipe from the sidelines than it is to put in the effort to attend telcons and face-to-face meetings, to engage on mailing lists, to write specifications and implementations and tutorials.

On the other hand, what I hope is that the perspective of someone who is outside that process, someone who tries to understand and interpret and use the results of that process, might be valuable. And so I aimed to provide that honestly.

In that spirit, I’m going to put my stake in the ground and say that there are three areas where I think W3C should be concentrating its efforts:

  1. standardising (something like) TriG – Turtle plus named graphs
  2. standardising an API for the RDF data model
  3. standardising a path language for RDF that can be used by that API and others for easy access

and that it should specifically not put its efforts into standardising another syntax for RDF based on JSON.

Government Should Do its Own Data Homework

Sep 26, 2010

I’ve been reflecting a little since OpenTech on the relationship between the developer community and government.

Let me set out my perspective first. My goal is to help ensure that the public sector publishes reusable data in the long term.

To do that, data publication needs to be sustainable. It needs to be embedded within the day-to-day activity of the public sector, something that seems as natural as the generation of PDF reports seems today. It also needs to be useful. It needs to be easy for anyone to understand and reuse the data, with minimal effort. It cannot be the case, long term, that you need to be an expert hacker to reuse government data.

To get there, we need to work towards a virtuous cycle in which the public sector is rewarded for publishing useful data well. The reward may come from financial savings, from increasing data quality, from better delivery of its remit, or simply from kudos. It doesn’t matter how, but there needs to be some reward, or it just won’t happen.

Over the last few years, government has had to be persuaded that it’s a good idea to release their data at all. The message from the developer community has been “give us your data and we’ll show you what we can do with it!” Through hack days and various similar activities, developers have excited, wowed and dazzled officials and politicians, opening their eyes to what could be done. Through sustained argument and political pressure, developers have set out the economic and moral case that releasing data not only could, but should happen.

They have been incredibly successful. We have data.gov.uk, open data from Ordnance Survey, strong commitments to open data within the Coalition Agreement, and the Public Sector Transparency Board who are now applying that pressure, with authority, at the heart of government.

My perception is that the argument that government should open up its data has basically been won. The questions within the public sector are now about how, not whether. And as a result, in this changed environment, I’m growing slightly uneasy about the core developer message of “give us your data and we’ll show you what we can do with it!”

There are two things about that message that concern me. First, it implies government is doing it all wrong. Second, it implies that government doesn’t need to do any better, because the developer community can take up all the slack and fill in all the gaps. It’s like getting fed up with a child struggling with their homework, and saying “oh, just give it here and I’ll do it!” It’s a narrative that simultaneously undermines the best efforts of those within government and removes from them the motivation and opportunity to learn to do better.

On Standards

Sep 19, 2010

I’m beginning to think that ‘to recommend’ is an irregular verb like those that appeared every so often in Yes, Minister:

Bernard: It’s one of those irregular verbs, isn’t it: I have an independent mind; you are an eccentric; he is round the twist.

Something like: I recommend, you tell people what to do, he engages in premature standardisation.

Hosting Gridworks Instances

Sep 19, 2010

I’ve written previously about how wonderful Freebase Gridworks (shortly to be “Google Refine”) is for cleaning and converting data. Within the UK public sector, there are two big barriers to its use, however:

  1. Public sector workers typically can’t install software on their computers.
  2. They’re also typically stuck with IE7 (or even, if they’re really unlucky, IE6).

We’ve got around the first of these issues by installing Gridworks as a hosted (password-protected) instance on http://source.data.gov.uk/gridworks. Now, this isn’t perfect of course: Gridworks wasn’t designed to be used as a shared instance, so it doesn’t have support for multiple users operating on the same project at the same time, let alone things like user accounts or access control. So we’re operating on trust here – hoping that people won’t delete or edit each others’ projects – but it’s worth the risk.

It’s also not particularly pretty in that the links that Gridworks uses all assume that it’s running at the root of a web server. Fortunately, source.data.gov.uk doesn’t need to have a home page, so it’s possible to have Gridworks available at the root (although in hope of something better in the future, I’ve made the main point of entry /gridworks).

I got this working by installing Gridworks normally on the server and using Apache as a proxy, with the following configuration:

# Gridworks support
RewriteRule "^/$" "/gridworks" [R,L]
RewriteRule "^/gridworks(.*)$" "http://localhost:3333$1" [P,L]
RewriteRule "^/(.*)$" "http://localhost:3333/$1" [P,L]
ProxyPass /gridworks/ http://localhost:3333/
ProxyPassReverse /gridworks/ http://localhost:3333/

That’s it.

The IE7 problem will take a bit longer to solve, I imagine.

Using Freebase Gridworks to Create Linked Data

Aug 22, 2010

When we encourage people to put their data on the web as linked data, the biggest question is “How?”. There are so many “How?” questions to answer:

  • how do we choose what URIs to use for things?
  • how do we choose what vocabularies to use?
  • how do we handle changing data?
  • how do we tell people how the data was created?
  • how do we publish it?
  • how will other people know about it?

and, of course:

  • how do we create it?