Government Should Do its Own Data Homework

Sep 26, 2010

I’ve been reflecting a little since OpenTech on the relationship between the developer community and government.

Let me set out my perspective first. My goal is to help ensure that the public sector publishes reusable data in the long term.

To do that, data publication needs to be sustainable. It needs to be embedded within the day-to-day activity of the public sector, something that seems as natural as the generation of PDF reports seems today. It also needs to be useful. It needs to be easy for anyone to understand and reuse the data, with minimal effort. It cannot be the case, long term, that you need to be an expert hacker to reuse government data.

To get there, we need to work towards a virtuous cycle in which the public sector is rewarded for publishing useful data well. The reward may come from financial savings, from increasing data quality, from better delivery of its remit, or simply from kudos. It doesn’t matter how, but there needs to be some reward, or it just won’t happen.

Over the last few years, government has had to be persuaded that it’s a good idea to release their data at all. The message from the developer community has been “give us your data and we’ll show you what we can do with it!” Through hack days and various similar activities, developers have excited, wowed and dazzled officials and politicians, opening their eyes to what could be done. Through sustained argument and political pressure, developers have set out the economic and moral case that releasing data not only could, but should happen.

They have been incredibly successful. We have, open data from Ordnance Survey, strong commitments to open data within the Coalition Agreement, and the Public Sector Transparency Board who are now applying that pressure, with authority, at the heart of government.

My perception is that the argument that government should open up its data has basically been won. The questions within the public sector are now about how, not whether. And as a result, in this changed environment, I’m growing slightly uneasy about the core developer message of “give us your data and we’ll show you what we can do with it!”

There are two things about that message that concern me. First, it implies government is doing it all wrong. Second, it implies that government doesn’t need to do any better, because the developer community can take up all the slack and fill in all the gaps. It’s like getting fed up with a child struggling with their homework, and saying “oh, just give it here and I’ll do it!” It’s a narrative that simultaneously undermines the best efforts of those within government and removes from them the motivation and opportunity to learn to do better.

On Standards

Sep 19, 2010

I’m beginning to think that ‘to recommend’ is an irregular verb like those that appeared every so often in Yes, Minister:

Bernard: It’s one of those irregular verbs, isn’t it: I have an independent mind; you are an eccentric; he is round the twist.

Something like: I recommend, you tell people what to do, he engages in premature standardisation.

Hosting Gridworks Instances

Sep 19, 2010

I’ve written previously about how wonderful Freebase Gridworks (shortly to be “Google Refine”) is for cleaning and converting data. Within the UK public sector, there are two big barriers to its use, however:

  1. Public sector workers typically can’t install software on their computers.
  2. They’re also typically stuck with IE7 (or even, if they’re really unlucky, IE6).

We’ve got around the first of these issues by installing Gridworks as a hosted (password-protected) instance on Now, this isn’t perfect of course: Gridworks wasn’t designed to be used as a shared instance, so it doesn’t have support for multiple users operating on the same project at the same time, let alone things like user accounts or access control. So we’re operating on trust here – hoping that people won’t delete or edit each others’ projects – but it’s worth the risk.

It’s also not particularly pretty in that the links that Gridworks uses all assume that it’s running at the root of a web server. Fortunately, doesn’t need to have a home page, so it’s possible to have Gridworks available at the root (although in hope of something better in the future, I’ve made the main point of entry /gridworks).

I got this working by installing Gridworks normally on the server and using Apache as a proxy, with the following configuration:

# Gridworks support
RewriteRule "^/$" "/gridworks" [R,L]
RewriteRule "^/gridworks(.*)$" "http://localhost:3333$1" [P,L]
RewriteRule "^/(.*)$" "http://localhost:3333/$1" [P,L]
ProxyPass /gridworks/ http://localhost:3333/
ProxyPassReverse /gridworks/ http://localhost:3333/

That’s it.

The IE7 problem will take a bit longer to solve, I imagine.

Using Freebase Gridworks to Create Linked Data

Aug 22, 2010

When we encourage people to put their data on the web as linked data, the biggest question is “How?”. There are so many “How?” questions to answer:

  • how do we choose what URIs to use for things?
  • how do we choose what vocabularies to use?
  • how do we handle changing data?
  • how do we tell people how the data was created?
  • how do we publish it?
  • how will other people know about it?

and, of course:

  • how do we create it? Credit Where it's Due

Aug 14, 2010

I’m aware I’ve been quiet for the past few months. This isn’t because nothing interesting has been going on – rather the opposite. It’s been difficult to get a chance to sit down and write about the work I’ve been doing, when actually doing the work has been taking up so much time.

Most of my time has been spent on the new website and its underlying API. There’s so much to say about this project that I hardly know where to start, so I’ll just try to do an overview and we can take it from there. Let me know what you’re interested in.