datagovuk

Government Should Do its Own Data Homework

I’ve been reflecting a little since OpenTech on the relationship between the developer community and government.

Let me set out my perspective first. My goal is to help ensure that the public sector publishes reusable data in the long term.

To do that, data publication needs to be sustainable. It needs to be embedded within the day-to-day activity of the public sector, something that seems as natural as the generation of PDF reports seems today. It also needs to be useful. It needs to be easy for anyone to understand and reuse the data, with minimal effort. It cannot be the case, long term, that you need to be an expert hacker to reuse government data.

To get there, we need to work towards a virtuous cycle in which the public sector is rewarded for publishing useful data well. The reward may come from financial savings, from increasing data quality, from better delivery of its remit, or simply from kudos. It doesn’t matter how, but there needs to be some reward, or it just won’t happen.

Over the last few years, government has had to be persuaded that it’s a good idea to release their data at all. The message from the developer community has been “give us your data and we’ll show you what we can do with it!” Through hack days and various similar activities, developers have excited, wowed and dazzled officials and politicians, opening their eyes to what could be done. Through sustained argument and political pressure, developers have set out the economic and moral case that releasing data not only could, but should happen.

They have been incredibly successful. We have data.gov.uk, open data from Ordnance Survey, strong commitments to open data within the Coalition Agreement, and the Public Sector Transparency Board who are now applying that pressure, with authority, at the heart of government.

My perception is that the argument that government should open up its data has basically been won. The questions within the public sector are now about how, not whether. And as a result, in this changed environment, I’m growing slightly uneasy about the core developer message of “give us your data and we’ll show you what we can do with it!”

There are two things about that message that concern me. First, it implies government is doing it all wrong. Second, it implies that government doesn’t need to do any better, because the developer community can take up all the slack and fill in all the gaps. It’s like getting fed up with a child struggling with their homework, and saying “oh, just give it here and I’ll do it!” It’s a narrative that simultaneously undermines the best efforts of those within government and removes from them the motivation and opportunity to learn to do better.

Hosting Gridworks Instances

I’ve written previously about how wonderful Freebase Gridworks (shortly to be “Google Refine”) is for cleaning and converting data. Within the UK public sector, there are two big barriers to its use, however:

  1. Public sector workers typically can’t install software on their computers.
  2. They’re also typically stuck with IE7 (or even, if they’re really unlucky, IE6).

On Standards

I’m beginning to think that ‘to recommend’ is an irregular verb like those that appeared every so often in Yes, Minister:

Bernard: It’s one of those irregular verbs, isn’t it: I have an independent mind; you are an eccentric; he is round the twist.

Something like: I recommend, you tell people what to do, he engages in premature standardisation.

Using Freebase Gridworks to Create Linked Data

When we encourage people to put their data on the web as linked data, the biggest question is “How?”. There are so many “How?” questions to answer:

  • how do we choose what URIs to use for things?
  • how do we choose what vocabularies to use?
  • how do we handle changing data?
  • how do we tell people how the data was created?
  • how do we publish it?
  • how will other people know about it?

and, of course:

  • how do we create it?

Why Linked Data for data.gov.uk?

data.gov.uk was finally launched to the public last week (still in beta, but now a more public beta than the beta that it’s been in for the last few months). It’s a great step forward, and everyone involved should be proud of both the amount of data that’s been made available and the website itself, which (unlike a lot of UK government IT) was developed rapidly by a small team based on open source software (and at low cost).

This is a first step on a long road.

Syndicate content