Updated to include some of Arto Bendicken’s recommendations.
This post is a response to Richard Pope’s Linked Data/RDF/SPARQL Documentation Challenge. In it, he asks for documentation of the following steps:
- Install an RDF store from a package management system on a computer running either Apple’s OSX or Ubuntu Desktop.
- Install a code library (again from a package management system) for talking to the RDF store in either PHP, Ruby or Python.
- Programatically load some real-world data into the RDF datastore using either PHP, Ruby or Python.
- Programatically retrieve data from the datastore with SPARQL using using either PHP, Ruby or Python.
- Convert retrieved data into an object or datatype that can be used by the chosen programming language (e.g. a Python dictionary).
I’ve been told so many time how RDF sucks for mainstream developers that it was the main point of my TPAC talk late last year. I think that this is a great motivating challenge for improving not only the documentation of how to use RDF stores and libraries but how to improve their generally installability and usability for developers as well.
Anyway, I thought I’d try to get as far as I could to see just how bad things really are. I am on Mac OS X, and I’m going to use Ruby (although I don’t really know it all that well, so please forgive my mistakes). I’ll breeze on through as if everything is hunky dory, but there are some caveats at the end.
There are a couple of additional things that I think are worth drawing out.
I got a little bit of pushback on my previous blog post for suggesting that W3C should standardise an API for RDF. (I’m talking here about a programming-interface-kind-of-API to enable developers to extract information out of an RDF document rather than a website-API to enable them to access RDF data in the first place.)
I just wanted to talk about a couple of actual real-life scenarios that make me want a standard RDF API:
A couple of weeks ago I did a talk at the TPAC Plenary Day about why RDF hasn’t had the uptake that it might and what could be done about it.
I felt quite uncomfortable about doing this for many reasons. The predominant one is that I’m well aware that the world is made by the people who turn up. It is far far easier to snipe from the sidelines than it is to put in the effort to attend telcons and face-to-face meetings, to engage on mailing lists, to write specifications and implementations and tutorials.
On the other hand, what I hope is that the perspective of someone who is outside that process, someone who tries to understand and interpret and use the results of that process, might be valuable. And so I aimed to provide that honestly.
In that spirit, I’m going to put my stake in the ground and say that there are three areas where I think W3C should be concentrating its efforts:
and that it should specifically not put its efforts into standardising another syntax for RDF based on JSON.
I’ve been reflecting a little since OpenTech on the relationship between the developer community and government.
Let me set out my perspective first. My goal is to help ensure that the public sector publishes reusable data in the long term.
To do that, data publication needs to be sustainable. It needs to be embedded within the day-to-day activity of the public sector, something that seems as natural as the generation of PDF reports seems today. It also needs to be useful. It needs to be easy for anyone to understand and reuse the data, with minimal effort. It cannot be the case, long term, that you need to be an expert hacker to reuse government data.
To get there, we need to work towards a virtuous cycle in which the public sector is rewarded for publishing useful data well. The reward may come from financial savings, from increasing data quality, from better delivery of its remit, or simply from kudos. It doesn’t matter how, but there needs to be some reward, or it just won’t happen.
Over the last few years, government has had to be persuaded that it’s a good idea to release their data at all. The message from the developer community has been “give us your data and we’ll show you what we can do with it!” Through hack days and various similar activities, developers have excited, wowed and dazzled officials and politicians, opening their eyes to what could be done. Through sustained argument and political pressure, developers have set out the economic and moral case that releasing data not only could, but should happen.
They have been incredibly successful. We have data.gov.uk, open data from Ordnance Survey, strong commitments to open data within the Coalition Agreement, and the Public Sector Transparency Board who are now applying that pressure, with authority, at the heart of government.
My perception is that the argument that government should open up its data has basically been won. The questions within the public sector are now about how, not whether. And as a result, in this changed environment, I’m growing slightly uneasy about the core developer message of “give us your data and we’ll show you what we can do with it!”
There are two things about that message that concern me. First, it implies government is doing it all wrong. Second, it implies that government doesn’t need to do any better, because the developer community can take up all the slack and fill in all the gaps. It’s like getting fed up with a child struggling with their homework, and saying “oh, just give it here and I’ll do it!” It’s a narrative that simultaneously undermines the best efforts of those within government and removes from them the motivation and opportunity to learn to do better.