Those readers who follow the TAG or public-lod mailing lists over the last couple of weeks cannot have failed to notice a large number of posts on a theme that recurs on roughly a 9-monthly cycle within these communities: httpRange-14.
The reason for this particular recurrence was a Call for Change Proposals on the resolution. The TAG meets on Monday, and discussion of this issue is one of the first items on our agenda. These are my thoughts going in to that discussion.
If you’ve hung around in linked data circles for any amount of time, you’ll probably have come across the httpRange-14 issue. This was an issue placed before the W3C TAG years and years ago which has become a permathread on semantic web and linked data mailing lists. The basic question (or my interpretation of it) is:
Given that URIs can sometimes be used to name things that aren’t on the web (eg the novel Moby Dick) and sometimes things that are (eg the Wikipedia page about Moby Dick), how can you tell, for a given URI, how it’s being used so that you can work out what a statement (say, about its author) means?
As you may know, I accepted an appointment to the W3C’s Technical Architecture Group earlier this year. Last week was the first face-to-face meeting that I attended, hosted in the Stata Center at MIT. As you can tell from the agenda (which was in fact revised as we went along) it was a packed three days.
What I intend to do here is to briefly report on the major areas that we discussed and give a tiny bit of my own personal take on them. In no way should any of what I write here be judged as revealing the official opinion of the TAG, it’s just me saying what I think, and I’m not going to go into anything in depth because they’re all incredibly gnarly and contentious topics and I’d not only be here all year but also end up in a tar pit.
My previous post talked about how to install 4store as a triplestore, and use the Ruby library RDF.rb in order to process RDF extracted from that store. This was a response to Richard Pope’s Linked Data/RDF/SPARQL Documentation Challenge which asks for documentation of how to install a triplestore, load data into it, retrieve it using SPARQL and access the results as native structures using Ruby, Python or PHP.
I quite enjoyed writing the last one, so I thought I’d try again. As before, I am on Mac OS X, but this time I’m going to use Python, which I have not programmed in before. I like a challenge. You might not like the results!
Updated to include some of Arto Bendicken’s recommendations.
This post is a response to Richard Pope’s Linked Data/RDF/SPARQL Documentation Challenge. In it, he asks for documentation of the following steps:
- Install an RDF store from a package management system on a computer running either Apple’s OSX or Ubuntu Desktop.
- Install a code library (again from a package management system) for talking to the RDF store in either PHP, Ruby or Python.
- Programatically load some real-world data into the RDF datastore using either PHP, Ruby or Python.
- Programatically retrieve data from the datastore with SPARQL using using either PHP, Ruby or Python.
- Convert retrieved data into an object or datatype that can be used by the chosen programming language (e.g. a Python dictionary).
I’ve been told so many time how RDF sucks for mainstream developers that it was the main point of my TPAC talk late last year. I think that this is a great motivating challenge for improving not only the documentation of how to use RDF stores and libraries but how to improve their generally installability and usability for developers as well.
Anyway, I thought I’d try to get as far as I could to see just how bad things really are. I am on Mac OS X, and I’m going to use Ruby (although I don’t really know it all that well, so please forgive my mistakes). I’ll breeze on through as if everything is hunky dory, but there are some caveats at the end.
A couple of weeks ago I did a talk at the TPAC Plenary Day about why RDF hasn’t had the uptake that it might and what could be done about it.
I felt quite uncomfortable about doing this for many reasons. The predominant one is that I’m well aware that the world is made by the people who turn up. It is far far easier to snipe from the sidelines than it is to put in the effort to attend telcons and face-to-face meetings, to engage on mailing lists, to write specifications and implementations and tutorials.
On the other hand, what I hope is that the perspective of someone who is outside that process, someone who tries to understand and interpret and use the results of that process, might be valuable. And so I aimed to provide that honestly.
In that spirit, I’m going to put my stake in the ground and say that there are three areas where I think W3C should be concentrating its efforts:
and that it should specifically not put its efforts into standardising another syntax for RDF based on JSON.
When we encourage people to put their data on the web as linked data, the biggest question is “How?”. There are so many “How?” questions to answer:
and, of course:
One of the biggest selling points of linked data is that it’s supposed to facilitate web-scale distributed publication of data. Just as with the human web, anyone can publish data at their local site without having to go through any kind of central authority.
Just as with the human web, convergence on particular sets of URIs for particular kinds of things can happen in an evolutionary way: in a blog post I might point to Amazon when I want to talk about a particular book, Wikipedia to define the concepts I mention, people’s blogs or twitter streams when I mention them.
And with everyone using the same terms to talk about the same things, there’s the prospect of being able to easily pull together information from completely different sources to find connections and patterns that we’d never have found otherwise.
What’s been very unclear to me is how this distributed publication of data can be married with the use of SPARQL for querying. After all, SPARQL doesn’t (in its present form) support federated search, so to use SPARQL over all this distributed linked data, it sounds like you really need a central triplestore that contains everything you might want to query.
This post is an attempt to explore this tension, between distributed publication and centralised query, and to try to find a pattern that we might use within the UK government (and potentially more widely, of course) to publish and expose linked data in a queryable way. It’s a bit sketchy, and I’d welcome comments.
As we encourage linked data adoption within the UK public sector, something we run into again and again is that (unsurprisingly) particular domain areas have pre-existing standard ways of thinking about the data that they care about. There are existing models, often with multiple serialisations, such as in XML and a text-based form, that are supported by existing tool chains.
In contrast, if there is existing RDF in that domain area, it’s usually been designed by people who are more interested in the RDF than in the domain area, and is thus generally more focused on the goals of the typical casual data re-user rather than the professionals in the area.
As you probably know, I’ve been working quite a lot recently on the UK government’s use of linked data, and in particular on providing guidance for people who want to publish their data as linked data. One of the things that we need to provide guidance about is how to publish linked data that changes over time. I’ve touched on this topic before but things have progressed now to the stage where we have to make some real, practical, recommendations.