Getting Started with RDF and SPARQL Using Sesame and Python

Jan 25, 2011

My previous post talked about how to install 4store as a triplestore, and use the Ruby library RDF.rb in order to process RDF extracted from that store. This was a response to Richard Pope’s Linked Data/RDF/SPARQL Documentation Challenge which asks for documentation of how to install a triplestore, load data into it, retrieve it using SPARQL and access the results as native structures using Ruby, Python or PHP.

I quite enjoyed writing the last one, so I thought I’d try again. As before, I am on Mac OS X, but this time I’m going to use Python, which I have not programmed in before. I like a challenge. You might not like the results!

Getting Started with RDF and SPARQL Using 4store and RDF.rb

Jan 15, 2011

Updated to include some of Arto Bendicken’s recommendations.

This post is a response to Richard Pope’s Linked Data/RDF/SPARQL Documentation Challenge. In it, he asks for documentation of the following steps:

  • Install an RDF store from a package management system on a computer running either Apple’s OSX or Ubuntu Desktop.
  • Install a code library (again from a package management system) for talking to the RDF store in either PHP, Ruby or Python.
  • Programatically load some real-world data into the RDF datastore using either PHP, Ruby or Python.
  • Programatically retrieve data from the datastore with SPARQL using using either PHP, Ruby or Python.
  • Convert retrieved data into an object or datatype that can be used by the chosen programming language (e.g. a Python dictionary).

I’ve been told so many time how RDF sucks for mainstream developers that it was the main point of my TPAC talk late last year. I think that this is a great motivating challenge for improving not only the documentation of how to use RDF stores and libraries but how to improve their generally installability and usability for developers as well.

Anyway, I thought I’d try to get as far as I could to see just how bad things really are. I am on Mac OS X, and I’m going to use Ruby (although I don’t really know it all that well, so please forgive my mistakes). I’ll breeze on through as if everything is hunky dory, but there are some caveats at the end.

URL design

Jan 1, 2011

Kyle Neath’s post on URL design (go read it) reflects a lot of the thinking that we went through in the design of the legislation.gov.uk URIs and the linked data API as used within data.gov.uk.

I found the section about HTML5’s History interface particularly interesting. We haven’t started using AJAX within legislation.gov.uk yet, but when we do, we will want to ensure that the different views these pages provide have distinct URIs, so that they remain bookmarkable and sharable. This is progressive enhancement applied to web applications at a deeper level than CSS and Javascript.

There are a couple of additional things that I think are worth drawing out.

Standardising an RDF API

Dec 4, 2010

I got a little bit of pushback on my previous blog post for suggesting that W3C should standardise an API for RDF. (I’m talking here about a programming-interface-kind-of-API to enable developers to extract information out of an RDF document rather than a website-API to enable them to access RDF data in the first place.)

I just wanted to talk about a couple of actual real-life scenarios that make me want a standard RDF API:

  1. TimBL wants an RDFa parser for Tabulator. There are a few RDFa parsers in Javascript; he chooses to use rdfQuery’s. Tabulator works on top of its own datastore, which has its own interface for inserting data. rdfQuery’s RDFa parser works on top of its own datastore, which has a different interface for inserting data. To use rdfQuery, TimBL has to either rewrite some of its internal code to call the methods that insert data into Tabulator’s datastore, or rewrite some of Tabulator’s internal code to call the methods that query rdfQuery’s datastore. The lack of a standard API for RDF has made it harder for TimBL to reuse my code.

  2. I’m working on Puelia, which needs to both parse and generate RDF in various ways and uses Moriarty to do so. I am editing the code to create triples in an in-memory RDF graph. I want to add a triple with a literal value. I have no idea how to do so, because I haven’t used Moriarty before, so I have to hunt through its documentation to find the add_literal_triple() function. The lack of a standard API for RDF has made it harder for me to use the library. If I ever wanted to switch to using some other PHP RDF library, such as EasyRDF or Graphite, for whatever reason, I would have to rewrite substantial parts of Puelia to use the functions provided by that library. The lack of a standard API for RDF has made Puelia less modular and adaptable.

For all that the W3C XML DOM seems to be universally reviled as an API for querying and creating XML, it and SAX mean that people can write XSLT and XProc processors (etc) without writing their own XML parser. They mean that whatever programming language I find myself writing code in, I know that I’ll be able to use getElementsByTagName() to get hold of elements with a particular name. They mean that XML parsers have a reason to improve over time, because applications can easily switch to better parsers when they come along. DOM and SAX provide a foundation, a level of standardisation and pluggability, that improves the XML landscape as a whole.

Of course sometimes components need tighter integration in order to achieve performance benefits; that’s a modularity/performance judgement on the part of the developer of the application. And of course there are better object model APIs for XML than the W3C XML DOM around. But better APIs are almost always programming-language or library specific; they are better simply because cross-platform APIs like DOM and SAX cannot take full advantage of the idioms of a particular programming language or style.

Now regarding the W3C’s involvement in creating such a standard, the argument seems to be “W3C created the horror that is the XML DOM and therefore every API specification that comes out of the W3C will be horrendous”.

I think sometimes that W3C is seen as a kind of monolithic organisation that exists over there, with secret committees whose work takes place out of public eyes until they deign to let us mere mortals read the results of their machinations. And who then fend off all comments and criticism in order to protect their lovingly crafted (but completely impractical) specifications.

What this overlooks is that the standards organisation merely provides the framework and administrative support within which groups who are interested in creating a standard can come together. The existing RDFa Working Group’s meetings are documented and discussion takes place in public and is open to all. I’m sure this will continue in the RDF Core Working Group when it is set up.

It will happen anyway. There is already work going on with the W3C to create a standard RDFa API, out of which, so I am told, will arise a Working Draft of an RDF API. From the looks of the most recent Working Draft I will be able to add a literal triple to a DataStore using something like

$store->add(
  $store->createTriple(
    $store->createBlankNode('puelia'),
    $store->createIRI('http://www.w3.org/2000/01/rdf-schema#label'),
    $store->createPlainLiteral('Puelia', 'en')
  )
);

(compared to

$graph->add_literal_triple('_:puelia', 'http://www.w3.org/2000/01/rdf-schema#label', 'Puelia', 'en')

in Moriarty). So OK, it needs a bit of work. But these are early days, and from the looks of the editor’s draft it’s likely to change quite rapidly.

W3C’s standardisation is what we make it; wherever it is done, it is a self-fulfilling prophecy that an API will not be suited to its purpose if the people who would benefit from implementing and using that API don’t get involved in its design. And to be clear, I am talking to myself more than anyone.

Priorities for RDF

Nov 28, 2010

A couple of weeks ago I did a talk at the TPAC Plenary Day about why RDF hasn’t had the uptake that it might and what could be done about it.

I felt quite uncomfortable about doing this for many reasons. The predominant one is that I’m well aware that the world is made by the people who turn up. It is far far easier to snipe from the sidelines than it is to put in the effort to attend telcons and face-to-face meetings, to engage on mailing lists, to write specifications and implementations and tutorials.

On the other hand, what I hope is that the perspective of someone who is outside that process, someone who tries to understand and interpret and use the results of that process, might be valuable. And so I aimed to provide that honestly.

In that spirit, I’m going to put my stake in the ground and say that there are three areas where I think W3C should be concentrating its efforts:

  1. standardising (something like) TriG – Turtle plus named graphs
  2. standardising an API for the RDF data model
  3. standardising a path language for RDF that can be used by that API and others for easy access

and that it should specifically not put its efforts into standardising another syntax for RDF based on JSON.