This post was imported from my old Drupal blog. To see the full thing, including comments, it's best to visit the Internet Archive.
In my last post, I showed how to add data to a Talis store. In this post, I’m going to show how you can use the Talis Platform as a back end for a Linked Data view on the RDF you added to it.
As you’ll see, the great thing about this method is that it only takes a couple of PHP files and an .htaccess
file on a server. Assuming that you’ve got a web server that supports PHP, it’s an approach you can use without installing anything. The code I’ve written is pretty generic and should be widely applicable; feel free to reuse and adapt it.
One of the principles of Linked Data is that if you make a GET request to a URI that’s used as an identifier within an RDF triple, you’ll get back some useful information about that resource. I’ve created URIs like http://www.jenitennison.com/data/id/london-borough/barnet
and added triples to Talis about those resources, but I haven’t yet put anything in place such that actually requesting http://www.jenitennison.com/data/id/london-borough/barnet
will provide a useful response. So how do I do that?
Well, it’s easy enough with a bit of PHP to do the forwarding. (By the way, this is the first bit of PHP I’ve ever done, so feel free to point out all the glaring problems with it; I’d love to learn.)
Now, the URI http://www.jenitennison.com/data/id/london-borough/barnet
is a URI that I’ve made up for a London Borough, and obviously when you request that URI you’re not actually going to get the London Borough delivered to you through your computer screen. Instead, based on Cool URIs for the Semantic Web, I want to either respond with a 303 See Other
redirection to a document resource describing the borough, or a 404 Not Found
to say that it doesn’t exist.
Note that I don’t just want to blindly respond with a 303 See Other
. If someone requests http://www.jenitennison.com/data/id/london-borough/rubbish
I want to tell them that the London Borough of ‘Rubbish’ doesn’t exist. If I redirected them to a document URI which then 404’ed, it would mean the London Borough of ‘Rubbish’ exists, but we have no information about it. So I can’t use a simple URL rewrite; I have to check for its presence first.
Existence Tests
The first task, then, is to test whether the resource exists. To do that, I can execute an ASK request on the SPARQL endpoint that Talis provides for the store. The ASK request simply looks like:
ASK { <http://www.jenitennison.com/data/id/london-borough/barnet> ?p ?v . }
which asks if there are any triples at all that involve that URI. I request the JSON response using the output=json
parameter. The JSON looks like:
{"head":{},"boolean":true}
if the store holds any triples about the borough and:
{"head":{},"boolean":false}
if it doesn’t. The URI for the request looks like:
http://api.talis.com/stores/rdfquery-dev1/services/sparql?query=ASK+%7B+%3Chttp%3A%2F%2Fwww.jenitennison.com%2Fdata%2Fid%2Flondon-borough%2Fbarnet%3E+%3Fp+%3Fv+.+%7D
which looks pretty horrendous when you write it out but is easy enough to construct with PHP. Here’s the exists()
function which does the test based on the server host name used in the request and a path that’s passed in.
$store = 'rdfquery-dev1';
function exists($idUri) {
global $store;
$host = $_SERVER['HTTP_HOST'];
$id = "http://$host$idUri";
$sparql = "ASK { <$id> ?p ?v . }";
$params = array('query' => $sparql, 'output' => 'json');
$query = http_build_query($params);
$request = "http://api.talis.com/stores/$store/services/sparql?$query";
$resource = file_get_contents($request, 'rb');
$result = strstr(strstr($resource, "\"boolean\":"), ":");
return !strstr($result, "false");
}
Handling Identifier URIs
With that function in utils.php
, it’s pretty easy to create a id.php
that does the redirection that I need to do. For my purposes, I’m using /id/
in all the URIs that identify abstract resources, and removing it for the document URIs that describe them. So the URI for the abstract resource http://www.jenitennison.com/data/id/london-borough/barnet
will redirect to the document resource http://www.jenitennison.com/data/london-borough/barnet
. Here’s id.php
:
<?php
include "utils.php";
$idUri = $_SERVER['REQUEST_URI'];
if (exists($idUri)) {
$docUri = str_replace('/id/', '/', $idUri);
header("Location: $docUri", true, 303);
} else {
error(404);
}
?>
The error()
function is also in utils.php
and looks like:
function error() {
header("HTTP/1.1 404 Not Found");
echo <<<EOF
<html>
<head>
<title>404 Not Found</title>
</head>
<body>
<h1>404 Not Found</h1>
<p>No such resource</p>
</body>
</html>
EOF;
}
I have id.php
which will check for the presence of triples about the requested resource, and respond with either a 404 Not Found
or a 303 See Other
. Now I need to invoke id.php
whenever someone requests an identifier URI like http://www.jenitennison.com/data/id/london-borough/barnet
. To do this, I put id.php
in the /data
directory within my webserver’s documents and added a .htaccess
file that looks like:
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^id/([^/]+)/(.+)$ id.php [L]
</IfModule>
This says that any requests that aren’t for existing files or directories and that start with id
should be redirected to id.php
. Since id.php
picks up on the original request URI, I don’t need to pass anything extra into it by way of query parameters and what have you.
To make this
.htaccess
file work, you have to havemod_rewrite
enabled and haveAllowOverride
includeFileInfo
(inhttp.conf
) . My ISP allows this, but the Apache installation on my Mac doesn’t, and Apache generally doesn’t out of the box, so you may need to do a bit of fiddling with configuration files.
Now, requesting http://www.jenitennison.com/data/id/london-borough/barnet
redirects me with a 303 See Other
to http://www.jenitennison.com/data/london-borough/barnet
, while requesting http://www.jenitennison.com/data/id/london-borough/rubbish
gives me a 404 Not Found
response.
Handling Document URIs
The next stage is supporting the document URIs like http://www.jenitennison.com/data/london-borough/barnet
. For them, I need to actually get the data about the resource out of the Talis Platform. Fortunately, there’s a really easy way of doing that using a simple request on the metabox like:
http://api.talis.com/stores/rdfquery-dev1/meta?about=http%3A%2F%2Fwww.jenitennison.com%2Fdata%2Fid%2Flondon-borough%2Fbarnet&output=rdf
In other words, you pass the URI of the resource that you’re interested in as the value of the about
parameter to the metabox store URI of http://api.talis.com/stores/{store}/meta?about={resource}&output=rdf
. This gives you back some RDF/XML. For the particular request above, the RDF/XML looks like:
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:j.0="http://www.jenitennison.com/ontology/data#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" >
<rdf:Description rdf:about="http://www.jenitennison.com/data/id/london-borough/barnet">
<rdfs:label>Barnet</rdfs:label>
<rdf:type rdf:resource="http://www.jenitennison.com/ontology/data#LondonBorough"/>
<j.0:maleLifeExpectancy rdf:datatype="http://www.w3.org/2001/XMLSchema#decimal">79.5</j.0:maleLifeExpectancy>
<j.0:femaleLifeExpectancy rdf:datatype="http://www.w3.org/2001/XMLSchema#decimal">83.6</j.0:femaleLifeExpectancy>
</rdf:Description>
</rdf:RDF>
Now I don’t know about you, but this RDF/XML really makes me cringe. It’s very obviously RDF, and it has a horrible j.0
prefix that no one would ever actually write if they were creating it in an editor. Readability matters, even for data that’s aimed at computers. If I’m going to use RDF/XML, I’d really like it to be sensible XML as well as being RDF (and Leigh Dodds has given some good guidelines about how to do it).
But of course since it’s XML it’s amendable to a spot of transformation. So it’s not hard to transform the RDF/XML above into:
<LondonBorough xmlns="http://www.jenitennison.com/ontology/data#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
rdf:about="http://www.jenitennison.com/data/id/london-borough/sutton">
<rdfs:label>Sutton</rdfs:label>
<femaleLifeExpectancy rdf:datatype="http://www.w3.org/2001/XMLSchema#decimal">82.6</femaleLifeExpectancy>
<maleLifeExpectancy rdf:datatype="http://www.w3.org/2001/XMLSchema#decimal">78.7</maleLifeExpectancy>
</LondonBorough>
which is a little more acceptable. Talis offers a transformation service at:
http://api.talis.com/tx
It only supports XSLT 1.0. (There’s also the W3C XSLT 2.0 Service based on Saxon, but I get the impression they don’t like people to use it in anger.)
Anyway, each Talis store contains a contentbox as well as a metabox. The metabox holds the RDF/XML, and the contentbox can hold anything you like. I can put the XSLT stylesheet (tidyRDF.xsl
) into my store’s contentbox using the command:
curl -X PUT -H "Content-type: application/xslt+xml" --digest -u username:password --data-binary @tidyRDF.xsl
http://api.talis.com/stores/rdfquery-dev1/items/tidyRDF.xsl
which then makes it accessible at:
http://api.talis.com/stores/rdfquery-dev1/items/tidyRDF.xsl
(I could also use my own server of course, but if Talis are offering free hosting, why not?…)
And that means that I can get the RDF/XML associated with http://www.jenitennison.com/data/london-borough/barnet
and transform it into some decent XML using a horrendous double-escaped URI that I’m not going to replicate here. The proxy.php
script does this all nicely behind the scenes:
<?php
include "utils.php";
$docUri = $_SERVER['REQUEST_URI'];
$dir = dirname($_SERVER['SCRIPT_NAME']);
$path = substr($docUri, strlen($dir));
$idUri = "$dir/id$path";
if (exists($idUri)) {
$domain = $_SERVER['HTTP_HOST'];
// URL for the RDF
$id = "http://$domain$idUri";
$params = array('about' => $id, 'output' => 'rdf');
$query = http_build_query($params);
$rdfURL = "http://api.talis.com/stores/$store/meta?$query";
// URL for the transformation
$params = array('xml-uri' => $rdfURL,
'xsl-uri' => "http://api.talis.com/stores/$store/items/tidyRDF.xsl");
$query = http_build_query($params);
$txURL = "http://api.talis.com/tx?$query";
$resource = fopen($txURL, 'rb');
header("Content-Type: application/rdf+xml");
header("Content-Location: $docUri.rdf");
fpassthru($resource);
return;
} else {
error(404);
}
?>
With proxy.php
in the /data
directory on my server, I need a slight tweak to the .htaccess
to make sure that all non-id requests go to it:
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^id/(.+)$ id.php [L]
RewriteCond %{REQUEST_URI} !\.php
RewriteRule ^(.+)$ proxy.php [L]
</IfModule>
And Bob, as they say, is your uncle.
Requests to identifier URIs redirect to document URIs. Requests to document URIs return relevant RDF/XML for the resource. Have a look at http://www.jenitennison.com/data/id/london-borough/barnet for example.
Updated: fixed the link in the final paragraph so it actually pointed to the right location. Duh.