<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.jenitennison.com/blog" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>uri</title>
 <link>http://www.jenitennison.com/blog/taxonomy/term/48</link>
 <description>The taxonomy view with a depth of 0.</description>
 <language>en</language>
<item>
 <title>Creating Linked Data - Part II: Defining URIs</title>
 <link>http://www.jenitennison.com/blog/node/136</link>
 <description>&lt;p&gt;This is the second instalment in a series of posts about how to create linked data from existing data sets, using traffic count data as an example. In the last instalment, I talked about &lt;a href=&quot;http://www.jenitennison.com/blog/node/135&quot;&gt;analysing and modelling data&lt;/a&gt;. This instalment discusses the creation of URIs for the various &lt;em&gt;things&lt;/em&gt; that have been identified within the model.&lt;/p&gt;

&lt;p&gt;This part of the process is the same as what you&amp;#8217;d do if you were simply creating a RESTful API to a website. The principal is that everything has a URI, and if you resolve that URI you get information about the thing.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;For the data.gov.uk site, we now have some &lt;a href=&quot;http://www.cabinetoffice.gov.uk/media/308995/public_sector_uri.pdf&quot;&gt;guidelines about the design of URIs for the UK public sector&lt;/a&gt;. Basically, URIs for &lt;em&gt;things&lt;/em&gt; should look like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://{sector}.data.gov.uk/id/{type of thing}/{thing identifier}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;There&amp;#8217;ll be plenty of examples in what follows.&lt;/p&gt;

&lt;h2&gt;Areas&lt;/h2&gt;

&lt;p&gt;Some of the things that we&amp;#8217;ve identified as being part of the traffic count dataset already have centrally-defined identifiers. As part of other data.gov.uk work, we&amp;#8217;ve defined URIs for administrative areas like countries, regions, local authority districts and local authorities. The templates for these URIs are:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://statistics.data.gov.uk/id/country/{ONS code}
http://statistics.data.gov.uk/id/government-office-region/{ONS code}
http://statistics.data.gov.uk/id/local-authority-district/{ONS code}
http://statistics.data.gov.uk/id/local-authority/{ONS code}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We can use these identifiers directly for the regions, districts and local authorities. But there&amp;#8217;s a problem with the country URI: we don&amp;#8217;t have the ONS code for the country, only the name of the country. Fortunately, we&amp;#8217;ve also defined URIs with this pattern:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://statistics.data.gov.uk/id/country?name={country name}
http://statistics.data.gov.uk/id/government-office-region?name={region name}
http://statistics.data.gov.uk/id/local-authority-district?name={district name}
http://statistics.data.gov.uk/id/local-authority?name={authority name}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;so in this situation we can use the name-based country URI and we&amp;#8217;ll get redirected to the canonical, code-based URI.&lt;/p&gt;

&lt;p&gt;Local authorities actually have two codes within the dataset that we have: the ONS code and a DfT code. I can well imagine that other datasets from the Department for Transport will only reference the DfT code, so it&amp;#8217;s a good idea to create URIs that are based on these codes; later on, we can state that the two identifiers actually mean exactly the same thing.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://transport.data.gov.uk/id/local-authority-district/{DfT code}
http://transport.data.gov.uk/id/local-authority/{DfT code}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So given the record:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&quot;England&quot;,&quot;North West&quot;,&quot;B&quot;,4315.00,&quot;00BZ&quot;,&quot;St.Helens Metropolitan Borough Council&quot;,
4,&quot;U&quot;,,&quot;Unclassified Urban&quot;,,
,352100,398200,
7/6/2001 00:00:00,&quot;N&quot;,7,1,0,5,1,0,0,0,0,0,0,0,0,6
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;the URIs we&amp;#8217;ve defined so far are:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://statistics.data.gov.uk/id/country?name=England
http://statistics.data.gov.uk/id/government-office-region/B
http://statistics.data.gov.uk/id/local-authority-district/00BZ
http://statistics.data.gov.uk/id/local-authority/00BZ
http://transport.data.gov.uk/id/local-authority-district/4315
http://transport.data.gov.uk/id/local-authority/4315
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Roads&lt;/h2&gt;

&lt;p&gt;Now we&amp;#8217;re onto things that aren&amp;#8217;t defined already. First is roads. If there&amp;#8217;s a road number, the obvious thing to use is that road number; something like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://transport.data.gov.uk/id/road/{road number}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://transport.data.gov.uk/id/road/B3178
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If there isn&amp;#8217;t a road number, we&amp;#8217;ll have to construct a URI. Since each count point is on one particular road, we can use the identifier of the count point to identify the road, so:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://transport.data.gov.uk/id/road/{class}-{count point number}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://transport.data.gov.uk/id/road/U-4
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Count Points&lt;/h2&gt;

&lt;p&gt;Count points can be identified through their number, so it makes sense to use that in the URI:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://transport.data.gov.uk/id/traffic-count-point/{count point number}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://transport.data.gov.uk/id/traffic-count-point/4
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Counts&lt;/h2&gt;

&lt;p&gt;The counts themselves don&amp;#8217;t have their own identifiers, but they can be identified through a combination of the count point that they&amp;#8217;re associated with, the direction of travel of the traffic that&amp;#8217;s being counted, and the date and time that the count is made. So we can create a URI that combines these things. To aid hackability, I&amp;#8217;m going to build on top of the traffic count point URI that we&amp;#8217;ve already defined:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://transport.data.gov.uk/id/traffic-count-point/{count point number}/direction/{direction}/hour/{time}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://transport.data.gov.uk/id/traffic-count-point/4/direction/N/hour/2001-06-07T07:00:00
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Observations&lt;/h2&gt;

&lt;p&gt;Again, observations build on top of the counts by adding a vehicle type to the mix, so we can construct URIs that reflect that:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://transport.data.gov.uk/id/traffic-count-point/{count point number}/direction/{direction}/hour/{time}/type/{vehicle type}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://transport.data.gov.uk/id/traffic-count-point/4/direction/N/hour/2001-06-07T07:00:00/type/motor-vehicle
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Road Categories&lt;/h2&gt;

&lt;p&gt;Road categories are a bit different from the kinds of things that we&amp;#8217;ve been talking about so far: they are concepts. For these URIs we use a slightly different pattern from the URIs above: &lt;code&gt;/def/&lt;/code&gt; rather than &lt;code&gt;/id/&lt;/code&gt;. For road categories we can use:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://transport.data.gov.uk/def/road-category/{category}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://transport.data.gov.uk/def/road-category/motorway
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Vehicle Types&lt;/h2&gt;

&lt;p&gt;Vehicle types are also concepts, so have similar URIs:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://transport.data.gov.uk/def/vehicle-category/{type}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://transport.data.gov.uk/def/vehicle-category/HGVa5
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Cardinal Directions&lt;/h2&gt;

&lt;p&gt;Cardinal directions are also concepts, but really they are global concepts, not specific to transport, or even to the UK. So it feels a bit strange to use URIs for them that imply that they somehow belong to data.gov.uk.&lt;/p&gt;

&lt;p&gt;Fortunately, for this kind of general concept we can use URIs defined by &lt;a href=&quot;http://dbpedia.org&quot;&gt;DBPedia&lt;/a&gt;. DBPedia is a linked data view on Wikipedia, so it has URIs for everything that Wikipedia has a page about, making it an excellent general purpose resource. The relevant URIs for the cardinal directions are:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://dbpedia.org/resource/North
http://dbpedia.org/resource/South
http://dbpedia.org/resource/East
http://dbpedia.org/resource/West
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;so that&amp;#8217;s what we&amp;#8217;ll use.&lt;/p&gt;

&lt;h2&gt;Dates, Times and Periods&lt;/h2&gt;

&lt;p&gt;For dates, times and periods, we can use the URIs provided by another general-purpose linked data resource: &lt;a href=&quot;http://www.placetime.com/&quot;&gt;placetime.com&lt;/a&gt;. URIs for instants have the pattern:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://placetime.com/instant/gregorian/{dateTime}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;while periods have the pattern:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://placetime.com/interval/gregorian/{dateTime}/{duration}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So the hour from 7-8am on 7th June 2001 would be:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://placetime.com/interval/gregorian/2001-06-07T07:00:00/PT1H
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and the year 2001 would be:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://placetime.com/interval/gregorian/2001-01-01T00:00:00/P1Y
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The thing is that the latter isn&amp;#8217;t particularly approachable. Calendar years are used all over the place, so it would be nice to have a set of URIs for them that we use consistently. Again, DBPedia provides URIs for every year, such as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://dbpedia.org/resource/2001
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;so where we need to refer to a calendar year, it would be good to reuse that.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;And that completes the sets of URIs that we need for this data. Stay tuned.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/136#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/46">linked data</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/48">uri</category>
 <pubDate>Sun, 22 Nov 2009 17:23:34 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">136 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>The Real Deal: data.gov.uk</title>
 <link>http://www.jenitennison.com/blog/node/115</link>
 <description>&lt;p&gt;I&amp;#8217;m sure that you&amp;#8217;ve noticed that my recent posts have been somewhat obsessed with publishing and using public sector information. It&amp;#8217;s because I&amp;#8217;ve somehow been sucked into the work going on within the UK government, &lt;a href=&quot;http://blogs.cabinetoffice.gov.uk/digitalengagement/post/2009/06/09/Data-So-what-happens-now.aspx&quot;&gt;with Tim Berners-Lee and Nigel Shadbolt advising&lt;/a&gt;, to publish its data as linked data.&lt;/p&gt;

&lt;p&gt;My &lt;a href=&quot;http://www.jenitennison.com/blog/node/109&quot;&gt;recent&lt;/a&gt; &lt;a href=&quot;http://www.jenitennison.com/blog/node/110&quot;&gt;blog&lt;/a&gt; &lt;a href=&quot;http://www.jenitennison.com/blog/node/111&quot;&gt;posts&lt;/a&gt; about publishing data using &lt;a href=&quot;http://www.talis.com/platform/&quot;&gt;Talis&lt;/a&gt; have actually been a front for much more complex work that I&amp;#8217;ve been doing with a different data set.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;As an early demonstration of how existing government data sets might be turned into linked data, a few weeks ago I was given a CSV file containing road traffic counts; the raw data that lies behind the &lt;a href=&quot;http://www.dft.gov.uk/matrix/&quot;&gt;traffic flow information&lt;/a&gt; available on the Department for Transport website. The data is really interesting and ripe for visualisations and analysis. For each hour of particular days each year, at particular points on many roads within the UK, the Department for Transport measures the number of bicycles, motorbikes, cars, vans, buses and HGVs of various types that roll past in each direction. The data contains information about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the count of each of the various classes of traffic that pass the point in a particular direction on a particular hour of a particular day&lt;/li&gt;
&lt;li&gt;the points at which these measurements were taken&lt;/li&gt;
&lt;li&gt;the roads on which the points are situated&lt;/li&gt;
&lt;li&gt;the areas in which the points are situated&lt;/li&gt;
&lt;li&gt;the local authority that is in charge of these areas&lt;/li&gt;
&lt;li&gt;the region that the area is in&lt;/li&gt;
&lt;li&gt;the country that the region is in &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The challenge was to turn the 386Mb CSV file into linked data. The result is up and available for you to look at; a good starting point is &lt;a href=&quot;http://geo.data.gov.uk/0/country&quot;&gt;http://geo.data.gov.uk/0/country&lt;/a&gt;. Just follow the links from there.&lt;/p&gt;

&lt;p&gt;With a few false starts and mis-steps, this is the process that I went through:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Tidied the CSV file so that it could be processed using awk. That meant replacing the commas that were delimiters with &lt;code&gt;|&lt;/code&gt;s. It also meant removing a couple of weird ^M characters that had snuck into the file.&lt;/li&gt;
&lt;li&gt;Examined the data and came up with an informal ontology and prototype URI scheme.&lt;/li&gt;
&lt;li&gt;Created a bunch of awk scripts to extract different data from the files and create RDF/XML from it.&lt;/li&gt;
&lt;li&gt;Ran the scripts to create RDF/XML.&lt;/li&gt;
&lt;li&gt;Uploaded the data into a Talis store.&lt;/li&gt;
&lt;li&gt;Created appropriate PHP for the data and put it into a proxy server.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Some of this has been covered by my recent posts, so I&amp;#8217;m just going to talk about a few of these steps in a bit more detail.&lt;/p&gt;

&lt;p&gt;First, the URIs. Frankly, they&amp;#8217;re an experiment to see how it plays. The templates are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;countries: &lt;code&gt;http://geo.data.gov.uk/0/id/country/{name}&lt;/code&gt;, eg &lt;a href=&quot;http://geo.data.gov.uk/0/id/country/england&quot;&gt;http://geo.data.gov.uk/0/id/country/england&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;regions: &lt;code&gt;http://geo.data.gov.uk/0/id/region/{name}&lt;/code&gt;, eg &lt;a href=&quot;http://geo.data.gov.uk/0/id/region/north-west&quot;&gt;http://geo.data.gov.uk/0/id/region/north-west&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;areas: &lt;code&gt;http://geo.data.gov.uk/0/id/area/{ONS code}&lt;/code&gt;, eg &lt;a href=&quot;http://geo.data.gov.uk/0/id/area/00KA&quot;&gt;http://geo.data.gov.uk/0/id/area/00KA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;local authorities: &lt;code&gt;http://local-government.data.gov.uk/0/id/local-authority/{ONS code for area}&lt;/code&gt;, eg &lt;a href=&quot;http://local-government.data.gov.uk/0/id/local-authority/00KA&quot;&gt;http://local-government.data.gov.uk/0/id/local-authority/00KA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;roads: &lt;code&gt;http://transport.data.gov.uk/0/id/road/{name}&lt;/code&gt; or &lt;code&gt;http://transport.data.gov.uk/0/id/road/U-{random number}&lt;/code&gt;, eg &lt;a href=&quot;http://transport.data.gov.uk/0/id/road/M5&quot;&gt;http://transport.data.gov.uk/0/id/road/M5&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;traffic count points: &lt;code&gt;http://transport.data.gov.uk/0/id/traffic-count-point/{number}&lt;/code&gt;, eg &lt;a href=&quot;http://transport.data.gov.uk/0/id/traffic-count-point/36195&quot;&gt;http://transport.data.gov.uk/0/id/traffic-count-point/36195&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;traffic counts: &lt;code&gt;http://transport.data.gov.uk/0/id/traffic-count/{point number}/{direction}/{date}/{hour}/{traffic type}&lt;/code&gt;, eg &lt;a href=&quot;http://transport.data.gov.uk/0/id/traffic-count/4/N/2008-06-05/08:00:00/HGVr2&quot;&gt;http://transport.data.gov.uk/0/id/traffic-count/4/N/2008-06-05/08:00:00/HGVr2&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The subdomains are one way of subdividing the vast set of public sector information into vague categories that might be handled by different departments, without using the (highly changeable) department names in the URI. The &lt;code&gt;/0&lt;/code&gt; portion of each URI is a version number: these URIs are experimental and liable to be unsupported in the future so they&amp;#8217;re marked with a version 0. The &lt;code&gt;/id&lt;/code&gt; portion of each URI indicates that these are URIs for non-information resources; the response is a &lt;code&gt;303 See Other&lt;/code&gt; redirect to the same URIs but without the &lt;code&gt;/id&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;After the &lt;code&gt;/id&lt;/code&gt;, the URIs follow a common pattern of naming a class of resource, followed by an appropriate identifier for that resource. The identifiers themselves are designed to be unique, &lt;a href=&quot;http://www.jenitennison.com/blog/node/112&quot;&gt;unlikely to change&lt;/a&gt;, and &lt;a href=&quot;http://www.jenitennison.com/blog/node/114&quot;&gt;human readable&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The ontologies, well, actually they don&amp;#8217;t exist as yet except in my head. It&amp;#8217;s been more important to make the data available than to provide ontologies for it. Triplestores and SPARQL queries work without ontologies; indeed you have to go out of your way to find applications that actually reason with them. Like schemas for XML documents, they&amp;#8217;re not absolutely essential, but useful for documentation purposes and &lt;em&gt;potentially&lt;/em&gt; useful for applications.&lt;/p&gt;

&lt;p&gt;There are, though, a couple of &lt;a href=&quot;http://www.w3.org/2004/02/skos/&quot;&gt;SKOS&lt;/a&gt; schemes for categorising roads and vehicle types. These are available via:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;http://transport.data.gov.uk/0/category/road&lt;/li&gt;
&lt;li&gt;http://transport.data.gov.uk/0/category/vehicle&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They were informed by the &lt;a href=&quot;http://www.cbrd.co.uk/roadsfaq/&quot;&gt;British Roads FAQ&lt;/a&gt; and the &lt;a href=&quot;http://www.dft.gov.uk/matrix/forms/definitions.aspx&quot;&gt;data definitions from the Department for Transport&lt;/a&gt;. I heartily recommend a read; it&amp;#8217;s scintillating stuff!&lt;/p&gt;

&lt;p&gt;Anyway, with this size of file, and the kind of processing that needed to be done with it, the simple XSLT that I talked about &lt;a href=&quot;http://www.jenitennison.com/blog/node/109&quot;&gt;previously&lt;/a&gt; for extracting data out of CSV files just wasn&amp;#8217;t going to cut it. Awk, on the other hand, is designed for this kind of processing. Most of the RDF/XML could be generated by collecting unique values from the file. For example, to generate the RDF/XML for the regions I used:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;BEGIN { 
  FS = &quot;|&quot;;
  print &quot;&amp;lt;rdf:RDF xmlns:rdf=\&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#\&quot;&quot;;
  print &quot;  xmlns:rdfs=\&quot;http://www.w3.org/2000/01/rdf-schema#\&quot;&quot;;
  print &quot;  xmlns:g=\&quot;http://geo.data.gov.uk/0/ontology/geo#\&quot;&amp;gt;&quot;;
}
FNR &amp;gt; 1 {
  countries[$2] = substr($1, 2, length($1) - 2);
  regions[$2] = substr($2, 2, length($2) - 2);
  codes[$2] = substr($3, 2, length($3) - 2);
}
END { 
  for (region in regions) {
    country = countries[region];
    name = regions[region];
    code = codes[region];
    path = tolower(name);
    gsub(&quot; &quot;, &quot;-&quot;, path);
    print &quot;&amp;lt;g:Region rdf:about=\&quot;http://geo.data.gov.uk/0/id/region/&quot; path &quot;\&quot;&amp;gt;&quot;;
    print &quot;  &amp;lt;rdfs:label&amp;gt;&quot; name &quot;&amp;lt;/rdfs:label&amp;gt;&quot;;
    print &quot;  &amp;lt;g:isInCountry&amp;gt;&quot;;
    print &quot;    &amp;lt;g:Country rdf:about=\&quot;http://geo.data.gov.uk/0/id/country/&quot; tolower(country) &quot;\&quot;&amp;gt;&quot;;
    print &quot;      &amp;lt;g:hasRegion rdf:resource=\&quot;http://geo.data.gov.uk/0/id/region/&quot; path &quot;\&quot; /&amp;gt;&quot;;
    print &quot;    &amp;lt;/g:Country&amp;gt;&quot;;
    print &quot;  &amp;lt;/g:isInCountry&amp;gt;&quot;;
    if (code != &quot;&quot;) {
      print &quot;  &amp;lt;g:ONScode rdf:datatype=\&quot;http://www.w3.org/2001/XMLSchema#NCName\&quot;&amp;gt;&quot; code &quot;&amp;lt;/g:ONScode&amp;gt;&quot;;
    }
    print &quot;&amp;lt;/g:Region&amp;gt;&quot;;
  }
  print &quot;&amp;lt;/rdf:RDF&amp;gt;&quot;; 
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This generated RDF/XML that looks like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;rdf:RDF xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;
  xmlns:rdfs=&quot;http://www.w3.org/2000/01/rdf-schema#&quot;
  xmlns:g=&quot;http://geo.data.gov.uk/0/ontology/geo#&quot;&amp;gt;
&amp;lt;g:Region rdf:about=&quot;http://geo.data.gov.uk/0/id/region/london&quot;&amp;gt;
  &amp;lt;rdfs:label&amp;gt;London&amp;lt;/rdfs:label&amp;gt;
  &amp;lt;g:isInCountry&amp;gt;
    &amp;lt;g:Country rdf:about=&quot;http://geo.data.gov.uk/0/id/country/england&quot;&amp;gt;
      &amp;lt;g:hasRegion rdf:resource=&quot;http://geo.data.gov.uk/0/id/region/london&quot; /&amp;gt;
    &amp;lt;/g:Country&amp;gt;
  &amp;lt;/g:isInCountry&amp;gt;
  &amp;lt;g:ONScode rdf:datatype=&quot;http://www.w3.org/2001/XMLSchema#NCName&quot;&amp;gt;H&amp;lt;/g:ONScode&amp;gt;
&amp;lt;/g:Region&amp;gt;
&amp;lt;g:Region rdf:about=&quot;http://geo.data.gov.uk/0/id/region/yorkshire-and-the-humber&quot;&amp;gt;
  &amp;lt;rdfs:label&amp;gt;Yorkshire and The Humber&amp;lt;/rdfs:label&amp;gt;
  &amp;lt;g:isInCountry&amp;gt;
    &amp;lt;g:Country rdf:about=&quot;http://geo.data.gov.uk/0/id/country/england&quot;&amp;gt;
      &amp;lt;g:hasRegion rdf:resource=&quot;http://geo.data.gov.uk/0/id/region/yorkshire-and-the-humber&quot; /&amp;gt;
    &amp;lt;/g:Country&amp;gt;
  &amp;lt;/g:isInCountry&amp;gt;
  &amp;lt;g:ONScode rdf:datatype=&quot;http://www.w3.org/2001/XMLSchema#NCName&quot;&amp;gt;D&amp;lt;/g:ONScode&amp;gt;
&amp;lt;/g:Region&amp;gt;
...
&amp;lt;/rdf:RDF&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In other cases, I needed to split up the RDF/XML that was generated into several files. Uploads to Talis of more than about 2Mb cause the upload to fail. The traffic count point RDF/XML needed to be split into 13 separate files. The traffic counts themselves&amp;#8230; well, I haven&amp;#8217;t managed to do it all yet but to give you an idea, the 2008 data alone generated 1800 RDF/XML files, each about 1.6Mb in size and each taking about a minute to upload. What&amp;#8217;s there now is all the 2008 data, and the overall motor vehicle counts from all the years. More will be added gradually.&lt;/p&gt;

&lt;p&gt;The awk script that generates the count data in separate files is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;BEGIN { 
  FS = &quot;|&quot;;
  fileCount = 0;
  countCount = 99999;
  curlFile = &quot;traffic-counts.curl.sh&quot;;
}
FNR &amp;gt; 1 &amp;amp;&amp;amp; $15 ~ /\/2008 / {
  countCount += 1;
  if (countCount &amp;gt; 200) {
    if (fileCount != 0) {
      print &quot;&amp;lt;/rdf:RDF&amp;gt;&quot; &amp;gt; fileName; 
      close(fileName);
    }
    countCount = 0;
    fileCount += 1;
    fileName = &quot;traffic-counts/traffic-counts.&quot; fileCount &quot;.rdf&quot;;
    print &quot;creating&quot;, fileName;
    print &quot;echo loading&quot;, fileName &amp;gt; curlFile;
    print &quot;curl -H \&quot;Content-type: application/rdf+xml\&quot; -o progress.txt --digest -u username:password --data-binary @&quot; fileName &quot; http://api.talis.com/stores/transport/meta&quot; &amp;gt; curlFile;

    print &quot;&amp;lt;?xml version=\&quot;1.0\&quot; encoding=\&quot;ASCII\&quot;?&amp;gt;&quot; &amp;gt; fileName;
    print &quot;&amp;lt;rdf:RDF xmlns:rdf=\&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#\&quot;&quot; &amp;gt; fileName;
    print &quot;  xmlns:rdfs=\&quot;http://www.w3.org/2000/01/rdf-schema#\&quot;&quot; &amp;gt; fileName;
    print &quot;  xmlns:xsd=\&quot;http://www.w3.org/2001/XMLSchema#\&quot;&quot; &amp;gt; fileName;
    print &quot;  xmlns:t=\&quot;http://transport.data.gov.uk/0/ontology/traffic#\&quot;&quot; &amp;gt; fileName;
    print &quot;  xml:base=\&quot;http://transport.data.gov.uk/0/id/traffic-count/\&quot;&amp;gt;&quot; &amp;gt; fileName;
  }

  cp = $7;
  date = $15;
  direction = substr($16, 2, length($16) - 2);
  split(date, dateFields, &quot; &quot;);
  date = dateFields[1];
  split(date, dateFields, &quot;/&quot;);
  date = sprintf(&quot;%04d-%02d-%02d&quot;, dateFields[3], dateFields[2], dateFields[1]);
  hour = sprintf(&quot;%02d:00:00&quot;, $17);
  base = &quot;http://transport.data.gov.uk/0/id/traffic-count/&quot; cp &quot;/&quot; direction &quot;/&quot; date &quot;/&quot; hour;

  cycles = $18;
  motorbikes = $19;
  ...

  print &quot;&amp;lt;t:Count rdf:about=\&quot;&quot; base &quot;/cycle\&quot;&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;t:point&amp;gt;&quot; &amp;gt; fileName;
  print &quot;    &amp;lt;t:CountPoint rdf:about=\&quot;http://transport.data.gov.uk/0/id/traffic-count-point/&quot; cp &quot;\&quot;&amp;gt;&quot; &amp;gt; fileName;
  print &quot;      &amp;lt;t:count rdf:resource=\&quot;&quot; base &quot;/cycle\&quot; /&amp;gt;&quot; &amp;gt; fileName;
  print &quot;    &amp;lt;/t:CountPoint&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;/t:point&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;t:hour rdf:datatype=\&quot;http://www.w3.org/2001/XMLSchema#dateTime\&quot;&amp;gt;&quot; date &quot;T&quot; hour &quot;&amp;lt;/t:hour&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;t:direction&amp;gt;&quot; direction &quot;&amp;lt;/t:direction&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;t:category rdf:resource=\&quot;http://transport.data.gov.uk/0/category/bicycle\&quot; /&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;rdf:value  rdf:datatype=\&quot;http://www.w3.org/2001/XMLSchema#integer\&quot;&amp;gt;&quot; cycles &quot;&amp;lt;/rdf:value&amp;gt;&quot; &amp;gt; fileName;
  print &quot;&amp;lt;/t:Count&amp;gt;&quot; &amp;gt; fileName;
  print &quot;&amp;lt;t:Count rdf:about=\&quot;&quot; base &quot;/motorbike\&quot;&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;t:point&amp;gt;&quot; &amp;gt; fileName;
  print &quot;    &amp;lt;t:CountPoint rdf:about=\&quot;http://transport.data.gov.uk/0/id/traffic-count-point/&quot; cp &quot;\&quot;&amp;gt;&quot; &amp;gt; fileName;
  print &quot;      &amp;lt;t:count rdf:resource=\&quot;&quot; base &quot;/motorbike\&quot; /&amp;gt;&quot; &amp;gt; fileName;
  print &quot;    &amp;lt;/t:CountPoint&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;/t:point&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;t:hour rdf:datatype=\&quot;http://www.w3.org/2001/XMLSchema#dateTime\&quot;&amp;gt;&quot; date &quot;T&quot; hour &quot;&amp;lt;/t:hour&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;t:direction&amp;gt;&quot; direction &quot;&amp;lt;/t:direction&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;t:category rdf:resource=\&quot;http://transport.data.gov.uk/0/category/motorbike\&quot; /&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;rdf:value  rdf:datatype=\&quot;http://www.w3.org/2001/XMLSchema#integer\&quot;&amp;gt;&quot; motorbikes &quot;&amp;lt;/rdf:value&amp;gt;&quot; &amp;gt; fileName;
  print &quot;&amp;lt;/t:Count&amp;gt;&quot; &amp;gt; fileName;
  ...
}
END {
  print &quot;&amp;lt;/rdf:RDF&amp;gt;&quot; &amp;gt; fileName; 
  close(fileName);
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This also generates a shall script that includes the curl instructions to upload the files.&lt;/p&gt;

&lt;p&gt;The original data contained easing/northing information about each point when generally latitude/longitude is easier for mapping. So I extracted the easting/northings, used the &lt;a href=&quot;http://gps.ordnancesurvey.co.uk/convert.asp&quot;&gt;free (Windows only) software available via the Ordnance Survey&lt;/a&gt; to turn these into latitude/longitude &amp;#8212; there is a &lt;a href=&quot;http://gps.ordnancesurvey.co.uk/convertbatch.asp?location=0&quot;&gt;web service&lt;/a&gt; to do the same, but you can only do 200 coordinates at a time &amp;#8212; converted those into decimals, then RDF, and uploaded them.&lt;/p&gt;

&lt;p&gt;The PHP scripts that serve the data as linked data are exactly what I&amp;#8217;ve &lt;a href=&quot;http://www.jenitennison.com/blog/node/111&quot;&gt;shown before&lt;/a&gt;. I amended the &lt;code&gt;.htaccess&lt;/code&gt; file to redirect to an appropriate PHP script like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;IfModule mod_rewrite.c&amp;gt;
  RewriteEngine on
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteCond %{REQUEST_FILENAME} !-d

  RewriteRule ^id/(.+)$  id.php [L]

  RewriteCond %{REQUEST_URI} !\.php
  RewriteRule ^([^/]+)(/.+)? $1.php$2 [L,QSA]
&amp;lt;/IfModule&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and created PHP scripts for each of the types of data being published. For example, &lt;code&gt;region.php&lt;/code&gt; is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;?php
  include &quot;utils.php&quot;;
  proxy(&#039;http://geo.data.gov.uk/0/ontology/geo#Region&#039;, 50);
?&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And there we have it. Linked traffic count data on the web.&lt;/p&gt;

&lt;p&gt;(And because this is all published through Talis, there&amp;#8217;s also a &lt;a href=&quot;http://api.talis.com/stores/transport/services/sparql&quot;&gt;SPARQL endpoint&lt;/a&gt; that you could use to run queries and &lt;a href=&quot;http://www.jenitennison.com/blog/node/112&quot;&gt;create visualisations&lt;/a&gt;. Knock yourself out.)&lt;/p&gt;

&lt;p&gt;Please take a look and comment on what we&amp;#8217;ve done. What&amp;#8217;s your opinion of the URI scheme? Is it useful to be able to access the data as linked data? Which other formats would you like to see?&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/115#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/46">linked data</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/50">psi</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/31">rdf</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/47">Talis</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/48">uri</category>
 <pubDate>Sun, 26 Jul 2009 16:38:54 +0100</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">115 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Opaque URIs != Unreadable URIs</title>
 <link>http://www.jenitennison.com/blog/node/114</link>
 <description>&lt;p&gt;I&amp;#8217;ve been &lt;a href=&quot;http://www.jenitennison.com/blog/node/112&quot;&gt;talking about URIs&lt;/a&gt; a lot recently. One of the things that has bothered me about some of the conversations is the conflation of the concepts of &amp;#8220;opaque URIs&amp;#8221; and &amp;#8220;non-human-readable URIs&amp;#8221;. This is my argument for keeping the concepts separate.&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;http://www.w3.org/DesignIssues/Axioms.html#opaque&quot;&gt;opacity of URIs&lt;/a&gt; is an important axiom in web architecture. It states that web applications must not try to pick apart URIs in order to work out information from them. Applications must not, for example, use the fact that a URI has &lt;code&gt;.html&lt;/code&gt; at the end to infer that it resolves to an HTML document. It&amp;#8217;s closely related to &lt;a href=&quot;http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven&quot;&gt;hypertext as engine of application state&lt;/a&gt;, in that opaque URIs should not be generated by web applications either: they must be discovered through links and the submission of forms.&lt;/p&gt;

&lt;p&gt;But this has nothing to do with readability or hackability, both of which are &lt;a href=&quot;http://www.useit.com/alertbox/990321.html&quot;&gt;extremely important for human users&lt;/a&gt;. Readable URIs help human users understand something about the resource that the URI is pointing to. Hackable URIs (by which I mean ones that people might manipulate by altering or removing portions of the path or query) enable human users to locate other resources that they might be interested in.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;Before I go further, a couple of caveats:&lt;/p&gt;

&lt;p&gt;I am not saying that every URI must contain a natural language identifier. An example is the URI for a school, which could include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the name of the school&lt;/li&gt;
&lt;li&gt;the unique reference number for the school&lt;/li&gt;
&lt;li&gt;the record number for the school in the database that is being published on the web&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using the name of the school, as &lt;a href=&quot;http://www.jenitennison.com/blog/node/112&quot;&gt;I&amp;#8217;ve discussed&lt;/a&gt;, is probably a bad idea because of its lack of longevity. Using the record number for the school within the particular database that&amp;#8217;s being published is entirely non-human-readable because there is simply no way of finding out what that would be for a given school. The unique reference number for the school, on the other hand, may be an obscure series of digits, but it is a meaningful one which renders the URI readable and hackable.&lt;/p&gt;

&lt;p&gt;There are also times when uniquely identifying a resource using natural identifiers within the URI leads to incredibly long and complex URIs, in which case the &amp;#8216;human readable&amp;#8217; version isn&amp;#8217;t actually human readable. Introducing non-human-readable components is then the only option.&lt;/p&gt;

&lt;p&gt;Back to my argument:&lt;/p&gt;

&lt;p&gt;Why should URIs support humans doing things that applications must not? Because humans are intelligent. When humans hack a URI, they are aware that they are making a guess, taking a chance and might or might not end up at something useful. If they get a 404, or even more importantly if they get to information about something that they weren&amp;#8217;t expecting, they are intelligent enough to recognise that the chance they took didn&amp;#8217;t pay off. Applications aren&amp;#8217;t intelligent. They can&amp;#8217;t tell the difference between a right guess and a wrong guess, so it&amp;#8217;s best not to let them guess at all.&lt;/p&gt;

&lt;p&gt;Let me give an example. Let&amp;#8217;s say that I&amp;#8217;m creating a URI for a particular house. Here are two possible URIs:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://id.example.org/house/NG9_3HZ/4
http://id.example.org/house/0aef0218
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The first is readable and hackable. A human could change the house number or the postcode. They could remove the house number and expect a list of houses within the postcode. The second is not readable or hackable: there is no way to know what you would get if you changed the identifier within the URI.&lt;/p&gt;

&lt;p&gt;Now it is true that an application accessing a site that used the URIs like the first could create those URIs programmatically whereas it couldn&amp;#8217;t (perhaps) create a URI like the second. But if it did create the URIs programmatically it would be the fault of the application, not the fault of the URI.&lt;/p&gt;

&lt;p&gt;As publishers, it is our responsibility to provide humans URIs that are meaningful and hackable, and to provide applications with the means of creating or identifying these URIs through forms and links. But it is not our responsibility to prevent applications from doing things that they should not do by deliberately obfuscating our URIs.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/114#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/22">rest</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/48">uri</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/12">web</category>
 <pubDate>Sat, 25 Jul 2009 21:41:34 +0100</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">114 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Versioning URIs</title>
 <link>http://www.jenitennison.com/blog/node/112</link>
 <description>&lt;p&gt;Yesterday I went along to a workshop on developing URI guidelines for the UK public sector. Because of the current drive to get more UK public sector information online, and the fact that &lt;a href=&quot;http://blogs.cabinetoffice.gov.uk/digitalengagement/post/2009/06/09/Data-So-what-happens-now.aspx&quot;&gt;we have Tim Berners-Lee on board&lt;/a&gt;, there&amp;#8217;s a growing recognition of the fact that we need URIs for the real-world and conceptual things that we talk about in the public sector: schools, roads, hospitals, services, councils, and so on.&lt;/p&gt;

&lt;p&gt;One of the particular points of contention at the meeting was whether URIs for non-information resources (ie for real-world and conceptual things) should contain dates or version numbers, or not.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;Let&amp;#8217;s get some of the argument out of the way first. We are not talking about documents here. Documents will almost always have multiple versions, and if you care at all about maintaining a historical record you will want to refer to the previous version of a document. So dates or version numbers within URIs that refer to documents are often a really good idea. Even better if you have one URI &lt;em&gt;without&lt;/em&gt; a date that consistently redirects (through a &lt;code&gt;307 Temporary Redirect&lt;/code&gt;) to the current version of the document.&lt;/p&gt;

&lt;p&gt;Documents (that people read) are just one form of &lt;strong&gt;&amp;#8220;information resource&amp;#8221;&lt;/strong&gt;: things that are information and therefore can be transmitted electronically. Other things in the world are &lt;strong&gt;&amp;#8220;non-information resources&amp;#8221;&lt;/strong&gt;: things that are more than simple information and therefore cannot be transmitted electronically, such as schools, roads, hospitals and so on. A lot of things that we want to talk about (make RDF assertions about) are non-information resources. We give them URIs to name them, so that we can talk about them unambiguously, and we give them HTTP URIs so that we have a way of finding information resources (documents) that give us information &lt;em&gt;about&lt;/em&gt; them.&lt;/p&gt;

&lt;p&gt;Does the information that you get when you resolve a non-information resource URI change? Absolutely. A request to a non-information resource URI will respond with a &lt;code&gt;303 See Other&lt;/code&gt; that redirects to an information resource (probably without a version number) that itself redirects (&lt;code&gt;307 Temporary Redirect&lt;/code&gt;) to a URI for a particular version of information about the resource. For example an identifier that means a particular school such as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://id.example.org/education/school/78
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;can 303 redirect to the current version of a document that contains information about that school, such as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.example.org/education/school/78
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;which will 307 redirect to a particular version of information about that school, such as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.example.org/education/school/78/2008-09-01
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The date is in the URI for the information resource (the information about the school), and therefore it doesn&amp;#8217;t need to be in the URI for the non-information resource (the school).&lt;/p&gt;

&lt;p&gt;OK, but say that the identifier for a school changes over time. Let&amp;#8217;s say that you&amp;#8217;ve designed your URIs for schools like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://id.example.org/school/bracknell-forest/broadmoor-primary
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and the name of the school changes. Now the above identifier isn&amp;#8217;t applicable any more, and any RDF statements out there on the web that have used this identifier are now talking about something that no longer exists. How do you deal with this?&lt;/p&gt;

&lt;p&gt;Well, the first rule is that &lt;strong&gt;non-information resource URIs must not include information that is likely to change&lt;/strong&gt;. That&amp;#8217;s why a lot of URIs contain numbers rather than names. So we shouldn&amp;#8217;t have included the name of the school in the URI? OK, we&amp;#8217;ll use a number instead:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://id.example.org/school/bracknell-forest/78
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Hang on. Bracknell Forest is a council, and historically it&amp;#8217;s been known for councils to change, either in their boundaries (which would mean that a school would move council) or in its name, or they are merged, or&amp;#8230; well, there are lots of things that could happen to a council. So in the face of all these possibilities, and given that we no longer need the council name to disambiguate the school name (because we have a number instead), we can employ a second rule: &lt;strong&gt;non-information resource URIs must not include unnecessary hierarchy&lt;/strong&gt;. We can eliminate part of the path and still identify the school:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://id.example.org/school/78
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And so we come to the final thing that could change: &amp;#8220;school&amp;#8221;. Now surely, you might say, the concept of a school cannot change. And maybe you&amp;#8217;re right, maybe it won&amp;#8217;t. On the other hand, in the UK we have in the past had things called &lt;a href=&quot;http://en.wikipedia.org/wiki/Polytechnic_(United_Kingdom&quot;&gt;polytechnics&lt;/a&gt;), which are now known as universities, so the types of educational establishments that we have do change over time.&lt;/p&gt;

&lt;p&gt;We could do a bunch of things to help prevent a conceptual change like this from requiring a change to the URI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;we keep the number of concepts named within the URI to a minimum (eg don&amp;#8217;t have both &amp;#8216;education&amp;#8217; and &amp;#8216;school&amp;#8217;)&lt;/li&gt;
&lt;li&gt;we use wide terms rather than narrow terms (eg use a generic &amp;#8216;school&amp;#8217; rather than having separate &amp;#8216;grammar-school&amp;#8217;, &amp;#8216;primary-school&amp;#8217; and so on)&lt;/li&gt;
&lt;li&gt;we could change the term &amp;#8216;school&amp;#8217; to a code (eg use &amp;#8216;C3X0&amp;#8217; instead of &amp;#8216;school&amp;#8217;), but I don&amp;#8217;t think this will help: you&amp;#8217;ll still have problems if &amp;#8216;C3X0&amp;#8217; and &amp;#8216;F9R2&amp;#8217; mean the same thing in the future, whatever they&amp;#8217;re called.&lt;/li&gt;
&lt;li&gt;we could eliminate the concept term from the URI altogether, and label everything under one flat naming scheme, using something that has billions and billions of possible combinations. I know, a UUID! No, I&amp;#8217;m not serious.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And so we come to the question of versioning the URIs themselves. This is what Tim Berners-Lee says in &lt;a href=&quot;http://www.w3.org/Provider/Style/URI&quot;&gt;Cool URIs don&amp;#8217;t change&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;I&amp;#8217;ll go into this danger in more detail as it is one of the more difficult things to avoid. Typically, topics end up in URIs when you classify your documents according to a breakdown of the work you are doing. That breakdown will change. Names for areas will change. At W3C we wanted to change &amp;#8220;MarkUp&amp;#8221; to &amp;#8220;Markup&amp;#8221; and then to &amp;#8220;HTML&amp;#8221; to reflect the actual content of the section. Also, beware that this is often a flat name space. In 100 years are you sure you won&amp;#8217;t want to reuse anything? We wanted to reuse &amp;#8220;History&amp;#8221; and &amp;#8220;Stylesheets&amp;#8221; for example in our short life.&lt;/p&gt;
  
  &lt;p&gt;This is a tempting way of organizing a web site - and indeed a tempting way of organizing anything, including the whole web. It is a great medium term solution but has serious drawbacks in the long term&lt;/p&gt;
  
  &lt;p&gt;Part of the reasons for this lie in the philosophy of meaning. every term in the language it a potential clustering subject, and each person can have a different idea of what it means. Because the relationships between subjects are web-like rather than tree-like, even for people who agree on a web may pick a different tree representation. These are my (oft repeated) general comments on the dangers of hierarchical classification as a general solution.&lt;/p&gt;
  
  &lt;p&gt;Effectively, when you use a topic name in a URI you are binding yourself to some classification. You may in the future prefer a different one. Then, the URI will be liable to break.&lt;/p&gt;
  
  &lt;p&gt;A reason for using a topic area as part of the URI is that responsibility for sub-parts of a URI space is typically delegated, and then you need a name for the organizational body - the subdivision or group or whatever - which has responsibility for that sub-space. This is binding your URIs to the organizational structure. It is typically safe only when protected by a date further up the URI (to the left of it): 1998/pics can be taken to mean for your server &amp;#8220;what we meant in 1998 by pics&amp;#8221;, rather than &amp;#8220;what in 1998 we did with what we now refer to as pics.&amp;#8221;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let&amp;#8217;s spell out the danger with some examples. Let&amp;#8217;s say that in 20 year&amp;#8217;s time, nurseries and primary schools merge into &amp;#8216;schools&amp;#8217; and secondary schools, sixth-form colleges and universities merge into &amp;#8216;academies&amp;#8217;. A particular primary school currently known as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://id.example.org/school/78
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;will continue to be known by that URI. A particular university currently known as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://id.example.org/university/307
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;is now known as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://id.example.org/academy/79
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To support these changes, we have to set up some &lt;code&gt;301 Moved Permanently&lt;/code&gt; redirects;  &lt;code&gt;http://id.example.org/university/307&lt;/code&gt; has to redirect to &lt;code&gt;http://id.example.org/academy/79&lt;/code&gt;. The RDF found at the end of the new URIs has to include &lt;code&gt;owl:sameAs&lt;/code&gt; triples that link the new URIs back to the old ones, to indicate they are talking about the same institution:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://id.example.org/academy/79&amp;gt; owl:sameAs &amp;lt;http://id.example.org/university/307&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;or this would be &lt;a href=&quot;http://www.ldodds.com/blog/2007/03/the-semantics-of-301-moved-permanently/&quot;&gt;derived from the 301 response&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Similar changes may or may not happen within the RDF hosted elsewhere that talks about these institutions. Since it can be discovered that they are identical, there&amp;#8217;s no real reason for anyone to start using the new URIs unless they want to.&lt;/p&gt;

&lt;p&gt;Then 30 years later, the government of the time decide to create a new kind of institution which they call a &amp;#8216;university&amp;#8217;. The university of 50 years hence isn&amp;#8217;t actually the same as the &amp;#8216;university&amp;#8217; as we mean it &amp;#8212; they are virtual meeting places for independent researchers, each centered on a particular topic of study rather than a physical location &amp;#8212; but they need URIs. And since they are called &amp;#8216;university&amp;#8217; that is the name that should be used in the URI. Now someone mints the URI:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://id.example.org/university/307
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;But disaster! This University 307 is not at all the same as the old University 307, now known as Academy 79. The same URI has been used for two different things. Redirections halt, graphs are smushed, distinctions are lost and fallacies haunt the web.&lt;/p&gt;

&lt;p&gt;TimBL&amp;#8217;s solution to this possibility is for every URI that includes a topic to include the year in which the topic was minted. So we would have:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://id.example.org/2009/school/78
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;that remains the same, and then:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://id.example.org/2009/university/307
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;redirecting to:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://id.example.org/2029/academy/79
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and the introduction of:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://id.example.org/2059/university/307
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;which can be guaranteed to be distinct from &lt;code&gt;http://id.example.org/2009/university/307&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This, to me, is the crux of the argument for including a version inside the URIs that you use for non-information resources. It means that you can reuse old terms with new meanings within URIs without breaking the web.&lt;/p&gt;

&lt;p&gt;On the other hand, many people, myself among them, really dislike the use of years or version numbers within URIs for non-information resources (unless, I should say, they are used as part of the identification of the resource). I think there are four main reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;they are additional cruft that add to the length of a URI but provide no information about the thing being identified&lt;/li&gt;
&lt;li&gt;they can give a misleading impression about the relevance of a concept; for example &lt;a href=&quot;http://xmlns.com/foaf/spec/&quot;&gt;FOAF&lt;/a&gt; is stuck at version 0.1 (&lt;code&gt;http://xmlns.com/foaf/0.1/&lt;/code&gt;) despite being widely used, while &lt;code&gt;http://www.w3.org/1998/Math/MathML&lt;/code&gt; is feeling distinctly old (in internet time) despite being under active development&lt;/li&gt;
&lt;li&gt;it leads to a proliferation of URIs and creates additional work for people who want to keep their URIs up to date, even when the concepts themselves don&amp;#8217;t change (such as for the primary school&amp;#8217;s URI above)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In essence, the likelihood of a term being reused with a different meaning seems low enough that the cost (in readability, understandability and maintainability) of supporting URIs that contain versions or years doesn&amp;#8217;t seem worthwhile. We can keep the likelihood low by using terms that are unlikely to change their meaning (particularly avoiding those that have more than one meaning) and by disambiguating them (for example by using &amp;#8216;train-station&amp;#8217; rather than just &amp;#8216;station&amp;#8217;).&lt;/p&gt;

&lt;p&gt;There is also, perhaps, a middle way here that can keep the majority of URIs clean without leading to overlapping names. That&amp;#8217;s to start with a URI scheme that does not include a version number or year, and only to start introducing them when it becomes necessary due to the reuse of previous terms. In the example above, in 2059 we might have:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://id.example.org/school/78
http://id.example.org/academy/79
http://id.example.org/university2.0/307
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In other words, we make a decision now that our future selves will have to act upon. All we have to worry about is our future selves caring as much about persisting historical URIs as we do about persisting our current ones.&lt;/p&gt;

&lt;p&gt;What do you think? Should versioning be avoided in URIs at all costs, or always be included just in case? Are there other arguments for or against including versions or years in URIs? What other design considerations are there that help prevent changes to URIs over (long periods of) time?&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/112#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/31">rdf</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/48">uri</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/12">web</category>
 <pubDate>Wed, 22 Jul 2009 23:16:20 +0100</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">112 at http://www.jenitennison.com/blog</guid>
</item>
</channel>
</rss>
