This post was imported from my old Drupal blog. To see the full thing, including comments, it's best to visit the Internet Archive.
This is the second instalment in a series of posts about how to create linked data from existing data sets, using traffic count data as an example. In the last instalment, I talked about analysing and modelling data. This instalment discusses the creation of URIs for the various things that have been identified within the model.
This part of the process is the same as what you’d do if you were simply creating a RESTful API to a website. The principal is that everything has a URI, and if you resolve that URI you get information about the thing.
For the data.gov.uk site, we now have some guidelines about the design of URIs for the UK public sector. Basically, URIs for things should look like:
http://{sector}.data.gov.uk/id/{type of thing}/{thing identifier}
There’ll be plenty of examples in what follows.
Areas
Some of the things that we’ve identified as being part of the traffic count dataset already have centrally-defined identifiers. As part of other data.gov.uk work, we’ve defined URIs for administrative areas like countries, regions, local authority districts and local authorities. The templates for these URIs are:
http://statistics.data.gov.uk/id/country/{ONS code}
http://statistics.data.gov.uk/id/government-office-region/{ONS code}
http://statistics.data.gov.uk/id/local-authority-district/{ONS code}
http://statistics.data.gov.uk/id/local-authority/{ONS code}
We can use these identifiers directly for the regions, districts and local authorities. But there’s a problem with the country URI: we don’t have the ONS code for the country, only the name of the country. Fortunately, we’ve also defined URIs with this pattern:
http://statistics.data.gov.uk/id/country?name={country name}
http://statistics.data.gov.uk/id/government-office-region?name={region name}
http://statistics.data.gov.uk/id/local-authority-district?name={district name}
http://statistics.data.gov.uk/id/local-authority?name={authority name}
so in this situation we can use the name-based country URI and we’ll get redirected to the canonical, code-based URI.
Local authorities actually have two codes within the dataset that we have: the ONS code and a DfT code. I can well imagine that other datasets from the Department for Transport will only reference the DfT code, so it’s a good idea to create URIs that are based on these codes; later on, we can state that the two identifiers actually mean exactly the same thing.
http://transport.data.gov.uk/id/local-authority-district/{DfT code}
http://transport.data.gov.uk/id/local-authority/{DfT code}
So given the record:
"England","North West","B",4315.00,"00BZ","St.Helens Metropolitan Borough Council",
4,"U",,"Unclassified Urban",,
,352100,398200,
7/6/2001 00:00:00,"N",7,1,0,5,1,0,0,0,0,0,0,0,0,6
the URIs we’ve defined so far are:
http://statistics.data.gov.uk/id/country?name=England
http://statistics.data.gov.uk/id/government-office-region/B
http://statistics.data.gov.uk/id/local-authority-district/00BZ
http://statistics.data.gov.uk/id/local-authority/00BZ
http://transport.data.gov.uk/id/local-authority-district/4315
http://transport.data.gov.uk/id/local-authority/4315
Roads
Now we’re onto things that aren’t defined already. First is roads. If there’s a road number, the obvious thing to use is that road number; something like:
http://transport.data.gov.uk/id/road/{road number}
For example:
http://transport.data.gov.uk/id/road/B3178
If there isn’t a road number, we’ll have to construct a URI. Since each count point is on one particular road, we can use the identifier of the count point to identify the road, so:
http://transport.data.gov.uk/id/road/{class}-{count point number}
For example:
http://transport.data.gov.uk/id/road/U-4
Count Points
Count points can be identified through their number, so it makes sense to use that in the URI:
http://transport.data.gov.uk/id/traffic-count-point/{count point number}
For example:
http://transport.data.gov.uk/id/traffic-count-point/4
Counts
The counts themselves don’t have their own identifiers, but they can be identified through a combination of the count point that they’re associated with, the direction of travel of the traffic that’s being counted, and the date and time that the count is made. So we can create a URI that combines these things. To aid hackability, I’m going to build on top of the traffic count point URI that we’ve already defined:
http://transport.data.gov.uk/id/traffic-count-point/{count point number}/direction/{direction}/hour/{time}
For example:
http://transport.data.gov.uk/id/traffic-count-point/4/direction/N/hour/2001-06-07T07:00:00
Observations
Again, observations build on top of the counts by adding a vehicle type to the mix, so we can construct URIs that reflect that:
http://transport.data.gov.uk/id/traffic-count-point/{count point number}/direction/{direction}/hour/{time}/type/{vehicle type}
For example:
http://transport.data.gov.uk/id/traffic-count-point/4/direction/N/hour/2001-06-07T07:00:00/type/motor-vehicle
Road Categories
Road categories are a bit different from the kinds of things that we’ve been talking about so far: they are concepts. For these URIs we use a slightly different pattern from the URIs above: /def/
rather than /id/
. For road categories we can use:
http://transport.data.gov.uk/def/road-category/{category}
For example:
http://transport.data.gov.uk/def/road-category/motorway
Vehicle Types
Vehicle types are also concepts, so have similar URIs:
http://transport.data.gov.uk/def/vehicle-category/{type}
For example:
http://transport.data.gov.uk/def/vehicle-category/HGVa5
Cardinal Directions
Cardinal directions are also concepts, but really they are global concepts, not specific to transport, or even to the UK. So it feels a bit strange to use URIs for them that imply that they somehow belong to data.gov.uk.
Fortunately, for this kind of general concept we can use URIs defined by DBPedia. DBPedia is a linked data view on Wikipedia, so it has URIs for everything that Wikipedia has a page about, making it an excellent general purpose resource. The relevant URIs for the cardinal directions are:
http://dbpedia.org/resource/North
http://dbpedia.org/resource/South
http://dbpedia.org/resource/East
http://dbpedia.org/resource/West
so that’s what we’ll use.
Dates, Times and Periods
For dates, times and periods, we can use the URIs provided by another general-purpose linked data resource: placetime.com. URIs for instants have the pattern:
http://placetime.com/instant/gregorian/{dateTime}
while periods have the pattern:
http://placetime.com/interval/gregorian/{dateTime}/{duration}
So the hour from 7-8am on 7th June 2001 would be:
http://placetime.com/interval/gregorian/2001-06-07T07:00:00/PT1H
and the year 2001 would be:
http://placetime.com/interval/gregorian/2001-01-01T00:00:00/P1Y
The thing is that the latter isn’t particularly approachable. Calendar years are used all over the place, so it would be nice to have a set of URIs for them that we use consistently. Again, DBPedia provides URIs for every year, such as:
http://dbpedia.org/resource/2001
so where we need to refer to a calendar year, it would be good to reuse that.
And that completes the sets of URIs that we need for this data. Stay tuned.