Creating Linked Data - Part IV: Developing RDF Schemas

Nov 26, 2009

This is the fourth instalment in a series about turning an existing dataset into some linked data. I’ve previously talked about analysis and modelling, defining URIs and defining concept schemes. In this instalment, we’ll look at developing a schema in which we define the classes, properties and datatypes that we want to use in the RDF that describes the things in our dataset.

We’ll start by writing out some RDF for our record, using Turtle here for readability, and use unprefixed names to indicate classes, properties and datatypes, just so we can see what we need. Then we’ll see how those requirements match up to existing vocabularies and ontologies that we can reuse. Anything that’s left over we’re going to have to put in our own vocabulary. We’ll call this

http://transport.data.gov.uk/def/traffic/

All the classes, properties and datatypes that we define will eventually use that namespace.

Let’s focus on this record; I find it easiest to use an actual example rather than talk in abstract:

"England","South West","K",1115.00,"18","Devon County Council",
13,"B3178",,"B Urban","Salterton Road",
"Salterton Road, EAST OF DINAN WAY, EXMOUTH",302600,81984,
8/10/2001 00:00:00,"E",17,2,2,400,5,41,0,2,0,0,0,0,2,450

We’ll put this into RDF bit by bit.

Areas

First, let’s look at the areas and local authorities. The kind of RDF that we want to have looks like:

<http://statistics.data.gov.uk/id/country?name=England>
  a :Country ;
  :name "England"@en .

<http://statistics.data.gov.uk/id/government-office-region/K>
  a :GovernmentOfficeRegion ;
  :name "South West"@en ;
  :code "K"^^:ONScode ;
  :containedBy <http://statistics.data.gov.uk/id/country?name=England> .

<http://statistics.data.gov.uk/id/local-authority-district/18>
  a :LocalAuthorityDistrict ;
  :code "18"^^:ONScode ;
  :code "1115"^^:DfTLAcode ;
  :localAuthority <http://statistics.data.gov.uk/id/local-authority/18> ;
  :containedBy <http://statistics.data.gov.uk/id/country?name=England> ;
  :containedBy <http://statistics.data.gov.uk/id/government-office-region/K> .

<http://transport.data.gov.uk/id/local-authority-district/1115>
  :sameAs <http://statistics.data.gov.uk/id/local-authority-district/18> .

<http://statistics.data.gov.uk/id/local-authority/18>
  a :LocalAuthority ;
  :name "Devon County Council"@en ;
  :code "18"^^:ONSLAcode ;
  :code "1115"^^:DfTLAcode ;
  :localAuthorityDistrict <http://statistics.data.gov.uk/id/local-authority-district/18> .

<http://transport.data.gov.uk/id/local-authority/1116>
  :sameAs <http://statistics.data.gov.uk/id/local-authority/18> .

To work out what we need to put in our schema, we should first look at what existing vocabularies there are that could help. These areas are already defined elsewhere, so we can just use the same vocabulary for countries, regions, local authority districts and local authorities as is used there. The vocabularies that are useful here are:

  • http://statistics.data.gov.uk/def/administrative-geography/ which defines classes and properties related to administrative areas and local authorities (as described by the Office of National Statistics)
  • http://data.ordnancesurvey.co.uk/ontology/admingeo/ which also defines classes and properties related to administrative areas (as described by the Ordnance Survey)
  • http://data.ordnancesurvey.co.uk/ontology/spatialrelations/, also developed by John Goodwin at the Ordnance Survey, which defines spatial relationships between areas

There are other commonly used vocabularies that it’s helpful to know about:

  • RDFS is designed for representing RDF schemas, but it has a few general-purpose properties that are good to know, namely rdfs:label (the label for a thing) and rdfs:comment (a comment or description about the thing).
  • SKOS is designed for representing concept schemes, but again it has a few properties that can be used with any set of linked data, in particular skos:prefLabel (the preferred label for a thing), skos:altLabel (an alternative label for a thing) and skos:notation (a code for the thing).
  • OWL is designed for representing ontologies, but it has one very important property that you should know about – owl:sameAs – which is used to link two things that are the same thing.
  • XML Schema datatypes can be used within RDF, which is useful for things like dates, times, integers and so on.
  • For our purposes here, OWL-Time is going to prove useful, as it has a bunch of properties that are used to represent instants and durations.

If we look through the RDF above, the only thing that isn’t covered by these vocabularies is the DfTLAcode datatype. If we use the http://transport.data.gov.uk/def/traffic/ namespace, there’s not really any need to indicate that this is a transport-related code, so we can just call it LAcode. Let’s define that datatype:

<http://transport.data.gov.uk/def/traffic/LAcode>
  a rdfs:Datatype ;
  rdfs:label "Local Authority Code"@en .

That’s it. Now here’s the Turtle for the areas with the relevant namespaces added, and property names changed where appropriate:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#"> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix area: <http://statistics.data.gov.uk/def/administrative-geography/> .
@prefix space: <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/> .
@prefix admingeo: <http://data.ordnancesurvey.co.uk/ontology/admingeo/> .
@prefix traffic: <http://transport.data.gov.uk/def/traffic/> .

<http://statistics.data.gov.uk/id/country?name=England>
  a area:Country ;
  rdfs:label "England"@en .

<http://statistics.data.gov.uk/id/government-office-region/K>
  a admingeo:GovernmentOfficeRegion ;
  rdfs:label "South West"@en ;
  skos:notation "K"^^area:StandardCode ;
  area:country <http://statistics.data.gov.uk/id/country?name=England> .

<http://statistics.data.gov.uk/id/local-authority-district/18>
  a area:LocalAuthorityDistrict ;
  skos:notation "18"^^area:StandardCode ;
  skos:notation "1115"^^traffic:LAcode ;
  area:localAuthority <http://statistics.data.gov.uk/id/local-authority/18> ;
  area:country <http://statistics.data.gov.uk/id/country?name=England> ;
  area:region <http://statistics.data.gov.uk/id/government-office-region/K> .

<http://transport.data.gov.uk/id/local-authority-district/1115>
  owl:sameAs <http://statistics.data.gov.uk/id/local-authority-district/18> .

<http://statistics.data.gov.uk/id/local-authority/18>
  a area:LocalAuthority ;
  rdfs:label "Devon County Council"@en ;
  skos:notation "18"^^area:StandardCode ;
  skos:notation "1115"^^traffic:LAcode ;
  area:coverage <http://statistics.data.gov.uk/id/local-authority-district/18> .

<http://transport.data.gov.uk/id/local-authority/1116>
  owl:sameAs <http://statistics.data.gov.uk/id/local-authority/18> .

Roads

Here’s the kind of RDF we want to create for roads:

<http://transport.data.gov.uk/id/road/B3178>
  a :Road ;
  :code "B3178"^^:RoadNumber .

Obviously, we need a class for roads:

<http://transport.data.gov.uk/def/traffic/Road>
  a rdfs:Class ;
  rdfs:label "Road"@en .

Wherever there’s a code, I like to reuse skos:notation. But it’s important to define a datatype for the values used with that notation because (as we saw with local authorities above) there may be several different coding schemes that apply to the same Thing, and we need to be able to distinguish between them in case they clash. So:

<http://transport.data.gov.uk/def/traffic/RoadNumber>
  a rdfs:Datatype ;
  rdfs:label "Road Number"@en .

That’s all we have to define for roads; now the RDF can look like:

@prefix traffic: <http://transport.data.gov.uk/def/traffic/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

<http://transport.data.gov.uk/id/road/B3178>
  a traffic:Road ;
  skos:notation "B3178"^^traffic:RoadNumber .

Count Points

On to count points. Here’s the sketch of the RDF we want to create:

<http://transport.data.gov.uk/id/traffic-count-point/13>
  a :TrafficCountPoint ;
  :description "Salterton Road, EAST OF DINAN WAY, EXMOUTH"@en ;
  :code "13"^^:CountPointNumber ;
  :road <http://transport.data.gov.uk/id/road/B3178> ;
  :roadName "Salterton Road"@en ;
  :roadCategory 
    <http://transport.data.gov.uk/def/road-category/b> ,
    <http://transport.data.gov.uk/def/road-category/urban> ;
  :easting 302600 ;
  :northing 81984 ;
  :localAuthority <http://statistics.data.gov.uk/id/local-authority/18> ;
  :localAuthorityDistrict <http://statistics.data.gov.uk/id/local-authority-district/18> .

Of these, the description could be done with rdfs:comment. The code can be held by a skos:notation (provided we define a datatype for the count point number):

<http://transport.data.gov.uk/def/traffic/CountPointNumber>
  a rdfs:Datatype ;
  rdfs:label "Traffic Count Point Number"@en .

Properties for easting and northing are actually defined by the OS’s spatial relations ontology (although unfortunately neither the ontology nor the property is currently resolvable; the only way you’d know this is through looking at their use in the conversion of the edubase data). Links to local authorities and local authority districts can be done using the ONS-based administrative geography ontology, which again is currently only guessable at by looking at the online data.

That leaves us with a traffic:CountPoint class (no point calling it TrafficCountPoint if the namespace provides sufficient disambiguation):

<http://transport.data.gov.uk/def/traffic/CountPoint>
  a rdfs:Class ;
  rdfs:label "Traffic Count Point"@en .

A road property to point to a road:

<http://transport.data.gov.uk/def/traffic/road>
  a rdf:Property ;
  rdfs:label "road"@en ;
  rdfs:range <http://transport.data.gov.uk/def/traffic/Road> .

Note that properties are by convention named with a lowercase first letter, whereas classes are named with an uppercase first letter. It’s a good idea to follow that convention. Note also that I’ve defined a rdfs:range for this property, which means that anything that’s the object in a RDF statement that involves this property must be a traffic:Road.

We need a road name property to give the name of the road at the count point.

<http://transport.data.gov.uk/def/traffic/road>
  a rdf:Property ;
  rdfs:label "road name"@en ;
  rdfs:range <http://transport.data.gov.uk/def/traffic/Road> .

We also need a road category property to point to the categor(ies) of the road at the count point:

<http://transport.data.gov.uk/def/traffic/roadCategory>
  a rdf:Property ;
  rdfs:label "road category"@en .

You’ll remember that we defined different road categories using SKOS, such that each road category is a skos:Concept. But to give a range to the traffic:roadCategory property, we need to create a class for all the things that are categories of road. These are all skos:Concepts, and we can indicate that through an rdfs:subClassOf property:

<http://transport.data.gov.uk/def/traffic/RoadCategory>
  a rdfs:Class ;
  rdfs:subClassOf skos:Concept ;
  rdfs:label "Road Category"@en .

use this as the range of the traffic:roadCategory property:

<http://transport.data.gov.uk/def/traffic/roadCategory>
  a rdf:Property ;
  rdfs:label "road category"@en ;
  rdfs:range <http://transport.data.gov.uk/def/traffic/RoadCategory> .

and amend the concept scheme we created to include references to this new class, for example:

<motorway> a traffic:RoadCategory ;
  skos:prefLabel "Motorway"@en ;
  skos:broader <major> ;
  skos:scopeNote "Major roads often used for long distance travel. They are usually three or more lanes in each direction and generally have the maximum speed limit of 70mph."@en ;
  skos:inScheme <> .

So here is the RDF with the relevant properties properly defined:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix area: <http://statistics.data.gov.uk/def/administrative-geography/> .
@prefix space: <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/> .
@prefix traffic: <http://transport.data.gov.uk/def/traffic/> .

<http://transport.data.gov.uk/id/traffic-count-point/13>
  a traffic:CountPoint ;
  rdfs:comment "Salterton Road, EAST OF DINAN WAY, EXMOUTH"@en ;
  skos:notation "13"^^traffic:CountPointNumber ;
  traffic:road <http://transport.data.gov.uk/id/road/B3178> ;
  traffic:roadName "Salterton Road"@en ;
  traffic:roadCategory 
    <http://transport.data.gov.uk/def/road-category/b> ,
    <http://transport.data.gov.uk/def/road-category/urban> ;
  space:easting 302600 ;
  space:northing 81984 ;
  area:localAuthority <http://statistics.data.gov.uk/id/local-authority/18> ;
  area:district <http://statistics.data.gov.uk/id/local-authority-district/18> .

Traffic Counts

On to traffic counts. The un-namespaced RDF should look like:

<http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00>
  a :TrafficCount ;
  :countPoint <http://transport.data.gov.uk/id/traffic-count-point/13> ;
  :direction <http://dbpedia.org/resource/East> ;
  :hour <http://placetime.com/interval/gregorian/2001-10-08T17:00:00Z/PT1H> .

So for that we need a class for traffic counts:

<http://transport.data.gov.uk/def/traffic/Count>
  a rdfs:Class ;
  rdfs:label "Traffic Count"@en .

a property that can link to the traffic count to the count point where the count is taken:

<http://transport.data.gov.uk/def/traffic/countPoint>
  a rdf:Property ;
  rdfs:label "traffic count point"@en ;
  rdfs:range <http://transport.data.gov.uk/def/traffic/CountPoint> .

a property to link to the the direction the traffic is flowing in (we can’t put a range on this one because the DBPedia resources we’re using don’t have a common type):

<http://transport.data.gov.uk/def/traffic/direction>
  a rdf:Property ;
  rdfs:label "traffic direction"@en .

and finally a property to link to the hour during which the measurement was taken. This last one is a very common thing to need to do, so we’d imagine that there might be an existing property defined somewhere that we could use. SDMX, which includes a standard for representing statistical information in XML, defines a REF_PERIOD field which would seem to suit our purposes, but we don’t yet have a proper mapping of SDMX into RDF (I’ve had an initial cut, but it needs some input from statisticians).

So for now, we’ll use a specific property in our own namespace; we can always indicate that it’s a sub-property of a future SDMX property at a later date. I’m going to call it countHour and give it a domain of traffic:Count to indicate that the property has a pretty specific use for providing the count for an hour. We could just give its range as a generic time:Interval, but the kind of hours that are traffic count hours are kinda special intervals: they’re obviously an hour long, but are also restricted to start and end on the hour, cover an hour between 7am and 7pm, and don’t occur in winter. So it feels like we should have a special kind of interval for that purpose:

<http://transport.data.gov.uk/def/traffic/countHour>
  a rdf:Property ;
  rdfs:label "hour of count"@en ;
  rdfs:domain <http://transport.data.gov.uk/def/traffic/Count> ;
  rdfs:range <http://transport.data.gov.uk/def/traffic/CountHour> .

<http://transport.data.gov.uk/def/traffic/CountHour>
  a rdfs:Class ;
  rdfs:subClassOf time:Interval ;
  rdfs:label "Count Hour"@en .

All those properties were in the traffic namespace, so here’s the RDF with it added:

@prefix traffic: <http://transport.data.gov.uk/def/traffic/> .

<http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00>
  a traffic:Count ;
  traffic:countPoint <http://transport.data.gov.uk/id/traffic-count-point/13> ;
  traffic:direction <http://dbpedia.org/resource/East> ;
  traffic:countHour <http://placetime.com/interval/gregorian/2001-10-08T17:00:00Z/PT1H> .

Cardinal Directions

As I discussed in the last instalment, we’re not actually going to mint URIs for cardinal directions, but that doesn’t mean we can’t make statements about them in the RDF we generate. As I’ll discuss in more depth in the next instalment, it’s always good to provide a label at the very least:

<http://dbpedia.org/resource/East>
  rdfs:label "East"@en .

Intervals and Instants

Let’s look now at the RDF we want to generate about the hour during which the count was taken. As I’ve said above, these hours are a special kind of interval, and we’ve already created a class for them. I also discussed earlier that the things about this interval that are really useful for the purposes of querying are the year during which the count was taken and the hour at which it was taken, so we should pull out at least those pieces of information. Time-based data can be represented in RDF using the OWL-Time ontology.

Unfortunately, expressing time very specifically gets. This is what the statements we want to make look like using OWL-Time:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix time: <http://www.w3.org/2006/time> .

<http://placetime.com/interval/gregorian/2001-10-08T17:00:00Z/PT1H>
  a traffic:CountHour ;
  rdfs:label "8 Oct 2001, 17:00-18:00"@en ;
  time:hasBeginning <http://placetime.com/instant/gregorian/2001-10-08T17:00:00Z> ;
  time:hasEnd <http://placetime.com/instant/gregorian/2001-10-08T18:00:00Z> ;
  time:hasDurationDescription _:OneHour ;
  time:intervalDuring <http://dbpedia.org/resource/2001> .

_:OneHour a time:DurationDescription ;
  rdfs:label "one hour"@en ;
  time:years 0 ;
  time:months 0 ;
  time:days 0 ;
  time:hours 1 ;
  time:minutes 0 ;
  time:seconds 0 .

<http://placetime.com/instant/gregorian/2001-10-08T17:00:00Z>
  a time:Instant ;
  rdfs:label "8 Oct 2001, 17:00"@en ;
  time:inXSDDateTime "2001-10-08T17:00:00Z"^^xsd:dateTime ;
  time:inDateTime [
    a time:DateTimeDescription ;
    time:unitType time:unitHour ;
    time:year "2001"^^xsd:gYear ;
    time:month "--10"^^xsd:gMonth ;
    time:day "---08"^^xsd:gDay ;
    time:hour 17 ;
  ] .

<http://placetime.com/interval/gregorian/2001-10-08T18:00:00Z>
  a time:Instant ;
  rdfs:label "8 Oct 2001, 18:00"@en ;
  time:inXSDDateTime "2001-10-08T18:00:00Z"^^xsd:dateTime ;
  time:inDateTime [
    a time:DateTimeDescription ;
    time:unitType time:unitHour ;
    time:year "2001"^^xsd:gYear ;
    time:month "--10"^^xsd:gMonth ;
    time:day "---08"^^xsd:gDay ;
    time:hour 18 ;
  ] .

<http://dbpedia.org/resource/2001>
  a time:Interval ;
  rdfs:label "2001" ;
  rdf:value "2001"^^xsd:gYear ;
  time:intervalEquals <http://placetime.com/interval/gregorian/2001-01-01T00:00:00Z/P1Y> .

Observations

Finally we’re on to the observations themselves. The un-namespaced RDF looks like:

<http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00/type/bicycle>
  a :Observation ;
  :count <http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00> ;
  :vehicleType <http://transport.data.gov.uk/def/vehicle/bicycle> ;
  :value 2 .

The SCOVO vocabulary exists to represent statistical information like this. In SCOVO, observations are called scovo:Items, the value of the statistical measure itself (the count in this case) should be held in the rdf:value property, and any other properties should be subtypes of scovo:dimension, which has a domain of scovo:Dimension.

To fit in with SCOVO, then, we need to have the pointer to the count that this observation belongs to as a property that is a sub-property of scovo:dimension:

<http://transport.data.gov.uk/def/traffic/count>
  a rdf:Property ;
  rdfs:subPropertyOf scovo:dimension ;
  rdfs:label "count"@en ;
  rdfs:range <http://transport.data.gov.uk/def/traffic/Count> .

We might be tempted to indicate that the type of thing pointed to by the traffic:count property is a subclass of scovo:Dimension, but this is unnecessary and probably untrue: there might exist some traffic counts that aren’t dimensions, and the ones that are will be linked to by the traffic:count property can be inferred to be dimensions.

Similarly, the property that provides the pointer to the vehicle type should be a sub-property of scovo:dimension and we need a class for those various vehicle types in order to restrict the range of that property:

<http://transport.data.gov.uk/def/vehicleType>
  a rdf:Property ;
  rdfs:subPropertyOf scovo:dimension ;
  rdfs:label "vehicle type"@en ;
  rdfs:range <http://transport.data.gov.uk/def/VehicleType> .

<http://transport.data.gov.uk/def/VehicleType>
  a rdfs:Class ;
  rdfs:subClassOf skos:Concept ;
  rdfs:label "Vehicle Type"@en .

Of course all the concepts that we created for the vehicle types need to be designated as instances of this new traffic:VehicleType class:

<bicycle> a traffic:VehicleType ;
  ... .

So, the RDF with the proper namespaces is:

@prefix scovo: <http://purl.org/NET/scovo#> .
@prefix traffic: <http://transport.data.gov.uk/def/traffic/> .

<http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00/type/bicycle>
  a scovo:Item ;
  traffic:count <http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00> ;
  traffic:vehicleType <http://transport.data.gov.uk/def/vehicle/bicycle> ;
  rdf:value 2 .

That concludes our initial walkthrough of the data to create a vocabulary. I’ve duplicated the schema and the example data below so that it’s all in one place. But it’s not quite done. In the next instalment, I’ll look at adding some finishing touches that make the RDF easier to use.


Schema

This is the full schema. It contains just six classes, seven properties and three datatypes at the moment, so it’s pretty small as vocabularies go. We’ve been able to reuse a lot of classes, properties and datatypes that have already been defined elsewhere in the RDF itself, so this vocabulary is pretty focused on just what we need to describe traffic counts.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix scovo: <http://purl.org/NET/scovo#> .
@prefix time: <http://www.w3.org/2006/time> .

# Classes #

<http://transport.data.gov.uk/def/traffic/Road>
  a rdfs:Class ;
  rdfs:label "Road"@en .

<http://transport.data.gov.uk/def/traffic/CountPoint>
  a rdfs:Class ;
  rdfs:label "Traffic Count Point"@en .

<http://transport.data.gov.uk/def/traffic/Count>
  a rdfs:Class ;
  rdfs:label "Traffic Count"@en .

<http://transport.data.gov.uk/def/traffic/RoadCategory>
  a rdfs:Class ;
  rdfs:subClassOf skos:Concept ;
  rdfs:label "Road Category"@en .    

<http://transport.data.gov.uk/def/traffic/CountHour>
  a rdfs:Class ;
  rdfs:subClassOf time:Interval ;
  rdfs:label "Count Hour"@en .

<http://transport.data.gov.uk/def/VehicleType>
  a rdfs:Class ;
  rdfs:subClassOf skos:Concept ;
  rdfs:label "Vehicle Type"@en .

# Properties #

<http://transport.data.gov.uk/def/traffic/road>
  a rdf:Property ;
  rdfs:label "road name"@en ;
  rdfs:range <http://transport.data.gov.uk/def/traffic/Road> .

<http://transport.data.gov.uk/def/traffic/countPoint>
  a rdf:Property ;
  rdfs:label "traffic count point"@en ;
  rdfs:range <http://transport.data.gov.uk/def/traffic/CountPoint> .

<http://transport.data.gov.uk/def/traffic/count>
  a rdf:Property ;
  rdfs:subPropertyOf scovo:dimension ;
  rdfs:label "count"@en ;
  rdfs:range <http://transport.data.gov.uk/def/traffic/Count> .

<http://transport.data.gov.uk/def/traffic/roadCategory>
  a rdf:Property ;
  rdfs:label "road category"@en ;
  rdfs:range <http://transport.data.gov.uk/def/traffic/RoadCategory> .

<http://transport.data.gov.uk/def/traffic/direction>
  a rdf:Property ;
  rdfs:label "traffic direction"@en .

<http://transport.data.gov.uk/def/traffic/countHour>
  a rdf:Property ;
  rdfs:label "hour of count"@en ;
  rdfs:domain <http://transport.data.gov.uk/def/traffic/Count> ;
  rdfs:range <http://transport.data.gov.uk/def/traffic/CountHour> .

<http://transport.data.gov.uk/def/vehicleType>
  a rdf:Property ;
  rdfs:subPropertyOf scovo:dimension ;
  rdfs:label "vehicle type"@en ;
  rdfs:range <http://transport.data.gov.uk/def/VehicleType> .

# Datatypes #

<http://transport.data.gov.uk/def/traffic/LAcode>
  a rdfs:Datatype ;
  rdfs:label "Local Authority Code"@en .

<http://transport.data.gov.uk/def/traffic/RoadNumber>
  a rdfs:Datatype ;
  rdfs:label "Road Number"@en .

<http://transport.data.gov.uk/def/traffic/CountPointNumber>
  a rdfs:Datatype ;
  rdfs:label "Traffic Count Point Number"@en .

RDF Data

Here’s a sample set of data. It looks like rather a lot to simply describe the number of bicycles at a particular point on a road (and it doesn’t even include the SKOS concept schemes that we did last time), but (a) it all provides valuable context for that measurement and (b) most of it will be reused by a lot of other measurements.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#"> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix time: <http://www.w3.org/2006/time> .
@prefix scovo: <http://purl.org/NET/scovo#> .
@prefix area: <http://statistics.data.gov.uk/def/administrative-geography/> .
@prefix admingeo: <http://data.ordnancesurvey.co.uk/ontology/admingeo/> .
@prefix space: <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/> .
@prefix traffic: <http://transport.data.gov.uk/def/traffic/> .

<http://statistics.data.gov.uk/id/country?name=England>
  a area:Country ;
  rdfs:label "England"@en .
  
<http://statistics.data.gov.uk/id/government-office-region/K>
  a admingeo:GovernmentOfficeRegion ;
  rdfs:label "South West"@en ;
  skos:notation "K"^^area:StandardCode ;
  area:country <http://statistics.data.gov.uk/id/country?name=England> .
  
<http://statistics.data.gov.uk/id/local-authority-district/18>
  a area:LocalAuthorityDistrict ;
  skos:notation "18"^^area:StandardCode ;
  skos:notation "1115"^^traffic:LAcode ;
  area:localAuthority <http://statistics.data.gov.uk/id/local-authority/18> ;
  area:country <http://statistics.data.gov.uk/id/country?name=England> ;
  area:region <http://statistics.data.gov.uk/id/government-office-region/K> .
  
<http://transport.data.gov.uk/id/local-authority-district/1115>
  owl:sameAs <http://statistics.data.gov.uk/id/local-authority-district/18> .
  
<http://statistics.data.gov.uk/id/local-authority/18>
  a area:LocalAuthority ;
  rdfs:label "Devon County Council"@en ;
  skos:notation "18"^^area:StandardCode ;
  skos:notation "1115"^^traffic:LAcode ;
  area:coverage <http://statistics.data.gov.uk/id/local-authority-district/18> .
  
<http://transport.data.gov.uk/id/local-authority/1116>
  owl:sameAs <http://statistics.data.gov.uk/id/local-authority/18> .

<http://transport.data.gov.uk/id/road/B3178>
  a traffic:Road ;
  skos:notation "B3178"^^traffic:RoadNumber .
  
<http://transport.data.gov.uk/id/traffic-count-point/13>
  a traffic:CountPoint ;
  rdfs:comment "Salterton Road, EAST OF DINAN WAY, EXMOUTH"@en ;
  skos:notation "13"^^traffic:CountPointNumber ;
  traffic:road <http://transport.data.gov.uk/id/road/B3178> ;
  traffic:roadName "Salterton Road"@en ;
  traffic:roadCategory 
    <http://transport.data.gov.uk/def/road-category/b> ,
    <http://transport.data.gov.uk/def/road-category/urban> ;
  space:easting 302600 ;
  space:northing 81984 ;
  area:localAuthority <http://statistics.data.gov.uk/id/local-authority/18> ;
  area:district <http://statistics.data.gov.uk/id/local-authority-district/18> .

<http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00>
  a traffic:Count ;
  traffic:countPoint <http://transport.data.gov.uk/id/traffic-count-point/13> ;
  traffic:direction <http://dbpedia.org/resource/East> ;
  traffic:countHour <http://placetime.com/interval/gregorian/2001-10-08T17:00:00Z/PT1H> .

<http://dbpedia.org/resource/East>
  rdfs:label "East"@en .

<http://placetime.com/interval/gregorian/2001-10-08T17:00:00Z/PT1H>
  a traffic:CountHour ;
  rdfs:label "8 Oct 2001, 17:00-18:00"@en ;
  time:hasBeginning <http://placetime.com/instant/gregorian/2001-10-08T17:00:00Z> ;
  time:hasEnd <http://placetime.com/instant/gregorian/2001-10-08T18:00:00Z> ;
  time:hasDurationDescription _:OneHour ;
  time:intervalDuring <http://dbpedia.org/resource/2001> .
  
_:OneHour a time:DurationDescription ;
  rdfs:label "one hour"@en ;
  time:years 0 ;
  time:months 0 ;
  time:days 0 ;
  time:hours 1 ;
  time:minutes 0 ;
  time:seconds 0 .
  
<http://placetime.com/instant/gregorian/2001-10-08T17:00:00Z>
  a time:Instant ;
  rdfs:label "8 Oct 2001, 17:00"@en ;
  time:inXSDDateTime "2001-10-08T17:00:00Z"^^xsd:dateTime ;
  time:inDateTime [
    a time:DateTimeDescription ;
    time:unitType time:unitHour ;
    time:year "2001"^^xsd:gYear ;
    time:month "--10"^^xsd:gMonth ;
    time:day "---08"^^xsd:gDay ;
    time:hour 17 ;
  ] .
  
<http://placetime.com/instant/gregorian/2001-10-08T18:00:00Z>
  a time:Instant ;
  rdfs:label "8 Oct 2001, 18:00"@en ;
  time:inXSDDateTime "2001-10-08T18:00:00Z"^^xsd:dateTime ;
  time:inDateTime [
    a time:DateTimeDescription ;
    time:unitType time:unitHour ;
    time:year "2001"^^xsd:gYear ;
    time:month "--10"^^xsd:gMonth ;
    time:day "---08"^^xsd:gDay ;
    time:hour 18 ;
  ] .
  
<http://dbpedia.org/resource/2001>
  a time:Interval ;
  rdfs:label "2001" ;
  rdf:value "2001"^^xsd:gYear ;
  time:intervalEquals <http://placetime.com/interval/gregorian/2001-01-01T00:00:00Z/P1Y> .
  
<http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00/type/bicycle>
  a scovo:Item ;
  traffic:count <http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00> ;
  traffic:vehicleType <http://transport.data.gov.uk/def/vehicle/bicycle> ;
  rdf:value 2 .    

Creating Linked Data - Part III: Defining Concept Schemes

Nov 22, 2009

This is the third instalment in a series that I’m writing about turning data into linked data. I’m using traffic count data as the example, since that’s a dataset that I’m currently working on. In the last two instalments, I talked about analysing and modelling the data and about designing URIs for the things in that model.

Within the model, there are three sets of things that are concepts:

  • road categories
  • vehicle types
  • cardinal directions

As I discussed last time, cardinal directions have URIs defined within DBPedia which are good enough for our purposes. The categorisation of roads and vehicles, on the other hand, is something specific to UK transport data, so they are up to us to define.

There’s a really useful RDF vocabulary called SKOS which is designed precisely for defining the kind of concept schemes that we want to use here. SKOS provides classes for concepts, concept schemes and collections (groupings of concepts within a scheme), and properties for linking them and providing labels, codes, definitions and so forth. Many of the SKOS properties can be used outside concept schemes – for example skos:prefLabel can be used anywhere you want to indicate the preferred label for a thing – so it’s good to get to know them.

Vehicle Types

Before we dive into RDF, let’s take some time to understand the classification that we need to model. We’re modelling vehicle types because counts are made of each different type of vehicle passing a traffic count point over a particular hour. Within the CSV data, the relevant column headings are:

  • Pedal cycles
  • Two wheeled motor vehicles
  • Cars and taxis
  • Buses and coaches
  • Light vans
  • HGVr2
  • HGVr3
  • HGVr4+
  • HGVa3/4
  • HGVa5
  • HGVa6
  • All HGV
  • All motor vehicles

These classifications are detailed in the Department for Transport documentation of the dataset. It’s clear that it’s not a flat classification, but can be arranged into a hierarchy as follows:

+- Pedal cycles
+- All motor vehicles
   +- Two wheeled motor vehicles
   +- Cars and taxis
   +- Buses and coaches
   +- Light vans
   +- All HGV
      +- Rigid HGV
      |  +- HGVr2
      |  +- HGVr3
      |  +- HGVr4+
      +- Articulated HGV
         +- HGVa3/4
         +- HGVa5
         +- HGVa6

So all we have to do is define that in SKOS. We’ve already decided that the URIs will look like:

http://transport.data.gov.uk/def/vehicle-category/{type}

so for URI-hackability reasons we’ll call the concept scheme:

http://transport.data.gov.uk/def/vehicle-category/

It’s probably easiest to just show what the concept scheme looks like. This is in Turtle.

@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@base <http://transport.data.gov.uk/def/vehicle-category/> .

<> a skos:ConceptScheme ;
  skos:prefLabel "Vehicle Types"@en ;
  skos:hasTopConcept <bicycle> ;
  skos:hasTopConcept <motor-vehicle> .
...
<motor-vehicle> a skos:Concept ;
  skos:prefLabel "Motor Vehicle"@en ;
  skos:topConceptOf <> ;
  skos:narrower <motorbike> ;
  skos:narrower <car> ;
  skos:narrower <bus> ;
  skos:narrower <van> ;
  skos:narrower <HGV> .
...
<HGV> a skos:Concept ;
  skos:prefLabel "Heavy Goods Vehicle"@en ;
  skos:altLabel "HGV"@en ;
  skos:definition "Goods vehicles over 3,500 kgs gross vehicle weight."@en ;
  skos:scopeNote "Includes tractors (without trailers), road rollers, box vans and similar large vans. A two axle motor tractive unit without trailer is also included."@en ;
  skos:broader <motor-vehicle> ;
  skos:narrower <HGVr> ;
  skos:narrower <HGVa> ;
  skos:inScheme <> .
...

The properties shown here are:

  • skos:prefLabel - the preferred label for something; there can only be one in any given language
  • skos:altLabel - an alternative label for the thing; there can be any number
  • skos:definition - provides a definition of the term
  • skos:scopeNote - provides information about the scope of the term (eg what’s included or excluded)
  • skos:broader/skos:narrower - link together concepts into a hierarchy
  • skos:hasTopConcept/skos:topConceptOf - links together the concept schemes and the concepts at the top of the concept hierarchy defined within the scheme
  • skos:inScheme - points from a concept the concept scheme it’s defined in; it’s necessary to use either this or skos:topConceptOf on every skos:Concept otherwise it’s not clear which concept scheme they belong to

Note that in the RDF I’ve assigned every string a language (English). That’s good practice when values are textual; a Welsh translation could be provided for each one as well, for example.

Road Categories

Road categories are also described within the documentation for this dataset. The hierarchy is shown in the documentation as:

+- Major Roads
|  +- Motorways
|  |  +- Trunk
|  |  +- Principal
|  +- A Roads
|     +- Trunk
|     |  +- Urban
|     |  +- Rural
|     +- Principal
|        +- Urban
|        +- Rural
+- Minor Roads
   +- B Roads
   |  +- Urban
   |  +- Rural
   +- C Roads
   |  +- Urban
   |  +- Rural
   +- Unclassified Roads
      +- Urban
      +- Rural

But this is actually the result of three sets of overlapping concepts:

  • roads by classification (major/minor, motorway/A/B/C/unclassified)
  • roads by locale (urban/rural)
  • major roads by maintenance responsibility (trunk/principal)

These kinds of subdivisions of concepts can be managed in SKOS through skos:Collections, which group together concepts without being broader than those concepts. Here’s a snippet from the concept scheme that shows how this works.

@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@base <http://transport.data.gov.uk/def/road-category/> .

<> a skos:ConceptScheme ;
  skos:prefLabel "Road Categories"@en ;
  skos:hasTopConcept <major> ;
  skos:hasTopConcept <minor> ;
  skos:hasTopConcept <urban> ;
  skos:hasTopConcept <rural> .

<classification> a skos:Collection ;
  skos:prefLabel "Road by Classification"@en ;
  skos:member <major> ;
  skos:member <minor> .

<maintenance> a skos:Collection ;
  skos:prefLabel "Major Road by Maintenance Responsibility"@en ;
  skos:member <principal> ;
  skos:member <trunk> .

<major> a skos:Concept ;
  skos:prefLabel "Major Road"@en ;
  skos:altLabel "Major"@en ;
  skos:scopeNote "Include motorways and A roads. These roads usually have high traffic flows and are often the main arteries to major destinations."@en ;
  skos:narrower <motorway> ;
  skos:narrower <a> ;
  skos:narrower <principal> ;
  skos:narrower <trunk> ;
  skos:topConceptOf <> .

<motorway> a skos:Concept ;
  skos:prefLabel "Motorway"@en ;
  skos:broader <major> ;
  skos:scopeNote "Major roads often used for long distance travel. They are usually three or more lanes in each direction and generally have the maximum speed limit of 70mph."@en ;
  skos:inScheme <> .
...
<trunk> a skos:Concept ;
  a skos:Concept ;
  skos:prefLabel "Trunk Road"@en ;
  skos:altLabel "Trunk"@en ;
  skos:scopeNote "Most motorways and many of the long distance rural A roads are trunk roads."@en ;
  skos:note "The responsibility for the maintenance of trunk roads lies with the Secretary of State and they are managed by the Highways Agency in England, the National Assembly of Wales in Wales and the Scottish Executive in Scotland (National Through Routes)."@en ;
  skos:broader <major> ;
  skos:inScheme <> .
...

In a hierarchy, these multiple overlapping concepts can be shown as:

+- <Road by Classification>
|  +- Major Road
|  |  +- <Major Road by Classification>
|  |  |  +- Motorway
|  |  |  +- A Road
|  |  +- <Major Road by Maintenance Responsibility>
|  |     +- Principal Road
|  |     +- Trunk Road
|  +- Minor Road
|     +- B Road
|     +- C Road
|     +- Unclassified Road
+- <Road by Locale>
   +- Urban Road
   +- Rural Road

That’s our concept schemes done. Next it will be time to turn to defining a vocabulary for the particular things that we want to describe from this dataset.

Creating Linked Data - Part II: Defining URIs

Nov 22, 2009

This is the second instalment in a series of posts about how to create linked data from existing data sets, using traffic count data as an example. In the last instalment, I talked about analysing and modelling data. This instalment discusses the creation of URIs for the various things that have been identified within the model.

This part of the process is the same as what you’d do if you were simply creating a RESTful API to a website. The principal is that everything has a URI, and if you resolve that URI you get information about the thing.

Creating Linked Data - Part I: Analysing and Modelling

Nov 22, 2009

One of the goals of the government’s Data Project is to equip the people who own data with the capability to publish it as linked data. There’s an overwhelming amount of work to do here from providing tool support to changing a culture that makes it hard to publish data. But we can start by taking some baby steps that simply explain what’s involved in turning existing data into linked data.

I’m currently reworking the traffic count linked data that I first transformed back in September, and I thought it would be helpful to talk through that process for several reasons:

  • to give people using the traffic count data more insight into how it fits together
  • so that other people can follow it as they transform their own data
  • so that tool providers can spot some of the places where tools might help

Rather than creating one massive blog post, I’m going to break it down into several steps. These are:

  1. analysing and modelling
  2. defining URIs
  3. defining concept schemes
  4. defining classes, properties and datatypes
  5. adding finishing touches

This is the first instalment.

Publishing Information About Inward Links

Nov 8, 2009

In the Linked Data world, we talk a lot about having URIs that are identifiers for things, and making them HTTP URIs so that they can be dereferenced and people can find more information about those things.

This raises the questions of “what information should you publish?” Let’s make this concrete by using a real example: UK Legislation, which TSO is publishing for OPSI as Linked Data.

UK Legislation now has a set of URIs that are explicitly intended to be used as unique identifiers for items of legislation and parts, sections, subsections and so on within them. If you request one of these URIs, requesting RDF/XML, you will get some information about that bit of legislation, such as:

  • bibliographic metadata such as its title, publisher, created date and so on
  • links to other related sections or items of legislation
  • links to particular versions of that bit of legislation

So we provide some basic information, and the links we know about, ie those within UK Legislation.

It turns out that lots of things aside from UK Legislation reference legislation, and that when you publish information about them it’s helpful to be able to point to the relevant legislation. For example:

  • the Home Office relate offences to sections of legislation that state that a particular activity is illegal and has a certain maximum penalty
  • local authorities are bound to provide certain services by law, so there’s a natural pointer from the definition of a service to that law
  • administrative areas such as counties and local authorities are defined by law, so when the Ordnance Survey publish information about those areas, it helps to point to the law in which their names are legally defined as the authority on which their statements are based
  • the publication of notices posted within the London Gazette is enforced by legislation, and the text of the notices usually indicates which piece of legislation caused the notice to be published

These are all inward pointers. As we publish information about UK Legislation, we won’t know about all these links to the information we publish. But people who access information about UK Legislation might well want to know about those links. Wouldn’t it be useful to know – given an item of legislation – what it makes illegal, what it compels local authorities to do, which administrative areas it defines, which notices it has caused to be published?

We were discussing the same issue the other day in respect of spatial objects. The Ordnance Survey, or other organisations peddling spatial data, may define spatial objects, but other people define the things that those spatial objects represent, such as schools, roads, parks and so on. It’s obviously useful to go from a school to the spatial objects that represent its buildings, but it would also be useful to go from a spatial object that is a school building to the school.

So what should we, as publishers, do about the inward links (that we know about)? When we publish information about something should we also try to publish information about the things that (we know) reference that thing? I think the answer’s “yes,” at the very least in any human-readable access we give to the information. And from that come two further thoughts:

  • If you are publishing data with outward links, it would be a good idea to provide feeds or other mechanisms that enable people to pull in basic information about the things that you’re publishing that link to something they’re publishing. SPARQL queries would do, but something a bit less general purpose and more approachable – I’m thinking a URL like http://example.org/links?url=http://example.net/linked/resource – would be better.

  • Information from another source is going to have different provenance/trust etc characteristics than the primary information you publish. That needs to be clearly indicated somehow; sounds to me like a requirement for named graphs.