<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.jenitennison.com/blog" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>rdf</title>
 <link>http://www.jenitennison.com/blog/taxonomy/term/31</link>
 <description>The taxonomy view with a depth of 0.</description>
 <language>en</language>
<item>
 <title>Microdata + RDF</title>
 <link>http://www.jenitennison.com/blog/node/162</link>
 <description>&lt;p&gt;As part of the ongoing discussion about how to reconcile RDFa and microdata (if at all), &lt;a href=&quot;http://webr3.org/blog/&quot;&gt;Nathan Rixham&lt;/a&gt; has put together a suggested &lt;a href=&quot;http://www.w3.org/wiki/Microdata_RDFa_Merge&quot;&gt;Microdata RDFa Merge&lt;/a&gt; which brings together parts of &lt;a href=&quot;http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html&quot;&gt;microdata&lt;/a&gt; and parts of &lt;a href=&quot;http://www.w3.org/TR/rdfa-core/&quot;&gt;RDFa&lt;/a&gt;, creating a completely new set of attributes, but a parsing model that more or less follows microdata&amp;#8217;s.&lt;/p&gt;

&lt;p&gt;I want here to put forward another possibility to the debate. I should say that this is just some noodling on my part as a way of exploring options, not any kind of official position on the behalf of the W3C or the TAG or any other body that you might associate me with, nor even a decided position on my part.&lt;/p&gt;

&lt;!--break--&gt;

&lt;h2&gt;Simplifying RDFa&lt;/h2&gt;

&lt;p&gt;As &lt;a href=&quot;http://www.jenitennison.com/blog/node/103&quot;&gt;I&amp;#8217;ve said before&lt;/a&gt;, RDFa, in my experience, is complicated not primarily because of the whole namespaces/CURIEs issue but because its processing model tries to be too clever. RDFa was designed to largely fit in with existing markup and turn it into embedded data &amp;#8220;just&amp;#8221; by adding a few attributes here and there. Thus a simple image like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;img src=&quot;photo1.jpg&quot;&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;is first marked up to indicate that it&amp;#8217;s an image:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;img src=&quot;photo1.jpg&quot; typeof=&quot;foaf:Image&quot;&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;then to provide its license:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;img src=&quot;photo1.jpg&quot; typeof=&quot;foaf:Image&quot;
  rel=&quot;license&quot; resource=&quot;http://creativecommons.org/licenses/by/2.0/&quot;&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and finally to add a title:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;img src=&quot;photo1.jpg&quot; typeof=&quot;foaf:Image&quot;
  rel=&quot;license&quot; resource=&quot;http://creativecommons.org/licenses/by/2.0/&quot;
  property=&quot;dc:title&quot; content=&quot;A Pretty Picture&quot;&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;all by adding attributes to the one &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; element. The trouble with this approach is that the rules about how statements are made become extremely complex, dependent on context (eg what other attributes are present, what the parent element has on it, what content it has) and default in ways that are hard to remember.&lt;/p&gt;

&lt;p&gt;Even having written an RDFa parser, having written code to mark up documents with RDFa, having &lt;em&gt;taught&lt;/em&gt; it, I still cannot write RDFa past a trivial example and be 100% sure that it will produce what I was aiming to produce.&lt;/p&gt;

&lt;p&gt;If we were to look at really simplifying RDFa, rather than making cosmetic changes, we need to address this complexity. It would certainly mean backwards-incompatible changes, such as dropping the use of particular attributes and revising the way the processing model works, such that future RDFa processors couldn&amp;#8217;t be used on RDFa 1.0. There are two possible ways of approaching this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;retaining some backwards compatibility, and aiming for a simplified subset of RDFa 1.0 such that RDFa 1.0 processor will still get the intended triples out of data marked up with RDFa 1.1&lt;/li&gt;
&lt;li&gt;dropping backwards compatibility entirely and using completely different attributes, essentially creating a new language&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I do not know which of these routes is the best one to take.&lt;/p&gt;

&lt;p&gt;My instinct is that the first will be hard to do. For example, there are already certain simplifications in RDFa 1.1 &amp;#8212; such as assuming an element with no &lt;code&gt;datatype&lt;/code&gt; attribute is giving a string value rather than looking to see if there are any non-text-nodes in the content of the element &amp;#8212; which lead to markup that will not be processed correctly by RDFa 1.0 processors. Perhaps that could be addressed by rewriting history: creating a RDFa 1.0 Second Edition that includes any changes that are needed to make a simple subset viable.&lt;/p&gt;

&lt;p&gt;What I want to explore here is what the second route &amp;#8212; using entirely different attributes from those currently used in RDFa 1.0 &amp;#8212; might mean. I think that in this case the substantial difference between microdata and this new language would be support for that much-derided requirement: decentralised extensibility.&lt;/p&gt;

&lt;h2&gt;Adding Decentralised Extensibility to Microdata&lt;/h2&gt;

&lt;p&gt;As I discussed &lt;a href=&quot;http://www.jenitennison.com/blog/node/161&quot;&gt;earlier in the week&lt;/a&gt;, microdata is simply not designed for use in a web where publishers might want to use multiple vocabularies to mark up the same thing for different consumers. This focus is very probably the right one for the majority of uses, where publishers address single consumers or everyone has standardised on a single vocabulary. It&amp;#8217;s certainly an assumption that keeps the markup simple.&lt;/p&gt;

&lt;p&gt;However, there is a larger data web out there. It&amp;#8217;s not just browsers and search engines who might look for and process data embedded within a page. Unlike with HTML, those few, large consumers don&amp;#8217;t have to understand a particular vocabulary for other consumers to get valuable information from it. If you operate in a world of multiple consumers with different requirements, you need decentralised extensibility. And support for decentralised extensibility is RDF&amp;#8217;s niche as a data model, its unique selling point.&lt;/p&gt;

&lt;p&gt;Given that a new language would have to use a different processing model from RDFa 1.0, I would suggest that it simply uses microdata&amp;#8217;s as a starting point. Using attributes from RDFa 1.0 would only cause conflicts with RDFa 1.0 processors. Microdata processing is there, already defined, already implemented. It isn&amp;#8217;t going to go away. And you know, &lt;em&gt;it&amp;#8217;s pretty good&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The &amp;#8216;new language&amp;#8217; would then not so much a &amp;#8216;new language&amp;#8217; as an enhancement on something that already exists. It would be a set of additions that augment the data that is generated from normal microdata processing with a few extra features that are useful in a world where there are multiple vocabularies for the same domain, where publishers have to provide data to multiple consumers, where an RDF view of data is useful. Call it microdata+RDF.&lt;/p&gt;

&lt;p&gt;So what would we need to add? Well, there are three things, I think, that make microdata hard to use in a decentralised world, and make it hard to generate good RDF from microdata markup:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;lack of support for multiple types&lt;/li&gt;
&lt;li&gt;scoping of properties by type&lt;/li&gt;
&lt;li&gt;lack of datatypes&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We would need to find a way to add these for use within the RDF extracted from the microdata markup such that a basic microdata parser would still generate the same JSON, and such that microdata&amp;#8217;s DOM API would work as specified in the microdata spec. So we can&amp;#8217;t change the types of values that are possible in microdata&amp;#8217;s attributes or how they&amp;#8217;re interpreted in the DOM API.&lt;/p&gt;

&lt;h3&gt;Multiple Types&lt;/h3&gt;

&lt;p&gt;Because of the restrictions I just mentioned in not touching microdata itself, we can&amp;#8217;t simply make &lt;code&gt;itemtype&lt;/code&gt; take multiple URLs. We could rely on &lt;code&gt;itemprop=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&quot;&lt;/code&gt; as a mechanism of providing types for use by RDF processors, but I think that the types of something is such a fundamental property that it makes sense to have a dedicated attribute.&lt;/p&gt;

&lt;p&gt;I suggest &lt;code&gt;itemclass&lt;/code&gt;. It would only be allowed on elements with an &lt;code&gt;itemscope&lt;/code&gt; attribute and would take a space-separated set of values in exactly the same way as the &lt;code&gt;itemprop&lt;/code&gt; attribute. The values would be turned into URIs in the same way as for the &lt;code&gt;itemprop&lt;/code&gt; attribute, which I&amp;#8217;ll describe below.&lt;/p&gt;

&lt;p&gt;Microdata+RDF would add a method to the existing microdata DOM API to enable people to access items by class rather than their single type. So:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;document . getItemsByClass( classes )
Returns a NodeList of the elements in the Document that create items, that are not 
part of other items, and that have one or more of the types or classes given in the 
argument.

The classes argument is interpreted as a space-separated list of classes.
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Note that for simplicity, because they are interpreted in the same way within the RDF model, this returns items whose &lt;code&gt;itemtype&lt;/code&gt; is listed in the argument list of classes as well as those whose &lt;code&gt;itemclass&lt;/code&gt; is listed.&lt;/p&gt;

&lt;p&gt;Within the DOM API, the &lt;code&gt;itemClass&lt;/code&gt; IDL attribute on HTML elements would reflect the &lt;code&gt;itemclass&lt;/code&gt; attribute.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;itemclass&lt;/code&gt; attribute would be ignored for the purpose of creating JSON from microdata, and only be used when creating RDF.&lt;/p&gt;

&lt;p&gt;An example would be:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;li class=&quot;conference&quot; itemscope itemid=&quot;/2011/oscon/&quot;
    itemtype=&quot;http://schema.org/Event&quot;
    itemclass=&quot;http://microformats.org/profile/hcalendar#vevent /vocab/Conference&quot;&amp;gt;
  ...
&amp;lt;/li&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The JSON generated from this would look like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;{
  &quot;type&quot;: &quot;http://schema.org/Event&quot; ,
  &quot;id&quot;: &quot;http://lanyrd.com/2011/oscon/&quot;,
  &quot;properties&quot;: {}
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The RDF would look like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://lanyrd.com/2011/oscon&amp;gt;
  a &amp;lt;http://schema.org/Event&amp;gt; ,
    &amp;lt;http://microformats.org/profile/hcalendar#vevent&amp;gt; ,
    &amp;lt;http://lanyrd.com/vocab/Conference&amp;gt; ;
  .
&lt;/code&gt;&lt;/pre&gt;

&lt;h3&gt;Disambiguating Properties&lt;/h3&gt;

&lt;p&gt;To work with the RDF model, properties have to have URIs. We need to have a way of easily creating the URIs for the short-name properties without people changing their existing microdata markup.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Note: I&amp;#8217;ve substantially revised this section following discussion with &lt;a href=&quot;http://blog.foolip.org/&quot;&gt;Philip Jägenstedt&lt;/a&gt;. Old text is struck through, new text underlined.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The way that this is done in RDFa 1.1 is through a &lt;code&gt;vocab&lt;/code&gt; attribute, which provides a URI prefix that is concatenated to any short-name properties or types. &lt;strike&gt;We could use the same approach here, but call the attribute &lt;code&gt;itemvocab&lt;/code&gt; to fit in with the general method of naming attributes in microdata.&lt;/strike&gt; &lt;u&gt;Using this with microdata would be tedious for users however, and it would be easy for the &lt;code&gt;itemtype&lt;/code&gt; and &lt;code&gt;itemvocab&lt;/code&gt; to get out of sync in weird ways.&lt;/u&gt;&lt;/p&gt;

&lt;p&gt;&lt;strike&gt;&lt;code&gt;itemvocab&lt;/code&gt; would only be allowed on elements with an &lt;code&gt;itemscope&lt;/code&gt;. The scope of &lt;code&gt;itemvocab&lt;/code&gt; would be limited to the item itself, so that it&amp;#8217;s not forgotten when it&amp;#8217;s needed, particularly in copy-and-paste scenarios. However, to make it easier to use I think it should probably be given a default value if it isn&amp;#8217;t present, as follows:&lt;/strike&gt;&lt;/p&gt;

&lt;p&gt;&lt;u&gt;Instead, the vocabulary for the properties could be identified as follows:&lt;/u&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;set &lt;em&gt;vocab&lt;/em&gt; to the &lt;code&gt;itemtype&lt;/code&gt; of the item if it is present, and the URL of the document if not&lt;/li&gt;
&lt;li&gt;use a substring of &lt;em&gt;vocab&lt;/em&gt;:
&lt;ol&gt;&lt;li&gt;if &lt;em&gt;vocab&lt;/em&gt; contains a &lt;code&gt;#&lt;/code&gt;, the substring of &lt;em&gt;vocab&lt;/em&gt; up to and including the &lt;code&gt;#&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;otherwise, the substring of &lt;em&gt;vocab&lt;/em&gt; up to and including its final &lt;code&gt;/&lt;/code&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For example, if you have:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;li class=&quot;conference&quot; itemscope itemid=&quot;/2011/oscon/&quot;
    itemtype=&quot;http://schema.org/Event&quot;
    itemclass=&quot;http://microformats.org/profile/hcalendar#vevent /vocab/Conference&quot;&amp;gt;
  ...
&amp;lt;/li&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;then &lt;strike&gt;&lt;code&gt;itemvocab&lt;/code&gt;&lt;/strike&gt; &lt;u&gt;the item vocabulary&lt;/u&gt; would default to &lt;code&gt;http://schema.org/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strike&gt;There could be an extra restriction that if &lt;code&gt;itemtype&lt;/code&gt; is specified, &lt;code&gt;itemvocab&lt;/code&gt; must be in the same domain as that type; that could help prevent the weird situation where in the generated RDF the properties would be interpreted as being in a completely different vocabulary from the &lt;code&gt;itemtype&lt;/code&gt;.&lt;/strike&gt;&lt;/p&gt;

&lt;p&gt;&lt;strike&gt;Within the DOM API, the &lt;code&gt;itemVocab&lt;/code&gt; IDL attribute on HTML elements would reflect the &lt;code&gt;itemvocab&lt;/code&gt; attribute.&lt;/strike&gt;&lt;/p&gt;

&lt;p&gt;&lt;u&gt;Note: the following example has been altered in place.&lt;/u&gt;&lt;/p&gt;

&lt;p&gt;For example, take the following markup:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;li class=&quot;conference&quot; itemscope itemid=&quot;/2011/oscon/&quot;
    itemtype=&quot;http://schema.org/Event&quot; 
    itemclass=&quot;SocialEvent BusinessEvent EducationEvent&quot;&amp;gt;
  &amp;lt;h3&amp;gt;
    &amp;lt;a itemprop=&quot;url&quot; href=&quot;/2011/oscon/&quot;&amp;gt;
      &amp;lt;span itemprop=&quot;name&quot;&amp;gt;OSCON 2011&amp;lt;/span&amp;gt;
    &amp;lt;/a&amp;gt;
  &amp;lt;/h3&amp;gt;
  &amp;lt;p itemprop=&quot;location&quot; itemscope itemid=&quot;/places/portland/&quot;
     itemtype=&quot;http://schema.org/Place&quot;&amp;gt;
    &amp;lt;span itemprop=&quot;name&quot;&amp;gt;&amp;lt;a href=&quot;/places/usa/&quot;&amp;gt;United States&amp;lt;/a&amp;gt; / &amp;lt;a itemprop=&quot;url&quot; href=&quot;/places/portland/&quot;&amp;gt;Portland&amp;lt;/a&amp;gt;&amp;lt;/span&amp;gt;
  &amp;lt;/p&amp;gt;
  &amp;lt;p class=&quot;date&quot;&amp;gt;
    &amp;lt;time itemprop=&quot;startDate&quot; datetime=&quot;2011-07-25&quot;&amp;gt;25th&amp;lt;/time&amp;gt;–
    &amp;lt;time itemprop=&quot;endDate&quot; datetime=&quot;2011-07-29&quot;&amp;gt;29th July 2011&amp;lt;/time&amp;gt;
  &amp;lt;/p&amp;gt;
  ...
&amp;lt;/li&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The vocabulary for the &lt;code&gt;&amp;lt;li&amp;gt;&lt;/code&gt; element defaults to &lt;code&gt;http://schema.org/&lt;/code&gt; based on the value of the &lt;code&gt;itemtype&lt;/code&gt;. The short-named properties and classes within that item are turned into URIs by pre-pending &lt;code&gt;http://schema.org/&lt;/code&gt; to their name. Similarly, the properties on the nested &lt;code&gt;http://schema.org/Place&lt;/code&gt; are pre-pended with &lt;code&gt;http://schema.org/Place/&lt;/code&gt;. The resulting RDF would be:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@prefix s: &amp;lt;http://schema.org/&amp;gt;

&amp;lt;/2011/oscon/&amp;gt;
  a s:Event ,
    s:SocialEvent ,
    s:BusinessEvent ,
    s:EducationEvent ;
  s:url &amp;lt;http://lanyrd.com/2011/oscon/&amp;gt; ;
  s:name &quot;OSCON 2011&quot; ;
  s:location &amp;lt;/places/portland/&amp;gt; ;
  s:startDate &quot;2011-07-25&quot;^^xsd:date ;
  s:endDate &quot;2011-07-29&quot;^^xsd:date ;
  .

&amp;lt;/places/portland/&amp;gt;
  a s:Place ;
  s:url &amp;lt;http://lanyrd.com/places/portland/&amp;gt; ;
  s:name &quot;United States / Portland&quot; ;
  .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Note: see below for how the values are created in this example.&lt;/p&gt;

&lt;p&gt;The JSON would be just the same as from a standard microdata processor; there&amp;#8217;s no mapping to URIs for that output:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;{
  &quot;type&quot;: &quot;http://schema.org/Event&quot;,
  &quot;id&quot;: &quot;http://lanyrd.com/2011/oscon/&quot;,
  &quot;properties&quot;: {
    &quot;url&quot;: [
      &quot;http://lanyrd.com/2011/oscon/&quot;
    ],
    &quot;name&quot;: [
      &quot;OSCON 2011&quot;
    ],
    &quot;location&quot;: [
      {
        &quot;type&quot;: &quot;http://schema.org/Place&quot;,
        &quot;id&quot;: &quot;http://lanyrd.com/places/portland/&quot;,
        &quot;properties&quot;: {
          &quot;name&quot;: [
            &quot;United States / Portland&quot;
          ],
          &quot;url&quot;: [
            &quot;http://lanyrd.com/places/portland/&quot;
          ]
        }
      }
    ],
    &quot;startDate&quot;: [
      &quot;2011-07-25&quot;
    ],
    &quot;endDate&quot;: [
      &quot;2011-07-29&quot;
    ]
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;h3&gt;Adding Datatypes&lt;/h3&gt;

&lt;p&gt;How to manage datatypes in RDF generated from microdata is something where the best approach is not at all clear to me. A couple of years ago I talked about some &lt;a href=&quot;http://www.jenitennison.com/blog/node/120&quot;&gt;frustrations with RDF datatyping&lt;/a&gt;, and datatypes in RDF still frustrate me by being hard to use in sensible ways throughout the RDF toolchain. Nevertheless, it&amp;#8217;s what we have. &lt;/p&gt;

&lt;p&gt;The possibilities I can see for microdata+RDF are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Use plain literals for everything, including URIs, equivalent to using strings as microdata does. This makes things simple for the publisher and keeps the markup in the page clean, but makes it difficult for consumers who are using RDF toolchains: they will &lt;em&gt;usually&lt;/em&gt; have to do some kind of processing of the RDF generated from microdata+RDF to add appropriate datatypes to the values. There are two issues with this approach:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;I have a feeling that microdata+RDF processors will make up their own rules to add datatypes to the data extracted from a page (using rules like those described below and/or sniffing of values and/or using information from known built-in vocabularies), in an effort to add value for their users. But if different processors do that in different ways, we have an interoperability problem.&lt;/li&gt;
&lt;li&gt;In some vocabularies, the datatype of a value is not derivable from the property. The most important/common example of this is &lt;a href=&quot;http://www.w3.org/TR/skos-reference/#notations&quot;&gt;&lt;code&gt;skos:notation&lt;/code&gt;&lt;/a&gt;, which uses values with different datatypes to supply different identifiers from different identification schemes for a given concept.&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Assign datatypes based on the element type in the HTML. If the property value has come from a URL attribute, assume that it&amp;#8217;s a resource rather than a literal; if the element is a &lt;code&gt;&amp;lt;time&amp;gt;&lt;/code&gt; element, work out the datatype based on the syntax of the &lt;code&gt;datetime&lt;/code&gt; attribute; otherwise assume it&amp;#8217;s a string and give it a language in the case that one is specified. This gives some information but leads to a somewhat strange situation where you can mark up something as a date/time but not as a number.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Supplement the processing described in 2. with some basic datatype sniffing. Basically, if the value looks like a number or a boolean value then assign it a numeric or boolean datatype based on its syntax. This could reuse the &lt;a href=&quot;http://www.w3.org/TeamSubmission/turtle/#literal&quot;&gt;rules for recognising different literals from Turtle&lt;/a&gt;. This wouldn&amp;#8217;t be perfect; in particular, it would guess that strings that consist purely of numbers such as zip codes were numbers. I&amp;#8217;m inclined not to go down this path.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Supplement the processing described in 2. with a &lt;code&gt;itemvaltype&lt;/code&gt; attribute that takes a token from the list of &lt;a href=&quot;http://www.w3.org/TR/xmlschema-2/#built-in-datatypes&quot;&gt;built-in XML Schema Datatypes&lt;/a&gt; or the token &amp;#8216;&lt;code&gt;literal&lt;/code&gt;&amp;#8217;. The &amp;#8216;&lt;code&gt;literal&lt;/code&gt;&amp;#8217; token would be used to override the normal processing of URL attributes in the case where those really should be literals rather than resources. In this design, it would be easy to create literals using one of the most usual datatypes, but not possible to use datatypes that are specific to a given vocabulary.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Supplement the processing described in 4. by allowing the &lt;code&gt;itemvaltype&lt;/code&gt; to take either a token or a URL. The thing I don&amp;#8217;t like about this design is that the token would be interpreted as being within the XML Schema Datatypes vocabulary rather than the vocabulary specified for &lt;code&gt;itemvocab&lt;/code&gt; (used for tokens in &lt;code&gt;itemprop&lt;/code&gt; and &lt;code&gt;itemclass&lt;/code&gt;). This seems like it might turn into a source of confusion, but if we went the other way and had &lt;code&gt;itemvaltype&lt;/code&gt; being interpreted based on &lt;code&gt;itemvocab&lt;/code&gt;, it would be harder to give a value the more common datatypes such as numbers and boolean values.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;My inclination, somewhat reluctantly as it&amp;#8217;s the most complex, would be to use the last of these, because it provides for decentralised extensibility of datatypes, and support for decentralised extensibility is the core aim of these extensions. In other words, have a &lt;code&gt;itemvaltype&lt;/code&gt; attribute that can hold either a token, which must be one of &lt;code&gt;literal&lt;/code&gt; or the local name of an XML Schema datatype, or a URL. On a &lt;code&gt;&amp;lt;time&amp;gt;&lt;/code&gt; element, this would default to the appropriate type based on the syntax of the value of the &lt;code&gt;datetime&lt;/code&gt; attribute.&lt;/p&gt;

&lt;p&gt;To be conformant, the &lt;code&gt;itemvaltype&lt;/code&gt; would have to be an allowed value type for the properties given in &lt;code&gt;itemprop&lt;/code&gt; and the value of the property must be a legal value for the datatype. (In keeping with the style of the microdata specification, the mechanisms for working out what value types are allowed and what the legal values are for non-XML Schema datatypes would be left undefined &amp;#8212; a consuming application would look at the definition of the vocabulary.)&lt;/p&gt;

&lt;p&gt;Within the DOM API, the &lt;code&gt;itemValType&lt;/code&gt; IDL attribute on HTML elements would reflect the &lt;code&gt;itemvaltype&lt;/code&gt; attribute. The value of &lt;code&gt;itemvaltype&lt;/code&gt; &lt;em&gt;wouldn&amp;#8217;t&lt;/em&gt; change the types of the values returned by &lt;code&gt;element.itemValue&lt;/code&gt; or in the JSON mapping from microdata; it would purely be used when generating RDF from that data.&lt;/p&gt;

&lt;p&gt;For example, if someone started with some markup like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;div itemscope itemtype=&quot;http://schema.org/AggregateOffer&quot;&amp;gt;
  Priced from: &amp;lt;span itemprop=&quot;lowPrice&quot;&amp;gt;$35&amp;lt;/span&amp;gt;
  &amp;lt;span itemprop=&quot;offerCount&quot;&amp;gt;1938&amp;lt;/span&amp;gt; tickets left
&amp;lt;/div&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;it might be supplemented with some type information like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;div itemscope itemtype=&quot;http://schema.org/AggregateOffer&quot;&amp;gt;
  Priced from: &amp;lt;span itemprop=&quot;lowPrice&quot; itemvaltype=&quot;http://schema.org/Price&quot;&amp;gt;$35&amp;lt;/span&amp;gt;
  &amp;lt;span itemprop=&quot;offerCount&quot; itemvaltype=&quot;integer&quot;&amp;gt;1938&amp;lt;/span&amp;gt; tickets left
&amp;lt;/div&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;which would generate RDF like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@prefix s: &amp;lt;http://schema.org/&amp;gt;

[] a s:AggregateOffer ;
  s:lowPrice &quot;$35&quot;^^s:Price ;
  s:offerCount 1938 ;
  .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;(Note: Here I&amp;#8217;m assuming that schema.org defines a &lt;code&gt;http://schema.org/Price&lt;/code&gt; datatype which includes a currency and a number. They don&amp;#8217;t currently.)&lt;/p&gt;

&lt;p&gt;The JSON would still be:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;{
  &quot;type&quot;: &quot;http://schema.org/AggregateOffer&quot;,
  &quot;properties&quot;: {
    &quot;lowPrice&quot;: [
      &quot;$35&quot;
    ],
    &quot;offerCount&quot;: [
      &quot;1938&quot;
    ]
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;h3&gt;Non-Additions&lt;/h3&gt;

&lt;p&gt;When I wrote a couple of years ago about &lt;a href=&quot;http://www.jenitennison.com/blog/node/103&quot;&gt;what microdata can&amp;#8217;t do&lt;/a&gt;, one of the things that I identified was not being able to express XML Literals. Having thought about this more, what&amp;#8217;s actually missing isn&amp;#8217;t to do with RDF, but is the ability to use the &lt;a href=&quot;http://www.whatwg.org/specs/web-apps/current-work/multipage/content-models.html#innerhtml&quot;&gt;&lt;code&gt;innerHTML&lt;/code&gt;&lt;/a&gt; of an element to provide a value for a property rather than its &lt;a href=&quot;http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#textcontent&quot;&gt;&lt;code&gt;textContent&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For example, the description of an event might run over several paragraphs, or even in a single paragraph include other markup such as emphasised text, ruby markup, or links to additional information. People who are working from the DOM API can capture this information when they need it by getting the &lt;code&gt;innerHTML&lt;/code&gt; of the element rather than its &lt;code&gt;itemValue&lt;/code&gt;, but in the JSON mapping, the value is always the &lt;code&gt;itemValue&lt;/code&gt; &amp;#8212; the text content of the element.&lt;/p&gt;

&lt;p&gt;So this is a general microdata simplifying limitation. I&amp;#8217;d argue that we shouldn&amp;#8217;t add any special handling to plug this hole at the microdata+RDF level. If it turns out that having values that contain markup is useful then it will be added to microdata, and the microdata+RDF mapping would then be extended to create &lt;code&gt;rdf:XMLLiteral&lt;/code&gt;s or HTML literals (for which there is no defined datatype in RDF at the moment) for such values.&lt;/p&gt;

&lt;p&gt;Similarly, I haven&amp;#8217;t said anything in this post about providing machine-readable values to override the text content of an element. There is &lt;a href=&quot;http://www.w3.org/Bugs/Public/show_bug.cgi?id=13240&quot;&gt;an open bug&lt;/a&gt; about whether and how that capability might be added to HTML/microdata. I happen to think that it&amp;#8217;s useful, but that utility isn&amp;#8217;t limited to RDF processing. Whichever route is chosen there, I think it&amp;#8217;s important to keep the property values used by basic microdata and microdata+RDF aligned.&lt;/p&gt;

&lt;h2&gt;Summary&lt;/h2&gt;

&lt;p&gt;To summarise, one direction that we could take in aligning microdata and RDFa would be to define an extension to microdata to add support for decentralised extensibility and the RDF data model. I think that would entail adding attributes such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;itemclass&lt;/code&gt; to make it easy to define multiple types for an item&lt;/li&gt;
&lt;li&gt;&lt;code&gt;itemvocab&lt;/code&gt; and some default processing to provide nice mappings for short-name properties into URIs&lt;/li&gt;
&lt;li&gt;&lt;code&gt;itemvaltype&lt;/code&gt; and some default processing to assign datatypes to values&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For publishers and consumers, a single language with optional extensions greatly simplifies the use of embedded data. Property names don&amp;#8217;t have to be repeated or balancing acts made between different processing models.&lt;/p&gt;

&lt;p&gt;RDFa proponents get a syntax that can be used to generate a natural RDF model against which they can build RDF-oriented APIs and map to other formats such as JSON-LD.&lt;/p&gt;

&lt;p&gt;For microdata proponents, this approach doesn&amp;#8217;t pollute microdata with requirements that they see as superfluous, and doesn&amp;#8217;t change the behaviour of core microdata processors. Browsers, search engines and other consumers can continue to use the JSON output and only those who really want to support RDF need to do so.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;m sure that there are things that I&amp;#8217;ve missed in my outline above, issues that I haven&amp;#8217;t thought of. But if there is to be any kind of convergence between microdata/RDFa, this layered approach seems to me to be the kind of convergence that is most likely to eventually result in one language for embedding data in HTML rather than two or three.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note: if you prefer to comment on Google+, please add your comment to &lt;a href=&quot;https://plus.google.com/u/0/112095156983892490612/posts/aUqGQSLzDPv&quot;&gt;my announcement post there&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/162#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/71">microdata</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/31">rdf</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/42">rdfa</category>
 <pubDate>Sun, 31 Jul 2011 19:55:44 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">162 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>My Experience of Web Standards</title>
 <link>http://www.jenitennison.com/blog/node/160</link>
 <description>&lt;p&gt;One of the things that&amp;#8217;s been niggling at the back of my mind since the &lt;a href=&quot;http://schema.org&quot;&gt;schema.org&lt;/a&gt; announcement is how small a role search engine results plays in the wider data sharing efforts that I&amp;#8217;m more familiar with in my work on &lt;a href=&quot;http://www.legislation.gov.uk/&quot;&gt;legislation.gov.uk&lt;/a&gt;, and more generally how my day job experience differs from (what seem to be) more common experiences of development on the web. In this post, I&amp;#8217;m going to talk about that experience, and about the particular problems that I see with the coexistence of microdata and RDFa as a result.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;My day job (the one I actually get paid for) is web development. The site I spend most of my time and effort on is &lt;a href=&quot;http://www.legislation.gov.uk/&quot;&gt;legislation.gov.uk&lt;/a&gt;. This deals with complex content (UK legislation) that has to be presented in multiple formats (users love PDFs of legislation). Our aim is to make the data as reusable as possible by third parties through good, RESTful, web architecture, and we want to use open standards and open source technologies as part of the &lt;a href=&quot;http://www.cabinetoffice.gov.uk/resource-library/open-source-open-standards-and-re-use-government-action-plan&quot;&gt;UK government&amp;#8217;s general strategy&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;legislation.gov.uk is not a global website like Amazon or eBay, but it&amp;#8217;s not small either: it covers 60,000 changing items of legislation, providing point-in-time views for many of them, and with more added every day. It&amp;#8217;s one of the top ten most used UK Government websites, with 2 million visits (about 10-12 million page views) each month and typically about 120 requests/second during the active times of the day. Legislation might sound like a highly specialist interest, but if you &lt;a href=&quot;http://twitter.com/search/legislation.gov.uk&quot;&gt;search for legislation.gov.uk on Twitter&lt;/a&gt; you&amp;#8217;ll see it being referenced over and over by people who want to share what the law says.&lt;/p&gt;

&lt;p&gt;I do not by any means claim that my experience is representative of the wider web. I know that there are large numbers of sites that deal only in data, not documents, and certainly not documents with the kind of rich semantic structure that legislation has. I offer the following discussion as a data point, partly because I can&amp;#8217;t quite believe that legislation.gov.uk is &lt;em&gt;completely&lt;/em&gt; unique in its requirements and partly because obviously my perspective on a bunch of issues arises from this experience.&lt;/p&gt;

&lt;h2&gt;Technology Stacks&lt;/h2&gt;

&lt;p&gt;Legislation items are complex, semi-structured documents. Their natural fit is XML (well, that&amp;#8217;s not quite true &amp;#8212; their natural fit would be something that allowed overlapping markup &amp;#8212; but XML is the closest that we have). So we store it in XML in a native XML database and we use an XML toolset to query it (XQuery) and transform it (XSLT) into various formats including rendering it as PDF (through XSL-FO).&lt;/p&gt;

&lt;p&gt;Our next step for the development of the site involves looking at legislative effects. These form a graph: one item of legislation affects other items of legislation which may in turn affect other items and so on. There are all sorts of other links between items of legislation in terms of commencements, conferred powers and so on. Particularly because we already have well-thought-through URIs for legislation, the natural fit is to use RDF to represent this graph. We already offer a SPARQL endpoint for accessing some aspects of our data, but we expect to expand and develop this over the next few months and to use it as a layer under the website and exposed for reusers, in much the same way as we use the XML database.&lt;/p&gt;

&lt;p&gt;As a government site, we have fairly strict limits on what we can do within our web pages: we have to make sure that they&amp;#8217;re accessible by everyone who wants to view them. We aren&amp;#8217;t able to use technologies that are only available in the latest browsers, but that&amp;#8217;s OK because with the kind of content we deal with, we don&amp;#8217;t have to do anything fancy anyway. So we use pretty basic HTML and CSS and Javascript, because that&amp;#8217;s how you deliver content to end-users on the web (as well as exposing the underlying XML and RDF, to enable others to reuse the data).&lt;/p&gt;

&lt;p&gt;In other words, we use three web stacks for delivering legislation.gov.uk:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the XML stack, which is great for single-source publishing of documents that have more semantic structures than those supported by HTML&lt;/li&gt;
&lt;li&gt;the RDF stack, which is well-suited for metadata about things that are identified by URIs&lt;/li&gt;
&lt;li&gt;the HTML stack, which is absolutely necessary for delivering human-accessible content on the web&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What bemuses me, because of this experience, is that sometimes it appears that the narrative around these technologies is framed in terms of an exclusive choice between them. For example, &lt;a href=&quot;http://twitter.com/mattur/status/89331716430372864&quot;&gt;@mattur asked&lt;/a&gt;:&lt;/p&gt;

&lt;p style=&quot;text-align:center;&quot;&gt;
  &lt;a href=&quot;http://twitter.com/mattur/status/89331716430372864&quot;&gt;&lt;img src=&quot;/blog/files/mattur-tweet.jpg&quot; alt=&quot;@gimsieke @JeniT how may TAG members believe RDF(a) and X(HT)ML are way forward? How many think they aren&#039;t?&quot; /&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;It is as if, if you use XML you &lt;em&gt;cannot&lt;/em&gt; appreciate the utility of error-handling in HTML; or if you use RDF you &lt;em&gt;cannot&lt;/em&gt; understand the need to represent documents in XML; or if you want to utilise HTML fully, you &lt;em&gt;cannot&lt;/em&gt; adopt RDF&amp;#8217;s view of data on the web. That&amp;#8217;s simply not my experience. They each have their role on the web; supporting the use of one does not necessitate rejecting the use of the others.&lt;/p&gt;

&lt;p&gt;It&amp;#8217;s interesting that some of the standards that are most reviled are those that arise at the intersections, where it appears that one technology is trying to encroach on the space of another:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;XHTML at the border of XML and HTML&lt;/li&gt;
&lt;li&gt;RDF/XML at the border of RDF and XML&lt;/li&gt;
&lt;li&gt;RDFa at the border of all three&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At the same time, within legislation.gov.uk, we publish XHTML (because it&amp;#8217;s the natural output from an XML toolchain) and create and process RDF/XML (because it gives us access to that data from within the XML toolchain). We use a small bit of RDFa in the XHTML to indicate the rights under which our information is avaialble, and don&amp;#8217;t yet, but are thinking about using RDFa to mark up non-document semantics within our XML (to enable the XML markup to focus on the document structures that it&amp;#8217;s good at). For all their imperfections, these intersection technologies are useful for managing cross-overs; the problems arise when they overstep their remit and people start to think that &lt;em&gt;all&lt;/em&gt; HTML must be XHTML or &lt;em&gt;all&lt;/em&gt; XML must be RDF/XML or &lt;em&gt;all&lt;/em&gt; RDF must be RDFa.&lt;/p&gt;

&lt;h2&gt;Sharing Scenarios&lt;/h2&gt;

&lt;p&gt;The second thing that I wanted to explore is the experience from legislation.gov.uk of what it&amp;#8217;s like to be a publisher who actively wants to share their data. We need to operate simultaneously at three levels in our data sharing efforts.&lt;/p&gt;

&lt;h3&gt;Large-Scale Consumer-Driven Data Sharing&lt;/h3&gt;

&lt;p&gt;The first target for our data sharing efforts are the search engines. Obviously we&amp;#8217;re not selling anything, but we want people to be able to locate legislation easily when they want it, and we want people who have done the search to be able to see some information about the legislation so that they know that they&amp;#8217;ve located the right item.&lt;/p&gt;

&lt;p&gt;This is large-scale consumer (search engine) driven data sharing, typified by schema.org and Facebook&amp;#8217;s &lt;a href=&quot;http://developers.facebook.com/docs/opengraph/&quot;&gt;Open Graph Protocol&lt;/a&gt; (OGP). There are a few very big data consumers (Google, Microsoft, Yahoo!, Facebook etc) who need to consume data from large numbers of data providers. These consumers obviously can&amp;#8217;t understand &lt;em&gt;everything&lt;/em&gt;, so they determine and document what syntaxes and vocabularies they &lt;em&gt;do&lt;/em&gt; understand and expect publishers to follow.&lt;/p&gt;

&lt;p&gt;The benefits that publishers get from a particular consumer determines which syntax/vocabulary they use; publishers who are particularly keen to show up prettily within search results will target schema.org whereas those who want to be sharable within Facebook will target OGP. Many publishers will want to target both. There is probably a driver towards eventual convergence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;publishers might push back about inserting two lots of very similar data in their pages&lt;/li&gt;
&lt;li&gt;consumers might want to include data from publishers who haven&amp;#8217;t specifically targeted them&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;although there&amp;#8217;s likely to be a period where they coexist, much as there was for VHS and Betamax (and &lt;a href=&quot;http://en.wikipedia.org/wiki/Video_2000&quot;&gt;V2000&lt;/a&gt;, I know, dad) during the early days of video players.&lt;/p&gt;

&lt;p&gt;As &lt;a href=&quot;http://www.jenitennison.com/blog/node/157&quot;&gt;I discussed previously&lt;/a&gt;, these large-scale consumers will be driven by the data that they find in the wild, in all its messy variety. They get relatively little benefit directly from using a generic &lt;em&gt;syntax&lt;/em&gt;, as they are really interested in only a few, pretty generic, &lt;em&gt;vocabularies&lt;/em&gt; for which they have hardwired processing. Indirectly, adopting a generic syntax has benefits in that publishers might find it easier to find tools that enable them to generate it, tutorials about how to use it, and feel that they aren&amp;#8217;t being quite as locked in to something proprietary. However, rejecting data that isn&amp;#8217;t marked up properly using that syntax has no benefit for consumers except in so far as it makes them feel that they are being good community members. &lt;/p&gt;

&lt;p&gt;This is the pattern we see with schema.org (which accepts microdata but, based on its documentation, won&amp;#8217;t reject data that isn&amp;#8217;t fully compliant with it) and with OGP (which accepts a subset of RDFa but doesn&amp;#8217;t reject data that hasn&amp;#8217;t got prefixes properly bound, for example).&lt;/p&gt;

&lt;p&gt;Another point to mention is that there is very little trust in this scenario. The communication between consumers and publishers is very limited, and the consumers will want to protect themselves against accidental or malicious errors that are evident in mismatches between explicit metadata and that which is parsed from the visible content of the page.&lt;/p&gt;

&lt;p&gt;The parallels to HTML and browser vendors are very strong in this type of data sharing.&lt;/p&gt;

&lt;h3&gt;Small-Scale Consumer-Driven Data Sharing&lt;/h3&gt;

&lt;p&gt;A second type of data sharing is again driven by consumers, but this time at a lot smaller and more specialised scale. For legislation.gov.uk, these are services such as &lt;a href=&quot;http://www.glin.gov/&quot;&gt;GLIN&lt;/a&gt;, which is a global legislation registry. Other examples are the recent work that we&amp;#8217;ve done to publish &lt;a href=&quot;http://data.gov.uk/organogram&quot;&gt;UK Government organograms&lt;/a&gt; or &lt;a href=&quot;http://countculture.wordpress.com/&quot;&gt;Chris Taggart&lt;/a&gt;&amp;#8217;s &lt;a href=&quot;http://openelectiondata.org/&quot;&gt;Open Election Data&lt;/a&gt; project. In these cases, there&amp;#8217;s a single, relatively small and specialised consumer and a small number of publishers which are closely coordinated together.&lt;/p&gt;

&lt;p&gt;As in the large-scale case, the consumer ultimately determines the syntax/vocabulary that it recognises, and communicates that to the publishers. However, small-scale consumers typically have close coordination with the publishers, which has a number of side-effects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;consumers may be more able to both apply pressure to and help publishers to do well in their markup&lt;/li&gt;
&lt;li&gt;publishers have the opportunity to feed back directly to the consumer any suggestions that they have about changes to the syntax/vocabulary&lt;/li&gt;
&lt;li&gt;publishers are likely to gain an immediate and tangible benefit from their cooperation, such as visualisations of their data that they otherwise wouldn&amp;#8217;t have seen&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another noteworthy point about small-scale consumers is that they&amp;#8217;re unlikely to have the engineering capability to create a custom parser for a particular syntax, but will instead want to use something off-the-shelf to extract data from pages and into their own backend systems. This, coupled with the closer coordination with publishers, means that they&amp;#8217;re much more likely to stick to a specification, assuming that the off-the-shelf tools do.&lt;/p&gt;

&lt;h3&gt;Publisher-Driven Data Sharing&lt;/h3&gt;

&lt;p&gt;The final type of data sharing is driven by publishers. At legislation.gov.uk, we&amp;#8217;re motivated to make our data available for reuse for transparency/accountability reasons (to help citizens understand the law), efficiency reasons (to help parliament and government departments to publish new legislation better) and economic reasons (to foster innovation in legal publishing). We don&amp;#8217;t have any individual consumers in mind when we publish our data, but have found that simply by publishing it well, we foster reuse.&lt;/p&gt;

&lt;p&gt;In this case, we as publishers are highly motivated to ensure that the data we publish is easily parsed with something off-the-shelf, since that lowers the barrier for potential consumers. Publishers like us are very likely to have unique, specialised, content and need to use a vocabulary that fits closely to our internal data structures as this lowers implementation cost. Consumers can also trust publishers like us: we simply have no motivation to lie in the data that we provide for reuse.&lt;/p&gt;

&lt;h2&gt;Mixed Markup&lt;/h2&gt;

&lt;p&gt;As I&amp;#8217;ve outlined above, publishers like legislation.gov.uk need to target several potential consumers at the same time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;large-scale consumers such as search engines&lt;/li&gt;
&lt;li&gt;small-scale consumers that provide us with a useful service&lt;/li&gt;
&lt;li&gt;specialist consumers that are interested specifically in our data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We cannot use a single vocabulary for all these different purposes. (Well, we could write our own vocabulary and describe mappings to other vocabularies using RDFS, but search engines wouldn&amp;#8217;t read it.)&lt;/p&gt;

&lt;p&gt;We must therefore use a mix of vocabularies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;generic vocabularies about things that search engines care about&lt;/li&gt;
&lt;li&gt;specialised vocabularies for particular small consumers&lt;/li&gt;
&lt;li&gt;site-specific vocabularies for sharing our unique data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It&amp;#8217;s repetitive, but it&amp;#8217;s manageable so long as we have a syntax that enables us to say that an item of legislation is a &lt;code&gt;http://scheme.org/CreativeWork&lt;/code&gt; and a &lt;code&gt;http://purl.org/dc/dcmitype/Text&lt;/code&gt; and a &lt;code&gt;http://www.legislation.gov.uk/def/legislation/Legislation&lt;/code&gt; and allows us to give multiple properties the same value.&lt;/p&gt;

&lt;p&gt;The way things are going at the moment, we might well end up having to use multiple &lt;em&gt;syntaxes&lt;/em&gt; on the same page, as some consumers understand microdata, others consume RDFa, and still others will parse microformats. This leads to more repetition: adding &lt;code&gt;itemprop&lt;/code&gt; for microdata, &lt;code&gt;property&lt;/code&gt; for RDFa and specialised &lt;code&gt;class&lt;/code&gt; attributes for microformats. But worse (much worse), each of the syntaxes uses a different parsing model to create an entity-property-value structure, so not only do we have to learn substantially different markup patterns but our pages quickly become some kind of hideous polyglot mess trying to balance between them.&lt;/p&gt;

&lt;h2&gt;Looking Forward&lt;/h2&gt;

&lt;p&gt;As I said at the start, I&amp;#8217;m fairly sure that my experience at legislation.gov.uk isn&amp;#8217;t representative of the wider web, but I don&amp;#8217;t have a clear idea about just how unrepresentative it is, in terms of technology use or motivations around data sharing. When I read my twitter stream or blogs, there&amp;#8217;s a massive sampling bias, both in terms of who I follow and what I read, but also about who talks about what they&amp;#8217;re doing. (I&amp;#8217;m reminded of &lt;a href=&quot;http://www.codinghorror.com/blog/&quot;&gt;Jeff Atwood&lt;/a&gt;&amp;#8217;s post on the &lt;a href=&quot;http://www.codinghorror.com/blog/2007/11/the-two-types-of-programmers.html&quot;&gt;Two Types of Programmers&lt;/a&gt;: the vast majority of web developers don&amp;#8217;t make a noise about what they do.)&lt;/p&gt;

&lt;p&gt;Taking part in web standardisation today often feels like being part on an ongoing cold war between distinct camps rather than a community working towards common aims. The underlying question seems to be &amp;#8220;who&amp;#8217;s side are you on?&amp;#8221; Every decision and activity is cast as a victory or defeat. Time is wasted on attack and defence, or on raking over past slights and stupidities, rather than on progress. Valid criticism from outside a group cannot be listened to for fear of giving ground, cannot be made within a group where it seems like betrayal.&lt;/p&gt;

&lt;p&gt;It is the &lt;a href=&quot;http://en.wikipedia.org/wiki/Realistic_conflict_theory#The_Robbers_Cave_Experiment&quot;&gt;Robbers Cave Experiment&lt;/a&gt; played out in web standards. As a psychologist, I find it fascinating. As a developer, and particularly one who doesn&amp;#8217;t self-identify with any single group, it is frustrating. As a TAG member, trying to work for the longer-term good of the web, it is worrying, because situations of intergroup conflict lead to &lt;a href=&quot;http://en.wikipedia.org/wiki/Groupthink&quot;&gt;groupthink&lt;/a&gt; and non-optimal solutions.&lt;/p&gt;

&lt;p&gt;As I described above, a non-optimal outcome seems to be the most likely result of the particular microdata vs RDFa conflict for us at legislation.gov.uk. While I know we are not generally representative, I believe that it will be similarly bad for other developers: publishers, consumers and tool implementers.&lt;/p&gt;

&lt;p&gt;This is a problem for all who want to foster data sharing on the web using open standards; it is not one that any one group can fix on their own. It&amp;#8217;s my hope that a balanced task force of individuals with a variety of experience and backgrounds can provide a focus for us all to work together to solve it. If we can&amp;#8217;t, then we have let our prejudice and bias overcome our judgement.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/160#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/44">html5</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/71">microdata</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/31">rdf</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/42">rdfa</category>
 <pubDate>Sun, 24 Jul 2011 16:24:00 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">160 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Getting Started with RDF and SPARQL Using Sesame and Python</title>
 <link>http://www.jenitennison.com/blog/node/153</link>
 <description>&lt;p&gt;My &lt;a href=&quot;http://www.jenitennison.com/blog/node/152&quot;&gt;previous post&lt;/a&gt; talked about how to install &lt;a href=&quot;http://4store.org/&quot;&gt;4store&lt;/a&gt; as a triplestore, and use the Ruby library &lt;a href=&quot;http://rdf.rubyforge.org/&quot;&gt;RDF.rb&lt;/a&gt; in order to process RDF extracted from that store. This was a response to Richard Pope&amp;#8217;s &lt;a href=&quot;http://memespring.co.uk/2011/01/linked-data-rdfsparql-documentation-challenge/&quot;&gt;Linked Data/RDF/SPARQL Documentation Challenge&lt;/a&gt; which asks for documentation of how to install a triplestore, load data into it, retrieve it using SPARQL and access the results as native structures using Ruby, Python or PHP.&lt;/p&gt;

&lt;p&gt;I quite enjoyed writing the last one, so I thought I&amp;#8217;d try again. As before, I am on Mac OS X, but this time I&amp;#8217;m going to use Python, which I have not programmed in before. I like a challenge. You might not like the results!&lt;/p&gt;

&lt;!--break--&gt;

&lt;h2&gt;Sesame&lt;/h2&gt;

&lt;p&gt;This time, I&amp;#8217;m going to use &lt;a href=&quot;http://www.openrdf.org/&quot;&gt;Sesame&lt;/a&gt;, as I was told by &lt;a href=&quot;http://twitter.com/johnlsheridan&quot;&gt;John Sheridan&lt;/a&gt; that it was so easy to install that even he, a civil servant, could do it!&lt;/p&gt;

&lt;p&gt;Sesame needs a Java servlet container. I&amp;#8217;m using &lt;a href=&quot;http://tomcat.apache.org/&quot;&gt;Tomcat&lt;/a&gt; because I have some experience with it, but you could use something like &lt;a href=&quot;http://jetty.codehaus.org/jetty/&quot;&gt;Jetty&lt;/a&gt; if you prefer. I had a bit of trouble getting Tomcat 6 to install, but that might just have been because it has a lot of dependencies and I wasn&amp;#8217;t patient enough. It might be worth upgrading your existing ports and getting verbose output so you know there&amp;#8217;s activity as you install Tomcat:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo port upgrade outdated
$ sudo port -v install tomcat6
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This installs Tomcat 6 in &lt;code&gt;/opt/local/share/java/tomcat6&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;While that&amp;#8217;s happening, get Sesame from its &lt;a href=&quot;http://sourceforge.net/projects/sesame/files/Sesame%202/&quot;&gt;download page&lt;/a&gt;. I got hold of &lt;code&gt;openrdf-sesame-2.3.2-sdk.tar.gz&lt;/code&gt;. The files we actually need are the &lt;code&gt;.war&lt;/code&gt;s so we can just extract them and put them in the &lt;code&gt;webapps&lt;/code&gt; directory within Tomcat:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ tar -zxvf openrdf-sesame-2.3.2-sdk.tar.gz openrdf-sesame-2.3.2/war/*.war
$ sudo cp openrdf-sesame-2.3.2/war/*.war /opt/local/share/java/tomcat6/webapps/
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then startup Tomcat:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo /opt/local/share/java/tomcat6/bin/tomcatctl start
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;All being well, you should see Tomcat doing some initial setup:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;conf_setup.sh: file conf/catalina.policy is missing; copying conf/catalina.policy.sample to its place.
conf_setup.sh: file conf/catalina.properties is missing; copying conf/catalina.properties.sample to its place.
conf_setup.sh: file conf/server.xml is missing; copying conf/server.xml.sample to its place.
conf_setup.sh: file conf/tomcat-users.xml is missing; copying conf/tomcat-users.xml.sample to its place.
conf_setup.sh: file conf/web.xml is missing; copying conf/web.xml.sample to its place.
conf_setup.sh: file conf/setenv.local is missing; copying conf/setenv.local.sample to its place.
Starting Tomcat.... started. (pid 20064)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now have a look at &lt;code&gt;http://localhost:8080/openrdf-sesame&lt;/code&gt;. If you&amp;#8217;re like me, you&amp;#8217;ll get some error messages because the user that Tomcat is running under (&lt;code&gt;www&lt;/code&gt;) isn&amp;#8217;t able to create or write to a logging directory that it wants to create, in my case &lt;code&gt;/Users/Jeni/Library/Application Support/Aduna/OpenRDF Sesame/logs&lt;/code&gt;. This turns out to be partly caused by permissions issues and partly caused by the spaces in the filename. To get around it, create a data directory for Aduna that doesn&amp;#8217;t have spaces in the filename, and change its ownership to &lt;code&gt;www&lt;/code&gt;. In my case, I chose &lt;code&gt;/opt/local/var/aduna&lt;/code&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo mkdir -p /opt/local/var/aduna
$ sudo chown www:www /opt/local/var/aduna
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then edit Tomcat&amp;#8217;s &lt;code&gt;setenv.local&lt;/code&gt; file which in my environment is at &lt;code&gt;/opt/local/share/java/tomcat6/conf&lt;/code&gt; and add a line that sets the &lt;code&gt;info.aduna.platform.appdata.basedir&lt;/code&gt; to that directory, like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;export JAVA_OPTS=&#039;-Dinfo.aduna.platform.appdata.basedir=/opt/local/var/aduna&#039;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Restart Tomcat:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo /opt/local/share/java/tomcat6/bin/tomcatctl restart
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then navigate again to &lt;a href=&quot;http://localhost:8080/openrdf-sesame&quot;&gt;http://localhost:8080/openrdf-sesame&lt;/a&gt; and you should see the Welcome page:&lt;/p&gt;

&lt;p style=&quot;text-align: center&quot;&gt;
  &lt;img src=&quot;/blog/files/sesame-welcome.jpg&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;As you can see, this recommends using the Workbench for managing the repositories. If you open that, at &lt;a href=&quot;http://localhost:8080/openrdf-workbench&quot;&gt;http://localhost:8080/openrdf-workbench&lt;/a&gt;.&lt;/p&gt;

&lt;p style=&quot;text-align: center&quot;&gt;
  &lt;img src=&quot;/blog/files/sesame-workbench-home.jpg&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;We&amp;#8217;ll use this Workbench to create a new repository for our data, which I&amp;#8217;ll call &lt;code&gt;reference&lt;/code&gt;. Click on &lt;code&gt;New Repository&lt;/code&gt; from the left hand navigation and fill in the form. I&amp;#8217;m just going to use the default in-memory RDF store because I&amp;#8217;m only using a little data; the other options (using MySQL or PostgreSQL stores) would be useful if I were creating something more permanent. See &lt;a href=&quot;http://www.openrdf.org/doc/sesame2/users/ch07.html#section-rdbms-store-config&quot;&gt;the Sesame User Guide&lt;/a&gt; for information about those.&lt;/p&gt;

&lt;p&gt;So fill in the form to create a new repository with the id &lt;code&gt;reference&lt;/code&gt; and whatever title you like:&lt;/p&gt;

&lt;p style=&quot;text-align: center&quot;&gt;
  &lt;img src=&quot;/blog/files/sesame-workbench-new-repository.jpg&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Click &lt;code&gt;Next&lt;/code&gt; and there will be a couple more options to select; I just used the default for these:&lt;/p&gt;

&lt;p style=&quot;text-align: center&quot;&gt;
  &lt;img src=&quot;/blog/files/sesame-workbench-new-repository-2.jpg&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Click &lt;code&gt;Create&lt;/code&gt; and you will see a summary of the new repository that you&amp;#8217;ve created:&lt;/p&gt;

&lt;p style=&quot;text-align: center&quot;&gt;
  &lt;img src=&quot;/blog/files/sesame-workbench-new-repository-3.jpg&quot; /&gt;
&lt;/p&gt;

&lt;h2&gt;Loading Data&lt;/h2&gt;

&lt;p&gt;I&amp;#8217;m going to use the same data as I did before:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;a href=&quot;http://source.data.gov.uk/data/reference/organogram-co/2010-10-31/index.rdf&quot;&gt;http://source.data.gov.uk/data/reference/organogram-co/2010-10-31/index.rdf&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You can add data to a Sesame repository in a browser through the Workbench by uploading a file, pointing Sesame at a URL or pasting in some RDF that you want to load. There are also Java bindings for adding data to Sesame. But neither of those are any good to us as we need programmatic access.&lt;/p&gt;

&lt;p&gt;So we will use the &lt;a href=&quot;http://www.openrdf.org/doc/sesame2/system/ch08.html#d0e304&quot;&gt;HTTP method&lt;/a&gt;. I want to add some statements to the &lt;code&gt;reference&lt;/code&gt; repository in the graph (what Sesame calls &amp;#8220;context&amp;#8221;) &lt;code&gt;http://source.data.gov.uk/data/reference/organogram-co/2010-10-30&lt;/code&gt;, which amounts to an HTTP PUT on the repository&amp;#8217;s statements with that context. &lt;/p&gt;

&lt;p&gt;Now I don&amp;#8217;t know much at all about Python, but it looks as though the built-in library &lt;code&gt;urllib2&lt;/code&gt; doesn&amp;#8217;t support &lt;code&gt;PUT&lt;/code&gt; and there&amp;#8217;s a better HTTP library available in &lt;a href=&quot;http://code.google.com/p/httplib2/&quot;&gt;&lt;code&gt;httplib2&lt;/code&gt;&lt;/a&gt;. MacPorts supports various different packages for &lt;code&gt;httplib2&lt;/code&gt; with different versions of Python. Now there only seems to be a package for rdflib, which we&amp;#8217;ll use later, for Python 2.6, so we&amp;#8217;ll go for &lt;code&gt;py26-httplib2&lt;/code&gt;, which will bring in Python 2.6 with it just in case.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo port install py26-httplib2
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;After running this, if you want to actually use it you will need to install the &lt;code&gt;python_select&lt;/code&gt; port and choose Python 2.6:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo port install python_select
$ sudo python_select python26
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then open up another Terminal window or tab (because the change won&amp;#8217;t have affected your old one) and check what version of Python you&amp;#8217;re running:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ python --version
Python 2.6.6
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;With the &lt;code&gt;httplib2&lt;/code&gt; library in place, it&amp;#8217;s time for a Python script (&lt;code&gt;load-rdf-into-sesame.py&lt;/code&gt;) to do the PUTting:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;import urllib
import httplib2

repository = &#039;reference&#039;
graph      = &#039;http://source.data.gov.uk/data/reference/organogram-co/2010-06-30&#039;
filename   = &#039;/Users/Jeni/Downloads/index.rdf&#039;

print &quot;Loading %s into %s in Sesame&quot; % (filename, graph)
params = { &#039;context&#039;: &#039;&amp;lt;&#039; + graph + &#039;&amp;gt;&#039; }
endpoint = &quot;http://localhost:8080/openrdf-sesame/repositories/%s/statements?%s&quot; % (repository, urllib.urlencode(params))
data = open(filename, &#039;r&#039;).read()
(response, content) = httplib2.Http().request(endpoint, &#039;PUT&#039;, body=data, headers={ &#039;content-type&#039;: &#039;application/rdf+xml&#039; })
print &quot;Response %s&quot; % response.status
print content
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Run the script from the command line:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ python load-rdf-into-sesame.py
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and you should get just get:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Loading /Users/Jeni/Downloads/index.rdf into http://source.data.gov.uk/data/reference/organogram-co/2010-06-30 in Sesame
Response 204
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;which isn&amp;#8217;t particularly helpful (well, the &lt;code&gt;204&lt;/code&gt; response tells us it worked), but you can then check &lt;a href=&quot;http://localhost:8080/openrdf-workbench/repositories/reference/contexts&quot;&gt;http://localhost:8080/openrdf-workbench/repositories/reference/contexts&lt;/a&gt; and you should see that there is a new context of &lt;code&gt;http://source.data.gov.uk/data/reference/organogram-co/2010-06-30&lt;/code&gt;:&lt;/p&gt;

&lt;p style=&quot;text-align: center&quot;&gt;
  &lt;img src=&quot;/blog/files/sesame-workbench-contexts.jpg&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Click on the context and it will take you to a list of (some of) the triples in that graph:&lt;/p&gt;

&lt;p style=&quot;text-align: center&quot;&gt;
  &lt;img src=&quot;/blog/files/sesame-workbench-explore-context.jpg&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;One of the nice things about Sesame is that the Workbench provides you with ways of exploring the data that you have loaded. On the left navigation bar there are ways of listing the types of the entities described in the data:&lt;/p&gt;

&lt;p style=&quot;text-align: center&quot;&gt;
  &lt;img src=&quot;/blog/files/sesame-workbench-explore-types.jpg&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;from which you can find instances of that type, for example of &lt;code&gt;org:Organization&lt;/code&gt;:&lt;/p&gt;

&lt;p style=&quot;text-align: center&quot;&gt;
  &lt;img src=&quot;/blog/files/sesame-workbench-explore-organization.jpg&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;and then the statements about a particular instance, for example DirectGov:&lt;/p&gt;

&lt;p style=&quot;text-align: center&quot;&gt;
  &lt;img src=&quot;/blog/files/sesame-workbench-explore-directgov.jpg&quot; /&gt;
&lt;/p&gt;

&lt;h2&gt;Running a Query&lt;/h2&gt;

&lt;p&gt;Onto running a query directly. This is done on Sesame in exactly the same way as it was done on 4store in my last walkthrough: by HTTP POSTing a query to the SPARQL endpoint. Sesame&amp;#8217;s page for testing queries on the &lt;code&gt;reference&lt;/code&gt; repository is at &lt;a href=&quot;http://localhost:8080/openrdf-workbench/repositories/reference/query&quot;&gt;http://localhost:8080/openrdf-workbench/repositories/reference/query&lt;/a&gt; and we&amp;#8217;ll use the basic one that lists types of things that are described within the data:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT DISTINCT ?type 
WHERE { 
  ?thing a ?type .
} 
ORDER BY ?type
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Paste that into the textarea that&amp;#8217;s provided on &lt;a href=&quot;http://localhost:8080/openrdf-workbench/repositories/reference/query&quot;&gt;http://localhost:8080/openrdf-workbench/repositories/reference/query&lt;/a&gt; so it looks like:&lt;/p&gt;

&lt;p style=&quot;text-align: center&quot;&gt;
  &lt;img src=&quot;/blog/files/sesame-workbench-query.jpg&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;and you get an HTML page:&lt;/p&gt;

&lt;p style=&quot;text-align: center&quot;&gt;
  &lt;img src=&quot;/blog/files/sesame-workbench-query-result.jpg&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;That&amp;#8217;s nice for humans, but not so good for computers. When we request the results of this query programmatically, we&amp;#8217;ll want to make sure that we specifically ask for the query results in either &lt;a href=&quot;http://www.w3.org/TR/rdf-sparql-XMLres/&quot;&gt;XML&lt;/a&gt; or &lt;a href=&quot;http://www.w3.org/TR/rdf-sparql-json-res/&quot;&gt;JSON&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I went the XML route last time, so let&amp;#8217;s mix it up a bit and try processing the JSON results of a SPARQL query this time, as it&amp;#8217;s really easy to access using the &lt;code&gt;json&lt;/code&gt; module in Python. So, we need to &lt;code&gt;POST&lt;/code&gt; the query, ensuring that we set the &lt;code&gt;Accept&lt;/code&gt; header to &lt;code&gt;application/sparql-results+json&lt;/code&gt;, and then process the results as JSON. Here is &lt;a href=&quot;/blog/files/find-rdf-types.py&quot;&gt;&lt;code&gt;find-rdf-types.py&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;import urllib
import httplib2
import json

query = &#039;SELECT DISTINCT ?type WHERE { ?thing a ?type . } ORDER BY ?type&#039;
repository = &#039;reference&#039;
endpoint = &quot;http://localhost:8080/openrdf-sesame/repositories/%s&quot; % (repository)

print &quot;POSTing SPARQL query to %s&quot; % (endpoint)
params = { &#039;query&#039;: query }
headers = { 
  &#039;content-type&#039;: &#039;application/x-www-form-urlencoded&#039;, 
  &#039;accept&#039;: &#039;application/sparql-results+json&#039; 
}
(response, content) = httplib2.Http().request(endpoint, &#039;POST&#039;, urllib.urlencode(params), headers=headers)

print &quot;Response %s&quot; % response.status
results = json.loads(content)
print &quot;\n&quot;.join([result[&#039;type&#039;][&#039;value&#039;] for result in results[&#039;results&#039;][&#039;bindings&#039;]])
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Run it:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ python find-rdf-types.py
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and you get:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;POSTing SPARQL query to http://localhost:8080/openrdf-sesame/repositories/reference
Response 200
http://purl.org/linked-data/cube#DataSet
http://purl.org/linked-data/cube#DataStructureDefinition
http://purl.org/linked-data/cube#Observation
http://purl.org/net/opmv/ns#Artifact
http://purl.org/net/opmv/ns#Process
http://purl.org/net/opmv/types/google-refine#OperationDescription
http://purl.org/net/opmv/types/google-refine#Process
http://purl.org/net/opmv/types/google-refine#Project
http://rdfs.org/ns/void#Dataset
http://reference.data.gov.uk/def/central-government/AssistantParliamentaryCounsel
http://reference.data.gov.uk/def/central-government/CivilServicePost
http://reference.data.gov.uk/def/central-government/Department
http://reference.data.gov.uk/def/central-government/DeputyDirector
http://reference.data.gov.uk/def/central-government/DeputyParliamentaryCounsel
http://reference.data.gov.uk/def/central-government/Director
http://reference.data.gov.uk/def/central-government/DirectorGeneral
http://reference.data.gov.uk/def/central-government/ParliamentaryCounsel
http://reference.data.gov.uk/def/central-government/PermanentSecretary
http://reference.data.gov.uk/def/central-government/PublicBody
http://reference.data.gov.uk/def/central-government/SeniorAssistantParliamentaryCounsel
http://reference.data.gov.uk/def/intervals/CalendarDay
http://www.w3.org/2000/01/rdf-schema#Class
http://www.w3.org/ns/org#Organization
http://www.w3.org/ns/org#OrganizationalUnit
http://xmlns.com/foaf/0.1/Person
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is the same set of types as that given through the HTML browse interface. Note that the JSON results themselves look like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;{
  &quot;head&quot;: {
    &quot;vars&quot;: [ &quot;type&quot; ]
  }, 
  &quot;results&quot;: {
    &quot;bindings&quot;: [
      {
        &quot;type&quot;: { &quot;type&quot;: &quot;uri&quot;, &quot;value&quot;: &quot;http:\/\/purl.org\/linked-data\/cube#DataSet&quot; }
      }, 
      {
        &quot;type&quot;: { &quot;type&quot;: &quot;uri&quot;, &quot;value&quot;: &quot;http:\/\/purl.org\/linked-data\/cube#DataStructureDefinition&quot; }
      }, 
      {
        &quot;type&quot;: { &quot;type&quot;: &quot;uri&quot;, &quot;value&quot;: &quot;http:\/\/purl.org\/linked-data\/cube#Observation&quot; }
      }, 
      ...
    ]
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Each of the items within the &lt;code&gt;bindings&lt;/code&gt; array contains a set of bindings for the variables in the SPARQL query. This closely matches the XML format.&lt;/p&gt;

&lt;h2&gt;Processing RDF&lt;/h2&gt;

&lt;p&gt;Now we get onto the part of this where we look at specific libraries for RDF support in Python. The most popular library is &lt;a href=&quot;http://www.rdflib.net/&quot;&gt;rdflib&lt;/a&gt;, which you can install using MacPorts:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo port install py26-rdflib
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The SPARQL query we&amp;#8217;ll try this time uses a CONSTRUCT query, which creates RDF, rather than a SELECT query, which as we&amp;#8217;ve seen can create either XML or JSON. For example, try the query:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;PREFIX foaf: &amp;lt;http://xmlns.com/foaf/0.1/&amp;gt;

CONSTRUCT {
  ?person 
    a foaf:Person ;
    foaf:name ?name ;
    ?prop ?value .
} WHERE { 
  ?person a foaf:Person ;
    foaf:name ?name ;
    ?prop ?value .
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This gets all the information in the data about the individuals for whom names have been supplied in the data, as RDF. Again, Sesame will display this as HTML when you try doing it, but you can choose a different format from the drop-down menu at the top of the Query Result display:&lt;/p&gt;

&lt;p style=&quot;text-align: center&quot;&gt;
  &lt;img src=&quot;/blog/files/sesame-workbench-query-result-rdf.jpg&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;When you&amp;#8217;re not accessing using a browser, by default Sesame serves up its results in &lt;a href=&quot;http://www4.wiwiss.fu-berlin.de/bizer/TriG/Spec/&quot;&gt;TriG format&lt;/a&gt;, which isn&amp;#8217;t particularly appropriate for the results of CONSTRUCT queries as we don&amp;#8217;t need multiple graphs. We&amp;#8217;ll request &lt;a href=&quot;http://www.w3.org/TR/rdf-testcases/#ntriples&quot;&gt;N-Triples&lt;/a&gt; as that&amp;#8217;s something rdflib can understand. Sesame 2 uses the content type &lt;code&gt;text/plain&lt;/code&gt; for N-Triples, so we can request this type by setting the &lt;code&gt;Accept&lt;/code&gt; header:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;params = { &#039;query&#039;: query }
headers = { 
  &#039;content-type&#039;: &#039;application/x-www-form-urlencoded&#039;, 
  &#039;accept&#039;: &#039;text/plain&#039; 
}
(response, content) = httplib2.Http().request(endpoint, &#039;POST&#039;, urllib.urlencode(params), headers=headers)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We then need to parse this Turtle response into a &lt;a href=&quot;http://www.rdflib.net/rdflib-2.4.0/html/public/rdflib.Graph.Graph-class.html&quot;&gt;&lt;code&gt;rdflib.Graph&lt;/code&gt;&lt;/a&gt; object:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;graph = rdflib.ConjunctiveGraph()
graph.parse(rdflib.StringInputSource(content), format=&quot;nt&quot;)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We then need to get information out of that graph, which rdflib isn&amp;#8217;t particularly good at. So let&amp;#8217;s use &lt;a href=&quot;http://www.openvest.com/trac/wiki/RDFAlchemy&quot;&gt;RDFAlchemy&lt;/a&gt; instead. That can be installed using &lt;a href=&quot;http://packages.python.org/distribute/easy_install.html&quot;&gt;easy_install&lt;/a&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo easy_install-2.6 rdfalchemy
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;RDFAlchemy can be used to map RDF graphs onto Python object structures in a fairly straight-forward manner. Basically, you define the namespaces of the vocabularies that you want to use, then some classes for the kinds of things that you have in the data, with properties that map onto properties in the RDF. Then you set the &lt;code&gt;rdfSubject.db&lt;/code&gt; to the source of the data (which can be an rdflib Graph as above) and can query on it. Here&amp;#8217;s an example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;FOAF = rdflib.Namespace(&#039;http://xmlns.com/foaf/0.1/&#039;)
RDF = rdflib.Namespace(&#039;http://www.w3.org/1999/02/22-rdf-syntax-ns#&#039;)

class Person(rdfalchemy.rdfSubject):
  rdf_type = FOAF.Person
  name = rdfalchemy.rdfSingle(FOAF.name)
  mbox = rdfalchemy.rdfSingle(FOAF.mbox)

rdfalchemy.rdfSubject.db = graph
stott = Person.get_by(name=&#039;Andrew Stott&#039;)
print &quot;Andrew Stott&#039;s email address: %s&quot; % stott.mbox.n3()
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;RDFAlchemy adds both &lt;code&gt;get_by()&lt;/code&gt; and &lt;code&gt;filter_by()&lt;/code&gt; methods on the descriptor classes that you define, to get a single item that matches a query or a list of items, respectively.&lt;/p&gt;

&lt;p&gt;The full script for &lt;a href=&quot;/blog/files/get-names-and-emails.py&quot;&gt;&amp;#8216;get-names-and-emails.py&amp;#8217;&lt;/a&gt; is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;import urllib
import httplib2
import rdflib
import rdfalchemy

query = &quot;&quot;&quot;PREFIX foaf: &amp;lt;http://xmlns.com/foaf/0.1/&amp;gt;

CONSTRUCT {
  ?person
    a foaf:Person ;
    foaf:name ?name ;
    ?prop ?value .
} WHERE {
  ?person a foaf:Person ;
    foaf:name ?name ;
    ?prop ?value .
}&quot;&quot;&quot;
repository = &#039;reference&#039;
endpoint = &quot;http://localhost:8080/openrdf-sesame/repositories/%s&quot; % repository

print &quot;POSTing SPARQL query to %s&quot; % endpoint
params = { &#039;query&#039;: query }
headers = { 
  &#039;content-type&#039;: &#039;application/x-www-form-urlencoded&#039;, 
  &#039;accept&#039;: &#039;text/plain&#039; 
}
(response, content) = httplib2.Http().request(endpoint, &#039;POST&#039;, urllib.urlencode(params), headers=headers)
print &quot;Response %s&quot; % response.status

graph = rdflib.ConjunctiveGraph()
graph.parse(rdflib.StringInputSource(content), format=&quot;nt&quot;)

print &quot;Loaded %d triples&quot; % len(graph)

FOAF = rdflib.Namespace(&#039;http://xmlns.com/foaf/0.1/&#039;)
RDF = rdflib.Namespace(&#039;http://www.w3.org/1999/02/22-rdf-syntax-ns#&#039;)

class Person(rdfalchemy.rdfSubject):
  rdf_type = FOAF.Person
  name = rdfalchemy.rdfSingle(FOAF.name)
  mbox = rdfalchemy.rdfSingle(FOAF.mbox)

rdfalchemy.rdfSubject.db = graph
stott = Person.get_by(name=&#039;Andrew Stott&#039;)
print &quot;Andrew Stott&#039;s email address: %s&quot; % stott.mbox.n3()
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Run this script with:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ python get-names-and-emails.py
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and you get the result:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;No handlers could be found for logger &quot;rdflib.Literal&quot;
POSTing SPARQL query to http://localhost:8080/openrdf-sesame/repositories/reference
Response 200
Loaded 459 triples
Andrew Stott&#039;s email address: &amp;lt;mailto:andrew.stott@cabinet-office.gsi.gov.uk&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The first line is apparently a &lt;a href=&quot;http://groups.google.com/group/rdfalchemy-dev/browse_thread/thread/44a94ec27c4c0b85&quot;&gt;side-effect of rdflib/RDFAlchemy weirdness&lt;/a&gt; which you don&amp;#8217;t need to worry about. The rest shows that the search was successful; the call to the &lt;code&gt;.n3()&lt;/code&gt; call on the email address is only necessary because it is a resource rather than a literal, and therefore doesn&amp;#8217;t get converted to a particularly readable string otherwise.&lt;/p&gt;

&lt;h2&gt;Conclusions&lt;/h2&gt;

&lt;p&gt;So there you have it, another walkthrough of setting up a local triplestore, loading in data and accessing that data programmatically using SPARQL queries, this time using Sesame and Python rather than 4store and Ruby.&lt;/p&gt;

&lt;p&gt;This walkthrough took me a fair bit longer to do than the previous one, for several reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I&amp;#8217;ve done almost no previous programming with Python (as you can probably tell), so every little thing took ages to work out &amp;#8212; you know you&amp;#8217;re in trouble when you&amp;#8217;re Googling for string concatenation code! I&amp;#8217;m very happy to accept corrections and improvements, which I&amp;#8217;ll include in the above.&lt;/li&gt;
&lt;li&gt;I spent a lot of time faffing around with different Python versions, opting for the latest and then finding that the libraries that I wanted to use weren&amp;#8217;t available for that version and so on. I eventually ended up with Python 2.6; the code above may or may not work with any other versions.&lt;/li&gt;
&lt;li&gt;Setting up Sesame 2 was pretty frustrating: first Tomcat wouldn&amp;#8217;t work, then Jetty wouldn&amp;#8217;t work, and finally I did get Tomcat working and then had the issue with the log directory, as I described above. Once I&amp;#8217;d changed the data directory things worked very smoothly.&lt;/li&gt;
&lt;li&gt;I thought rdflib was going to be enough to work with RDF in Python, but really it isn&amp;#8217;t (if you want to get data &lt;em&gt;out&lt;/em&gt; as well as put data &lt;em&gt;in&lt;/em&gt;), so I had to find something else.&lt;/li&gt;
&lt;li&gt;The documentation for rdflib and RDFAlchemy isn&amp;#8217;t as comprehensive as the documentation for RDF.rb, especially if you&amp;#8217;re not familiar with Python, so it took me a bit longer to work out how to do things with those particular libraries.&lt;/li&gt;
&lt;li&gt;I took a lot more screenshots!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Again, I haven&amp;#8217;t followed Richard&amp;#8217;s steps to the letter; in particular I haven&amp;#8217;t used a package to get data out of (or into) Sesame: I&amp;#8217;ve just done it through HTTP calls. I did it this way deliberately because I think it&amp;#8217;s a really important feature of triplestores that you can query them through a common interface: SPARQL. It means that you can take the Python code here and use it against 4store or another triplestore with only a change to the value of the endpoint variable, and similarly take the Ruby code from my previous walkthrough and use it against Sesame. Your code is not tied to a particular implementation or API; you &amp;#8220;only&amp;#8221; have to learn SPARQL and you&amp;#8217;re away.&lt;/p&gt;

&lt;p&gt;If you prefer something a little more tightly bound, however, RDFAlchemy does have some targeted &lt;a href=&quot;http://www.openvest.com/trac/wiki/RDFAlchemy#Sesame&quot;&gt;Sesame support&lt;/a&gt;, as does &lt;a href=&quot;http://rdf.rubyforge.org/sesame/&quot;&gt;RDF.rb&lt;/a&gt; for that matter. These can help with the management of the data within the repository as well as querying it.&lt;/p&gt;

&lt;p&gt;Another thing that&amp;#8217;s worth pointing out is that 4store and Sesame have completely different (HTTP-based) interfaces for getting data into stores, and that rdflib/RDFAlchemy and RDF.rb have completely different ways of loading data into in-memory graphs, querying it, and getting information from the results, quite aside from the obvious language-based differences that you&amp;#8217;d expect.&lt;/p&gt;

&lt;p&gt;On the SPARQL side, there are some efforts within the W3C to define a &lt;a href=&quot;http://www.w3.org/TR/sparql11-http-rdf-update/&quot;&gt;uniform HTTP protocol for managing RDF graphs&lt;/a&gt; and of course there&amp;#8217;s &lt;a href=&quot;http://www.w3.org/TR/sparql11-update/&quot;&gt;SPARQL 1.1 Update&lt;/a&gt;. There are glimmers of hope for a &lt;a href=&quot;http://www.w3.org/QA/2010/12/new_rdf_working_group_rdfjson.html&quot;&gt;standard RDF API&lt;/a&gt;, as &lt;a href=&quot;http://www.jenitennison.com/blog/node/150&quot;&gt;I&amp;#8217;ve argued for recently&lt;/a&gt;, but I gather that this effort will be focused on client-side developers, ie that it is really a standard RDF API &lt;em&gt;for Javascript&lt;/em&gt;, which I think is a wasted opportunity: I would have been faster in this task if I&amp;#8217;d been able to use familiar methods, and I wouldn&amp;#8217;t have been so dependent on the documentation provided by the author of a particular library.&lt;/p&gt;

&lt;p&gt;Anyway, hopefully my tramping this path will make it easier for those who follow.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/153#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/46">linked data</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/65">python</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/31">rdf</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/66">rdfalchemy</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/64">rdflib</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/67">sesame</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/51">sparql</category>
 <enclosure url="http://www.jenitennison.com/blog/files/load-rdf-into-sesame.py.txt" length="615" type="text/plain" />
 <pubDate>Tue, 25 Jan 2011 17:27:24 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">153 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Getting Started with RDF and SPARQL Using 4store and RDF.rb</title>
 <link>http://www.jenitennison.com/blog/node/152</link>
 <description>&lt;p&gt;&lt;strong&gt;Updated&lt;/strong&gt; to include some of &lt;a href=&quot;http://www.jenitennison.com/blog/node/152#comment-10579&quot;&gt;Arto Bendicken&amp;#8217;s recommendations&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This post is a response to Richard Pope&amp;#8217;s &lt;a href=&quot;http://memespring.co.uk/2011/01/linked-data-rdfsparql-documentation-challenge/&quot;&gt;Linked Data/RDF/SPARQL Documentation Challenge&lt;/a&gt;. In it, he asks for documentation of the following steps:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;ul&gt;
  &lt;li&gt;Install an RDF store from a package management system on a computer running either Apple’s OSX or Ubuntu Desktop.&lt;/li&gt;
  &lt;li&gt;Install a code library (again from a package management system) for talking to the RDF store in either PHP, Ruby or Python.&lt;/li&gt;
  &lt;li&gt;Programatically load some real-world data into the RDF datastore using either PHP, Ruby or Python.&lt;/li&gt;
  &lt;li&gt;Programatically retrieve data from the datastore with SPARQL using using either PHP, Ruby or Python.&lt;/li&gt;
  &lt;li&gt;Convert retrieved data into an object or datatype that can be used by the chosen programming language (e.g. a Python dictionary).&lt;/li&gt;
  &lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;I&amp;#8217;ve been told so many time how RDF sucks for mainstream developers that it was the main point of my &lt;a href=&quot;http://www.w3.org/2010/11/TPAC/RDF-SW-velocity.pdf&quot;&gt;TPAC talk&lt;/a&gt; late last year. I think that this is a great motivating challenge for improving not only the documentation of how to use RDF stores and libraries but how to improve their generally installability and usability for developers as well.&lt;/p&gt;

&lt;p&gt;Anyway, I thought I&amp;#8217;d try to get as far as I could to see just how bad things really are. I am on Mac OS X, and I&amp;#8217;m going to use Ruby (although I don&amp;#8217;t really know it all that well, so please forgive my mistakes). I&amp;#8217;ll breeze on through as if everything is hunky dory, but there are some caveats at the end.&lt;/p&gt;

&lt;!--break--&gt;

&lt;h2&gt;4store&lt;/h2&gt;

&lt;p&gt;I&amp;#8217;m going to use &lt;a href=&quot;http://4store.org&quot;&gt;4store&lt;/a&gt; because it&amp;#8217;s really easy to install on the Mac. If you want to install it on Ubuntu, &lt;a href=&quot;http://blog.dbtune.org/post/2009/08/14/4Store-stuff&quot;&gt;there&amp;#8217;s a package available&lt;/a&gt;. For a Mac, it&amp;#8217;s a matter of going to the &lt;a href=&quot;http://4store.org/download/macosx/&quot;&gt;list of Mac downloads&lt;/a&gt;, downloading the most recent version, opening the &lt;code&gt;.dmg&lt;/code&gt; and installing the 4store application by dragging it into your Applications folder.&lt;/p&gt;

&lt;p&gt;When you run the 4store application you get a command line prompt. To set up and start a triplestore called &amp;#8216;reference&amp;#8217; with a SPARQL endpoint running on port 8000, type the following commands:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ 4s-backend-setup reference
$ 4s-backend reference
$ 4s-httpd -p 8000 reference
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you then navigate to &lt;a href=&quot;http://localhost:8000/&quot;&gt;http://localhost:8000/&lt;/a&gt; you should see the following:&lt;/p&gt;

&lt;p style=&quot;text-align: center&quot;&gt;
  &lt;img src=&quot;/blog/files/4store-homepage.jpg&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;Don&amp;#8217;t let the title &amp;#8216;Not found&amp;#8217; put you off. The fact you get a response means that it&amp;#8217;s working.&lt;/p&gt;

&lt;h2&gt;Loading Data&lt;/h2&gt;

&lt;p&gt;First, find some data to load. A good place for government RDF data is &lt;a href=&quot;http://source.data.gov.uk/data/&quot;&gt;http://source.data.gov.uk/data/&lt;/a&gt; for example. I downloaded&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;a href=&quot;http://source.data.gov.uk/data/reference/organogram-co/2010-10-31/index.rdf&quot;&gt;http://source.data.gov.uk/data/reference/organogram-co/2010-10-31/index.rdf&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;There are several ways of &lt;a href=&quot;http://4store.org/trac/wiki/ImportData&quot;&gt;importing data into 4store using the command line&lt;/a&gt;. Yves Raimond has created a &lt;a href=&quot;https://github.com/moustaki/4store-ruby&quot;&gt;Ruby gem&lt;/a&gt; for doing so programmatically. There&amp;#8217;s also &lt;a href=&quot;https://github.com/fumi/rdf-4store&quot;&gt;rdf-4store&lt;/a&gt; from Fumihiro Kato which ties into the &lt;a href=&quot;http://rdf.rubyforge.org/&quot;&gt;RDF.rb&lt;/a&gt; library which I&amp;#8217;ll use later on.&lt;/p&gt;

&lt;p&gt;However, if you use the &lt;a href=&quot;http://4store.org/trac/wiki/SparqlServer&quot;&gt;SPARQL server&lt;/a&gt; then it&amp;#8217;s just an HTTP PUT call, which of course you can do in any language you like (every language has support for making HTTP requests, right?) without the need to install any store-specific packages. However, since we&amp;#8217;ll be doing a lot of HTTP requests, it&amp;#8217;s useful to have a library that can make them simple. There are &lt;a href=&quot;http://ruby-toolbox.com/categories/http_clients.html&quot;&gt;plenty to choose from for Ruby&lt;/a&gt;. I chose &lt;a href=&quot;https://github.com/archiloque/rest-client&quot;&gt;rest-client&lt;/a&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo gem install rest-client
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;With that, I wrote the following little Ruby script called &lt;a href=&quot;/blog/files/load-data-into-4store_0.rb&quot;&gt;&amp;#8216;load-data-into-4store.rb&amp;#8217;&lt;/a&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;#!/usr/bin/env ruby
require &#039;rubygems&#039;
require &#039;rest_client&#039;

filename = &#039;/Users/Jeni/Downloads/index.rdf&#039;
graph    = &#039;http://source.data.gov.uk/data/reference/organogram-co/2010-06-30&#039;
endpoint = &#039;http://localhost:8000/data/&#039;

puts &quot;Loading #{filename} into #{graph} in 4store&quot;
response = RestClient.put endpoint + graph, File.read(filename), :content_type =&amp;gt; &#039;application/rdf+xml&#039;
puts &quot;Response #{response.code}: 
#{response.to_str}&quot;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Run the script from the command line:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ ruby load-rdf-into-4store.rb
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and you should get the response:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Sending PUT /data/http://source.data.gov.uk/data/reference/organogram-co/2010-06-30 to localhost:8000
Response 201: 
&amp;lt;!DOCTYPE HTML PUBLIC &quot;-//IETF//DTD HTML 2.0//EN&quot;&amp;gt;
&amp;lt;html&amp;gt;&amp;lt;head&amp;gt;&amp;lt;title&amp;gt;201 imported successfully&amp;lt;/title&amp;gt;&amp;lt;/head&amp;gt;
&amp;lt;body&amp;gt;&amp;lt;h1&amp;gt;201 imported successfully&amp;lt;/h1&amp;gt;
&amp;lt;p&amp;gt;This is a 4store SPARQL server.&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;4store v1.0.5&amp;lt;/p&amp;gt;&amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can then check &lt;a href=&quot;http://localhost:8000/status/size/&quot;&gt;http://localhost:8000/status/size/&lt;/a&gt; and you should see that there are now some triples in the store:&lt;/p&gt;

&lt;p style=&quot;text-align: center&quot;&gt;
  &lt;img src=&quot;/blog/files/4store-size.jpg&quot; /&gt;
&lt;/p&gt;

&lt;h2&gt;Running a Query&lt;/h2&gt;

&lt;p&gt;The next step is to query that data using SPARQL. Running SPARQL queries is just a matter of HTTP POSTing a query to the SPARQL endpoint. 4store provides a page that you can use to test out queries at &lt;a href=&quot;http://localhost:8000/test/&quot;&gt;http://localhost:8000/test/&lt;/a&gt; so perhaps we should do that before diving into the Ruby code. The easy one to start with is just one that returns a list of the types of things that are described within the data:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT DISTINCT ?type 
WHERE { 
  ?thing a ?type .
} 
ORDER BY ?type
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Paste that into the textarea that&amp;#8217;s provided on &lt;a href=&quot;http://localhost:8000/test/&quot;&gt;http://localhost:8000/test/&lt;/a&gt; so it looks like:&lt;/p&gt;

&lt;p style=&quot;text-align: center&quot;&gt;
  &lt;img src=&quot;/blog/files/4store-test-query.jpg&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;and you get some XML:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;?xml version=&quot;1.0&quot;?&amp;gt;
&amp;lt;sparql xmlns=&quot;http://www.w3.org/2005/sparql-results#&quot;&amp;gt;
  &amp;lt;head&amp;gt;
    &amp;lt;variable name=&quot;type&quot;/&amp;gt;
  &amp;lt;/head&amp;gt;
  &amp;lt;results&amp;gt;
    &amp;lt;result&amp;gt;
      &amp;lt;binding name=&quot;type&quot;&amp;gt;&amp;lt;uri&amp;gt;http://purl.org/linked-data/cube#DataSet&amp;lt;/uri&amp;gt;&amp;lt;/binding&amp;gt;
    &amp;lt;/result&amp;gt;
    &amp;lt;result&amp;gt;
      &amp;lt;binding name=&quot;type&quot;&amp;gt;&amp;lt;uri&amp;gt;http://purl.org/linked-data/cube#DataStructureDefinition&amp;lt;/uri&amp;gt;&amp;lt;/binding&amp;gt;
    &amp;lt;/result&amp;gt;
    ...
  &amp;lt;/results&amp;gt;
&amp;lt;/sparql&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;SELECT queries like this one (which are the most common kind of query to run to simply extract data) return &lt;a href=&quot;http://www.w3.org/TR/rdf-sparql-XMLres/&quot;&gt;SPARQL Query Results XML Format&lt;/a&gt; by default, so there&amp;#8217;s no need to get hold of a specialised library for processing the results: you just need something to process XML.&lt;/p&gt;

&lt;p&gt;For Ruby, I&amp;#8217;m choosing &lt;a href=&quot;http://nokogiri.org/&quot;&gt;Nokogiri&lt;/a&gt; as I&amp;#8217;ve heard good things about it. To install:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo port install libxml2 libxslt
$ sudo gem install nokogiri
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So now we just need a script that will run this query, process the results as XML, and do something with them. Call it &lt;a href=&quot;/blog/files/find-rdf-types_0.rb&quot;&gt;&amp;#8216;find-rdf-types.rb&amp;#8217;&lt;/a&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;#!/usr/bin/env ruby
require &#039;rubygems&#039;
require &#039;rest_client&#039;
require &#039;nokogiri&#039;

query = &#039;SELECT DISTINCT ?type WHERE { ?thing a ?type . } ORDER BY ?type&#039;
endpoint = &#039;http://localhost:8000/sparql/&#039;

puts &quot;POSTing SPARQL query to #{endpoint}&quot;
response = RestClient.post endpoint, :query =&amp;gt; query
puts &quot;Response #{response.code}&quot;
xml = Nokogiri::XML(response.to_str)

xml.xpath(&#039;//sparql:binding[@name = &quot;type&quot;]/sparql:uri&#039;, &#039;sparql&#039; =&amp;gt; &#039;http://www.w3.org/2005/sparql-results#&#039;).each do |type|
  puts type.content
end
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Run it:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ ruby find-rdf-types.rb
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and you get:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;POSTing SPARQL query to http://localhost:8000/sparql/
Response 200
http://purl.org/linked-data/cube#DataSet
http://purl.org/linked-data/cube#DataStructureDefinition
http://purl.org/linked-data/cube#Observation
http://purl.org/net/opmv/ns#Artifact
http://purl.org/net/opmv/ns#Process
http://purl.org/net/opmv/types/google-refine#OperationDescription
http://purl.org/net/opmv/types/google-refine#Process
http://purl.org/net/opmv/types/google-refine#Project
http://rdfs.org/ns/void#Dataset
http://reference.data.gov.uk/def/central-government/AssistantParliamentaryCounsel
http://reference.data.gov.uk/def/central-government/CivilServicePost
http://reference.data.gov.uk/def/central-government/Department
http://reference.data.gov.uk/def/central-government/DeputyDirector
http://reference.data.gov.uk/def/central-government/DeputyParliamentaryCounsel
http://reference.data.gov.uk/def/central-government/Director
http://reference.data.gov.uk/def/central-government/DirectorGeneral
http://reference.data.gov.uk/def/central-government/ParliamentaryCounsel
http://reference.data.gov.uk/def/central-government/PermanentSecretary
http://reference.data.gov.uk/def/central-government/PublicBody
http://reference.data.gov.uk/def/central-government/SeniorAssistantParliamentaryCounsel
http://reference.data.gov.uk/def/intervals/CalendarDay
http://www.w3.org/2000/01/rdf-schema#Class
http://www.w3.org/ns/org#Organization
http://www.w3.org/ns/org#OrganizationalUnit
http://xmlns.com/foaf/0.1/Person
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So we can see that the dataset contains information that include statistical data using the &lt;a href=&quot;http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html&quot;&gt;data cube&lt;/a&gt; vocabulary, provenance information using &lt;a href=&quot;http://code.google.com/p/opmv/&quot;&gt;OPMV (Open Provenance Model Vocabulary)&lt;/a&gt;, some information about organisations using &lt;a href=&quot;http://www.epimorphics.com/public/vocabulary/org.html&quot;&gt;org&lt;/a&gt;, some data.gov.uk-specific vocabulary, and people using &lt;a href=&quot;http://xmlns.com/foaf/spec/&quot;&gt;FOAF&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;Processing RDF&lt;/h2&gt;

&lt;p&gt;Sometimes it can be useful to get non-tabular data out of SPARQL. At that point, rather than using SELECT queries, you will want to use a CONSTRUCT query, which creates RDF. For example, try the query:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;PREFIX foaf: &amp;lt;http://xmlns.com/foaf/0.1/&amp;gt;

CONSTRUCT {
  ?person 
    a foaf:Person ;
    foaf:name ?name ;
    ?prop ?value .
} WHERE { 
  ?person a foaf:Person ;
    foaf:name ?name ;
    ?prop ?value .
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This gets all the information in the data about the individuals for whom names have been supplied in the data, as RDF.&lt;/p&gt;

&lt;p&gt;Although the response is RDF/XML, you definitely &lt;em&gt;do not&lt;/em&gt; want to process it as XML. Instead, you need a proper RDF library. Fortunately, there&amp;#8217;s a good one for Ruby in &lt;a href=&quot;http://rdf.rubyforge.org/&quot;&gt;RDF.rb&lt;/a&gt;. You can install it and a bunch of extra plugins that make it easy to deal with RDF in all its guises using:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo gem install linkeddata
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This lets us pick out an appropriate parser based on the &lt;code&gt;Content-Type&lt;/code&gt; of the response, and load the results of the SPARQL query into an  in-memory &lt;a href=&quot;http://rdf.rubyforge.org/RDF/Graph.html&quot;&gt;&lt;code&gt;RDF::Graph&lt;/code&gt;&lt;/a&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;response = RestClient.post endpoint, :query =&amp;gt; query
content_type = response.headers[:content_type][/^[^ ;]+/]
puts &quot;Response #{response.code} type #{content_type}&quot;

graph = RDF::Graph.new
graph &amp;lt;&amp;lt; RDF::Reader.for(:content_type =&amp;gt; content_type).new(response.to_str)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We can perform subsequent queries over that graph, for example just to extract names and telephone numbers and put them into a dictionary:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;query = RDF::Query.new({
  :person =&amp;gt; {
    RDF.type  =&amp;gt; FOAF.Person,
    FOAF.name =&amp;gt; :name,
    FOAF.mbox =&amp;gt; :email,
  }
})

people = {}
query.execute(graph).each do |person|
  people[person.name.to_s] = person.email.to_s
end
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It&amp;#8217;s worth noting that the constants &lt;code&gt;RDF&lt;/code&gt; and &lt;code&gt;FOAF&lt;/code&gt; are pre-declared by including &lt;code&gt;RDF&lt;/code&gt;, and the values that you get back from a query are RDF values, which can be URIs or have datatypes or languages. In the above code I&amp;#8217;ve converted them into strings for insertion into the Ruby dictionary.&lt;/p&gt;

&lt;p&gt;The full script for &lt;a href=&quot;/blog/files/get-names-and-emails_0.rb&quot;&gt;&amp;#8216;get-names-and-emails.rb&amp;#8217;&lt;/a&gt; is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;#!/usr/bin/env ruby
require &#039;rubygems&#039;
require &#039;rest_client&#039;
require &#039;linkeddata&#039;

include RDF

query = &quot;PREFIX foaf: &amp;lt;http://xmlns.com/foaf/0.1/&amp;gt;

CONSTRUCT {
  ?person 
    a foaf:Person ;
    foaf:name ?name ;
    ?prop ?value .
} WHERE { 
  ?person a foaf:Person ;
    foaf:name ?name ;
    ?prop ?value .
}&quot;
endpoint = &#039;http://localhost:8000/sparql/&#039;

puts &quot;POSTing SPARQL query to #{endpoint}&quot;
response = RestClient.post endpoint, :query =&amp;gt; query
content_type = response.headers[:content_type][/^[^ ;]+/]
puts &quot;Response #{response.code} type #{content_type}&quot;

graph = RDF::Graph.new
graph &amp;lt;&amp;lt; RDF::Reader.for(:content_type =&amp;gt; content_type).new(response.to_str)

puts &quot;\nLoaded #{graph.count} triples\n&quot;

query = RDF::Query.new({
  :person =&amp;gt; {
    RDF.type  =&amp;gt; FOAF.Person,
    FOAF.name =&amp;gt; :name,
    FOAF.mbox =&amp;gt; :email,
  }
})

people = {}
query.execute(graph).each do |person|
  people[person.name.to_s] = person.email.to_s
end
puts &quot;\nCreating directory of #{people.length} people&quot;

stott_email = people[&#039;Andrew Stott&#039;]
puts &quot;\nAndrew Stott&#039;s email address: #{stott_email}&quot;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Run this script with:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ ruby get-names-and-emails.rb
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and you get the result:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;POSTing SPARQL query to http://localhost:8000/sparql/
Response 200 type application/rdf+xml

Loaded 459 triples

Creating directory of 75 people

Andrew Stott&#039;s email address: mailto:andrew.stott@cabinet-office.gsi.gov.uk
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Conclusions and Caveats&lt;/h2&gt;

&lt;p&gt;So there you have it, a walkthrough of setting up a local triplestore, loading in data and accessing that data programmatically using SPARQL queries.&lt;/p&gt;

&lt;p&gt;Now for some caveats. First, you&amp;#8217;re bound to have noticed that I having followed Richard&amp;#8217;s steps to the letter.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;4store wasn&amp;#8217;t installed from a package management system. The only packaged triplestore I could locate on &lt;a href=&quot;http://www.macports.org/&quot;&gt;MacPorts&lt;/a&gt; was &lt;a href=&quot;http://virtuoso.openlinksw.com/&quot;&gt;Virtuoso&lt;/a&gt; (which I&amp;#8217;ll come to in a second). I hope that 4store&amp;#8217;s installation is simple enough for this slight deviation from the rules not to matter.&lt;/li&gt;
&lt;li&gt;I didn&amp;#8217;t install a package for specifically talking to 4store in order to load in data, just used HTTP requests. There are &lt;a href=&quot;http://4store.org/trac/wiki/ClientLibraries&quot;&gt;client libraries&lt;/a&gt; for 4store, but I figure that the HTTP requests are easy enough, and the resulting code more portable into other environments, so I prefer not to use them.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Second, there are a couple of dead ends that I went down that I haven&amp;#8217;t written up in the above:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;I did spend some time yesterday evening trying to get &lt;a href=&quot;http://virtuoso.openlinksw.com/&quot;&gt;Virtuoso&lt;/a&gt; set up. I managed to get it installed, but loading data into it seemed to require some magic which I couldn&amp;#8217;t figure out. So I went to bed instead.&lt;/li&gt;
&lt;li&gt;I tried to install and use &lt;a href=&quot;http://rdf.rubyforge.org/raptor/&quot;&gt;rdf-raptor&lt;/a&gt; in order to parse the RDF/XML that naturally comes out of 4store CONSTRUCT queries, but got a &lt;code&gt;Could not open library &#039;libraptor&#039;&lt;/code&gt; error. I couldn&amp;#8217;t find an immediate fix for that, so decided to keep things simple instead and just use plain RDF.rb.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Third, I want to reiterate that there may be better ways of using 4store, rest_client, Nokogiri and RDF.rb, as well as Ruby generally, than those shown above. I don&amp;#8217;t claim to be an expert in any of these technologies. If you have suggestions and corrections, I&amp;#8217;d encourage you to add a comment and I&amp;#8217;ll incorporate them in the text to improve it.&lt;/p&gt;

&lt;p&gt;Finally, some general points, because the strong binding of &amp;#8216;linked data&amp;#8217; and &amp;#8216;SPARQL&amp;#8217; in Richard&amp;#8217;s post bothers me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It&amp;#8217;s not necessary to have a SPARQL endpoint when publishing linked data, nor to run your own triplestore. If you already have a website, you are probably better off generating N-Triples or RDF/XML or Turtle in the same way as you generate HTML or XML or JSON.&lt;/li&gt;
&lt;li&gt;It&amp;#8217;s not necessary to learn SPARQL to access and use linked data: the whole point is that the data in linked data is available through HTTP access in standard (RDF-based) formats, so you can scrape them using a follow-your-nose approach and store the results however you like.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Having said the above, if you&amp;#8217;re collecting linked data from multiple sources with unpredictable content and want to query across it, having a local triplestore is very useful.&lt;/p&gt;

&lt;p&gt;I also want to point out that within the &lt;a href=&quot;http://data.gov.uk/linked-data&quot;&gt;linked data we&amp;#8217;ve published on data.gov.uk&lt;/a&gt;, we&amp;#8217;ve made a big effort to make the data available in multiple formats such as JSON, XML and CSV, and through a RESTful, URI-parameter-driven API, precisely to lower the barrier for developers who want to use that information but understandably don&amp;#8217;t want to take the time or make the effort to learn the linked data technologies that underly the sites. For those that do, the RDF/XML and Turtle is there as well, and the SPARQL queries that are used to create each page are available to look at, tweak, and learn from. Our hope is that the &lt;a href=&quot;http://code.google.com/p/linked-data-api/&quot;&gt;linked data API&lt;/a&gt; that provides access to lists of &lt;a href=&quot;http://education.data.gov.uk/doc/school&quot;&gt;schools&lt;/a&gt;, &lt;a href=&quot;http://reference.data.gov.uk/doc/department&quot;&gt;departments&lt;/a&gt; and &lt;a href=&quot;http://transport.data.gov.uk/doc/station&quot;&gt;railway stations&lt;/a&gt; can make the linked data learning curve a little less steep.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/152#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/61">4store</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/46">linked data</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/31">rdf</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/62">rdf.rb</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/63">ruby</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/51">sparql</category>
 <enclosure url="http://www.jenitennison.com/blog/files/load-rdf-into-4store_0.rb" length="437" type="text/x-ruby-script" />
 <pubDate>Sat, 15 Jan 2011 19:17:57 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">152 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Standardising an RDF API</title>
 <link>http://www.jenitennison.com/blog/node/150</link>
 <description>&lt;p&gt;I got a little bit of &lt;a href=&quot;http://twitter.com/vambenepe/status/9097914244669440&quot;&gt;pushback&lt;/a&gt; on my &lt;a href=&quot;http://www.jenitennison.com/blog/node/149&quot;&gt;previous blog post&lt;/a&gt; for suggesting that W3C should standardise an API for RDF. (I&amp;#8217;m talking here about a programming-interface-kind-of-API to enable developers to extract information out of an RDF document rather than a website-API to enable them to access RDF data in the first place.)&lt;/p&gt;

&lt;p&gt;I just wanted to talk about a couple of actual real-life scenarios that make me want a standard RDF API:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&quot;http://www.w3.org/People/Berners-Lee/&quot;&gt;TimBL&lt;/a&gt; wants an &lt;a href=&quot;http://www.w3.org/TR/xhtml-rdfa-primer/&quot;&gt;RDFa&lt;/a&gt; parser for &lt;a href=&quot;http://www.w3.org/2005/ajar/tab&quot;&gt;Tabulator&lt;/a&gt;. There are a few RDFa parsers in Javascript; he chooses to use &lt;a href=&quot;http://code.google.com/p/rdfquery/&quot;&gt;rdfQuery&amp;#8217;s&lt;/a&gt;. Tabulator works on top of its own datastore, which has its own interface for inserting data. rdfQuery&amp;#8217;s RDFa parser works on top of its own datastore, which has a different interface for inserting data. To use rdfQuery, TimBL has to either rewrite some of its internal code to call the methods that insert data into Tabulator&amp;#8217;s datastore, or rewrite some of Tabulator&amp;#8217;s internal code to call the methods that query rdfQuery&amp;#8217;s datastore. &lt;strong&gt;The lack of a standard API for RDF has made it harder for TimBL to reuse my code.&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I&amp;#8217;m working on &lt;a href=&quot;http://code.google.com/p/puelia-php/&quot;&gt;Puelia&lt;/a&gt;, which needs to both parse and generate RDF in various ways and uses &lt;a href=&quot;http://code.google.com/p/moriarty/&quot;&gt;Moriarty&lt;/a&gt; to do so. I am editing the code to create triples in an in-memory RDF graph. I want to add a triple with a literal value. I have no idea how to do so, because I haven&amp;#8217;t used Moriarty before, so I have to hunt through its documentation to find the &lt;a href=&quot;http://code.google.com/p/moriarty/wiki/SimpleGraph#add_literal_triple&quot;&gt;&lt;code&gt;add_literal_triple()&lt;/code&gt;&lt;/a&gt; function. &lt;strong&gt;The lack of a standard API for RDF has made it harder for me to use the library.&lt;/strong&gt; If I ever wanted to switch to using some other PHP RDF library, such as &lt;a href=&quot;http://www.aelius.com/njh/easyrdf/&quot;&gt;EasyRDF&lt;/a&gt; or &lt;a href=&quot;http://graphite.ecs.soton.ac.uk/&quot;&gt;Graphite&lt;/a&gt;, for whatever reason, I would have to rewrite substantial parts of Puelia to use the functions provided by that library. &lt;strong&gt;The lack of a standard API for RDF has made Puelia less modular and adaptable.&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For all that the &lt;a href=&quot;http://www.w3.org/DOM/&quot;&gt;W3C XML DOM&lt;/a&gt; seems to be universally reviled as an API for querying and creating XML, it and &lt;a href=&quot;http://www.saxproject.org/&quot;&gt;SAX&lt;/a&gt; mean that people can write XSLT and XProc processors (etc) without writing their own XML parser. They mean that whatever programming language I find myself writing code in, I know that I&amp;#8217;ll be able to use &lt;code&gt;getElementsByTagName()&lt;/code&gt; to get hold of elements with a particular name. They mean that XML parsers have a reason to improve over time, because applications can easily switch to better parsers when they come along. DOM and SAX provide a foundation, a level of standardisation and pluggability, that improves the XML landscape as a whole.&lt;/p&gt;

&lt;p&gt;Of course sometimes components need tighter integration in order to achieve performance benefits; that&amp;#8217;s a modularity/performance judgement on the part of the developer of the application. And of course there are better object model APIs for XML than the W3C XML DOM around. But better APIs are almost always programming-language or library specific; they are better simply because cross-platform APIs like DOM and SAX cannot take full advantage of the idioms of a particular programming language or style.&lt;/p&gt;

&lt;p&gt;Now regarding the W3C&amp;#8217;s involvement in creating such a standard, the argument seems to be &amp;#8220;W3C created the horror that is the XML DOM and therefore every API specification that comes out of the W3C will be horrendous&amp;#8221;. &lt;/p&gt;

&lt;p&gt;I think sometimes that W3C is seen as a kind of monolithic organisation that exists &lt;em&gt;over there&lt;/em&gt;, with secret committees whose work takes place out of public eyes until they deign to let us mere mortals read the results of their machinations. And who then fend off all comments and criticism in order to protect their lovingly crafted (but completely impractical) specifications.&lt;/p&gt;

&lt;p&gt;What this overlooks is that the standards organisation merely provides the framework and administrative support within which groups who are interested in creating a standard can come together. The existing &lt;a href=&quot;http://www.w3.org/2010/02/rdfa/&quot;&gt;RDFa Working Group&amp;#8217;s&lt;/a&gt; &lt;a href=&quot;http://www.w3.org/2010/02/rdfa/wiki/Meetings&quot;&gt;meetings are documented&lt;/a&gt; and &lt;a href=&quot;http://lists.w3.org/Archives/Public/public-rdfa-wg/&quot;&gt;discussion takes place in public&lt;/a&gt; and is open to all. I&amp;#8217;m sure this will continue in the RDF Core Working Group when it is set up.&lt;/p&gt;

&lt;p&gt;It will happen anyway. There is &lt;em&gt;already&lt;/em&gt; work going on with the W3C to create a &lt;a href=&quot;http://www.w3.org/TR/rdfa-api/#the-rdf-interfaces&quot;&gt;standard RDFa API&lt;/a&gt;, out of which, &lt;a href=&quot;http://www.jenitennison.com/blog/node/149#comment-10515&quot;&gt;so I am told&lt;/a&gt;, will arise a Working Draft of an RDF API. From the looks of &lt;a href=&quot;http://www.w3.org/TR/2010/WD-rdfa-api-20100923/&quot;&gt;the most recent Working Draft&lt;/a&gt; I will be able to add a literal triple to a &lt;a href=&quot;http://www.w3.org/TR/2010/WD-rdfa-api-20100923/#data-store&quot;&gt;DataStore&lt;/a&gt; using something like&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$store-&amp;gt;add(
  $store-&amp;gt;createTriple(
    $store-&amp;gt;createBlankNode(&#039;puelia&#039;),
    $store-&amp;gt;createIRI(&#039;http://www.w3.org/2000/01/rdf-schema#label&#039;),
    $store-&amp;gt;createPlainLiteral(&#039;Puelia&#039;, &#039;en&#039;)
  )
);
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;(compared to&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$graph-&amp;gt;add_literal_triple(&#039;_:puelia&#039;, &#039;http://www.w3.org/2000/01/rdf-schema#label&#039;, &#039;Puelia&#039;, &#039;en&#039;)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;in Moriarty). So OK, it needs a bit of work. But these are early days, and from the looks of the &lt;a href=&quot;http://webr3.org/_pvt/rdfa-api&quot;&gt;editor&amp;#8217;s draft&lt;/a&gt; it&amp;#8217;s likely to change quite rapidly.&lt;/p&gt;

&lt;p&gt;W3C&amp;#8217;s standardisation is what we make it; wherever it is done, it is a self-fulfilling prophecy that an API will not be suited to its purpose if the people who would benefit from implementing and using that API don&amp;#8217;t get involved in its design. And to be clear, I am talking to myself more than anyone.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/150#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/31">rdf</category>
 <pubDate>Sat, 04 Dec 2010 08:05:43 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">150 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Priorities for RDF</title>
 <link>http://www.jenitennison.com/blog/node/149</link>
 <description>&lt;p&gt;A couple of weeks ago I did a talk at the &lt;a href=&quot;http://www.w3.org/2010/11/TPAC/PlenaryAgenda&quot;&gt;TPAC Plenary Day&lt;/a&gt; about why RDF hasn&amp;#8217;t had the uptake that it might and what could be done about it.&lt;/p&gt;

&lt;p&gt;I felt quite uncomfortable about doing this for many reasons. The predominant one is that I&amp;#8217;m well aware that the world is made by the people who turn up. It is far far easier to snipe from the sidelines than it is to put in the effort to attend telcons and face-to-face meetings, to engage on mailing lists, to write specifications and implementations and tutorials.&lt;/p&gt;

&lt;p&gt;On the other hand, what I hope is that the perspective of someone who is outside that process, someone who tries to understand and interpret and &lt;em&gt;use&lt;/em&gt; the results of that process, might be valuable. And so I aimed to provide that honestly.&lt;/p&gt;

&lt;p&gt;In that spirit, I&amp;#8217;m going to put my stake in the ground and say that there are three areas where I think W3C should be concentrating its efforts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;standardising (something like) TriG &amp;#8212; Turtle plus named graphs&lt;/li&gt;
&lt;li&gt;standardising an API for the RDF data model&lt;/li&gt;
&lt;li&gt;standardising a path language for RDF that can be used by that API and others for easy access&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;and that it should specifically &lt;em&gt;not&lt;/em&gt; put its efforts into standardising another syntax for RDF based on JSON.&lt;/p&gt;

&lt;!--break--&gt;

&lt;h2&gt;Format Standardisation&lt;/h2&gt;

&lt;p&gt;The first point is that I think we need to decide on a single recommended format for RDF. &lt;/p&gt;

&lt;p&gt;Fundamentally, unlike XML or JSON, RDF is defined first and foremost as a model rather than as a syntax. That means it can be expressed in a number of syntaxes, the most common of which are &lt;a href=&quot;http://www.w3.org/TR/REC-rdf-syntax/&quot;&gt;RDF/XML&lt;/a&gt;, &lt;a href=&quot;http://www.w3.org/TeamSubmission/turtle/&quot;&gt;Turtle&lt;/a&gt; and &lt;a href=&quot;http://www.w3.org/TR/rdf-testcases/#ntriples&quot;&gt;N-Triples&lt;/a&gt; though of course there&amp;#8217;s also &lt;a href=&quot;http://www.w3.org/TR/xhtml-rdfa-primer/&quot;&gt;RDFa&lt;/a&gt;, &lt;a href=&quot;http://n2.talis.com/wiki/RDF_JSON_Specification&quot;&gt;RDF/JSON&lt;/a&gt;, &lt;a href=&quot;http://json-ld.org/&quot;&gt;JSON-LD&lt;/a&gt; and &lt;a href=&quot;http://www.w3.org/DesignIssues/Notation3&quot;&gt;N3&lt;/a&gt; and if you start factoring in named graphs you can add &lt;a href=&quot;http://www.hpl.hp.com/techreports/2004/HPL-2004-56.html&quot;&gt;TriX&lt;/a&gt;, &lt;a href=&quot;http://www4.wiwiss.fu-berlin.de/bizer/TriG/&quot;&gt;TriG&lt;/a&gt; and &lt;a href=&quot;http://sw.deri.org/2008/07/n-quads/&quot;&gt;N-Quads&lt;/a&gt; to the list.&lt;/p&gt;

&lt;p&gt;Except for a few corner cases, it would be perfectly possible to express the same RDF model in any of these syntaxes. Why is this so bad? Surely having choice is a good thing, because publishers can choose an option that fits with their workflows? And aren&amp;#8217;t all these formats generated automatically anyway, such that the same data can be provided in many ways with no overhead?&lt;/p&gt;

&lt;p&gt;Well, no, there are actually two ways in which &lt;strong&gt;having multiple syntaxes makes adoption harder&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;First, &lt;strong&gt;publishers aren&amp;#8217;t always generating data automatically&lt;/strong&gt;; in a number of cases (which I think and hope will grow) RDF data is being generated just like CSV files are, as static documents which are simply published in the same way as other static documents. In these cases, publishers either have to do the research and make a decision about which format to use, or produce the data in multiple formats. This is a particular challenge when people aren&amp;#8217;t convinced they want to generate RDF anyway.&lt;/p&gt;

&lt;p&gt;Second, &lt;strong&gt;toolsets have to handle producing or consuming multiple formats&lt;/strong&gt;. That means more code, more testing and more maintenance on both the production and consumption sides of the equation, all of which raise the implementation burden.&lt;/p&gt;

&lt;p&gt;Of course it&amp;#8217;s natural that during the initial stages of the use of a technology that we should see a variety of patterns of use: there need to be innovations and experiments so that we can find what works and what doesn&amp;#8217;t. But as that technology matures, we need to start bedding down some basics. There need to be agreed foundations that, &lt;em&gt;even if imperfect&lt;/em&gt;, are solid enough for the majority of us to build upon. And we need to exercise some self-restraint to concentrate on doing that building rather than revisiting those decisions.&lt;/p&gt;

&lt;p&gt;We have a number of years of experience now about what formats are easy to understand, to pass around, to create and to process. It is time, I think, to pick one, to get it standardised, to deprecate others and to provide a much cleaner and clearer picture to publishers and consumers.&lt;/p&gt;

&lt;p&gt;Of the formats that we have, the one that fits best with the RDF data model and is simplest for humans to understand is Turtle. But it needs to support named graphs, so that it&amp;#8217;s possible to share the &lt;a href=&quot;http://www.w3.org/TR/rdf-sparql-query/#rdfDataset&quot;&gt;RDF datasets&lt;/a&gt; that are exposed within a SPARQL endpoint, which is why I say W3C should standardise something like TriG.&lt;/p&gt;

&lt;h2&gt;RDF APIs&lt;/h2&gt;

&lt;p&gt;The second point is to work on standardising the APIs that are available for developers who work with RDF. Why standardise APIs? Because it would make accessing RDF easier and more predictable for web developers, who often work across multiple languages and platforms. Developers don&amp;#8217;t really care about syntax &amp;#8212; although having something readable is useful for debugging &amp;#8212; they care about the way in which they get to interact with in-memory structures that hold the data.&lt;/p&gt;

&lt;p&gt;RDF needs an API that exposes its internal model (of literals and resources and triples and graphs and datasets) in a way that isn&amp;#8217;t too onerous for people to use. There are lots and lots of RDF APIs about, within the various parsers that are available for different platforms; the only one that&amp;#8217;s approaching a standard is the one embedded within the &lt;a href=&quot;http://www.w3.org/TR/rdfa-api/&quot;&gt;RDFa API specification&lt;/a&gt;. I would like to see that disentangled from RDFa and for it, or something like it, to gain traction amongst the writers of RDF libraries such as the &lt;a href=&quot;http://librdf.org/&quot;&gt;Redland RDF libraries&lt;/a&gt;, &lt;a href=&quot;http://www.rdflib.net/&quot;&gt;RDFLib&lt;/a&gt;, &lt;a href=&quot;http://code.google.com/p/moriarty/&quot;&gt;Moriarty&lt;/a&gt;, &lt;a href=&quot;https://github.com/tommorris/reddy&quot;&gt;Reddy&lt;/a&gt;, &lt;a href=&quot;http://code.google.com/p/rdfquery/&quot;&gt;rdfQuery&lt;/a&gt; and so on and on.&lt;/p&gt;

&lt;p&gt;But &lt;strong&gt;having an API for RDF&amp;#8217;s data model is not enough&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I think there is a lot that we can learn from XML&amp;#8217;s experience here. James Clark&amp;#8217;s recent blog post about &lt;a href=&quot;http://blog.jclark.com/2010/11/xml-vs-web_24.html&quot;&gt;XML and the web&lt;/a&gt; describes what it&amp;#8217;s like for developers working with XML compared to JSON:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;The fundamental problem is the mismatch between programming language data structures and the XML element/attribute data model of elements. This leaves the developer with three choices, all unappetising:&lt;/p&gt;
  
  &lt;ul&gt;
  &lt;li&gt;live with an inconvenient element/attribute representation of the data;&lt;/li&gt;
  &lt;li&gt;descend into XML Schema hell in the company of your favourite data binding tool;&lt;/li&gt;
  &lt;li&gt;write reams of code to convert the XML into a convenient data structure.&lt;/li&gt;
  &lt;/ul&gt;
  
  &lt;p&gt;By contrast with JSON, especially with a dynamic programming language, you can get a reasonable in-memory representation just by calling a library function.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So JSON is popular because accessing information within the JSON is really easy. And that&amp;#8217;s for two reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;it&amp;#8217;s parsed with a single simple function call in a common library&lt;/li&gt;
&lt;li&gt;the result of parsing is simple to navigate; typically you can do so using native methods such as dot-notation paths&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The first of these is a simple matter of winning hearts and minds. The second is the important one: it&amp;#8217;s easy to use because the underlying JSON model fits neatly onto the object-oriented programming paradigm that most developers use.&lt;/p&gt;

&lt;p&gt;XML isn&amp;#8217;t so popular among web developers because its underlying model doesn&amp;#8217;t fit well into most programming languages: it has attributes and mixed content and a whole bunch of other things that don&amp;#8217;t map straight-forwardly onto objects-with-properties. Navigating through XML (or HTML) structures using a DOM is tedious and automatic binding mostly doesn&amp;#8217;t work.&lt;/p&gt;

&lt;p&gt;What about RDF? On the face of it, RDF is a good fit with object-oriented models; they both follow a basic entity-attribute-value approach. However, there are (at least) three things in RDF that do not fit with the object-oriented model:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Properties in RDF are identified through URIs rather than simple names (ie those containing just letters and numbers and underscores). Some programming languages, such as Javascript, let you have properties that aren&amp;#8217;t simple names, but you then have to access them through the relatively clunky &lt;code&gt;[]&lt;/code&gt; notation rather than dot-notation paths. Properties are first-class objects in RDF with things like labels and ranges and inverses; fitting with standard programming languages here means using &lt;a href=&quot;http://en.wikipedia.org/wiki/Reflection_(computer_science)&quot;&gt;reflection&lt;/a&gt; and having the ability to annotate fields, and everything gets a bit mind-bending.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Values in RDF often have datatypes or languages associated with them, and the set of datatypes that you can use is completely extensible (and of course datatypes are first-class objects, with their own properties, too). This wouldn&amp;#8217;t be so bad except that making every value an object means comparisons with basic strings or numbers won&amp;#8217;t generally work.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In RDF, a property can have more than one value (an unordered bag of values, if you like) or it can have a value that is a &lt;code&gt;rdf:List&lt;/code&gt; (an ordered sequence of values); it can even have many values which are &lt;code&gt;rdf:List&lt;/code&gt;s. On the other hand, the object-oriented model generally supports values that are arrays (and of course you can have arrays within arrays), which are always ordered. So there is always a choice to be made when mapping from an object-oriented model to RDF, about whether the values should be at the same level or be &lt;code&gt;rdf:List&lt;/code&gt;s.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In other words, just as with XML, &lt;strong&gt;there is no straight-forward mapping from RDF to an object structure that developers can immediately use&lt;/strong&gt;. That doesn&amp;#8217;t stop us trying, of course:&lt;/p&gt;

&lt;p&gt;There&amp;#8217;s the approach that we use within the &lt;a href=&quot;http://code.google.com/p/linked-data-api/&quot;&gt;linked data API&lt;/a&gt; and elsewhere, which is to make it easy for data publishers to create simple JSON versions of the RDF they publish. A website-specific configuration file determines what the mapping looks like. URIs of properties get turned into readable names (and provide the map back to the original property URI, so that it&amp;#8217;s possible to get more information about the property). Datatypes and languages are ignored by default (but mapped onto structured values if so configured). And the distinction between properties-with-multiple-values and properties-whose-value-is-a-List is ignored. We purposefully lose some of RDF&amp;#8217;s expressivity and power in order to gain usability. You can see the result in action at &lt;a href=&quot;http://education.data.gov.uk/doc/school&quot;&gt;http://education.data.gov.uk/doc/school&lt;/a&gt; and &lt;a href=&quot;http://education.data.gov.uk/doc/department&quot;&gt;http://education.data.gov.uk/doc/department&lt;/a&gt; for example.&lt;/p&gt;

&lt;p&gt;There&amp;#8217;s the approach that Nathan has taken within &lt;a href=&quot;https://github.com/webr3/js3&quot;&gt;js3&lt;/a&gt; which is to create libraries that, with a bit of work on the client side, give a way of mapping RDF into an object-oriented structure which is easy to manipulate (or vice versa, create RDF from OO structures). It&amp;#8217;s the same basic principle as helping publishers to generate JSON, but the interpretation and mapping is done by the client rather than the publisher. The work that Nathan&amp;#8217;s done to manage this in Javascript is very impressive; I don&amp;#8217;t know whether the same approach can be mapped to other languages.&lt;/p&gt;

&lt;p&gt;But as James intimated, data binding is hairy and scary. &lt;strong&gt;Mappings between different data models are always imperfect, lossy, sensitive to what seem like small changes and therefore hard to maintain&lt;/strong&gt;. I remember nodding along as Mike Kay talked about this at the XML Summer School in relation to the use of XML: the horrors of working with systems in which there three way maps between relational and object-oriented and XML structures, and the relief that comes with working with an XML-only architecture. I suspect this same observation is one of the drivers behind the growth of JSON databases.&lt;/p&gt;

&lt;p&gt;On the other hand, you know, perhaps RDF is close enough to the object-oriented model that it won&amp;#8217;t be so bad. Perhaps we could find a way to standardise on a method of configuring applications that do the mapping, such as defining short names for properties, describing how to handle objects with datatypes and languages and so on. We have &lt;a href=&quot;http://code.google.com/p/linked-data-api/wiki/JSONFormats&quot;&gt;a body of experience&lt;/a&gt; that can be learnt from, including the ones above, and perhaps it can be tied into the &lt;a href=&quot;http://www.w3.org/TR/r2rml/&quot;&gt;RDB-to-RDF&lt;/a&gt; work too. The biggest challenge, I suspect, will be to create something round-trippable.&lt;/p&gt;

&lt;h2&gt;Path Languages&lt;/h2&gt;

&lt;p&gt;The other option that James didn&amp;#8217;t mention but that I touched on in my TPAC talk is to learn from how working with HTML and XML has been made easier in libraries such as &lt;a href=&quot;http://jquery.org/&quot;&gt;jQuery&lt;/a&gt; or &lt;a href=&quot;http://hpricot.com/&quot;&gt;hpricot&lt;/a&gt;. These libraries still allow the HTML and XML to be accessed through a DOM, rather than mapping HTML or XML into object structures, but &lt;strong&gt;make the lives of developers simpler by supporting querying of the HTML/XML using path languages that are &lt;em&gt;designed&lt;/em&gt; to be used to query those kinds of structures&lt;/strong&gt;. For HTML, that&amp;#8217;s CSS; for XML that&amp;#8217;s XPath. (It&amp;#8217;s the same approach as is used for strings: we use regular expressions for many operations rather than working with them at the character level.) Path languages work over the native model; all that&amp;#8217;s offered in the library are functions that take strings (holding the path language) and return objects or values as appropriate.&lt;/p&gt;

&lt;p&gt;I don&amp;#8217;t know exactly what it looks like, and it might already be out there (the world moves fast and I know I&amp;#8217;m not aware of everything), but what I think we need is a path language for navigating around RDF, probably based on &lt;a href=&quot;http://www.w3.org/TR/sparql11-query/#propertypaths&quot;&gt;SPARQL property paths&lt;/a&gt; or the &lt;a href=&quot;http://www.w3.org/2005/04/fresnel-info/fsl/&quot;&gt;FRESNEL selector language&lt;/a&gt; and an API (or APIs) that uses it. For example, something that lets developers use code like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;graph.find(&quot;*[foaf:nick = &#039;web3r&#039;]/foaf:name&quot;)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;to pick values out of an in-memory graph. In my opinion, something like this would be much more likely to bring benefits than a data binding approach.&lt;/p&gt;

&lt;h2&gt;Why Not RDF in JSON?&lt;/h2&gt;

&lt;p&gt;What I&amp;#8217;ve tried to explain above is firstly that we already have too many syntaxes for RDF, and secondly that the main barrier to developers using RDF is the way in which they are forced to interact with that RDF once they have hold of it, not the syntax itself. The syntax that we use for RDF really doesn&amp;#8217;t matter, because developers interact with the in-memory dataset, not directly on the syntax.&lt;/p&gt;

&lt;p&gt;Nathan&amp;#8217;s recent post on &lt;a href=&quot;http://webr3.org/blog/linked-data/opening-linked-data/&quot;&gt;Opening Linked Data&lt;/a&gt;, which is worth reading in its entirety, captures the essence of the issue:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;You can&amp;#8217;t shoe horn RDF in to JSON, no matter how hard you try - well, you can, but you loose all the benefits of JSON in the first place, because the data is RDF, triples and not objects, rdf nodes and not simple values&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In other words, &lt;strong&gt;using JSON as the basis for an RDF syntax doesn&amp;#8217;t actually win you anything in terms of the ease of processing of that RDF&lt;/strong&gt;. In fact, I&amp;#8217;ll go further and say it has exactly the same bad qualities as RDF/XML.&lt;/p&gt;

&lt;p&gt;One of the bad things about RDF/XML is that it misleads people into thinking they can use normal XML tooling to process RDF, but XML tooling exposes the XML tree, not the RDF graph that they need. It&amp;#8217;s good enough in some circumstances, of course, but it&amp;#8217;s not working with RDF as RDF. Similarly, just because you&amp;#8217;re using XML tools doesn&amp;#8217;t mean RDF/XML is easy to generate; you&amp;#8217;re a lot safer to generate correct RDF/XML from an in-memory graph, in the same way as generating XML using string manipulation is harder work than it first appears.&lt;/p&gt;

&lt;p&gt;In exactly the same way, I think that a JSON-based syntax for RDF will mislead developers into thinking that they can interpret and generate that JSON like they can normal JSON, and interact with it at that level, when this simply isn&amp;#8217;t the case.&lt;/p&gt;

&lt;p&gt;The only advantage that I can see for a JSON-based RDF syntax is equivalent to the only advantage of RDF/XML: it is easier to store for people who use JSON databases, just as RDF/XML is easier to store for people who use XML databases. I am not sure that benefit is worth the cost of an additional RDF syntax; isn&amp;#8217;t RDF better stored in a triplestore?&lt;/p&gt;

&lt;h2&gt;Summary&lt;/h2&gt;

&lt;p&gt;So to reiterate, as far as I&amp;#8217;m concerned, W3C and the RDF community should be concentrating on a syntax for RDF that doesn&amp;#8217;t come saddled with those kinds of assumptions, which I think is Turtle + graphs; something like TriG. They should be concentrating on developing a standard API for RDF access that has a chance of adoption among the developers of RDF libraries, and on working out what parts of SPARQL and FRESNEL could be used to create a path language that could be reused in several contexts, including within such an API. And these should be done in preference to a RDF syntax in JSON which doesn&amp;#8217;t solve the core problems, and in fact just adds another syntax to the mix.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/149#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/46">linked data</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/31">rdf</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/60">tpac2010</category>
 <pubDate>Sun, 28 Nov 2010 21:44:52 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">149 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Translating Existing Models to RDF</title>
 <link>http://www.jenitennison.com/blog/node/142</link>
 <description>&lt;p&gt;As we encourage linked data adoption within the UK public sector, something we run into again and again is that (unsurprisingly) particular domain areas have pre-existing standard ways of thinking about the data that they care about. There are existing models, often with multiple serialisations, such as in XML and a text-based form, that are supported by existing tool chains.&lt;/p&gt;

&lt;p&gt;In contrast, if there is existing RDF in that domain area, it&amp;#8217;s usually been designed by people who are more interested in the RDF than in the domain area, and is thus generally more focused on the goals of the typical casual data re-user rather than the professionals in the area.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;To give an example, the international statistics community uses &lt;a href=&quot;http://sdmx.org&quot;&gt;SDMX&lt;/a&gt; for representing and exchanging statistics (and a lot more besides; it&amp;#8217;s a huge standard). SDMX includes a well-thought through model for statistical datasets and the observations within them, as well as standard concepts for things like gender, age, unit multipliers and so on. By comparison, &lt;a href=&quot;http://sw.joanneum.at/scovo/schema.html&quot;&gt;SCOVO&lt;/a&gt;, the main RDF model for representing statistics, barely scratches the surface in comparison.&lt;/p&gt;

&lt;p&gt;This isn&amp;#8217;t the only example: the &lt;a href=&quot;http://inspire.jrc.ec.europa.eu/&quot;&gt;INSPIRE Directive&lt;/a&gt; defines how geographic information must be made available. &lt;a href=&quot;http://www.gigateway.org.uk/metadata/standards.html&quot;&gt;GEMINI&lt;/a&gt; defines the kind of geospatial metadata that that community cares about. The &lt;a href=&quot;http://openprovenance.org/&quot;&gt;Open Provenance Model&lt;/a&gt; is the result of many contributors from multiple fields, and again has a number of serialisations.&lt;/p&gt;

&lt;p&gt;You could view this as a challenge: experts in their domains already have models and serialisations for the data that they care about; how can we persuade them to adopt an RDF model and serialisations instead?&lt;/p&gt;

&lt;p&gt;But that&amp;#8217;s totally the wrong question. Linked data doesn&amp;#8217;t, can&amp;#8217;t and won&amp;#8217;t replace existing ways of handling data. But it has got some interesting features that can bring great benefit to people who want to publish their data, namely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;web-scale addresses&lt;/strong&gt; &amp;#8212; being able to name and refer to things like individual observations in a statistical hypercube, a particular road junction, or the particular process that led to something being created&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;annotation&lt;/strong&gt; &amp;#8212; the ability to record metadata about everything that you can name, which is everything!&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;distributed publication&lt;/strong&gt; &amp;#8212; enabling multiple publishers to control the publication of their data without having to upload it to a central location&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;links&lt;/strong&gt; &amp;#8212; the joining of information to other information, providing more context, supporting more queries and reducing the requirement for duplication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The question is really about how to enable people to reap these benefits; the answer, because HTTP-based addressing and typed linkage is usually hard to introduce into existing formats, is usually to publish data using an RDF-based model alongside existing formats. This might be done by generating an RDF-based format (such as RDF/XML or Turtle) as an alternative to the standard XML or HTML, accessible via content negotiation, or by providing a &lt;a href=&quot;http://www.w3.org/TR/grddl/&quot;&gt;GRDDL&lt;/a&gt; transformation that maps an XML format into RDF/XML.&lt;/p&gt;

&lt;p&gt;Either way, the underlying model needs to be mapped into RDF. We&amp;#8217;re furthest down this road with &lt;a href=&quot;http://groups.google.com/group/publishing-statistical-data&quot;&gt;statistical data&lt;/a&gt;. I wanted to explore here what it might look like for the Open Provenance Model, building on lessons learned from the statistical domain.&lt;/p&gt;

&lt;h2&gt;Open Provenance Model&lt;/h2&gt;

&lt;p&gt;The Open Provenance Model talks about three main &lt;strong&gt;nodes&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;artifacts&lt;/strong&gt;, which are the things that are produced or used by processes&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;processes&lt;/strong&gt;, which are actions that are performed using or producing artifacts&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;agents&lt;/strong&gt;, which are the people or systems that perform actions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and five kinds of &lt;strong&gt;edges&lt;/strong&gt; that can be defined between them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;process A &lt;strong&gt;used&lt;/strong&gt; artifact B&lt;/li&gt;
&lt;li&gt;artifact A &lt;strong&gt;was generated by&lt;/strong&gt; process B&lt;/li&gt;
&lt;li&gt;process A &lt;strong&gt;was controlled by&lt;/strong&gt; agent B&lt;/li&gt;
&lt;li&gt;process A &lt;strong&gt;was triggered by&lt;/strong&gt; process B&lt;/li&gt;
&lt;li&gt;artifact A &lt;strong&gt;was derived from&lt;/strong&gt; artifact B&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then things start getting more complicated. OPM indicates that each artifact and agent plays a different &lt;strong&gt;role&lt;/strong&gt; when it is used by, generated by or controls a process. What&amp;#8217;s more, each artifact and agent might be involved in the process at different &lt;strong&gt;times&lt;/strong&gt; (though timing information is optional within OPM). And a given provenance graph may contain several &lt;strong&gt;accounts&lt;/strong&gt; of how artifacts, processes and agents fit together.&lt;/p&gt;

&lt;h2&gt;Existing Mapping to RDF&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;http://openprovenance.org/model/opm.owl&quot;&gt;OWL ontology for OPM&lt;/a&gt; for OPM is a very literal mapping of OPM into RDF. Each of the types of nodes is a separate class, and each of the types of edges is a separate class. Thus, it introduces a lot of n-ary relationships. Take a really simple example of an XML file being transformed into HTML using XSLT. With the OPM ontology, the RDF would look something like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;_:transformation a opm:Process .
&amp;lt;doc.html&amp;gt; a opm:Artifact .
&amp;lt;doc.xml&amp;gt; a opm:Artifact .
&amp;lt;doc.xsl&amp;gt; a opm:Artifact .
_:processor a opm:Agent .
_:Jeni a opm:Agent .

_:stylesheetLink a opm:Used ;
  opm:effect _:transformation ;
  opm:cause &amp;lt;doc.xml&amp;gt; ;
  opm:role eg:xsltSource .

_:sourceLink a opm:Used ;
  opm:effect _:transformation ;
  opm:cause &amp;lt;doc.xsl&amp;gt; ;
  opm:role eg:xsltStylesheet .

_:resultLink a opm:WasGeneratedBy ;
  opm:effect &amp;lt;doc.html&amp;gt; ;
  opm:cause _:transformation ;
  opm:role eg:xsltResult .

_:processorLink a opm:WasControlledBy ;
  opm:effect _:transformation ;
  opm:cause _:processor ;
  opm:role xslt:processor .

_:userLink a opm:WasControlledBy ;
  opm:effect _:transformation ;
  opm:cause _:Jeni ;
  opm:role xslt:user .

_:derivation a opm:WasDerivedFrom ;
  opm:effect &amp;lt;doc.html&amp;gt; ;
  opm:cause &amp;lt;doc.xml&amp;gt; .

xslt:source a opm:Role ;
  opm:value &quot;source&quot; .

xslt:stylesheet a opm:Role ;
  opm:value &quot;stylesheet&quot; .

xslt:result a opm:Role ;
  opm:value &quot;result&quot; .

xslt:processor a opm:Role ;
  opm:value &quot;processor&quot; .

xslt:user a opm:Role ;
  opm:value &quot;user&quot; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To give you an idea of what this mapping means, if I wanted to work out who created &lt;code&gt;doc.html&lt;/code&gt;, I would have to do a query like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT ?who
WHERE {
  ?generatedBy 
    opm:cause &amp;lt;doc.html&amp;gt; ;
    opm:role xslt:result ;
    opm:effect ?transformation .
  ?controlledBy
    opm:effect ?transformation ;
    opm:role xslt:user ;
    opm:cause ?who .
}
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Some Observations&lt;/h2&gt;

&lt;p&gt;There are two things that I want to pull out about the RDF mapping described above.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it&amp;#8217;s incredibly literal; every entity type within the model is mapped onto an RDF class, including the edges, the roles and the accounts (which I didn&amp;#8217;t show above)&lt;/li&gt;
&lt;li&gt;it doesn&amp;#8217;t reuse any existing vocabularies, even when they might help (such as for the &amp;#8216;value&amp;#8217; of a role, which is really a label)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It reminds me of the mapping of object-oriented or relational data models into each other or into XML, which often result in a god awful mess and people swearing that technology X is goddamned ugly. &lt;/p&gt;

&lt;p&gt;The fact is that elegant uses of each modelling paradigm &amp;#8212; ones that are easy to understand and efficient to query &amp;#8212; always take advantage of the unique features of that paradigm. For example, good XML vocabularies take advantage of the distinctions between attributes and elements, of nesting and hierarchies, and of the ability to hold mixed content.&lt;/p&gt;

&lt;p&gt;It&amp;#8217;s the same with RDF. There are four features of RDF that I think good vocabularies will take suitable advantage of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;existing vocabularies&lt;/li&gt;
&lt;li&gt;inheritance&lt;/li&gt;
&lt;li&gt;shortcuts and reasoning&lt;/li&gt;
&lt;li&gt;named graphs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Reusing existing vocabularies&lt;/strong&gt; takes advantage of the ease of bringing together diverse domains within RDF, and it makes data more reusable. For example, an OPM mapping that encourages the reuse of FOAF for people and organisations saves time and effort for the developers of the OPM RDF vocabulary, that they would otherwise have spent modelling the details of agents; and it means that any agents that are described within the description of a piece of provenance are automatically available as agents in the wider FOAF cloud. The same goes for using DOAP to describe software.&lt;/p&gt;

&lt;p&gt;By reusing vocabularies, the data isn&amp;#8217;t isolated any more, locked within a single context designed for a single use. This is a huge benefit of the linked data approach and it makes sense to leverage it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Using inheritance&lt;/strong&gt; means creating general purpose classes and properties and encouraging other people to use &lt;code&gt;rdfs:subClassOf&lt;/code&gt; or &lt;code&gt;rdfs:subPropertyOf&lt;/code&gt; to specialise them according to their own requirements. Within OPM, the different roles that artifacts and agents might play in a process is a natural fit with either sub-properties or sub-classes, depending on how the edges in the model are represented. For example, rather than&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;_:stylesheetLink a opm:Used ;
  opm:effect _:transformation ;
  opm:cause &amp;lt;doc.xsl&amp;gt; ;
  opm:role eg:xsltStylesheet .

xslt:stylesheet a opm:Role ;
  opm:value &quot;stylesheet&quot; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;you could generate data that looked like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;_:stylesheetLink a xslt:Stylesheet ;
  opm:effect _:transformation ;
  opm:cause &amp;lt;doc.xsl&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;where &lt;code&gt;xslt:Stylesheet&lt;/code&gt; is defined as a subclass of &lt;code&gt;opm:Used&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Inheritance is a basic form of &lt;strong&gt;reasoning&lt;/strong&gt;. In the case of the subclass relationship outlined above, the reasoning is that anything that is a &lt;code&gt;xslt:Stylesheet&lt;/code&gt; is also a &lt;code&gt;opm:Used&lt;/code&gt;, and thus:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;_:stylesheetLink a xslt:Stylesheet .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;implies&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;_:stylesheetLink a xslt:Used .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Taking the scenario where you&amp;#8217;re doing native linked data publishing &amp;#8212; storing data in a triplestore and then publishing it out from there &amp;#8212; you have two choices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you can store just the basic data, and let the application retrieving it carry out whatever reasoning is necessary to derive the information they need; this limits the size of the triplestore, but can place a large burden on people using it &amp;#8212; either they have to be very familiar with the exact choices made in modelling the basic data, or they have to construct complex SPARQL queries that take account of the fact that the data might be modelled in many different ways&lt;/li&gt;
&lt;li&gt;you can store not only the basic data but also anything that can be derived from it; this increases the number of triples you have to store, but means that people can query it without having to perform any reasoning themselves&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The latter is obviously the more user-friendly approach. (And a triplestore could make it easy by understanding and applying schemas, ontologies and rules as data is loaded in.)&lt;/p&gt;

&lt;p&gt;To take a more complex example, provenance could be modelled in a much more direct way, such as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;doc.html&amp;gt; a opm:Artifact ;
  opm:derivedFrom &amp;lt;doc.xml&amp;gt; ;
  opm:generatedBy [
    xslt:source &amp;lt;doc.xml&amp;gt; ;
    xslt:stylesheet &amp;lt;doc.xsl&amp;gt; ;
    xslt:processor _:processor ;
    xslt:user _:Jeni ;
  ] .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;where &lt;code&gt;xslt:source&lt;/code&gt; and &lt;code&gt;xslt:stylesheet&lt;/code&gt; are sub-properties of a property called &lt;code&gt;opm:used&lt;/code&gt;, and &lt;code&gt;xslt:processor&lt;/code&gt; and &lt;code&gt;xslt:user&lt;/code&gt; are sub-properties of &lt;code&gt;opm:controlledBy&lt;/code&gt;. This removes the n-ary properties, which (given the use of inheritance to represent roles) are only actually needed if the model needs to capture the timing of the involvement of particular artifacts or agents within a process, and makes the provenance information much easier to query than before:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT ?who
WHERE {
  &amp;lt;doc.html&amp;gt; opm:generatedBy ?transformation .
  ?transformation xslt:user ?who .
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;But what if we also want to support the more complex, n-ary-relation-based models? We would need to assert, somehow, a rule that said that the presence of a &lt;code&gt;opm:controlledBy&lt;/code&gt; relationship from a process to an agent was equivalent to having a &lt;code&gt;opm:WasControlledBy&lt;/code&gt; instance with a &lt;code&gt;opm:cause&lt;/code&gt; pointing to the agent and an &lt;code&gt;opm:effect&lt;/code&gt; pointing to the process. Combine this with &lt;code&gt;xslt:user&lt;/code&gt; being sub-property of &lt;code&gt;opm:controlledBy&lt;/code&gt; and you have the statement:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;_:transformation xslt:user _:Jeni .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;implying:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;_:transformation opm:controlledBy _:Jeni .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;which in turn implies:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;[] a opm:WasControlledBy ;
  opm:effect _:transformation ;
  opm:cause _:Jeni .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The same reasoning could be applied in the opposite direction, of course. Part of the definition of the use of OPM in RDF could be that the presence of a &lt;code&gt;opm:WasControlledBy&lt;/code&gt; with a &lt;code&gt;opm:cause&lt;/code&gt; pointing to an agent and &lt;code&gt;opm:effect&lt;/code&gt; pointing to a process implies a &lt;code&gt;opm:controlledBy&lt;/code&gt; link between the &lt;code&gt;opm:effect&lt;/code&gt; and the &lt;code&gt;opm:cause&lt;/code&gt;. Whichever was used in the initial modelling of the data, the same query could be used to query the data (accepting some loss of precision along the way, but if you&amp;#8217;re not interesting in timing information then why should you suffer the cost of querying through n-ary relations?).&lt;/p&gt;

&lt;p&gt;The final thing that I mentioned above that mappings from existing models to RDF should take advantage of is &lt;strong&gt;named graphs&lt;/strong&gt;. In OPM, the obvious way that named graphs could play a role is in providing support for the different &lt;em&gt;accounts&lt;/em&gt; of provenance. Separate named graphs could be used to represent separate accounts, referencing the same artifacts, agents and processes where appropriate. Individually, the graphs can remain simple; together, you have the full power of OPM.&lt;/p&gt;

&lt;h2&gt;Conclusions&lt;/h2&gt;

&lt;p&gt;Modelling is a complex design activity, and you&amp;#8217;re best off avoiding doing it if you can. That means reusing conceptual models that have been built up for a domain as much as possible and reusing existing vocabularies wherever you can. But you can&amp;#8217;t and shouldn&amp;#8217;t try to avoid doing design when mapping from a conceptual model to a particular modelling paradigm such as a relational, object-oriented, XML or RDF model.&lt;/p&gt;

&lt;p&gt;If you&amp;#8217;re mapping to RDF, remember to take advantage of what it&amp;#8217;s good at such as web-scale addressing and extensibility, and always bear in mind how easy or difficult your data will be to query. There is no point publishing linked data if it is unusable.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/142#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/46">linked data</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/57">modelling</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/58">provenance</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/31">rdf</category>
 <pubDate>Sat, 13 Mar 2010 20:35:46 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">142 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Creating Linked Data - Part V: Finishing Touches</title>
 <link>http://www.jenitennison.com/blog/node/139</link>
 <description>&lt;p&gt;This is the fifth part in this series about creating linked data. I&amp;#8217;ve talked previously about &lt;a href=&quot;http://www.jenitennison.com/blog/node/135&quot;&gt;analysis and modelling&lt;/a&gt;, &lt;a href=&quot;http://www.jenitennison.com/blog/node/136&quot;&gt;defining URIs&lt;/a&gt;, &lt;a href=&quot;http://www.jenitennison.com/blog/node/137&quot;&gt;defining concept schemes&lt;/a&gt; and &lt;a href=&quot;http://www.jenitennison.com/blog/node/138&quot;&gt;defining a vocabulary&lt;/a&gt;. In this instalment I&amp;#8217;ll talk about the finishing touches that can make linked data easier to browse, query, locate and trust.&lt;/p&gt;

&lt;p&gt;Note that we don&amp;#8217;t &lt;em&gt;have&lt;/em&gt; to do any of these things; they&amp;#8217;re not part of the core data. We shouldn&amp;#8217;t beat ourselves up if we don&amp;#8217;t have time to do it right now, because we can always add them later, and it might be that you just don&amp;#8217;t agree that they should be done. But many of them don&amp;#8217;t take a lot of time and can enhance the user&amp;#8217;s experience of the data.&lt;/p&gt;

&lt;!--break--&gt;

&lt;h2&gt;Labels&lt;/h2&gt;

&lt;p&gt;Every resource should have a label, even blank nodes. Adding labels makes it easier for people to generate HTML views from the data. Sometimes we have resources that have an obvious label (like the name of a local authority); at other times, the label needs to be constructed based on the other information that&amp;#8217;s available about the resource.&lt;/p&gt;

&lt;p&gt;I talked in the last instalment about &lt;code&gt;skos:prefLabel&lt;/code&gt; (preferred label), &lt;code&gt;skos:altLabel&lt;/code&gt; (alternative label) and &lt;code&gt;rdfs:label&lt;/code&gt;. Technically, &lt;code&gt;skos:prefLabel&lt;/code&gt; and &lt;code&gt;skos:altLabel&lt;/code&gt; are sub properties of &lt;code&gt;rdfs:label&lt;/code&gt;, which means that if a resource has a &lt;code&gt;skos:prefLabel&lt;/code&gt; it also has a &lt;code&gt;rdfs:label&lt;/code&gt; with that value. However, drawing that conclusion requires either built-in knowledge of SKOS or the ability to both automatically get hold of the SKOS ontology and reason with it, which is feasible (this is one of the advantages of RDF, after all), but adds an extra hurdle for people wanting to use your data.&lt;/p&gt;

&lt;p&gt;So it&amp;#8217;s best to give everything a &lt;code&gt;rdfs:label&lt;/code&gt;, even if they already have a &lt;code&gt;skos:prefLabel&lt;/code&gt; or &lt;code&gt;skos:altLabel&lt;/code&gt;. It&amp;#8217;s also good to try to imagine that label in the context of having no other information about the thing that it&amp;#8217;s labelling, such as in the title of a page. For example, if you&amp;#8217;re looking at the observation &lt;code&gt;http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00/type/bicycle&lt;/code&gt; in the context of a traffic count, it may seem sensible to label it just &amp;#8220;bicycle&amp;#8221; (as I did in the first iteration of turning this traffic count data into RDF). But without that context, it makes no sense. Better to label it &amp;#8220;Bicycles - 8 Oct 2001 17:00-18:00 - East - Salterton Road, EAST OF DINAN WAY, EXMOUTH&amp;#8221; and provide an even more descriptive &lt;code&gt;rdfs:comment&lt;/code&gt; like &amp;#8220;Number of bicycles counted travelling East at Salterton Road, EAST OF DINAN WAY, EXMOUTH on 8 October 2001 between 17:00 and 18:00.&amp;#8221;.&lt;/p&gt;

&lt;h2&gt;Datasets&lt;/h2&gt;

&lt;p&gt;There are two kinds of datasets that are applicable to this particular &amp;#8230;err&amp;#8230; set of data &amp;#8230; and that we should describe within the RDF. They are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;datasets that are sets of statistical data items (such as the observations in the traffic count data); these are best described using &lt;a href=&quot;http://sw.joanneum.at/scovo/schema.html&quot;&gt;SCOVO&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;datasets that are general descriptions of particular sets of linked data (such as roads or local authorities); these are best described using &lt;a href=&quot;http://semanticweb.org/wiki/VoiD&quot;&gt;voiD&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both kinds of datasets can be identified for UK government data using URIs in the form:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://{sector}.data.gov.uk/set/{dataset}/
&lt;/code&gt;&lt;/pre&gt;

&lt;h3&gt;SCOVO Datasets&lt;/h3&gt;

&lt;p&gt;Every &lt;code&gt;scovo:Item&lt;/code&gt; should be part of a &lt;code&gt;scovo:Dataset&lt;/code&gt;, associated through a &lt;code&gt;scovo:dataset&lt;/code&gt; (and a reverse &lt;code&gt;scovo:datasetOf&lt;/code&gt;). A &lt;code&gt;scovo:Dataset&lt;/code&gt; is pretty simple: all you really need to do is give it an identifier and, of course, a label. In this case, something like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://transport.data.gov.uk/set/traffic-count/2001-2008/
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is an identifier that the various &lt;code&gt;scovo:Item&lt;/code&gt;s should use to indicate where the data comes from:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00/type/bicycle&amp;gt;
  a scovo:Item ;
  scovo:dataset &amp;lt;http://transport.data.gov.uk/set/traffic-count/2001-2008/&amp;gt; ;
  traffic:count &amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00&amp;gt; ;
  traffic:vehicleType &amp;lt;http://transport.data.gov.uk/def/vehicle/bicycle&amp;gt; ;
  rdf:value 2 .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It&amp;#8217;s also an identifier that we can attach some metadata to. Obviously it needs a label, but we can also attach other metadata, such as the &lt;a href=&quot;http://www.jenitennison.com/blog/node/133&quot;&gt;provenance of the dataset&lt;/a&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/set/traffic-count/2001-2008/&amp;gt;
  a scovo:Dataset ;
  a prv:DataItem ;
  rdfs:label &quot;Traffic counts between 2001 and 2008&quot;@en ;
  prv:createdBy [
    a prv:DataCreation ;
    prv:performedAt ... ;
    prv:performedBy ... ;
    prv:usedData ... ;
    prv:usedGuideline ... ;
  ] .
&lt;/code&gt;&lt;/pre&gt;

&lt;h3&gt;VoiD Datasets&lt;/h3&gt;

&lt;p&gt;VoiD is designed to be used to describe sets of linked data, their contents, their provenance and their relationships with each other. There are many ways of dividing up the data that we&amp;#8217;ve been looking at into datasets. We can start with a simple example: the dataset containing linked data about countries:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://statistics.data.gov.uk/set/country/&amp;gt;
  a void:Dataset ;
  rdfs:label &quot;Countries&quot;@en ;
  foaf:homepage &amp;lt;http://statistics.data.gov.uk/set/country&amp;gt; ;
  dct:subject &amp;lt;http://dbpedia.org/resource/Country&amp;gt; ;
  cc:license [
    a cc:License ;
    rdfs:label &quot;data.gov.uk Licence&quot;@en ;
    foaf:homepage &amp;lt;http://data.hmg.gov.uk/terms-privacy&amp;gt; ;
    cc:permits cc:DerivativeWorks, cc:Distribution, cc:Reproduction ;
    cc:requires cc:Attribution ;
  ] ;
  void:exampleResource &amp;lt;http://statistics.data.gov.uk/id/country?name=England&amp;gt; ;
  void:sparqlEndpoint &amp;lt;http://services.data.gov.uk/statistics/sparql&amp;gt; ;
  void:uriRegexPattern &quot;http://statistics.data.gov.uk/id/country?name=.+&quot;^^xs:string ;
  void:vocabulary &amp;lt;http://statistics.data.gov.uk/def/administrative-geography/&amp;gt; ;
  void:vocabulary &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This provides a link to a home page for the dataset, which should contain information about the dataset itself. (Accessing the URI for the dataset should also redirect users to this home page.) I&amp;#8217;ve used the same URI as the dataset URI but without the slash at the end. (This is probably too subtle a difference between URIs; we don&amp;#8217;t currently have official guidance for URIs for documents-about-datasets or documents-about-definitions.)&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;void:exampleResource&lt;/code&gt; property can be used to point to resources that can act as starting points for exploring the data, and the &lt;code&gt;void:sparqlEndpoint&lt;/code&gt; property points at a SPARQL endpoint that can be used for deeper querying. The &lt;code&gt;void:uriRegexPattern&lt;/code&gt; property provides a regular expression for the URIs that are used to identify the resources that the dataset is about. &lt;code&gt;void:vocabulary&lt;/code&gt; points to the vocabularies that the dataset uses.&lt;/p&gt;

&lt;p&gt;Various &lt;a href=&quot;http://dublincore.org/documents/dcmi-terms/&quot;&gt;Dublin Core&lt;/a&gt; properties can be used to provide metadata about the dataset, such as its subject matter. The &lt;a href=&quot;http://creativecommons.org/ns&quot;&gt;Creative Commons schema&lt;/a&gt; provides a way of indicating the licence that the dataset is made available under, which is essential information to enable reuse. (I&amp;#8217;ve derived some RDF about the licence here from the one &lt;a href=&quot;http://data.hmg.gov.uk/terms-privacy&quot;&gt;described on the data.hmg.gov.uk pages&lt;/a&gt;; there should be an official version some time soon.)&lt;/p&gt;

&lt;p&gt;The data that we can actually produce from this traffic count dataset is actually a &lt;em&gt;subset&lt;/em&gt; of the dataset of all countries, and we can indicate this through a &lt;code&gt;void:subset&lt;/code&gt; relationship:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://statistics.data.gov.uk/set/country/&amp;gt;
  ...
  void:subset [
    a void:Dataset ;
    a prv:DataItem ;
    rdfs:label &quot;Country data from the DfT traffic count dataset 2001-2008&quot;@en ;
    prv:createdBy [
      a prv:DataCreation ;
      prv:performedAt ... ;
      prv:performedBy ... ;
      prv:usedData ... ;
      prv:usedGuideline ... ;
    ] ;
  ] .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The other kind of subset that we should describe are link sets. Link sets are datasets that contain links between datasets. The country dataset doesn&amp;#8217;t (currently) contain any links to other datasets, but the count dataset does:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/set/traffic-count&amp;gt;
  a void:Dataset ;
  rdfs:label &quot;Traffic Counts&quot;@en ;
  foaf:homepage &amp;lt;http://transport.data.gov.uk/set/traffic-count&amp;gt; ;
  dct:subject &amp;lt;http://dbpedia.org/resource/Traffic&amp;gt; ;
  dct:subject &amp;lt;http://dbpedia.org/resource/Counting&amp;gt; ;
  cc:license [
    a cc:License ;
    rdfs:label &quot;data.gov.uk Licence&quot;@en ;
    foaf:homepage &amp;lt;http://data.hmg.gov.uk/terms-privacy&amp;gt; ;
    cc:permits cc:DerivativeWorks, cc:Distribution, cc:Reproduction ;
    cc:requires cc:Attribution ;
  ] ;
  void:exampleResource &amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00&amp;gt; ;
  void:uriRegexPattern &amp;lt;http://transport.data.gov.uk/id/traffic-count-point/[0-9]+/direction/[NSEW]/hour/[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:00:00&amp;gt; ;
  void:sparqlEndpoint &amp;lt;http://services.data.gov.uk/transport/sparql&amp;gt; ;
  void:vocabulary &amp;lt;http://transport.data.gov.uk/def/traffic/&amp;gt; ;
  void:subset [
    a void:Dataset ;
    rdfs:label &quot;Traffic Counts from the DfT traffic count dataset 2001-2008&quot;@en ;
    prv:createdBy ...
  ] ;
  void:subset [
    a void:Linkset ;
    rdfs:label &quot;Traffic count / count point links&quot;@en ;
    rdfs:comment &quot;Links from a traffic count to the count point at which the count was taken.&quot;@en ;
    void:subjectsTarget &amp;lt;http://transport.data.gov.uk/set/traffic-count/&amp;gt; ;
    void:linkPredicate &amp;lt;http://transport.data.gov.uk/def/count&amp;gt; ;
    void:objectsTarget &amp;lt;http://transport.data.gov.uk/set/traffic-count-point/&amp;gt; ;
  ] ;
  void:subset [
    a void:Linkset ;
    rdfs:label &quot;Traffic count / cardinal direction&quot;@en ;
    rdfs:comment &quot;Links from a traffic count to the direction in which the traffic was going.&quot;@en ;
    void:subjectsTarget &amp;lt;http://transport.data.gov.uk/set/traffic-count/&amp;gt; ;
    void:linkPredicate &amp;lt;http://transport.data.gov.uk/def/direction&amp;gt; ;
    void:objectsTarget &amp;lt;http://dbpedia.org/void/Dataset&amp;gt; ;
  ] ;
  void:subset [
    a void:Linkset ;
    rdfs:label &quot;Traffic count / hour&quot;@en ;
    rdfs:comment &quot;Links from a traffic count to the hour when the traffic was being monitored.&quot;@en ;
    void:subjectsTarget &amp;lt;http://transport.data.gov.uk/set/traffic-count/&amp;gt; ;
    void:linkPredicate &amp;lt;http://transport.data.gov.uk/def/direction&amp;gt; ;
    void:objectsTarget [
      a void:Dataset ;
      rdfs:label &quot;URIs for places and times&quot; ;
      foaf:homepage &quot;http://placetime.com/&quot; ;
    ] ;
  ] .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;code&gt;scovo:Dataset&lt;/code&gt;s are often subsets of &lt;code&gt;void:Datasets&lt;/code&gt;. In the case of the traffic count data, the observations described by the &lt;code&gt;scovo:Dataset&lt;/code&gt; above are a subset of the &lt;code&gt;void:Dataset&lt;/code&gt; that is the set of &lt;em&gt;all&lt;/em&gt; such observations (including ones from other years).&lt;/p&gt;

&lt;h2&gt;Derivable Data&lt;/h2&gt;

&lt;p&gt;The discussion about &lt;code&gt;rdfs:label&lt;/code&gt; above touched on another set of information that should be included within the RDF data we produce: data that is automatically derivable from the data we provide. There are three main reasons for including derivable data within what we publish:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Given the current adoption of RDF-aware technologies, the consumers of our data are pretty unlikely to be able to (or to want to) use schemas, ontologies or rule sets to help them to reason over the data and draw conclusions. The consumers of this data &lt;em&gt;might&lt;/em&gt; include semantic search engines and people scraping the data into their own triplestores, but they&amp;#8217;re far more likely to be developers who really don&amp;#8217;t care about RDF at all. It would be a shame to publish the data and then have no one use it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Computing derivable data once saves overall effort. We calculate it once, centrally, and it means that the people using the data don&amp;#8217;t have to spend processing time doing it themselves. (There&amp;#8217;s a classic time/space trade-off here, of course; the down side of including data that isn&amp;#8217;t strictly necessary is that the documents will end up larger.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If we provide information that people are likely to need within the document that they get when they request a given resource, they&amp;#8217;re less likely to need to resort to (harder to construct and more intensive to process) SPARQL queries to get what they need.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The overriding principle that we can use to help us decide what to include is to consider what we would like to see if we visited a page about the particular thing.&lt;/p&gt;

&lt;p&gt;How we manage to provide the derived data depends on how we publish the data. I&amp;#8217;m not talking here about how to do the publishing, but rather about what the consumers of the data should expect to see eventually. So, for example, if we publish the data as static files then we&amp;#8217;re going to have to include all this data in those files. If we generate the RDF dynamically, we just have to make sure that the generated RDF includes the derived data; we might be able to set up rules in a triplestore, or a transformation of the data that it naturally produces, to include the derivable data.&lt;/p&gt;

&lt;h3&gt;Superclasses and Super-properties&lt;/h3&gt;

&lt;p&gt;One set of derived data is that inferred from the superclasses and super-properties that are defined with the RDF vocabularies we use in our data. Basically, if a resource has a type that is a subclass of another type, then the resource should have that superclass as a type as well. Similarly, if a triple includes a property that has a super-property, then there ought also to be a triple that links the subject and object of the original triple with the super-property as well.&lt;/p&gt;

&lt;p&gt;To understand when it&amp;#8217;s important to include this kind of derived data, we need to be aware of the kind of applications that will use the data. Some applications will be targeting just this dataset about traffic counts, and will be written to use whatever properties and classes that we&amp;#8217;ve made available. Other applications will be targeted at specific vocabularies at a more general-purpose level. There might be applications that can be used to visualise SKOS hierarchies as a tree, for example, or applications that can plot any &lt;code&gt;geo:lat&lt;/code&gt;/&lt;code&gt;geo:long&lt;/code&gt; coordinates on a map, or any OWL-Time intervals and instants on a timeline. Still other applications, such as viewers like Tabulator, will be used with any old RDF. We need to provide enough information to make the data easily usable by these more generic applications.&lt;/p&gt;

&lt;p&gt;As an example, in the last instalment we introduced classes for &lt;code&gt;traffic:VehicleType&lt;/code&gt; and &lt;code&gt;traffic:RoadCategory&lt;/code&gt; which were subclasses of &lt;code&gt;skos:Concept&lt;/code&gt;. If we want generic SKOS visualisers to be able to display the vehicle type and road category concept schemes, we should try to make it easy for them to work out which things are concepts, by indicating that they are concepts as well. Bearing in mind what I&amp;#8217;ve said above about labels, that means that the original RDF:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;motorway&amp;gt; a traffic:RoadCategory ;
  skos:prefLabel &quot;Motorway&quot;@en ;
  skos:broader &amp;lt;major&amp;gt; ;
  skos:scopeNote &quot;Major roads often used for long distance travel. They are usually three or more lanes in each direction and generally have the maximum speed limit of 70mph.&quot;@en ;
  skos:inScheme &amp;lt;&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;should include a reference to &lt;code&gt;skos:Concept&lt;/code&gt; and a &lt;code&gt;rdfs:label&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;motorway&amp;gt; a traffic:RoadCategory ;
  a skos:Concept ;
  rdfs:label &quot;Motorway&quot;@en ;
  skos:prefLabel &quot;Motorway&quot;@en ;
  skos:broader &amp;lt;major&amp;gt; ;
  skos:scopeNote &quot;Major roads often used for long distance travel. They are usually three or more lanes in each direction and generally have the maximum speed limit of 70mph.&quot;@en ;
  skos:inScheme &amp;lt;&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Note that I haven&amp;#8217;t included the results of &lt;em&gt;all&lt;/em&gt; the reasoning that we could anticipate. The property &lt;code&gt;skos:scopeNote&lt;/code&gt; is a sub-property of &lt;code&gt;skos:note&lt;/code&gt;, for example, but I haven&amp;#8217;t included a &lt;code&gt;skos:note&lt;/code&gt; explicitly because any SKOS-aware processor should have that kind of knowledge built in. The rule of thumb is that &lt;strong&gt;if the result of the reasoning involves a resource from another vocabulary, then we should include it&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;Derivable Values&lt;/h3&gt;

&lt;p&gt;There are other kinds of derivable data in this data set. In particular, there are eastings and northings, but not latitudes and longitudes. When there&amp;#8217;s useful derivable data, especially when it&amp;#8217;s not trivial to derive, it makes sense to make that available explicitly, otherwise everyone else will have to go through the effort of deriving it themselves.&lt;/p&gt;

&lt;p&gt;We&amp;#8217;ve already done this with the information about the hours of the traffic counts, by pulling out the year and hour of the count rather than having them tucked away within a &lt;code&gt;xs:dateTime&lt;/code&gt; literal. The same should be true of the eastings and northings. For small numbers of values, you can use the &lt;a href=&quot;http://gps.ordnancesurvey.co.uk/convert.asp&quot;&gt;Ordnance Survey&amp;#8217;s online converter&lt;/a&gt;; for larger numbers of values you can download the (Windows only and very dated) software or try one of the various converters you can find with a &lt;a href=&quot;http://www.google.com/search?q=easting+northing+latitude+longitude+conversion+UK&quot;&gt;Google search&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Latitudes and longitudes for points should, of course, be expressed using the &lt;code&gt;geo:lat&lt;/code&gt; and &lt;code&gt;geo:long&lt;/code&gt; properties from the &lt;a href=&quot;http://www.w3.org/2003/01/geo/&quot;&gt;http://www.w3.org/2003/01/geo/wgs84_pos#&lt;/a&gt; vocabulary.&lt;/p&gt;

&lt;h3&gt;Inverses&lt;/h3&gt;

&lt;p&gt;Statements in RDF link two things. For example, you can view the statement:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13&amp;gt;
  traffic:road &amp;lt;http://transport.data.gov.uk/id/road/B3178&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;as saying that traffic count point 13 is on the B3178 &lt;em&gt;and&lt;/em&gt; that the B3178 has a count point on it that is traffic count point 13.&lt;/p&gt;

&lt;p&gt;So it&amp;#8217;s always possible, when creating a query about or a representation of the road to include the &amp;#8216;backward links&amp;#8217; &amp;#8212; the statements in which the road features as an &lt;em&gt;object&lt;/em&gt; as well as those in which it features as a &lt;em&gt;subject&lt;/em&gt;. This has caused some people to argue that &lt;a href=&quot;http://dowhatimean.net/2006/06/an-rdf-design-pattern-inverse-property-labels&quot;&gt;relationships should only be defined in one direction&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Personally, I don&amp;#8217;t agree, for two reasons.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Although it&amp;#8217;s &lt;em&gt;possible&lt;/em&gt; to create queries and representations that include backward links, it often doesn&amp;#8217;t happen like that. It&amp;#8217;s different with different triplestores, but result of a the &lt;code&gt;DESCRIBE&lt;/code&gt; SPARQL query commonly only includes triples in which the thing being described in the subject, not the object. Also, when constructing queries, it seem more natural to always &amp;#8220;travel forward&amp;#8221; through the graph. For example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT ?count
WHERE {
  ?point
    area:localAuthority &amp;lt;http://statistics.data.gov.uk/id/local-authority/43UC&amp;gt; ;
    traffic:count ?count .
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;rather than:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT ?count
WHERE {
  ?point
    area:localAuthority &amp;lt;http://statistics.data.gov.uk/id/local-authority/43UC&amp;gt; .
  ?count
    traffic:countPoint ?point .
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So although it introduces redundancy, I think that including inverse relationships in RDF aids usability and navigability.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sometimes both directions of a relationship contain meaningful information. For example, it&amp;#8217;s not enough to include a &lt;code&gt;gen:mother&lt;/code&gt; relationship from a person to their mother because the implied reverse relationship is simply that the person is a child of their mother &amp;#8212; you need to include a &lt;code&gt;gen:son&lt;/code&gt; or &lt;code&gt;gen:daughter&lt;/code&gt; relationship as well to tell which type of child.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So in this dataset, I&amp;#8217;m going to include inverse relationships where appropriate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;from countries to regions&lt;/li&gt;
&lt;li&gt;from regions to local authority districts&lt;/li&gt;
&lt;li&gt;from roads to count points&lt;/li&gt;
&lt;li&gt;from count points to counts&lt;/li&gt;
&lt;li&gt;from counts to observations&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;Shortcuts&lt;/h3&gt;

&lt;p&gt;Another thing that can aid the navigability of a set of RDF data is to provide &amp;#8220;shortcuts&amp;#8221;. For example, at the moment we have links that say which country a region belongs to and which region a local authority district belongs to, but we don&amp;#8217;t have a link that says which country a local authority district belongs to. These kind of links can make it easier to navigate through (and to query) a dataset, so they can be worth adding so long as they don&amp;#8217;t clutter up the data too much.&lt;/p&gt;

&lt;p&gt;Just think of what you&amp;#8217;d like to know about a particular &lt;em&gt;thing&lt;/em&gt; when you visit its page. If you&amp;#8217;re looking at transport in a local authority district, it would be useful to know what region and country it belongs to and about what roads and traffic count points it contains. But it would be too much to have a list of all the counts and observations that have been taken on those count points.&lt;/p&gt;

&lt;p&gt;For this dataset, I&amp;#8217;m going to add shortcuts from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;countries to local authority districts (and vice versa)&lt;/li&gt;
&lt;li&gt;count points to regions and countries&lt;/li&gt;
&lt;li&gt;roads to local authority districts (and vice versa)&lt;/li&gt;
&lt;li&gt;roads to regions and countries&lt;/li&gt;
&lt;li&gt;roads to road categories and road names&lt;/li&gt;
&lt;li&gt;roads to counts (and vice versa)&lt;/li&gt;
&lt;li&gt;observations to count points, roads, directions and count hours&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are all judgement calls &amp;#8212; there are no hard and fast rules &amp;#8212; and as you can see I&amp;#8217;m not adding inverses everywhere here because to do so would lead to unnecessarily large sets of RDF in some cases.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;That&amp;#8217;s the end of this instalment. I had been intending to make this the final one, but there are a couple of things still left to talk about: the publication of RDF, and the supplementary documents that we need to provide (including RDF about those supplementary documents). I&amp;#8217;ve also had a request to talk about OWL ontologies, so I&amp;#8217;ll probably do that, and there are things to say about how to manage things changing over time. So this may end up being an eight-part series!&lt;/p&gt;

&lt;p&gt;To keep us up to date, with all the extra derived information added, the RDF looks as follows:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@prefix rdf: &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&quot;&amp;gt; .
@prefix owl: &amp;lt;http://www.w3.org/2002/07/owl#&amp;gt; .
@prefix skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt; .
@prefix time: &amp;lt;http://www.w3.org/2006/time&amp;gt; .
@prefix scovo: &amp;lt;http://purl.org/NET/scovo#&amp;gt; .
@prefix area: &amp;lt;http://statistics.data.gov.uk/def/administrative-geography/&amp;gt; .
@prefix admingeo: &amp;lt;http://data.ordnancesurvey.co.uk/ontology/admingeo/&amp;gt; .
@prefix space: &amp;lt;http://data.ordnancesurvey.co.uk/ontology/spatialrelations/&amp;gt; .
@prefix traffic: &amp;lt;http://transport.data.gov.uk/def/traffic/&amp;gt; .

&amp;lt;http://statistics.data.gov.uk/id/country?name=England&amp;gt;
  a area:Country ;
  rdfs:label &quot;England&quot;@en ;
  area:region &amp;lt;http://statistics.data.gov.uk/id/government-office-region/K&amp;gt; ;
  area:district &amp;lt;http://statistics.data.gov.uk/id/local-authority-district/18&amp;gt; .

&amp;lt;http://statistics.data.gov.uk/id/government-office-region/K&amp;gt;
  a admingeo:GovernmentOfficeRegion ;
  rdfs:label &quot;South West&quot;@en ;
  skos:notation &quot;K&quot;^^area:StandardCode ;
  area:country &amp;lt;http://statistics.data.gov.uk/id/country?name=England&amp;gt; ;
  area:district &amp;lt;http://statistics.data.gov.uk/id/local-authority-district/18&amp;gt; .

&amp;lt;http://statistics.data.gov.uk/id/local-authority-district/18&amp;gt;
  a area:LocalAuthorityDistrict ;
  rdfs:label &quot;Devon&quot;@en ;
  skos:notation &quot;18&quot;^^area:StandardCode ;
  skos:notation &quot;1115&quot;^^traffic:LAcode ;
  area:localAuthority &amp;lt;http://statistics.data.gov.uk/id/local-authority/18&amp;gt; ;
  area:country &amp;lt;http://statistics.data.gov.uk/id/country?name=England&amp;gt; ;
  area:region &amp;lt;http://statistics.data.gov.uk/id/government-office-region/K&amp;gt; ;
  traffic:countPoint &amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13&amp;gt; ;
  traffic:road &amp;lt;http://transport.data.gov.uk/id/road/B3178&amp;gt; .

&amp;lt;http://transport.data.gov.uk/id/local-authority-district/1115&amp;gt;
  owl:sameAs &amp;lt;http://statistics.data.gov.uk/id/local-authority-district/18&amp;gt; .

&amp;lt;http://statistics.data.gov.uk/id/local-authority/18&amp;gt;
  a area:LocalAuthority ;
  rdfs:label &quot;Devon County Council&quot;@en ;
  skos:notation &quot;18&quot;^^area:StandardCode ;
  skos:notation &quot;1115&quot;^^traffic:LAcode ;
  area:coverage &amp;lt;http://statistics.data.gov.uk/id/local-authority-district/18&amp;gt; .

&amp;lt;http://transport.data.gov.uk/id/local-authority/1116&amp;gt;
  owl:sameAs &amp;lt;http://statistics.data.gov.uk/id/local-authority/18&amp;gt; .

&amp;lt;http://transport.data.gov.uk/id/road/B3178&amp;gt;
  a traffic:Road ;
  rdfs:label &quot;B3178&quot; ;
  rdfs:label &quot;Salterton Road&quot;@en ;
  skos:prefLabel &quot;B3178&quot; ;
  skos:altLabel &quot;Salterton Road&quot;@en ;
  skos:notation &quot;B3178&quot;^^traffic:RoadNumber ;
  area:country &amp;lt;http://statistics.data.gov.uk/id/country?name=England&amp;gt; ;
  area:region &amp;lt;http://statistics.data.gov.uk/id/government-office-region/K&amp;gt; ;
  area:localAuthority &amp;lt;http://statistics.data.gov.uk/id/local-authority/18&amp;gt; ;
  area:district &amp;lt;http://statistics.data.gov.uk/id/local-authority-district/18&amp;gt; ;
  traffic:roadCategory 
    &amp;lt;http://transport.data.gov.uk/def/road-category/b&amp;gt; ,
    &amp;lt;http://transport.data.gov.uk/def/road-category/urban&amp;gt; ;
  traffic:count &amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00&amp;gt; .

&amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13&amp;gt;
  a traffic:CountPoint ;
  rdfs:label &quot;Salterton Road, EAST OF DINAN WAY, EXMOUTH&quot;@en ;
  rdfs:comment &quot;Salterton Road, EAST OF DINAN WAY, EXMOUTH&quot;@en ;
  skos:notation &quot;13&quot;^^traffic:CountPointNumber ;
  traffic:road &amp;lt;http://transport.data.gov.uk/id/road/B3178&amp;gt; ;
  traffic:roadName &quot;Salterton Road&quot;@en ;
  traffic:roadCategory 
    &amp;lt;http://transport.data.gov.uk/def/road-category/b&amp;gt; ,
    &amp;lt;http://transport.data.gov.uk/def/road-category/urban&amp;gt; ;
  space:easting 302600 ;
  space:northing 81984 ;
  geo:lat 50.6294 ;
  geo:long -3.3784 ;
  area:country &amp;lt;http://statistics.data.gov.uk/id/country?name=England&amp;gt; ;
  area:region &amp;lt;http://statistics.data.gov.uk/id/government-office-region/K&amp;gt; ;
  area:localAuthority &amp;lt;http://statistics.data.gov.uk/id/local-authority/18&amp;gt; ;
  area:district &amp;lt;http://statistics.data.gov.uk/id/local-authority-district/18&amp;gt; ;
  traffic:count &amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00&amp;gt; .

&amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00&amp;gt;
  a traffic:Count ;
  rdfs:label &quot;8 Oct 2001 17:00-18:00 - East - Salterton Road, EAST OF DINAN WAY, EXMOUTH&quot;@en ;
  traffic:countPoint &amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13&amp;gt; ;
  traffic:road &amp;lt;http://transport.data.gov.uk/id/road/B3178&amp;gt; ;
  traffic:direction &amp;lt;http://dbpedia.org/resource/East&amp;gt; ;
  traffic:countHour &amp;lt;http://placetime.com/interval/gregorian/2001-10-08T17:00:00Z/PT1H&amp;gt; ;
  traffic:observation &amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00/type/bicycle&amp;gt; .

&amp;lt;http://dbpedia.org/resource/East&amp;gt;
  rdfs:label &quot;East&quot;@en .

&amp;lt;http://placetime.com/interval/gregorian/2001-10-08T17:00:00Z/PT1H&amp;gt;
  a traffic:CountHour ;
  rdfs:label &quot;8 Oct 2001, 17:00-18:00&quot;@en ;
  time:hasBeginning &amp;lt;http://placetime.com/instant/gregorian/2001-10-08T17:00:00Z&amp;gt; ;
  time:hasEnd &amp;lt;http://placetime.com/instant/gregorian/2001-10-08T18:00:00Z&amp;gt; ;
  time:hasDurationDescription _:OneHour ;
  time:intervalDuring &amp;lt;http://dbpedia.org/resource/2001&amp;gt; .

_:OneHour a time:DurationDescription ;
  rdfs:label &quot;one hour&quot;@en ;
  time:years 0 ;
  time:months 0 ;
  time:days 0 ;
  time:hours 1 ;
  time:minutes 0 ;
  time:seconds 0 .

&amp;lt;http://placetime.com/instant/gregorian/2001-10-08T17:00:00Z&amp;gt;
  a time:Instant ;
  rdfs:label &quot;8 Oct 2001, 17:00&quot;@en ;
  time:inXSDDateTime &quot;2001-10-08T17:00:00Z&quot;^^xsd:dateTime ;
  time:inDateTime [
    a time:DateTimeDescription ;
    time:unitType time:unitHour ;
    time:year &quot;2001&quot;^^xsd:gYear ;
    time:month &quot;--10&quot;^^xsd:gMonth ;
    time:day &quot;---08&quot;^^xsd:gDay ;
    time:hour 17 ;
  ] .

&amp;lt;http://placetime.com/instant/gregorian/2001-10-08T18:00:00Z&amp;gt;
  a time:Instant ;
  rdfs:label &quot;8 Oct 2001, 18:00&quot;@en ;
  time:inXSDDateTime &quot;2001-10-08T18:00:00Z&quot;^^xsd:dateTime ;
  time:inDateTime [
    a time:DateTimeDescription ;
    time:unitType time:unitHour ;
    time:year &quot;2001&quot;^^xsd:gYear ;
    time:month &quot;--10&quot;^^xsd:gMonth ;
    time:day &quot;---08&quot;^^xsd:gDay ;
    time:hour 18 ;
  ] .

&amp;lt;http://dbpedia.org/resource/2001&amp;gt;
  a time:Interval ;
  rdfs:label &quot;2001&quot; ;
  rdf:value &quot;2001&quot;^^xsd:gYear ;
  time:intervalEquals &amp;lt;http://placetime.com/interval/gregorian/2001-01-01T00:00:00Z/P1Y&amp;gt; .

&amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00/type/bicycle&amp;gt;
  a scovo:Item ;
  rdfs:label &quot;8 Oct 2001 17:00-18:00 - East - Salterton Road, EAST OF DINAN WAY, EXMOUTH&quot;@en ;
  traffic:count &amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00&amp;gt; ;
  traffic:countPoint &amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13&amp;gt; ;
  traffic:direction &amp;lt;http://dbpedia.org/resource/East&amp;gt; ;
  traffic:countHour &amp;lt;http://placetime.com/interval/gregorian/2001-10-08T17:00:00Z/PT1H&amp;gt; ;
  traffic:vehicleType &amp;lt;http://transport.data.gov.uk/def/vehicle/bicycle&amp;gt; ;
  rdf:value 2 .
&lt;/code&gt;&lt;/pre&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/139#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/46">linked data</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/31">rdf</category>
 <pubDate>Sat, 05 Dec 2009 08:50:28 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">139 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Creating Linked Data - Part IV: Developing RDF Schemas</title>
 <link>http://www.jenitennison.com/blog/node/138</link>
 <description>&lt;p&gt;This is the fourth instalment in a series about turning an existing dataset into some linked data. I&amp;#8217;ve previously talked about &lt;a href=&quot;http://www.jenitennison.com/blog/node/135&quot;&gt;analysis and modelling&lt;/a&gt;, &lt;a href=&quot;http://www.jenitennison.com/blog/node/136&quot;&gt;defining URIs&lt;/a&gt; and &lt;a href=&quot;http://www.jenitennison.com/blog/node/137&quot;&gt;defining concept schemes&lt;/a&gt;. In this instalment, we&amp;#8217;ll look at developing a schema in which we define the classes, properties and datatypes that we want to use in the RDF that describes the &lt;em&gt;things&lt;/em&gt; in our dataset.&lt;/p&gt;

&lt;p&gt;We&amp;#8217;ll start by writing out some RDF for our record, using Turtle here for readability, and use unprefixed names to indicate classes, properties and datatypes, just so we can see what we need. Then we&amp;#8217;ll see how those requirements match up to existing vocabularies and ontologies that we can reuse. Anything that&amp;#8217;s left over we&amp;#8217;re going to have to put in our own vocabulary. We&amp;#8217;ll call this&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://transport.data.gov.uk/def/traffic/
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;All the classes, properties and datatypes that we define will eventually use that namespace.&lt;/p&gt;

&lt;p&gt;Let&amp;#8217;s focus on this record; I find it easiest to use an actual example rather than talk in abstract:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&quot;England&quot;,&quot;South West&quot;,&quot;K&quot;,1115.00,&quot;18&quot;,&quot;Devon County Council&quot;,
13,&quot;B3178&quot;,,&quot;B Urban&quot;,&quot;Salterton Road&quot;,
&quot;Salterton Road, EAST OF DINAN WAY, EXMOUTH&quot;,302600,81984,
8/10/2001 00:00:00,&quot;E&quot;,17,2,2,400,5,41,0,2,0,0,0,0,2,450
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We&amp;#8217;ll put this into RDF bit by bit.&lt;/p&gt;

&lt;h2&gt;Areas&lt;/h2&gt;

&lt;p&gt;First, let&amp;#8217;s look at the areas and local authorities. The kind of RDF that we want to have looks like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://statistics.data.gov.uk/id/country?name=England&amp;gt;
  a :Country ;
  :name &quot;England&quot;@en .

&amp;lt;http://statistics.data.gov.uk/id/government-office-region/K&amp;gt;
  a :GovernmentOfficeRegion ;
  :name &quot;South West&quot;@en ;
  :code &quot;K&quot;^^:ONScode ;
  :containedBy &amp;lt;http://statistics.data.gov.uk/id/country?name=England&amp;gt; .

&amp;lt;http://statistics.data.gov.uk/id/local-authority-district/18&amp;gt;
  a :LocalAuthorityDistrict ;
  :code &quot;18&quot;^^:ONScode ;
  :code &quot;1115&quot;^^:DfTLAcode ;
  :localAuthority &amp;lt;http://statistics.data.gov.uk/id/local-authority/18&amp;gt; ;
  :containedBy &amp;lt;http://statistics.data.gov.uk/id/country?name=England&amp;gt; ;
  :containedBy &amp;lt;http://statistics.data.gov.uk/id/government-office-region/K&amp;gt; .

&amp;lt;http://transport.data.gov.uk/id/local-authority-district/1115&amp;gt;
  :sameAs &amp;lt;http://statistics.data.gov.uk/id/local-authority-district/18&amp;gt; .

&amp;lt;http://statistics.data.gov.uk/id/local-authority/18&amp;gt;
  a :LocalAuthority ;
  :name &quot;Devon County Council&quot;@en ;
  :code &quot;18&quot;^^:ONSLAcode ;
  :code &quot;1115&quot;^^:DfTLAcode ;
  :localAuthorityDistrict &amp;lt;http://statistics.data.gov.uk/id/local-authority-district/18&amp;gt; .

&amp;lt;http://transport.data.gov.uk/id/local-authority/1116&amp;gt;
  :sameAs &amp;lt;http://statistics.data.gov.uk/id/local-authority/18&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To work out what we need to put in our schema, we should first look at what existing vocabularies there are that could help. These areas are already defined elsewhere, so we can just use the same vocabulary for countries, regions, local authority districts and local authorities as is used there. The vocabularies that are useful here are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;http://statistics.data.gov.uk/def/administrative-geography/&lt;/code&gt; which defines classes and properties related to administrative areas and local authorities (as described by the &lt;a href=&quot;http://www.statistics.gov.uk/&quot;&gt;Office of National Statistics&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;http://data.ordnancesurvey.co.uk/ontology/admingeo/&lt;/code&gt; which also defines classes and properties related to administrative areas (as described by the &lt;a href=&quot;http://www.ordnancesurvey.co.uk/&quot;&gt;Ordnance Survey&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;http://data.ordnancesurvey.co.uk/ontology/spatialrelations/&lt;/code&gt;, also developed by John Goodwin at the Ordnance Survey, which defines spatial relationships between areas&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are other commonly used vocabularies that it&amp;#8217;s helpful to know about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RDFS is designed for representing RDF schemas, but it has a few general-purpose properties that are good to know, namely &lt;code&gt;rdfs:label&lt;/code&gt; (the label for a thing) and &lt;code&gt;rdfs:comment&lt;/code&gt; (a comment or description about the thing).&lt;/li&gt;
&lt;li&gt;SKOS is designed for representing concept schemes, but again it has a few properties that can be used with any set of linked data, in particular &lt;code&gt;skos:prefLabel&lt;/code&gt; (the preferred label for a thing), &lt;code&gt;skos:altLabel&lt;/code&gt; (an alternative label for a thing) and &lt;code&gt;skos:notation&lt;/code&gt; (a code for the thing).&lt;/li&gt;
&lt;li&gt;OWL is designed for representing ontologies, but it has one very important property that you should know about &amp;#8212; &lt;code&gt;owl:sameAs&lt;/code&gt; &amp;#8212; which is used to link two things that are the same thing.&lt;/li&gt;
&lt;li&gt;XML Schema datatypes can be used within RDF, which is useful for things like dates, times, integers and so on.&lt;/li&gt;
&lt;li&gt;For our purposes here, OWL-Time is going to prove useful, as it has a bunch of properties that are used to represent instants and durations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If we look through the RDF above, the only thing that &lt;em&gt;isn&amp;#8217;t&lt;/em&gt; covered by these vocabularies is the &lt;code&gt;DfTLAcode&lt;/code&gt; datatype. If we use the &lt;code&gt;http://transport.data.gov.uk/def/traffic/&lt;/code&gt; namespace, there&amp;#8217;s not really any need to indicate that this is a transport-related code, so we can just call it &lt;code&gt;LAcode&lt;/code&gt;. Let&amp;#8217;s define that datatype:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/def/traffic/LAcode&amp;gt;
  a rdfs:Datatype ;
  rdfs:label &quot;Local Authority Code&quot;@en .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That&amp;#8217;s it. Now here&amp;#8217;s the Turtle for the areas with the relevant namespaces added, and property names changed where appropriate:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&quot;&amp;gt; .
@prefix skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt; .
@prefix owl: &amp;lt;http://www.w3.org/2002/07/owl#&amp;gt; .
@prefix area: &amp;lt;http://statistics.data.gov.uk/def/administrative-geography/&amp;gt; .
@prefix space: &amp;lt;http://data.ordnancesurvey.co.uk/ontology/spatialrelations/&amp;gt; .
@prefix admingeo: &amp;lt;http://data.ordnancesurvey.co.uk/ontology/admingeo/&amp;gt; .
@prefix traffic: &amp;lt;http://transport.data.gov.uk/def/traffic/&amp;gt; .

&amp;lt;http://statistics.data.gov.uk/id/country?name=England&amp;gt;
  a area:Country ;
  rdfs:label &quot;England&quot;@en .

&amp;lt;http://statistics.data.gov.uk/id/government-office-region/K&amp;gt;
  a admingeo:GovernmentOfficeRegion ;
  rdfs:label &quot;South West&quot;@en ;
  skos:notation &quot;K&quot;^^area:StandardCode ;
  area:country &amp;lt;http://statistics.data.gov.uk/id/country?name=England&amp;gt; .

&amp;lt;http://statistics.data.gov.uk/id/local-authority-district/18&amp;gt;
  a area:LocalAuthorityDistrict ;
  skos:notation &quot;18&quot;^^area:StandardCode ;
  skos:notation &quot;1115&quot;^^traffic:LAcode ;
  area:localAuthority &amp;lt;http://statistics.data.gov.uk/id/local-authority/18&amp;gt; ;
  area:country &amp;lt;http://statistics.data.gov.uk/id/country?name=England&amp;gt; ;
  area:region &amp;lt;http://statistics.data.gov.uk/id/government-office-region/K&amp;gt; .

&amp;lt;http://transport.data.gov.uk/id/local-authority-district/1115&amp;gt;
  owl:sameAs &amp;lt;http://statistics.data.gov.uk/id/local-authority-district/18&amp;gt; .

&amp;lt;http://statistics.data.gov.uk/id/local-authority/18&amp;gt;
  a area:LocalAuthority ;
  rdfs:label &quot;Devon County Council&quot;@en ;
  skos:notation &quot;18&quot;^^area:StandardCode ;
  skos:notation &quot;1115&quot;^^traffic:LAcode ;
  area:coverage &amp;lt;http://statistics.data.gov.uk/id/local-authority-district/18&amp;gt; .

&amp;lt;http://transport.data.gov.uk/id/local-authority/1116&amp;gt;
  owl:sameAs &amp;lt;http://statistics.data.gov.uk/id/local-authority/18&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Roads&lt;/h2&gt;

&lt;p&gt;Here&amp;#8217;s the kind of RDF we want to create for roads:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/id/road/B3178&amp;gt;
  a :Road ;
  :code &quot;B3178&quot;^^:RoadNumber .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Obviously, we need a class for roads:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/def/traffic/Road&amp;gt;
  a rdfs:Class ;
  rdfs:label &quot;Road&quot;@en .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Wherever there&amp;#8217;s a code, I like to reuse &lt;code&gt;skos:notation&lt;/code&gt;. But it&amp;#8217;s important to define a datatype for the values used with that notation because (as we saw with local authorities above) there may be several different coding schemes that apply to the same Thing, and we need to be able to distinguish between them in case they clash. So:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/def/traffic/RoadNumber&amp;gt;
  a rdfs:Datatype ;
  rdfs:label &quot;Road Number&quot;@en .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That&amp;#8217;s all we have to define for roads; now the RDF can look like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@prefix traffic: &amp;lt;http://transport.data.gov.uk/def/traffic/&amp;gt; .
@prefix skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt; .

&amp;lt;http://transport.data.gov.uk/id/road/B3178&amp;gt;
  a traffic:Road ;
  skos:notation &quot;B3178&quot;^^traffic:RoadNumber .
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Count Points&lt;/h2&gt;

&lt;p&gt;On to count points. Here&amp;#8217;s the sketch of the RDF we want to create:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13&amp;gt;
  a :TrafficCountPoint ;
  :description &quot;Salterton Road, EAST OF DINAN WAY, EXMOUTH&quot;@en ;
  :code &quot;13&quot;^^:CountPointNumber ;
  :road &amp;lt;http://transport.data.gov.uk/id/road/B3178&amp;gt; ;
  :roadName &quot;Salterton Road&quot;@en ;
  :roadCategory 
    &amp;lt;http://transport.data.gov.uk/def/road-category/b&amp;gt; ,
    &amp;lt;http://transport.data.gov.uk/def/road-category/urban&amp;gt; ;
  :easting 302600 ;
  :northing 81984 ;
  :localAuthority &amp;lt;http://statistics.data.gov.uk/id/local-authority/18&amp;gt; ;
  :localAuthorityDistrict &amp;lt;http://statistics.data.gov.uk/id/local-authority-district/18&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Of these, the description could be done with &lt;code&gt;rdfs:comment&lt;/code&gt;. The code can be held by a &lt;code&gt;skos:notation&lt;/code&gt; (provided we define a datatype for the count point number):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/def/traffic/CountPointNumber&amp;gt;
  a rdfs:Datatype ;
  rdfs:label &quot;Traffic Count Point Number&quot;@en .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Properties for easting and northing are actually defined by the OS&amp;#8217;s spatial relations ontology (although unfortunately neither the ontology nor the property is currently resolvable; the only way you&amp;#8217;d know this is through looking at their use in the conversion of the edubase data). Links to local authorities and local authority districts can be done using the ONS-based administrative geography ontology, which again is currently only guessable at by looking at the online data.&lt;/p&gt;

&lt;p&gt;That leaves us with a &lt;code&gt;traffic:CountPoint&lt;/code&gt; class (no point calling it &lt;code&gt;TrafficCountPoint&lt;/code&gt; if the namespace provides sufficient disambiguation):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/def/traffic/CountPoint&amp;gt;
  a rdfs:Class ;
  rdfs:label &quot;Traffic Count Point&quot;@en .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;A road property to point to a road:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/def/traffic/road&amp;gt;
  a rdf:Property ;
  rdfs:label &quot;road&quot;@en ;
  rdfs:range &amp;lt;http://transport.data.gov.uk/def/traffic/Road&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Note that properties are by convention named with a lowercase first letter, whereas classes are named with an uppercase first letter. It&amp;#8217;s a good idea to follow that convention. Note also that I&amp;#8217;ve defined a &lt;code&gt;rdfs:range&lt;/code&gt; for this property, which means that anything that&amp;#8217;s the &lt;em&gt;object&lt;/em&gt; in a RDF statement that involves this property must be a &lt;code&gt;traffic:Road&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We need a road name property to give the name of the road at the count point.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/def/traffic/road&amp;gt;
  a rdf:Property ;
  rdfs:label &quot;road name&quot;@en ;
  rdfs:range &amp;lt;http://transport.data.gov.uk/def/traffic/Road&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We also need a road category property to point to the categor(ies) of the road at the count point:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/def/traffic/roadCategory&amp;gt;
  a rdf:Property ;
  rdfs:label &quot;road category&quot;@en .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You&amp;#8217;ll remember that we defined different road categories using SKOS, such that each road category is a &lt;code&gt;skos:Concept&lt;/code&gt;. But to give a range to the &lt;code&gt;traffic:roadCategory&lt;/code&gt; property, we need to create a class for all the things that are categories of road. These are all &lt;code&gt;skos:Concept&lt;/code&gt;s, and we can indicate that through an &lt;code&gt;rdfs:subClassOf&lt;/code&gt; property:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/def/traffic/RoadCategory&amp;gt;
  a rdfs:Class ;
  rdfs:subClassOf skos:Concept ;
  rdfs:label &quot;Road Category&quot;@en .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;use this as the range of the &lt;code&gt;traffic:roadCategory&lt;/code&gt; property:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/def/traffic/roadCategory&amp;gt;
  a rdf:Property ;
  rdfs:label &quot;road category&quot;@en ;
  rdfs:range &amp;lt;http://transport.data.gov.uk/def/traffic/RoadCategory&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and amend the concept scheme we created to include references to this new class, for example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;motorway&amp;gt; a traffic:RoadCategory ;
  skos:prefLabel &quot;Motorway&quot;@en ;
  skos:broader &amp;lt;major&amp;gt; ;
  skos:scopeNote &quot;Major roads often used for long distance travel. They are usually three or more lanes in each direction and generally have the maximum speed limit of 70mph.&quot;@en ;
  skos:inScheme &amp;lt;&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So here is the RDF with the relevant properties properly defined:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt; .
@prefix area: &amp;lt;http://statistics.data.gov.uk/def/administrative-geography/&amp;gt; .
@prefix space: &amp;lt;http://data.ordnancesurvey.co.uk/ontology/spatialrelations/&amp;gt; .
@prefix traffic: &amp;lt;http://transport.data.gov.uk/def/traffic/&amp;gt; .

&amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13&amp;gt;
  a traffic:CountPoint ;
  rdfs:comment &quot;Salterton Road, EAST OF DINAN WAY, EXMOUTH&quot;@en ;
  skos:notation &quot;13&quot;^^traffic:CountPointNumber ;
  traffic:road &amp;lt;http://transport.data.gov.uk/id/road/B3178&amp;gt; ;
  traffic:roadName &quot;Salterton Road&quot;@en ;
  traffic:roadCategory 
    &amp;lt;http://transport.data.gov.uk/def/road-category/b&amp;gt; ,
    &amp;lt;http://transport.data.gov.uk/def/road-category/urban&amp;gt; ;
  space:easting 302600 ;
  space:northing 81984 ;
  area:localAuthority &amp;lt;http://statistics.data.gov.uk/id/local-authority/18&amp;gt; ;
  area:district &amp;lt;http://statistics.data.gov.uk/id/local-authority-district/18&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Traffic Counts&lt;/h2&gt;

&lt;p&gt;On to traffic counts. The un-namespaced RDF should look like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00&amp;gt;
  a :TrafficCount ;
  :countPoint &amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13&amp;gt; ;
  :direction &amp;lt;http://dbpedia.org/resource/East&amp;gt; ;
  :hour &amp;lt;http://placetime.com/interval/gregorian/2001-10-08T17:00:00Z/PT1H&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So for that we need a class for traffic counts:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/def/traffic/Count&amp;gt;
  a rdfs:Class ;
  rdfs:label &quot;Traffic Count&quot;@en .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;a property that can link to the traffic count to the count point where the count is taken:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/def/traffic/countPoint&amp;gt;
  a rdf:Property ;
  rdfs:label &quot;traffic count point&quot;@en ;
  rdfs:range &amp;lt;http://transport.data.gov.uk/def/traffic/CountPoint&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;a property to link to the the direction the traffic is flowing in (we can&amp;#8217;t put a range on this one because the DBPedia resources we&amp;#8217;re using don&amp;#8217;t have a common type):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/def/traffic/direction&amp;gt;
  a rdf:Property ;
  rdfs:label &quot;traffic direction&quot;@en .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and finally a property to link to the hour during which the measurement was taken. This last one is a very common thing to need to do, so we&amp;#8217;d imagine that there might be an existing property defined somewhere that we could use. &lt;a href=&quot;http://sdmx.org/&quot;&gt;SDMX&lt;/a&gt;, which includes a standard for representing statistical information in XML, defines a &lt;code&gt;REF_PERIOD&lt;/code&gt; field which would seem to suit our purposes, but we don&amp;#8217;t yet have a proper mapping of SDMX into RDF (I&amp;#8217;ve had an initial cut, but it needs some input from statisticians).&lt;/p&gt;

&lt;p&gt;So for now, we&amp;#8217;ll use a specific property in our own namespace; we can always indicate that it&amp;#8217;s a sub-property of a future SDMX property at a later date. I&amp;#8217;m going to call it &lt;code&gt;countHour&lt;/code&gt; and give it a domain of &lt;code&gt;traffic:Count&lt;/code&gt; to indicate that the property has a pretty specific use for providing the count for an hour. We could just give its range as a generic &lt;code&gt;time:Interval&lt;/code&gt;, but the kind of hours that are traffic count hours are kinda special intervals: they&amp;#8217;re obviously an hour long, but are also restricted to start and end on the hour, cover an hour between 7am and 7pm, and don&amp;#8217;t occur in winter. So it feels like we should have a special kind of interval for that purpose:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/def/traffic/countHour&amp;gt;
  a rdf:Property ;
  rdfs:label &quot;hour of count&quot;@en ;
  rdfs:domain &amp;lt;http://transport.data.gov.uk/def/traffic/Count&amp;gt; ;
  rdfs:range &amp;lt;http://transport.data.gov.uk/def/traffic/CountHour&amp;gt; .

&amp;lt;http://transport.data.gov.uk/def/traffic/CountHour&amp;gt;
  a rdfs:Class ;
  rdfs:subClassOf time:Interval ;
  rdfs:label &quot;Count Hour&quot;@en .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;All those properties were in the traffic namespace, so here&amp;#8217;s the RDF with it added:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@prefix traffic: &amp;lt;http://transport.data.gov.uk/def/traffic/&amp;gt; .

&amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00&amp;gt;
  a traffic:Count ;
  traffic:countPoint &amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13&amp;gt; ;
  traffic:direction &amp;lt;http://dbpedia.org/resource/East&amp;gt; ;
  traffic:countHour &amp;lt;http://placetime.com/interval/gregorian/2001-10-08T17:00:00Z/PT1H&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Cardinal Directions&lt;/h2&gt;

&lt;p&gt;As I discussed in the last instalment, we&amp;#8217;re not actually going to mint URIs for cardinal directions, but that doesn&amp;#8217;t mean we can&amp;#8217;t make statements about them in the RDF we generate. As I&amp;#8217;ll discuss in more depth in the next instalment, it&amp;#8217;s always good to provide a label at the very least:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://dbpedia.org/resource/East&amp;gt;
  rdfs:label &quot;East&quot;@en .
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Intervals and Instants&lt;/h2&gt;

&lt;p&gt;Let&amp;#8217;s look now at the RDF we want to generate about the hour during which the count was taken. As I&amp;#8217;ve said above, these hours are a special kind of interval, and we&amp;#8217;ve already created a class for them. I also discussed earlier that the things about this interval that are really useful for the purposes of querying are the year during which the count was taken and the hour at which it was taken, so we should pull out at least those pieces of information. Time-based data can be represented in RDF using the &lt;a href=&quot;http://www.w3.org/2006/time&quot;&gt;OWL-Time ontology&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Unfortunately, expressing time very specifically gets. This is what the statements we want to make look like using OWL-Time:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix time: &amp;lt;http://www.w3.org/2006/time&amp;gt; .

&amp;lt;http://placetime.com/interval/gregorian/2001-10-08T17:00:00Z/PT1H&amp;gt;
  a traffic:CountHour ;
  rdfs:label &quot;8 Oct 2001, 17:00-18:00&quot;@en ;
  time:hasBeginning &amp;lt;http://placetime.com/instant/gregorian/2001-10-08T17:00:00Z&amp;gt; ;
  time:hasEnd &amp;lt;http://placetime.com/instant/gregorian/2001-10-08T18:00:00Z&amp;gt; ;
  time:hasDurationDescription _:OneHour ;
  time:intervalDuring &amp;lt;http://dbpedia.org/resource/2001&amp;gt; .

_:OneHour a time:DurationDescription ;
  rdfs:label &quot;one hour&quot;@en ;
  time:years 0 ;
  time:months 0 ;
  time:days 0 ;
  time:hours 1 ;
  time:minutes 0 ;
  time:seconds 0 .

&amp;lt;http://placetime.com/instant/gregorian/2001-10-08T17:00:00Z&amp;gt;
  a time:Instant ;
  rdfs:label &quot;8 Oct 2001, 17:00&quot;@en ;
  time:inXSDDateTime &quot;2001-10-08T17:00:00Z&quot;^^xsd:dateTime ;
  time:inDateTime [
    a time:DateTimeDescription ;
    time:unitType time:unitHour ;
    time:year &quot;2001&quot;^^xsd:gYear ;
    time:month &quot;--10&quot;^^xsd:gMonth ;
    time:day &quot;---08&quot;^^xsd:gDay ;
    time:hour 17 ;
  ] .

&amp;lt;http://placetime.com/interval/gregorian/2001-10-08T18:00:00Z&amp;gt;
  a time:Instant ;
  rdfs:label &quot;8 Oct 2001, 18:00&quot;@en ;
  time:inXSDDateTime &quot;2001-10-08T18:00:00Z&quot;^^xsd:dateTime ;
  time:inDateTime [
    a time:DateTimeDescription ;
    time:unitType time:unitHour ;
    time:year &quot;2001&quot;^^xsd:gYear ;
    time:month &quot;--10&quot;^^xsd:gMonth ;
    time:day &quot;---08&quot;^^xsd:gDay ;
    time:hour 18 ;
  ] .

&amp;lt;http://dbpedia.org/resource/2001&amp;gt;
  a time:Interval ;
  rdfs:label &quot;2001&quot; ;
  rdf:value &quot;2001&quot;^^xsd:gYear ;
  time:intervalEquals &amp;lt;http://placetime.com/interval/gregorian/2001-01-01T00:00:00Z/P1Y&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Observations&lt;/h2&gt;

&lt;p&gt;Finally we&amp;#8217;re on to the observations themselves. The un-namespaced RDF looks like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00/type/bicycle&amp;gt;
  a :Observation ;
  :count &amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00&amp;gt; ;
  :vehicleType &amp;lt;http://transport.data.gov.uk/def/vehicle/bicycle&amp;gt; ;
  :value 2 .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;a href=&quot;http://purl.org/NET/scovo&quot;&gt;SCOVO&lt;/a&gt; vocabulary exists to represent statistical information like this. In SCOVO, observations are called &lt;code&gt;scovo:Item&lt;/code&gt;s, the value of the statistical measure itself (the count in this case) should be held in the &lt;code&gt;rdf:value&lt;/code&gt; property, and any other properties should be subtypes of &lt;code&gt;scovo:dimension&lt;/code&gt;, which has a domain of &lt;code&gt;scovo:Dimension&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To fit in with SCOVO, then, we need to have the pointer to the count that this observation belongs to as a property that is a sub-property of &lt;code&gt;scovo:dimension&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/def/traffic/count&amp;gt;
  a rdf:Property ;
  rdfs:subPropertyOf scovo:dimension ;
  rdfs:label &quot;count&quot;@en ;
  rdfs:range &amp;lt;http://transport.data.gov.uk/def/traffic/Count&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We might be tempted to indicate that the type of thing pointed to by the &lt;code&gt;traffic:count&lt;/code&gt; property is a subclass of &lt;code&gt;scovo:Dimension&lt;/code&gt;, but this is unnecessary and probably untrue: there might exist some traffic counts that &lt;em&gt;aren&amp;#8217;t&lt;/em&gt; dimensions, and the ones that are will be linked to by the &lt;code&gt;traffic:count&lt;/code&gt; property can be inferred to be dimensions.&lt;/p&gt;

&lt;p&gt;Similarly, the property that provides the pointer to the vehicle type should be a sub-property of &lt;code&gt;scovo:dimension&lt;/code&gt; and we need a class for those various vehicle types in order to restrict the range of that property:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://transport.data.gov.uk/def/vehicleType&amp;gt;
  a rdf:Property ;
  rdfs:subPropertyOf scovo:dimension ;
  rdfs:label &quot;vehicle type&quot;@en ;
  rdfs:range &amp;lt;http://transport.data.gov.uk/def/VehicleType&amp;gt; .

&amp;lt;http://transport.data.gov.uk/def/VehicleType&amp;gt;
  a rdfs:Class ;
  rdfs:subClassOf skos:Concept ;
  rdfs:label &quot;Vehicle Type&quot;@en .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Of course all the concepts that we created for the vehicle types need to be designated as instances of this new &lt;code&gt;traffic:VehicleType&lt;/code&gt; class:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;bicycle&amp;gt; a traffic:VehicleType ;
  ... .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So, the RDF with the proper namespaces is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@prefix scovo: &amp;lt;http://purl.org/NET/scovo#&amp;gt; .
@prefix traffic: &amp;lt;http://transport.data.gov.uk/def/traffic/&amp;gt; .

&amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00/type/bicycle&amp;gt;
  a scovo:Item ;
  traffic:count &amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00&amp;gt; ;
  traffic:vehicleType &amp;lt;http://transport.data.gov.uk/def/vehicle/bicycle&amp;gt; ;
  rdf:value 2 .
&lt;/code&gt;&lt;/pre&gt;

&lt;hr /&gt;

&lt;p&gt;That concludes our initial walkthrough of the data to create a vocabulary. I&amp;#8217;ve duplicated the schema and the example data below so that it&amp;#8217;s all in one place. But it&amp;#8217;s not quite done. In the next instalment, I&amp;#8217;ll look at adding some finishing touches that make the RDF easier to use.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2&gt;Schema&lt;/h2&gt;

&lt;p&gt;This is the full schema. It contains just six classes, seven properties and three datatypes at the moment, so it&amp;#8217;s pretty small as vocabularies go. We&amp;#8217;ve been able to reuse a lot of classes, properties and datatypes that have already been defined elsewhere in the RDF itself, so this vocabulary is pretty focused on just what we need to describe traffic counts.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@prefix rdf: &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt; .
@prefix scovo: &amp;lt;http://purl.org/NET/scovo#&amp;gt; .
@prefix time: &amp;lt;http://www.w3.org/2006/time&amp;gt; .

# Classes #

&amp;lt;http://transport.data.gov.uk/def/traffic/Road&amp;gt;
  a rdfs:Class ;
  rdfs:label &quot;Road&quot;@en .

&amp;lt;http://transport.data.gov.uk/def/traffic/CountPoint&amp;gt;
  a rdfs:Class ;
  rdfs:label &quot;Traffic Count Point&quot;@en .

&amp;lt;http://transport.data.gov.uk/def/traffic/Count&amp;gt;
  a rdfs:Class ;
  rdfs:label &quot;Traffic Count&quot;@en .

&amp;lt;http://transport.data.gov.uk/def/traffic/RoadCategory&amp;gt;
  a rdfs:Class ;
  rdfs:subClassOf skos:Concept ;
  rdfs:label &quot;Road Category&quot;@en .    

&amp;lt;http://transport.data.gov.uk/def/traffic/CountHour&amp;gt;
  a rdfs:Class ;
  rdfs:subClassOf time:Interval ;
  rdfs:label &quot;Count Hour&quot;@en .

&amp;lt;http://transport.data.gov.uk/def/VehicleType&amp;gt;
  a rdfs:Class ;
  rdfs:subClassOf skos:Concept ;
  rdfs:label &quot;Vehicle Type&quot;@en .

# Properties #

&amp;lt;http://transport.data.gov.uk/def/traffic/road&amp;gt;
  a rdf:Property ;
  rdfs:label &quot;road name&quot;@en ;
  rdfs:range &amp;lt;http://transport.data.gov.uk/def/traffic/Road&amp;gt; .

&amp;lt;http://transport.data.gov.uk/def/traffic/countPoint&amp;gt;
  a rdf:Property ;
  rdfs:label &quot;traffic count point&quot;@en ;
  rdfs:range &amp;lt;http://transport.data.gov.uk/def/traffic/CountPoint&amp;gt; .

&amp;lt;http://transport.data.gov.uk/def/traffic/count&amp;gt;
  a rdf:Property ;
  rdfs:subPropertyOf scovo:dimension ;
  rdfs:label &quot;count&quot;@en ;
  rdfs:range &amp;lt;http://transport.data.gov.uk/def/traffic/Count&amp;gt; .

&amp;lt;http://transport.data.gov.uk/def/traffic/roadCategory&amp;gt;
  a rdf:Property ;
  rdfs:label &quot;road category&quot;@en ;
  rdfs:range &amp;lt;http://transport.data.gov.uk/def/traffic/RoadCategory&amp;gt; .

&amp;lt;http://transport.data.gov.uk/def/traffic/direction&amp;gt;
  a rdf:Property ;
  rdfs:label &quot;traffic direction&quot;@en .

&amp;lt;http://transport.data.gov.uk/def/traffic/countHour&amp;gt;
  a rdf:Property ;
  rdfs:label &quot;hour of count&quot;@en ;
  rdfs:domain &amp;lt;http://transport.data.gov.uk/def/traffic/Count&amp;gt; ;
  rdfs:range &amp;lt;http://transport.data.gov.uk/def/traffic/CountHour&amp;gt; .

&amp;lt;http://transport.data.gov.uk/def/vehicleType&amp;gt;
  a rdf:Property ;
  rdfs:subPropertyOf scovo:dimension ;
  rdfs:label &quot;vehicle type&quot;@en ;
  rdfs:range &amp;lt;http://transport.data.gov.uk/def/VehicleType&amp;gt; .

# Datatypes #

&amp;lt;http://transport.data.gov.uk/def/traffic/LAcode&amp;gt;
  a rdfs:Datatype ;
  rdfs:label &quot;Local Authority Code&quot;@en .

&amp;lt;http://transport.data.gov.uk/def/traffic/RoadNumber&amp;gt;
  a rdfs:Datatype ;
  rdfs:label &quot;Road Number&quot;@en .

&amp;lt;http://transport.data.gov.uk/def/traffic/CountPointNumber&amp;gt;
  a rdfs:Datatype ;
  rdfs:label &quot;Traffic Count Point Number&quot;@en .
&lt;/code&gt;&lt;/pre&gt;

&lt;hr /&gt;

&lt;h2&gt;RDF Data&lt;/h2&gt;

&lt;p&gt;Here&amp;#8217;s a sample set of data. It looks like rather a lot to simply describe the number of bicycles at a particular point on a road (and it doesn&amp;#8217;t even include the SKOS concept schemes that we did last time), but (a) it all provides valuable context for that measurement and (b) most of it will be reused by a lot of other measurements.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@prefix rdf: &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&quot;&amp;gt; .
@prefix owl: &amp;lt;http://www.w3.org/2002/07/owl#&amp;gt; .
@prefix skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt; .
@prefix time: &amp;lt;http://www.w3.org/2006/time&amp;gt; .
@prefix scovo: &amp;lt;http://purl.org/NET/scovo#&amp;gt; .
@prefix area: &amp;lt;http://statistics.data.gov.uk/def/administrative-geography/&amp;gt; .
@prefix admingeo: &amp;lt;http://data.ordnancesurvey.co.uk/ontology/admingeo/&amp;gt; .
@prefix space: &amp;lt;http://data.ordnancesurvey.co.uk/ontology/spatialrelations/&amp;gt; .
@prefix traffic: &amp;lt;http://transport.data.gov.uk/def/traffic/&amp;gt; .

&amp;lt;http://statistics.data.gov.uk/id/country?name=England&amp;gt;
  a area:Country ;
  rdfs:label &quot;England&quot;@en .

&amp;lt;http://statistics.data.gov.uk/id/government-office-region/K&amp;gt;
  a admingeo:GovernmentOfficeRegion ;
  rdfs:label &quot;South West&quot;@en ;
  skos:notation &quot;K&quot;^^area:StandardCode ;
  area:country &amp;lt;http://statistics.data.gov.uk/id/country?name=England&amp;gt; .

&amp;lt;http://statistics.data.gov.uk/id/local-authority-district/18&amp;gt;
  a area:LocalAuthorityDistrict ;
  skos:notation &quot;18&quot;^^area:StandardCode ;
  skos:notation &quot;1115&quot;^^traffic:LAcode ;
  area:localAuthority &amp;lt;http://statistics.data.gov.uk/id/local-authority/18&amp;gt; ;
  area:country &amp;lt;http://statistics.data.gov.uk/id/country?name=England&amp;gt; ;
  area:region &amp;lt;http://statistics.data.gov.uk/id/government-office-region/K&amp;gt; .

&amp;lt;http://transport.data.gov.uk/id/local-authority-district/1115&amp;gt;
  owl:sameAs &amp;lt;http://statistics.data.gov.uk/id/local-authority-district/18&amp;gt; .

&amp;lt;http://statistics.data.gov.uk/id/local-authority/18&amp;gt;
  a area:LocalAuthority ;
  rdfs:label &quot;Devon County Council&quot;@en ;
  skos:notation &quot;18&quot;^^area:StandardCode ;
  skos:notation &quot;1115&quot;^^traffic:LAcode ;
  area:coverage &amp;lt;http://statistics.data.gov.uk/id/local-authority-district/18&amp;gt; .

&amp;lt;http://transport.data.gov.uk/id/local-authority/1116&amp;gt;
  owl:sameAs &amp;lt;http://statistics.data.gov.uk/id/local-authority/18&amp;gt; .

&amp;lt;http://transport.data.gov.uk/id/road/B3178&amp;gt;
  a traffic:Road ;
  skos:notation &quot;B3178&quot;^^traffic:RoadNumber .

&amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13&amp;gt;
  a traffic:CountPoint ;
  rdfs:comment &quot;Salterton Road, EAST OF DINAN WAY, EXMOUTH&quot;@en ;
  skos:notation &quot;13&quot;^^traffic:CountPointNumber ;
  traffic:road &amp;lt;http://transport.data.gov.uk/id/road/B3178&amp;gt; ;
  traffic:roadName &quot;Salterton Road&quot;@en ;
  traffic:roadCategory 
    &amp;lt;http://transport.data.gov.uk/def/road-category/b&amp;gt; ,
    &amp;lt;http://transport.data.gov.uk/def/road-category/urban&amp;gt; ;
  space:easting 302600 ;
  space:northing 81984 ;
  area:localAuthority &amp;lt;http://statistics.data.gov.uk/id/local-authority/18&amp;gt; ;
  area:district &amp;lt;http://statistics.data.gov.uk/id/local-authority-district/18&amp;gt; .

&amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00&amp;gt;
  a traffic:Count ;
  traffic:countPoint &amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13&amp;gt; ;
  traffic:direction &amp;lt;http://dbpedia.org/resource/East&amp;gt; ;
  traffic:countHour &amp;lt;http://placetime.com/interval/gregorian/2001-10-08T17:00:00Z/PT1H&amp;gt; .

&amp;lt;http://dbpedia.org/resource/East&amp;gt;
  rdfs:label &quot;East&quot;@en .

&amp;lt;http://placetime.com/interval/gregorian/2001-10-08T17:00:00Z/PT1H&amp;gt;
  a traffic:CountHour ;
  rdfs:label &quot;8 Oct 2001, 17:00-18:00&quot;@en ;
  time:hasBeginning &amp;lt;http://placetime.com/instant/gregorian/2001-10-08T17:00:00Z&amp;gt; ;
  time:hasEnd &amp;lt;http://placetime.com/instant/gregorian/2001-10-08T18:00:00Z&amp;gt; ;
  time:hasDurationDescription _:OneHour ;
  time:intervalDuring &amp;lt;http://dbpedia.org/resource/2001&amp;gt; .

_:OneHour a time:DurationDescription ;
  rdfs:label &quot;one hour&quot;@en ;
  time:years 0 ;
  time:months 0 ;
  time:days 0 ;
  time:hours 1 ;
  time:minutes 0 ;
  time:seconds 0 .

&amp;lt;http://placetime.com/instant/gregorian/2001-10-08T17:00:00Z&amp;gt;
  a time:Instant ;
  rdfs:label &quot;8 Oct 2001, 17:00&quot;@en ;
  time:inXSDDateTime &quot;2001-10-08T17:00:00Z&quot;^^xsd:dateTime ;
  time:inDateTime [
    a time:DateTimeDescription ;
    time:unitType time:unitHour ;
    time:year &quot;2001&quot;^^xsd:gYear ;
    time:month &quot;--10&quot;^^xsd:gMonth ;
    time:day &quot;---08&quot;^^xsd:gDay ;
    time:hour 17 ;
  ] .

&amp;lt;http://placetime.com/instant/gregorian/2001-10-08T18:00:00Z&amp;gt;
  a time:Instant ;
  rdfs:label &quot;8 Oct 2001, 18:00&quot;@en ;
  time:inXSDDateTime &quot;2001-10-08T18:00:00Z&quot;^^xsd:dateTime ;
  time:inDateTime [
    a time:DateTimeDescription ;
    time:unitType time:unitHour ;
    time:year &quot;2001&quot;^^xsd:gYear ;
    time:month &quot;--10&quot;^^xsd:gMonth ;
    time:day &quot;---08&quot;^^xsd:gDay ;
    time:hour 18 ;
  ] .

&amp;lt;http://dbpedia.org/resource/2001&amp;gt;
  a time:Interval ;
  rdfs:label &quot;2001&quot; ;
  rdf:value &quot;2001&quot;^^xsd:gYear ;
  time:intervalEquals &amp;lt;http://placetime.com/interval/gregorian/2001-01-01T00:00:00Z/P1Y&amp;gt; .

&amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00/type/bicycle&amp;gt;
  a scovo:Item ;
  traffic:count &amp;lt;http://transport.data.gov.uk/id/traffic-count-point/13/direction/E/hour/2001-10-08T17:00:00&amp;gt; ;
  traffic:vehicleType &amp;lt;http://transport.data.gov.uk/def/vehicle/bicycle&amp;gt; ;
  rdf:value 2 .
&lt;/code&gt;&lt;/pre&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/138#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/46">linked data</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/31">rdf</category>
 <pubDate>Thu, 26 Nov 2009 10:35:32 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">138 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Creating Linked Data - Part III: Defining Concept Schemes</title>
 <link>http://www.jenitennison.com/blog/node/137</link>
 <description>&lt;p&gt;This is the third instalment in a series that I&amp;#8217;m writing about turning data into linked data. I&amp;#8217;m using traffic count data as the example, since that&amp;#8217;s a dataset that I&amp;#8217;m currently working on. In the last two instalments, I talked about &lt;a href=&quot;http://www.jenitennison.com/blog/node/135&quot;&gt;analysing and modelling the data&lt;/a&gt; and about &lt;a href=&quot;http://www.jenitennison.com/blog/node/136&quot;&gt;designing URIs&lt;/a&gt; for the &lt;em&gt;things&lt;/em&gt; in that model.&lt;/p&gt;

&lt;p&gt;Within the model, there are three sets of things that are &lt;strong&gt;concepts&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;road categories&lt;/li&gt;
&lt;li&gt;vehicle types&lt;/li&gt;
&lt;li&gt;cardinal directions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As I discussed last time, cardinal directions have URIs defined within DBPedia which are good enough for our purposes. The categorisation of roads and vehicles, on the other hand, is something specific to UK transport data, so they are up to us to define.&lt;/p&gt;

&lt;p&gt;There&amp;#8217;s a really useful RDF vocabulary called &lt;a href=&quot;http://www.w3.org/TR/skos-primer/&quot;&gt;SKOS&lt;/a&gt; which is designed precisely for defining the kind of concept schemes that we want to use here. SKOS provides classes for concepts, concept schemes and collections (groupings of concepts within a scheme), and properties for linking them and providing labels, codes, definitions and so forth. Many of the SKOS properties can be used outside concept schemes &amp;#8212; for example &lt;code&gt;skos:prefLabel&lt;/code&gt; can be used anywhere you want to indicate the preferred label for a thing &amp;#8212; so it&amp;#8217;s good to get to know them.&lt;/p&gt;

&lt;h2&gt;Vehicle Types&lt;/h2&gt;

&lt;p&gt;Before we dive into RDF, let&amp;#8217;s take some time to understand the classification that we need to model. We&amp;#8217;re modelling vehicle types because counts are made of each different type of vehicle passing a traffic count point over a particular hour. Within the CSV data, the relevant column headings are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Pedal cycles&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Two wheeled motor vehicles&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Cars and taxis&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Buses and coaches&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Light vans&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;HGVr2&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;HGVr3&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;HGVr4+&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;HGVa3/4&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;HGVa5&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;HGVa6&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;All HGV&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;All motor vehicles&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These classifications are detailed in the &lt;a href=&quot;http://www.dft.gov.uk/matrix/forms/definitions.aspx&quot;&gt;Department for Transport documentation of the dataset&lt;/a&gt;. It&amp;#8217;s clear that it&amp;#8217;s not a flat classification, but can be arranged into a hierarchy as follows:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;+- Pedal cycles
+- All motor vehicles
   +- Two wheeled motor vehicles
   +- Cars and taxis
   +- Buses and coaches
   +- Light vans
   +- All HGV
      +- Rigid HGV
      |  +- HGVr2
      |  +- HGVr3
      |  +- HGVr4+
      +- Articulated HGV
         +- HGVa3/4
         +- HGVa5
         +- HGVa6
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So all we have to do is define that in SKOS. We&amp;#8217;ve already decided that the URIs will look like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://transport.data.gov.uk/def/vehicle-category/{type}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;so for URI-hackability reasons we&amp;#8217;ll call the concept scheme:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://transport.data.gov.uk/def/vehicle-category/
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It&amp;#8217;s probably easiest to just show what the concept scheme looks like. This is in &lt;a href=&quot;http://www.w3.org/TeamSubmission/turtle/&quot;&gt;Turtle&lt;/a&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@prefix skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt; .
@base &amp;lt;http://transport.data.gov.uk/def/vehicle-category/&amp;gt; .

&amp;lt;&amp;gt; a skos:ConceptScheme ;
  skos:prefLabel &quot;Vehicle Types&quot;@en ;
  skos:hasTopConcept &amp;lt;bicycle&amp;gt; ;
  skos:hasTopConcept &amp;lt;motor-vehicle&amp;gt; .
...
&amp;lt;motor-vehicle&amp;gt; a skos:Concept ;
  skos:prefLabel &quot;Motor Vehicle&quot;@en ;
  skos:topConceptOf &amp;lt;&amp;gt; ;
  skos:narrower &amp;lt;motorbike&amp;gt; ;
  skos:narrower &amp;lt;car&amp;gt; ;
  skos:narrower &amp;lt;bus&amp;gt; ;
  skos:narrower &amp;lt;van&amp;gt; ;
  skos:narrower &amp;lt;HGV&amp;gt; .
...
&amp;lt;HGV&amp;gt; a skos:Concept ;
  skos:prefLabel &quot;Heavy Goods Vehicle&quot;@en ;
  skos:altLabel &quot;HGV&quot;@en ;
  skos:definition &quot;Goods vehicles over 3,500 kgs gross vehicle weight.&quot;@en ;
  skos:scopeNote &quot;Includes tractors (without trailers), road rollers, box vans and similar large vans. A two axle motor tractive unit without trailer is also included.&quot;@en ;
  skos:broader &amp;lt;motor-vehicle&amp;gt; ;
  skos:narrower &amp;lt;HGVr&amp;gt; ;
  skos:narrower &amp;lt;HGVa&amp;gt; ;
  skos:inScheme &amp;lt;&amp;gt; .
...
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The properties shown here are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;skos:prefLabel&lt;/code&gt; - the preferred label for something; there can only be one in any given language&lt;/li&gt;
&lt;li&gt;&lt;code&gt;skos:altLabel&lt;/code&gt; - an alternative label for the thing; there can be any number&lt;/li&gt;
&lt;li&gt;&lt;code&gt;skos:definition&lt;/code&gt; - provides a definition of the term&lt;/li&gt;
&lt;li&gt;&lt;code&gt;skos:scopeNote&lt;/code&gt; - provides information about the scope of the term (eg what&amp;#8217;s included or excluded)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;skos:broader&lt;/code&gt;/&lt;code&gt;skos:narrower&lt;/code&gt; - link together concepts into a hierarchy&lt;/li&gt;
&lt;li&gt;&lt;code&gt;skos:hasTopConcept&lt;/code&gt;/&lt;code&gt;skos:topConceptOf&lt;/code&gt; - links together the concept schemes and the concepts at the top of the concept hierarchy defined within the scheme&lt;/li&gt;
&lt;li&gt;&lt;code&gt;skos:inScheme&lt;/code&gt; - points from a concept the concept scheme it&amp;#8217;s defined in; it&amp;#8217;s necessary to use either this or &lt;code&gt;skos:topConceptOf&lt;/code&gt; on every &lt;code&gt;skos:Concept&lt;/code&gt; otherwise it&amp;#8217;s not clear which concept scheme they belong to&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note that in the RDF I&amp;#8217;ve assigned every string a language (English). That&amp;#8217;s good practice when values are textual; a Welsh translation could be provided for each one as well, for example.&lt;/p&gt;

&lt;h2&gt;Road Categories&lt;/h2&gt;

&lt;p&gt;Road categories are also described within the documentation for this dataset. The hierarchy is shown in the documentation as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;+- Major Roads
|  +- Motorways
|  |  +- Trunk
|  |  +- Principal
|  +- A Roads
|     +- Trunk
|     |  +- Urban
|     |  +- Rural
|     +- Principal
|        +- Urban
|        +- Rural
+- Minor Roads
   +- B Roads
   |  +- Urban
   |  +- Rural
   +- C Roads
   |  +- Urban
   |  +- Rural
   +- Unclassified Roads
      +- Urban
      +- Rural
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;But this is actually the result of three sets of overlapping concepts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;roads by classification (major/minor, motorway/A/B/C/unclassified)&lt;/li&gt;
&lt;li&gt;roads by locale (urban/rural)&lt;/li&gt;
&lt;li&gt;major roads by maintenance responsibility (trunk/principal)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These kinds of subdivisions of concepts can be managed in SKOS through &lt;code&gt;skos:Collection&lt;/code&gt;s, which group together concepts without being broader than those concepts. Here&amp;#8217;s a snippet from the concept scheme that shows how this works.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@prefix skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt; .
@base &amp;lt;http://transport.data.gov.uk/def/road-category/&amp;gt; .

&amp;lt;&amp;gt; a skos:ConceptScheme ;
  skos:prefLabel &quot;Road Categories&quot;@en ;
  skos:hasTopConcept &amp;lt;major&amp;gt; ;
  skos:hasTopConcept &amp;lt;minor&amp;gt; ;
  skos:hasTopConcept &amp;lt;urban&amp;gt; ;
  skos:hasTopConcept &amp;lt;rural&amp;gt; .

&amp;lt;classification&amp;gt; a skos:Collection ;
  skos:prefLabel &quot;Road by Classification&quot;@en ;
  skos:member &amp;lt;major&amp;gt; ;
  skos:member &amp;lt;minor&amp;gt; .

&amp;lt;maintenance&amp;gt; a skos:Collection ;
  skos:prefLabel &quot;Major Road by Maintenance Responsibility&quot;@en ;
  skos:member &amp;lt;principal&amp;gt; ;
  skos:member &amp;lt;trunk&amp;gt; .

&amp;lt;major&amp;gt; a skos:Concept ;
  skos:prefLabel &quot;Major Road&quot;@en ;
  skos:altLabel &quot;Major&quot;@en ;
  skos:scopeNote &quot;Include motorways and A roads. These roads usually have high traffic flows and are often the main arteries to major destinations.&quot;@en ;
  skos:narrower &amp;lt;motorway&amp;gt; ;
  skos:narrower &amp;lt;a&amp;gt; ;
  skos:narrower &amp;lt;principal&amp;gt; ;
  skos:narrower &amp;lt;trunk&amp;gt; ;
  skos:topConceptOf &amp;lt;&amp;gt; .

&amp;lt;motorway&amp;gt; a skos:Concept ;
  skos:prefLabel &quot;Motorway&quot;@en ;
  skos:broader &amp;lt;major&amp;gt; ;
  skos:scopeNote &quot;Major roads often used for long distance travel. They are usually three or more lanes in each direction and generally have the maximum speed limit of 70mph.&quot;@en ;
  skos:inScheme &amp;lt;&amp;gt; .
...
&amp;lt;trunk&amp;gt; a skos:Concept ;
  a skos:Concept ;
  skos:prefLabel &quot;Trunk Road&quot;@en ;
  skos:altLabel &quot;Trunk&quot;@en ;
  skos:scopeNote &quot;Most motorways and many of the long distance rural A roads are trunk roads.&quot;@en ;
  skos:note &quot;The responsibility for the maintenance of trunk roads lies with the Secretary of State and they are managed by the Highways Agency in England, the National Assembly of Wales in Wales and the Scottish Executive in Scotland (National Through Routes).&quot;@en ;
  skos:broader &amp;lt;major&amp;gt; ;
  skos:inScheme &amp;lt;&amp;gt; .
...
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In a hierarchy, these multiple overlapping concepts can be shown as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;+- &amp;lt;Road by Classification&amp;gt;
|  +- Major Road
|  |  +- &amp;lt;Major Road by Classification&amp;gt;
|  |  |  +- Motorway
|  |  |  +- A Road
|  |  +- &amp;lt;Major Road by Maintenance Responsibility&amp;gt;
|  |     +- Principal Road
|  |     +- Trunk Road
|  +- Minor Road
|     +- B Road
|     +- C Road
|     +- Unclassified Road
+- &amp;lt;Road by Locale&amp;gt;
   +- Urban Road
   +- Rural Road
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That&amp;#8217;s our concept schemes done. Next it will be time to turn to defining a vocabulary for the particular &lt;em&gt;things&lt;/em&gt; that we want to describe from this dataset.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/137#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/46">linked data</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/31">rdf</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/53">skos</category>
 <pubDate>Sun, 22 Nov 2009 21:04:41 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">137 at http://www.jenitennison.com/blog</guid>
</item>
</channel>
</rss>

