<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Jeni's Musings</title>
  <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog"/>
  <link rel="self" type="application/atom+xml" href="http://www.jenitennison.com/blog/atom/feed"/>
  <id>http://www.jenitennison.com/blog/atom/feed</id>
  <updated>2012-05-19T11:56:50+01:00</updated>
  <entry>
    <title>A Fair Web</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/173" />
    <id>http://www.jenitennison.com/blog/node/173</id>
    <published>2013-02-03T23:38:00+00:00</published>
    <updated>2013-02-24T13:10:00+00:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="web" />
    <summary type="html"><![CDATA[<p>The new membership of the W3C&#8217;s Technical Architecture Group (TAG), and some of the recent <a href="http://lists.w3.org/Archives/Public/www-tag/2013Jan/">discussions on the TAG list about polyglot markup</a>, have made me think about what the TAG should stand for and the role the TAG should play.</p>

<p>Fundamentally, the web is for everyone, whatever gender, whatever race, whatever sexual orientation, whatever visual or mental ability and so on. The web community should fight to keep the web open to all. And it should try to be a community that is open to all.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>The new membership of the W3C&#8217;s Technical Architecture Group (TAG), and some of the recent <a href="http://lists.w3.org/Archives/Public/www-tag/2013Jan/">discussions on the TAG list about polyglot markup</a>, have made me think about what the TAG should stand for and the role the TAG should play.</p>

<p>Fundamentally, the web is for everyone, whatever gender, whatever race, whatever sexual orientation, whatever visual or mental ability and so on. The web community should fight to keep the web open to all. And it should try to be a community that is open to all.</p>

<p>With <a href="http://www.bbc.co.uk/news/uk-18407568">same-sex marriages shortly being voted on by the UK Parliament</a>, I have been struck, reading the recent threads about polyglot, how similar the arguments against polyglot seem to those used against homosexuality:</p>

<ul>
<li>claiming that there is no use for polyglot, in the face of those who say they have use for it, is similar to denying that homosexuality exists, in the face of people saying that they are homosexual</li>
<li>stating that you can see no use for polyglot and therefore no one else should use it is similar to saying that since you are heterosexual, everyone else must be too</li>
<li>claiming that creating a Recommendation that describes polyglot will make people use it is similar to saying that talking about homosexuality will make people gay</li>
<li>saying that you don&#8217;t want to implement polyglot in a validator or editor is similar to being a priest who declines to marry gay people</li>
</ul>

<p>By this comparison, those who argue that polyglot must be the only output anyone generates also has an analogy: someone arguing that all churches must marry only gay people.</p>

<p>I want the web community to be a fair and good society. To me the question about whether there should be a polyglot Recommendation is just the latest example of a need to ensure that our community is equitable.</p>

<p>The web is for everyone, whatever technology stack they use. The reason we have standards is to enable people to make their own choices about what they do in the privacy of their own servers. We don&#8217;t have to use the same libraries, so long as we implement the same standards. It doesn&#8217;t matter if you programme with Ruby or PHP or C# or Scala or XQuery, you can build a web application because we have the standards of HTTP, HTML, CSS, Javascript and so on.</p>

<p>Just as in wider society, we need to find compromises that balance the needs and desires of different constituencies. We need to balance the rights that everyone has to code as they wish against the rights that everyone has to have a web that works. We need to make sure that the quiet voices are heard, and support the equal rights of those who tread the less worn paths.</p>

<p>When there are conflicts between technologies, developers necessarily think of them in terms of which <em>they</em> would use, given their experience, expertise, environment and so on. I think the TAG needs to judge technologies in a different way. We have to consider the extent to which standardising their use disrupts the fabric of the web and prevents others from operating as they wish to. And, because we want a web that is fair and open and free, if there is no or minimal risk to the fabric of the web, and it does not overly constrain how others act, I believe we should err on the side of supporting diversity.</p>

<p>Making and expressing these judgements is just one of the things that I hope the newly formed TAG will manage to do better.</p>
    ]]></content>
  </entry>
  <entry>
    <title>Open Data Business Models</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/172" />
    <id>http://www.jenitennison.com/blog/node/172</id>
    <published>2012-08-20T21:31:30+01:00</published>
    <updated>2013-02-03T23:41:22+00:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="opendata" />
    <summary type="html"><![CDATA[<p>As you may have seen, I have been appointed to the <a href="http://www.cabinetoffice.gov.uk/content/open-data-user-group">Cabinet Office&#8217;s Open Data User Group</a> which had its <a href="http://www.cabinetoffice.gov.uk/news/open-data-user-group-help-unleash-potential-open-data">first meeting in July</a>. The <a href="http://data.gov.uk/search/apachesolr_search?filters=tid%3A12001">minutes and slides are available</a>.</p>

<p>The purpose of the group is to act as an &#8220;intelligent customer&#8221; to the government on the release of open data. This is a bit of a misnomer, as the word &#8220;customer&#8221; implies that the group will in some way <em>buy</em> data that should be made open, which it&#8217;s unlikely to do. Perhaps &#8220;intelligent consumer&#8221; would be more appropriate: our task is to advise the government about which data should be opened up, and (if the commitment has already been made to open it) which should be opened first or how access to it could be improved.</p>

<p>One of the tasks that we face, particularly for datasets that are currently being sold by government (mostly from the <a href="http://www.bis.gov.uk/policies/shareholderexecutive/structure/portfolio-unit/public-data-group">Public Data Group</a>: Met Office, Ordnance Survey, Land Registry and Companies House), is making a strong economic argument for opening up data. To do that, it&#8217;s useful to understand two things:</p>

<ul>
<li>the ways in which open data can be used in the wider economy, to aid innovation, growth and thereby lift the country out of the economic doldrums</li>
<li>the business models that are being used by open data publishers to support open data releases, to illustrate the benefits that they can bring to publishers themselves</li>
</ul>
    ]]></summary>
    <content type="html"><![CDATA[<p>As you may have seen, I have been appointed to the <a href="http://www.cabinetoffice.gov.uk/content/open-data-user-group">Cabinet Office&#8217;s Open Data User Group</a> which had its <a href="http://www.cabinetoffice.gov.uk/news/open-data-user-group-help-unleash-potential-open-data">first meeting in July</a>. The <a href="http://data.gov.uk/search/apachesolr_search?filters=tid%3A12001">minutes and slides are available</a>.</p>

<p>The purpose of the group is to act as an &#8220;intelligent customer&#8221; to the government on the release of open data. This is a bit of a misnomer, as the word &#8220;customer&#8221; implies that the group will in some way <em>buy</em> data that should be made open, which it&#8217;s unlikely to do. Perhaps &#8220;intelligent consumer&#8221; would be more appropriate: our task is to advise the government about which data should be opened up, and (if the commitment has already been made to open it) which should be opened first or how access to it could be improved.</p>

<p>One of the tasks that we face, particularly for datasets that are currently being sold by government (mostly from the <a href="http://www.bis.gov.uk/policies/shareholderexecutive/structure/portfolio-unit/public-data-group">Public Data Group</a>: Met Office, Ordnance Survey, Land Registry and Companies House), is making a strong economic argument for opening up data. To do that, it&#8217;s useful to understand two things:</p>

<ul>
<li>the ways in which open data can be used in the wider economy, to aid innovation, growth and thereby lift the country out of the economic doldrums</li>
<li>the business models that are being used by open data publishers to support open data releases, to illustrate the benefits that they can bring to publishers themselves</li>
</ul>

<!--break-->

<p>I find business cases for data publishers much more compelling than examples of how open data can be used. For a start, I don&#8217;t think it&#8217;s possible to predict how open data will be used or what that will mean in terms of economic or societal impact: the wide world into which it&#8217;s released is just too complex to know. Sure, there&#8217;s the development of &#8220;Apps&#8221;, but there are many more hidden uses for open data within businesses, which data publishers and those external to the process are unlikely to ever be aware of. There are also impacts on vendors of support products like analysis and visualisation software, and on society as a whole when you have better informed people and organisations.</p>

<p>As well as wider impact being hard to measure, I don&#8217;t think anyone is likely to publish open data <em>well</em> if they don&#8217;t have some motivation that is a lot nearer to home than &#8220;helping grow the economy&#8221;. To be useful for reusers, open data needs to be structured, supported, timely, accurate and so on. To be useful for business, data releases need to be reliable and sustained over time: data can go out of date very rapidly and single releases are rarely interesting. So open data publishers need to have business models that enable their data-publishing activities to be self-sustaining and preferably improving.</p>

<p>Below I&#8217;ve described several such models, many of which are based on the business models that currently exist around open source. I&#8217;d be really interested to hear of any other business models that you know of for open data, and in particular to hear about examples where any of these models have been employed successfully.</p>

<h2>Cost Avoidance</h2>

<p>One argument I&#8217;ve heard made about government open data is that releasing it can help organisations avoid the costs of Freedom of Information requests. This probably applies only to data that is likely to be requested (perhaps it is published annually and was requested last year) or has a very low publishing cost. Organisations that have a high FOI spend with lots of successful requests may find that they can lower that FOI spend by proactively releasing data (and making it easy to find).</p>

<p>There are other, less direct, costs that organisations might avoid by releasing data. Most obviously in the public sector, there is political cost in not toeing the <a href="http://www.cabinetoffice.gov.uk/resource-library/open-data-white-paper-unleashing-potential">Open Data line</a>.</p>

<h2>Sponsorship</h2>

<p>The reverse of cost avoidance is finding sponsors for open data publication. If there are people who strongly believe that a particular dataset should be open and available to all, they may be prepared to sponsor its publication (which isn&#8217;t the same as licensing it; the consequence is that the data is open for all, not just for those who pay). As I understand it, some data that has been opened up by government, such as that from Ordnance Survey, has essentially been opened up through the sponsorship of the Treasury: they have paid for the data to be made open, for their own reasons (to do with belief that it will enable the economy to grow).</p>

<p>How could you persuade others to sponsor opening up data? If it&#8217;s something that they would otherwise license, perhaps they are in a better place to face any disruption that will come from the data being freely available than their competitors. Perhaps, if it&#8217;s the type of dataset that is hard to close up again after it has been made open, they might gamble that it would lower their long term costs. Perhaps they sell analysis or visualisation products that they know those who use the data will find useful, and so getting the data available widely will aid their business.</p>

<h2>Freemium</h2>

<p>The freemium model has been used with some success for web-based services; it might also work for open data. Under this model, an organisation would publish open data in a basic form &#8212; perhaps with some limitations on formats and throttling of API calls &#8212; and offer advanced access to those who are willing to pay.</p>

<p>There are many ways in which open data can be made more useful than static publication of spreadsheets or a basic API; under a freemium model some of these enhancements would only be offered to those who pay for it:</p>

<ul>
<li>availability of different machine-readable formats</li>
<li>unconstrained numbers of API calls</li>
<li>more sophisticated querying</li>
<li>access to data dumps rather than through an API (or vice versa)</li>
<li>provision of feeds of changes to the data</li>
<li>enhancement of the data with additional information</li>
<li>early access to data</li>
<li>provision of data on DVDs or hard disks rather than over the net</li>
</ul>

<h2>Dual Licensing</h2>

<p>Data publishers could provide data under an open license for certain purposes, and under a closed license for others. This technique has worked for some open source products. The &#8216;certain purposes&#8217; might not be simply &#8216;non-commercial&#8217;: publishers could still encourage start-up use of the data by charging based on the size or revenue of the organisation. Or the license could state that the data can be used in products but cannot be used in further &#8220;added value&#8221; data feeds without being licensed (this is roughly equivalent to dual-licensing with a share-alike license). This is the model used by <a href="http://opencorporates.com/info/licence">OpenCorporates</a>.</p>

<h2>Support and Services</h2>

<p>Offering support and services is a business model which seems to work well for companies built around open source. In the open data world, data publishers could offer paid packages with:</p>

<ul>
<li>guarantees on data availability</li>
<li>prioritisation on bug fixes (both in data and its provision) for paying customers</li>
<li>timely help for customers using the data</li>
<li>services around data visualisation, analysis and mashing with other data</li>
</ul>

<p>These kinds of services still tend to be coupled with licenses in the data world, whereas in open source they have been successfully disentangled.</p>

<h2>Charging for Changes</h2>

<p>In some cases, individuals or organisations are obliged to provide information to public bodies (and they have a statutory duty to collect it), so that it is available within government and more generally in society. These public bodies can (and sometimes do) charge the providers of that information &#8220;administration costs&#8221;. Examples of this are Companies House information, the Gazettes, Land Registrations, VAT Registrations and so on.</p>

<p>In these cases, those who supply the information to the register are bound to by law, so it would be possible to charge them whatever it took to support providing the data as open data. Indeed, supplying the data as open data is likely to increase its usage (both within government and more widely), and therefore the political pressure to retain the registry and thereby maintain its longevity.</p>

<h2>Increasing Quality through Participation</h2>

<p>The <a href="http://www.nationalarchives.gov.uk/news/732.htm">model that we are using at legislation.gov.uk</a> is based on increasing the quality of the data that we have to publish &#8212; bringing the statute book up to date &#8212; by enlisting the help of other parties who would benefit from having an up-to-date open statute book. Because otherwise this information is very costly to get hold of, there are any number of potential contributors, including publishers, lawyers, academics, and government itself.</p>

<p>This model doesn&#8217;t entirely cover the costs of opening up data: contributors aren&#8217;t generally paying money to be involved, but donating effort to maintaining the published data. Thus this business model doesn&#8217;t completely cover costs, but it&#8217;s a very useful one for organisations that have an obligation to publish information but lack the resources to do it well.</p>

<h2>Supporting Primary Business</h2>

<p>The final business model that I have seen being used is where releasing open data naturally supports the primary business goal of the organisation. The best example of this is around the Barclays Cycle Hire in London (or Boris Bikes as we call them for some reason), where releasing open data about the bikes drives the development of Apps that make it easier for potential customers to use the scheme, thus bringing in revenue to the core business.</p>

<p>Another example is the recent release of data about <a href="http://www.guardian.co.uk/football/blog/2012/aug/16/manchester-city-player-statistics">Manchester City football players</a> which, they hope, will lead people to create better ways of measuring player performance, which they will then be able to take advantage of. (And if it means that they are being talked about in the blogosphere, so much the better.)</p>

<p>I&#8217;d also place under this category situations where the organisation that publishes the data ends up improving its own use of its data by using the third-party tools that are created because the data is open and available. The kind of thing that I&#8217;m thinking about here is how MPs (reportedly) use <a href="http://theyworkforyou.com">TheyWorkForYou.com</a>. There&#8217;s great opportunity, I think, for the public sector to create a market place for tools that enable it to work more efficiently, by opening up its data.</p>

<p>In cases where organisations are releasing data to support their primary business, they may even find it worthwhile backing up such releases with hackdays and competitions and so on in order to drive the initial creation of some products.</p>

<h2>Discussion</h2>

<p>I have listed here business models for open data that I have either seen being used or think could viably be used by organisations, particularly public sector organisations. There may be others business models, or examples, that I don&#8217;t know about, and as I said at the start, I&#8217;d really value your suggestions for more.</p>

<p>One thing that I wanted to touch on was about motivations for data publishers. Although, as I&#8217;ve said, I think it&#8217;s going to be very hard to measure or predict the impact of particular datasets in terms of how it is used in the wider world, it&#8217;s fairly obvious that high quality data, supplied in a timely and consistent fashion, is going to be easier to use and more accurate than low quality data, supplied as and when, using different formats and coding schemes within each release. In other words, it seems likely that data that is published well will lead to greater usage and thus the better economic outcomes on which much of the open data argument is based.</p>

<p>The different business models above provide different incentives for data publishers. In fact, only the last two include any incentive to publish data <em>well</em>: when publishing data to support your primary business, the whole point is to make it easily reusable; when you are supporting data publication by eliciting contributions from others, they are more likely to contribute if the released data is useful for them.</p>

<p>In other cases, the incentive for the data publisher is towards doing the least amount of work so that they can retain as much money as possible, or sometimes (as in the Freemium or Support/Service models) to make the data hard to use unless you are a paying customer. Of course that doesn&#8217;t mean that organisations using these models will deliberately restrict the utility of the data that they publish &#8212; public sector employees tend to be more motivated towards doing the right thing than making a profit &#8212; but they will have a lot less incentive to invest in making the data easy to use than those employing either of the latter two models.</p>

<p>Another aspect of the business models above that&#8217;s worth thinking about is about knowing who is using your data. Many data releases have been done in a &#8220;fire and forget&#8221; mode, using either cost avoidance or sponsorship as a model, which has the advantage of being a cheap method of releasing data. But having a rudimentary idea of who is using data and what they are using it for helps you to understand where there are gaps in your provision, be it in formats, coding schemes, timeliness and so on. Many of the other business models on the list above (as well as the &#8216;selling licenses&#8217; business model of closed data) put you in direct contact with at least some of those using the data, which helps you improve provision in the right direction. In particular, getting people to participate to help improve data quality is <a href="http://www.opendataimpacts.net/engagement/">good open data engagement</a> and something that I think most data releases should aim for.</p>

<p>Which brings me back to ODUG. The best indications that we have about what data will prove useful to people are the data that people are currently using and the data that they tell us they want. If you&#8217;re a potential or actual reuser of public sector information, ODUG will shortly be providing routes to tell us about what you&#8217;d like to have access to, but in the meantime do <a href="mailto:jeni@jenitennison.com?subject=ODUG">get in touch with me</a> if you have any requests.</p>
    ]]></content>
  </entry>
  <entry>
    <title>RDF Chimera</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/171" />
    <id>http://www.jenitennison.com/blog/node/171</id>
    <published>2012-06-30T21:42:44+01:00</published>
    <updated>2012-08-20T21:25:52+01:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="xml" />
    <category term="json" />
    <category term="rdf" />
    <summary type="html"><![CDATA[<p>During <a href="http://www.slideshare.net/JeniT/collisions-chimera-and-consonance-in-web-content">my keynote at XML Prague</a> (the <a href="http://www.youtube.com/watch?v=0K_CAiVyqTQ">video</a> might make more sense than the slides on their own; there are notes on the slides but Slideshare doesn&#8217;t do well with Keynote), I talked about how the advantages of using chimeras created from two formats with different underlying models are seldom outweighed by the disadvantages. RDF/XML gets knocked so frequently it&#8217;s not even much fun to do it any more, but I&#8217;ve <a href="http://www.jenitennison.com/blog/node/149">applied the same arguments</a> to <a href="http://json-ld.org/">JSON-LD</a> in the past. My argument was that RDF, XML, JSON and HTML should each be used individually for their strengths rather than trying to find a middle ground that rarely satisfies anyone.</p>

<p><a href="http://blog.ldodds.com/2012/06/12/principled-use-of-rdfxml/">Leigh Dodds&#8217; post on principled use of RDF/XML</a> makes the point that RDF/XML can be useful when it is used in a regular, principled way. And in fact, I am using RDF/XML extensively in my work on <a href="http://www.nationalarchives.gov.uk/documents/expert_participation_press_release.pdf">Expert Participation</a> for <a href="http://digital.cabinetoffice.gov.uk/2012/03/30/putting-apis-first-legislation-gov-uk/">legislation.gov.uk</a>, though slightly differently from how Leigh describes. What I want to explore in this post is when and how it makes sense to use RDF/XML and how that might translate into usage of JSON versions of RDF. The key point I want to make is that RDF chimera are <em>roads</em>, not <em>destinations</em>, and when you&#8217;re choosing a road you have to think about the destination you&#8217;re aiming for.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>During <a href="http://www.slideshare.net/JeniT/collisions-chimera-and-consonance-in-web-content">my keynote at XML Prague</a> (the <a href="http://www.youtube.com/watch?v=0K_CAiVyqTQ">video</a> might make more sense than the slides on their own; there are notes on the slides but Slideshare doesn&#8217;t do well with Keynote), I talked about how the advantages of using chimeras created from two formats with different underlying models are seldom outweighed by the disadvantages. RDF/XML gets knocked so frequently it&#8217;s not even much fun to do it any more, but I&#8217;ve <a href="http://www.jenitennison.com/blog/node/149">applied the same arguments</a> to <a href="http://json-ld.org/">JSON-LD</a> in the past. My argument was that RDF, XML, JSON and HTML should each be used individually for their strengths rather than trying to find a middle ground that rarely satisfies anyone.</p>

<p><a href="http://blog.ldodds.com/2012/06/12/principled-use-of-rdfxml/">Leigh Dodds&#8217; post on principled use of RDF/XML</a> makes the point that RDF/XML can be useful when it is used in a regular, principled way. And in fact, I am using RDF/XML extensively in my work on <a href="http://www.nationalarchives.gov.uk/documents/expert_participation_press_release.pdf">Expert Participation</a> for <a href="http://digital.cabinetoffice.gov.uk/2012/03/30/putting-apis-first-legislation-gov-uk/">legislation.gov.uk</a>, though slightly differently from how Leigh describes. What I want to explore in this post is when and how it makes sense to use RDF/XML and how that might translate into usage of JSON versions of RDF. The key point I want to make is that RDF chimera are <em>roads</em>, not <em>destinations</em>, and when you&#8217;re choosing a road you have to think about the destination you&#8217;re aiming for.</p>

<!--break-->

<h2>My Destination</h2>

<p>Within legislation.gov.uk, our primary content is stored as XML (because it&#8217;s documents) and our primary toolchain is therefore based on XML: we use XSLT, XSL-FO and <a href="http://wiki.orbeon.com/forms/doc/developer-guide/xml-pipeline-language-xpl">Orbeon pipelines</a> (for historic reasons; today I&#8217;d try to find something similar that supported <a href="http://www.w3.org/TR/xproc/">XProc</a>) to create the various views of legislation we have on the site.</p>

<p>We store RDF content in a triplestore, and query it with SPARQL, because these are technologies designed for easy and efficient storage and querying of RDF, but when it comes to integrating that data into HTML pages, it makes sense to reuse the same core pipelines and processing tooling as we have been using thus far. If we&#8217;d been using Ruby on Rails we&#8217;d have wanted to use Ruby, if Django then Python, if Node.js then Javascript &#8212; the point is that once we&#8217;ve extracted the data we needed, we needed to get it into a format that we could easily process with the tooling we are using.</p>

<h2>Using SPARQL Results</h2>

<p>One of the nice things about SPARQL is that you can use it over HTTP and get the results of a standard <a href="http://www.w3.org/TR/sparql11-query/#select"><code>SELECT</code></a> SPARQL query in <a href="http://www.w3.org/TR/rdf-sparql-XMLres/">XML</a>, <a href="http://www.w3.org/TR/sparql11-results-json/">JSON</a> or <a href="http://www.w3.org/TR/sparql11-results-csv-tsv/">CSV/TSV</a>. Making HTTP requests is easy across programming languages, and every environment is going to be able to handle at least one of the three possible output formats. So for us, requesting an XML result to a <code>SELECT</code> query fits right into our processing pipeline. Well, to be honest, writing paths like </p>

<pre><code>sparql:result[sparql:binding[@name = 'process']/sparql:uri = $update]/sparql:binding[@name = 'count']/sparql:literal
</code></pre>

<p>is a little irritating, but if I cared enough it would be easy to transform the SPARQL results XML format into something I could more directly query through XPath.</p>

<p>What I&#8217;ve found, however, is that it is usually a lot more flexible and often more efficient to use a <code>CONSTRUCT</code> or <code>DESCRIBE</code> SPARQL query to extract a subgraph from the store to process. At that point, you need a format that can represent an RDF graph, and you need your processing chain to be able to process an RDF graph as an RDF graph. I&#8217;ll come on to other languages in a second, but sticking with XML: if your processing toolchain is XML-based, you need it in XML.</p>

<p>There are actually two choices here: RDF/XML or <a href="http://www.w3.org/2004/03/trix/">TriX</a>. TriX is fine, but it suffers similarly to the SPARQL XML results format when you try to use it with XPath: you have to do things like</p>

<pre><code>$task[trix:uri[2] = $task:assignedTo]/trix:uri[3]
</code></pre>

<p>where <code>$task</code> is a sequence of <code>trix:triple</code> elements and <code>$task:assignedTo</code> is the URI of a property that I&#8217;m interested in, to work out who is assigned to a particular task in the system I&#8217;m building. Of course that gnarliness could be hidden behind a function call, but it means using functions <em>everywhere</em>. Using RDF/XML on the other hand can be somewhat easier and use natural XML paths <strong>but only if it is normalised</strong>.</p>

<h2>Normalised RDF/XML for XML/XSLT Processing</h2>

<p>The biggest problem with RDF/XML for use with XML tooling is that as a format it is too damned flexible. There are multiple ways of representing everything: resource types can be indicated by the name of the resource element or through an explicit <code>rdf:type</code> child element; literal property values can be held as attributes or elements; resource property values can be referenced or nested. All these different options for representing the same information makes the format a complete sod to work with.</p>

<p>This is the thrust of Leigh&#8217;s post. The question is then: how do you normalise RDF/XML into a regular format that you can process? In Leigh&#8217;s post, he advises <a href="http://blog.ldodds.com/2012/06/12/principled-use-of-rdfxml/">&#8220;Use all of the [RDF/XML] shortcuts&#8221;</a>, which he describes as basically:</p>

<ul>
<li>using a single root resource element rather than a flat structure with a <code>rdf:RDF</code> wrapper</li>
<li>using the element name of a resource element to indicate the type of the element</li>
<li>using a single resource element to represent each resource</li>
</ul>

<p>(I note these aren&#8217;t <em>all</em> the RDF/XML shortcuts: you can use attributes for literal properties for example. It looks like Leigh instead chooses the regularity of having all properties be represented as elements in the XML.)</p>

<p>This normalisation routine gives you XML that is very close to &#8220;natural&#8221; XML: it will only have a few <code>rdf:about</code> and <code>rdf:resource</code> attributes here and there to give away that it can be processed as RDF/XML. However, if you are starting with RDF, it only gives you a nice, non-repetitive, regular structure if:</p>

<ul>
<li>it&#8217;s easy to identify a root resource within which everything is nested within a given graph</li>
<li>resources only have one type</li>
<li>you don&#8217;t get resources used in more than one place in the graph</li>
</ul>

<p>For the legislation.gov.uk Expert Participation work, I needed to have a normalisation routine that would work across graphs which included resources with multiple types and that were highly repetitive. I also needed a normalisation routine that I could use without it having to guess what the appropriate root node would be, or me having to feed in that decision or artificially create a root node for each graph (which would have just been extra work). The normalisation that I&#8217;ve found that works for me is instead:</p>

<ul>
<li>always use a <code>rdf:RDF</code> wrapper</li>
<li>all resource elements are directly within the <code>rdf:RDF</code> wrapper, including blank nodes</li>
<li>all resource elements are represented by a <code>rdf:Description</code> element</li>
<li>there&#8217;s one resource element per resource</li>
<li>all properties are represented by elements</li>
</ul>

<p>There are disadvantages to this algorithm: if you want to find all the <code>leg:Legislation</code> items in the RDF, for example, then you need to do something like <code>//rdf:Description[rdf:type/@rdf:resource = $leg:Legislation]</code> with an appropriately set <code>$leg:Legislation</code> variable, whereas under Leigh&#8217;s scheme you would do <code>//leg:Legislation</code>. The thing is that in the data that I&#8217;m dealing with, <code>leg:Legislation</code> resources might also be <code>leg:UnitedKingdomPublicGeneralAct</code>s or <code>leg:CommencementOrder</code>s: I can&#8217;t know at normalisation time which of the various types of a given resource is the one that I&#8217;ll want to easily be able to query over, and the cost of changing my mind later on would be quite high.</p>

<p>Similarly, not having nesting means that I can&#8217;t write simple paths like <code>$task/task:assignedTo/sioc:User/sioc:name</code> which I would have been able to do under Leigh&#8217;s suggested normalisation. What I do instead is define a couple of keys that index the descriptions by their <code>rdf:about</code> or <code>rdf:nodeID</code> attributes</p>

<pre><code>&lt;xsl:key name="descriptions" match="rdf:Description" use="@rdf:about" /&gt;
&lt;xsl:key name="nodeID" match="rdf:Description" use="@rdf:nodeID" /&gt;
</code></pre>

<p>and a function that makes it easy to traverse through properties:</p>

<pre><code>&lt;xsl:function name="rdf:get" as="element()*"&gt;
  &lt;xsl:param name="descriptions" as="element()*" /&gt;
  &lt;xsl:param name="propertyChain" as="xs:QName+" /&gt;
  &lt;xsl:variable name="properties" as="element()*" select="$descriptions/*[node-name(.) = $propertyChain[1]]" /&gt;
  &lt;xsl:variable name="values" as="element()*"&gt;
    &lt;xsl:for-each select="$properties"&gt;
      &lt;xsl:choose&gt;
        &lt;xsl:when test="@rdf:resource"&gt;
          &lt;xsl:sequence select="key('descriptions', @rdf:resource, root())" /&gt;
        &lt;/xsl:when&gt;
        &lt;xsl:when test="@rdf:nodeID"&gt;
          &lt;xsl:sequence select="key('nodeID', @rdf:nodeID, root())" /&gt;
        &lt;/xsl:when&gt;
        &lt;xsl:when test="*[@rdf:about or @rdf:nodeID]"&gt;
          &lt;xsl:sequence select="*" /&gt;
        &lt;/xsl:when&gt;
        &lt;xsl:otherwise&gt;
          &lt;xsl:sequence select="." /&gt;
        &lt;/xsl:otherwise&gt;
      &lt;/xsl:choose&gt;
    &lt;/xsl:for-each&gt;
  &lt;/xsl:variable&gt;
  &lt;xsl:variable name="values" as="element()*" select="$values | $values" /&gt;
  &lt;xsl:choose&gt;
    &lt;xsl:when test="exists($values) and count($propertyChain) &gt; 1"&gt;
      &lt;xsl:sequence select="rdf:get($values, subsequence($propertyChain, 2))" /&gt;
    &lt;/xsl:when&gt;
    &lt;xsl:otherwise&gt;
      &lt;xsl:sequence select="$values" /&gt;
    &lt;/xsl:otherwise&gt;
  &lt;/xsl:choose&gt;
&lt;/xsl:function&gt;
</code></pre>

<p>The normalisation I describe coupled with this bit of utility coding means that I can just use a simple path to get to the value of a (literal) property like this: <code>$task/task:assignedAt</code> and the <code>rdf:get()</code> function when I want to navigate to another resource: <code>$task/rdf:get(., xs:QName('task:assignedTo'))/sioc:name</code>. In XSLT 3.0 I think I&#8217;d be able to define some funky functional functions and write things like <code>$task($task:assignedTo)($sioc:name)</code>, but we&#8217;re not quite there yet.</p>

<h2>Another Destination</h2>

<p>What I wanted to bring out is that for all its failings RDF/XML is a useful format, but to process RDF/XML using XML tooling, you need to normalise it into a regular structure that retains the semantics of the underlying RDF graph. I thought it was interesting that Leigh suggested a different normalisation algorithm from the one I use: working through it, it&#8217;s clear there are different advantages and disadvantages of the different algorithms, both in terms of how easy they are to apply to given data and how easy the results are to process.</p>

<p>But the other thing I wonder is whether Leigh&#8217;s destination was different from mine. The goal of producing a natural-looking XML format, and one that you would want to validate using RELAX-NG, implies that Leigh was going along the RDF/XML road in another direction: basically scattering some RDF attributes around an XML format so that it could be interpreted by RDF tools. I&#8217;m sure that this was the reason RDF/XML was made so flexible to begin with: if you can take some naturalistic XML and just add attributes to turn it into RDF then (the argument goes) you get the best of both worlds.</p>

<p>But this is what I&#8217;m really sceptical about. XML formats tend not to fit into the &#8220;striped syntax&#8221; of resource/property/resource/property required by RDF/XML: they miss out resource elements or they group them in wrappers, or they put references in attributes. To get an XML format to be compliant RDF/XML you really have to design it that way from the start, and this goes to the root of my objection to chimera: when you do that, you stop using XML in the way it&#8217;s naturally used, and you lose (some of) its advantages. It&#8217;s like putting yourself in a straitjacket. It seems a lot wiser to me to let XML be XML, and run it through a transformation to create an RDF format (TriX or RDF/XML are both easy targets here) when you want to extract the data within it as RDF.</p>

<h2>JSON and RDF</h2>

<p>And so I turn again to JSON and RDF, because the same arguments apply. If my destination were using RDF data in a non-XML-based toolchain, I think that I would have a very similar experience to the one described above. Getting the results of a <code>SELECT</code> SPARQL query as JSON would be great, but eventually I&#8217;d need to handle an RDF graph as an RDF graph and I&#8217;d want to be able to manipulate it in my programming language of choice using the idioms of that programming language.</p>

<p>There are RDF processing libraries for most programming languages, of course, that make it easy to load RDF graphs into a data structure that you can then query and process &#8220;naturally&#8221;. If you&#8217;re using one of these libraries, then the format of the data that you get back from a SPARQL query doesn&#8217;t really matter, so long as it can be loaded into that data structure through the library.</p>

<p>Does a JSON format for RDF help here? Well, if there isn&#8217;t a library for a given programming language, or you don&#8217;t like the API that any of them give you, then a JSON format for RDF is a format that you will be able to load (because every programming language supports JSON) and manipulate.</p>

<p>If you&#8217;re in this world, you need a regular structure that enables you do that processing in a regular way. You know the format that you&#8217;re dealing with, so putting URIs into variables isn&#8217;t an issue (there&#8217;s no need for shorthands in the syntax itself). You want a flat and predictable structure which you can query into easily to follow links to information about other resources in the graph. You can of course use JSON-LD in this way. It might look something like:</p>

<pre><code>{
  "@graph": [{
    "@id": "http://www.legislation.gov.uk/id/task/research/effects/uksi/2011/1901",
    "http://www.legislation.gov.uk/def/legislation/assignedTo": [{
      "@value": "http://www.legislation.gov.uk/id/user/tso.co.uk/jeni.tennison"
    }]
  }, {
    "@id": "http://www.legislation.gov.uk/id/user/tso.co.uk/jeni.tennison",
    "http://rdfs.org/sioc/ns#name": [{
      "@value": "Jeni Tennison"
    }]
  }]
}
</code></pre>

<p>This isn&#8217;t great for navigation, at least in Javascript &#8212; you really need to build a hash table to get quickly from the object ids to the details of the objects themselves &#8212; but once you have that you can do things like:</p>

<pre><code>objects[task[leg.assignedTo][0].@value][sioc.name][0].@value
</code></pre>

<p>which isn&#8217;t too bad. It gets nastier when you start to have multiple values for properties and want to navigate through them, but people who use these kinds of object structures are used to that.</p>

<p>What about going the other direction? One of the JSON-LD raisons d&#8217;être is to provide a quick and easy annotation route for adding RDF semantics on top of existing JSON formats. Just as with RDF/XML used in this way, I&#8217;m really not convinced that the majority of existing JSON formats are going to be easily coercible into JSON that can be processed into sensible RDF through the JSON-LD processing rules, nor that JSON designed from scratch to be JSON-LD compatible will have the advantages of &#8220;natural&#8221; JSON. I can see JSON-LD as an easy-to-generate target format for people wanting to extract RDF from JSON, though given how people generate HTML and XML they might just stick with string manipulation and generate N-triples or N-quads.</p>

<h2>Thoughts?</h2>

<p>What do you think? Are there advantages in RDF chimeras like RDF/XML and JSON-LD <em>as destinations</em> that I&#8217;m just not seeing? Are there other ways of normalising them that make them easy to process as XML or JSON?</p>
    ]]></content>
  </entry>
  <entry>
    <title>Using &quot;Punning&quot; to Answer httpRange-14</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/170" />
    <id>http://www.jenitennison.com/blog/node/170</id>
    <published>2012-05-11T21:11:43+01:00</published>
    <updated>2012-06-26T20:53:22+01:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="rdf" />
    <category term="rest" />
    <category term="tag" />
    <summary type="html"><![CDATA[<p>As part of the TAG&#8217;s work on httpRange-14, <a href="http://mumble.net/~jar/">Jonathan Rees</a> has assessed how a variety of <a href="http://www.w3.org/wiki/HTTPURIUseCases">use cases</a> could be met by various <a href="http://www.w3.org/wiki/TagIssue57Responses">proposals</a> put before the TAG. The results of the assessment are a <a href="http://www.w3.org/wiki/HTTPURIUseCaseMatrix">matrix</a> which shows that &#8220;punning&#8221; is the most promising method, unique in not failing on either <a href="http://www.w3.org/wiki/HTTPURIUseCases#J.29_Naive_linked_data_on_hosting_service">ease of use (use case J)</a> or <a href="http://www.w3.org/wiki/HTTPURIUseCases#M.29_HTTP_consistency">HTTP consistency (use case M)</a>.</p>

<p>In normal use, &#8220;punning&#8221; is about making jokes based around a word that has two meanings. In this context, &#8220;punning&#8221; is about using the same URI to mean two (or more) different things. It&#8217;s most commonly used as a term of art in <a href="http://techwiki.openstructs.org/index.php/Metamodeling_in_Domain_Ontologies">OWL</a> but normal people don&#8217;t need to worry particularly about that use. Here I&#8217;ll explore what that might actually mean as an approach to the httpRange-14 issue.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>As part of the TAG&#8217;s work on httpRange-14, <a href="http://mumble.net/~jar/">Jonathan Rees</a> has assessed how a variety of <a href="http://www.w3.org/wiki/HTTPURIUseCases">use cases</a> could be met by various <a href="http://www.w3.org/wiki/TagIssue57Responses">proposals</a> put before the TAG. The results of the assessment are a <a href="http://www.w3.org/wiki/HTTPURIUseCaseMatrix">matrix</a> which shows that &#8220;punning&#8221; is the most promising method, unique in not failing on either <a href="http://www.w3.org/wiki/HTTPURIUseCases#J.29_Naive_linked_data_on_hosting_service">ease of use (use case J)</a> or <a href="http://www.w3.org/wiki/HTTPURIUseCases#M.29_HTTP_consistency">HTTP consistency (use case M)</a>.</p>

<p>In normal use, &#8220;punning&#8221; is about making jokes based around a word that has two meanings. In this context, &#8220;punning&#8221; is about using the same URI to mean two (or more) different things. It&#8217;s most commonly used as a term of art in <a href="http://techwiki.openstructs.org/index.php/Metamodeling_in_Domain_Ontologies">OWL</a> but normal people don&#8217;t need to worry particularly about that use. Here I&#8217;ll explore what that might actually mean as an approach to the httpRange-14 issue.</p>

<!--break-->

<p><em>Note: The material here is a summary of what I think is the best way forward following various discussions within and outside the <a href="http://www.w3.org/2001/tag/">TAG</a>, in particular with Jonathan, Henry Thompson and TimBL. Not all these people agree with or endorse the approach described here, but neither do all the ideas in this post originate from me.</em></p>

<h2>Background</h2>

<p>Five things recently make me more convinced than ever that the TAG must either provide some direction to the community, and soon, or get out of the way.</p>

<ol>
<li><p>The <a href="https://www.w3.org/2012/ldp/charter">proposed Linked Data Platform Working Group charter</a> and the <a href="http://www.w3.org/Submission/ldbp/">Submission that is the main input to the group</a> specifically brings together linked data and REST, and the only mention of <code>303</code> redirections so far is to do with paging.</p></li>
<li><p>A recent thread on the <a href="http://lists.w3.org/Archives/Public/public-vocabs/2012Apr/">W3C public-vocabs mailing list</a>, raised the question of <a href="http://lists.w3.org/Archives/Public/public-vocabs/2012Apr/0041.html">whether to embed schema.org markup about the page itself within a given page, or only about the thing that the page is about</a>. I wonder how many pages are being described as <code>schema:WebPage</code> as well as things like <code>schema:Organisation</code>, and how people choose which class to use.</p></li>
<li><p>The initial version of Dan&#8217;s <a href="http://www.w3.org/wiki/WebSchemas/ExternalEnumerations">proposal for handling external enumerations within schema.org</a> talked about minting new URIs in the <code>ext.schema.org</code> domain specifically to proxy existing URIs so that they can be guaranteed to provide the right (<code>303</code>) HTTP response. I can see the reasoning (persuading people to use <code>303</code> redirections is difficult) but it would be frustrating if the end result were a centralisation of the URI space.</p></li>
<li><p>Talking with my colleague John Sheridan about updating the UK government&#8217;s guidance on <a href="http://www.cabinetoffice.gov.uk/sites/default/files/resources/designing-URI-sets-uk-public-sector.pdf">Designing URI Sets for the UK Public Sector</a>, I really don&#8217;t know what to advise. Should the guidance continue to be to use <code>303</code> redirections, when I know from experience that these can be <a href="http://lists.w3.org/Archives/Public/public-lod/2012Mar/0424.html">impractically slow</a>? Should it change to recommend using hash URIs to identify things?</p></li>
<li><p>The very first message on the <a href="http://www.w3.org/community/opentag/">Technical Architecture Community Group</a> was in part about <a href="http://lists.w3.org/Archives/Public/public-opentag/2012Apr/0000.html">how to identify people with URIs</a>.</p></li>
</ol>

<p>Of course the httpRange-14 issue has been running for so long now that I&#8217;d estimate that currently 80% of the discussion about it is meta-discussion about whether there it needs to be discussed, how much it should be discussed, how to raise the quality of the discussion, how anyone who discusses it is a time-wasting idiot and should just shut up and so on. It&#8217;s a terrible destructive cycle: the more this goes on, the higher the proportion of time being spent on the meta-discussion, and the longer any discussion takes.</p>

<p>But I believe that we can get to a point where we don&#8217;t have to discuss it any more (except to reminisce about what a waste of time it was), and I believe that the only way to get to that point is for the TAG to push through and provide a practical way forward.</p>

<h2>Terminology</h2>

<p>Let&#8217;s start off with some terminology. The basic scenario is a three-way interaction between three agents:</p>

<ul>
<li>a <strong>supplier</strong> who manages the information that is accessible at a URI; it&#8217;s worth noting that the supplier for a particular URI might change over time and what exactly is provided at a URI is controlled by multiple parties, as there may be many service providers involved in routing the resolution of a URI, others in constructing what&#8217;s shown on the page served from the origin web server, and still others who transform that content en route to the consumer</li>
<li>a <strong>third-party publisher</strong> who publishes some information about a URI; unlike the supplier, they generally have no control and incomplete knowledge about the information available at a particular URI, its stability over time or consistency across representations</li>
<li>a <strong>consumer</strong>, typically an application of some kind, who has discovered the information published by the supplier or third-party publisher and wants to do something with it</li>
</ul>

<p>It&#8217;s also useful to include terms for three things that are passed around or referenced during the interaction, which are defined with the <a href="http://tools.ietf.org/html/rfc3986">URI specification</a> and <a href="http://tools.ietf.org/wg/httpbis/">HTTPbis</a>. (The <a href="http://tools.ietf.org/html/rfc2616">current HTTP specification, RFC 2616</a> is also of interest, of course, but HTTPbis is a better reflection of current practical use of HTTP, and is close to complete, at which point it will replace RFC 2616.)</p>

<ul>
<li>a <strong>URI</strong> which is a string of characters matching the syntax in the URI specification that <strong>identifies</strong> a resource; we only care about <code>http:</code> URIs here, although similar considerations may apply to URIs that use other schemes</li>
<li>a <strong>resource</strong> which is identified by the URI; the debate over whether HTTP constrains the nature of the resource is at the heart of some discussions around httpRange-14; here, as in the URI specification and HTTPbis, a resource could be anything</li>
<li><strong>representations</strong> which are media-typed sequences of bytes (often characters) encoded within the response to an HTTP <code>GET</code> request on a given URI; per HTTPbis, the response to a <code>GET</code> request contains a representation of the current state of the resource identified by the URI</li>
</ul>

<h2>Content and Meaning</h2>

<p>Now I will introduce a few new terms that aren&#8217;t used in the URI specification of HTTPbis but which are useful for discussion.</p>

<ul>
<li>The <strong>content</strong> located by a <code>http:</code> URI is whatever core information a consumer interacts with through the HTTP interface provided by the server for that URI. This is the information that is common across all the representations that are returned through a <code>GET</code> on a given URI (through content-negotiated variants). We can say that the <code>http:</code> URI <strong>locates</strong> the content of the resource that it identifies, because you can get hold of the content of a resource by performing an HTTP <code>GET</code> request.</li>
<li>The <strong>sense</strong> referred to by a <code>http:</code> URI is a social construct that arises from the properties associated with the URI by publishers and the way that these invoke action in consumers. We can say that a <code>http:</code> URI <strong>refers to</strong> a sense. While you can <code>GET</code> content from a URI, determining its sense can only be achieved by examining the way in which the URI is used within data published on the web. The sense referred to by a URI might vary in different contexts, but equally a single sense may emerge in the use of the URI. One sense referred to by a URI may be its content, and for some types of information that may be the only sense referred to by the URI by anyone.</li>
</ul>

<p>Here is a diagram that shows how these different terms hang together.</p>

<p style="text-align: center;">
<img src="/blog/files/punning.png" />
</p>

<p>For example, take the URI <code>http://www.amazon.com/gp/product/B004TRXX7C</code>. The <em>content</em> located by this URI is the core information in the web pages we <code>GET</code> from the URI. The <em>sense</em> referred to by the URI could the same as the <em>content</em>, or it could be the novel Moby Dick, or the particular Kindle edition of the book. We can&#8217;t tell from any interaction at the level of the HTTP protocol what the <em>sense</em> of the resource is: that information has to come from the application level.</p>

<p>Like the meaning of a word, the <em>sense</em> that a URI refers to is a social understanding which emerges from use of the URI across the web, and a given URI may be used to refer to different <em>senses</em> in different sources of information or over time. Consumers interpret the information that uses a URI and is made available to them on the web in order to draw conclusions and perform a task. Different consumers will have different levels of trust in the particular interpretation of the URI that a given publisher provides; in particular, the information published by the supplier of the URI might be given a higher weight than that from third-party publishers. Tools like <a href="http://sig.ma/">sig.ma</a> illustrate how information can be combined from multiple sources with different weights, by associating metadata about the location of the data with the data itself; unpicking commonalities between groups of sources may help to work out the different <em>senses</em> referred to by these different sources.</p>

<p>The <em>content</em> located by a URI is more concrete, and is important because certain classes of application may infer something about what they can do with the <em>content</em> found at a given URI based on information published about the URI. The canonical example of this is a consumer that searches the web for public-domain pages, based on information published about the licensing of those pages, and displays a portion of one each day within a feed. This application can&#8217;t work properly if it doesn&#8217;t know what actual content is public domain. In the Amazon example above, if there is a statement saying <code>http://www.amazon.com/gp/product/B004TRXX7C</code> is public domain (referring to the novel &#8220;Moby Dick&#8221;, which is one possible <em>sense</em> referred by the URI, one on which the copyright has expired), a consumer that assumes the URI is being used to locate the <em>content</em> at that URI will assume that the representation retrieved through a <code>GET</code> on that URI is public domain. The consumer might pick out the first major paragraph from the HTML page for display, but that first paragraph is actually an editorial review that is marked with a separate copyright which therefore shouldn&#8217;t be displayed in a feed of public-domain content.</p>

<p>Of course interpreting assertions made by publishers about particular <em>content</em> can be just as complex as interpreting statements about a particular <em>sense</em> of a URI, especially when those assertions come from a third party. The <em>content</em> located by a URI may change over time, and potentially in dramatic ways if the domain name of the URI changes hands, so any statements that a consumer discovers about <em>content</em> needs to be assessed by a consumer in that context. That said, while the validity or truthfulness of a given piece of information about <em>content</em> may be variable, the bytes that are located through a URI by a consumer are tangible and discoverable in a way that the <em>sense</em> referred to by a URI can never be.</p>

<p>The core disagreements around httpRange-14 arise from whether you view the <em>content</em> located by a hash-less <code>http:</code> URI which provides a successful (<code>2XX</code>) response to be the only valid <em>sense</em> referred to by that URI, or whether you think they could be different things, and if you think they can be different things then which of those you think the URI identifies when it is used.</p>

<h2>Current State</h2>

<p>The URI and HTTP specifications talk only about URIs identifying resources and being able to make requests using URIs to get a representation of the resource, they do not talk about <em>senses</em> or <em>content</em>.</p>

<p>The httpRange-14 decision is based on a design where if you can successfully <code>GET</code> a representation using a URI (ie you receive a <code>200 OK</code> response) then the <em>sense</em> referred to by the URI is the <em>content</em> located by that URI. This is pictured in the diagram below.</p>

<p style="text-align: center;">
<img src="/blog/files/equal-content-sense.png" />
</p>

<p>Sometimes, a server simply doesn&#8217;t store the <em>content</em> for something that it wishes to provide information about (for example, the Amazon website doesn&#8217;t store the content of Moby Dick), and sometimes the <em>sense</em> that a publisher wishes to confer on a URI is such that no <em>content</em> can be transmitted over the wire, such as a Person. In these cases, under this design, the server cannot give a <code>200 OK</code> response because it does not have the <em>content</em> for the URI. There are then two patterns a publisher can use to assign a <code>http:</code> URI to something in these cases.</p>

<p>One pattern is to use a <code>303 See Other</code> redirection to a URI whose <em>content</em> (which is the only <em>sense</em> of the URI in this design, remember) <strong>describes</strong> the original URI. This is pictured below.</p>

<p style="text-align: center;">
<img src="/blog/files/punning-303.png" />
</p>

<p>The second pattern is to use a hash URI. This gives a very similar pattern, as you can see from the diagram below; the only difference is that instead of following a <code>303 See Other</code> redirection to get from one URI to the other, you can use URI parsing: you chop off the fragment part of the URI and perform a <code>GET</code> on the resulting URI to get a description. Often this leads to several hash URIs being described by the same <em>content</em>, as illustrated here:</p>

<p style="text-align: center;">
<img src="/blog/files/punning-hash-uris.png" />
</p>

<p>However, hash URIs are also used to identify fragments within pages, which are bits of <em>content</em>. For hash URIs that identify fragments of a page, the picture looks more like this:</p>

<p style="text-align: center;">
<img src="/blog/files/punning-fragment.png" />
</p>

<p>A consumer can&#8217;t tell just from looking at a hash URI whether it identifies a fragment of content or is being used to refer to something described by the content located by its base URI: the consumer has to make a request and understand the fragment identifier as it applies to the media type of the representation that it gets back. Also, if there&#8217;s any content negotiation going on, even if the fragment identifier doesn&#8217;t make sense for the media type, or doesn&#8217;t locate a fragment within the representation that the consumer retrieves, it might still locate a fragment of content within a different representation of the resource than that the consumer has retrieved.</p>

<p>In any case, common usage of hash-less <code>http:</code> URIs differs from this model of <em>content</em> and <em>sense</em> being one and the same for all URIs that give a successful response. Often URIs are used in data formats such as JSON, XML, RDF or HTML where the <em>content</em> and the <em>sense</em> of the URI are different things. For example, on the Flickr the <em>content</em> located through the URI <code>http://www.flickr.com/photos/45701084@N08/7051652969/</code> is a landing page that provides a bunch of information about an image, but the data within the page includes a statement about its license:</p>

<pre><code>&lt;http://www.flickr.com/photos/45701084@N08/7051652969/&gt;
  cc:license &lt;http://creativecommons.org/licenses/by/2.0/deed.en&gt; ;
  .
</code></pre>

<p>in which that same URI refers to the photograph itself: the <em>sense</em> referred to by the URI. The only way that you can tell this is by being a human: reading the page and the context in which the text describing the license is used.</p>

<p>This mismatch between the design specified by RFC 2616 and the httpRange-14 decision, and practice on the web today results in arguments back and fro with people saying, in essence, that the resource a URI identifies is the <em>sense</em> conferred by the URI&#8217;s supplier, or that a URI should always be taken as identifying the <em>content</em> of the resource, and then discussions about how to signal to an application that in particular cases the supplier really does mean the URI to identify some <em>content</em> or really does mean the URI to identify a particular <em>sense</em>, and if so which <em>sense</em> is being referred to.</p>

<h2>Punning</h2>

<p>&#8220;Punning&#8221; approaches attempt to cut through these disagreements by saying that the context in which the URI is used determines whether it is locating <em>content</em> or referring to a <em>sense</em>.</p>

<p>If we look at some <a href="http://ogp.me/">Open Graph Protocol (OGP)</a> statements on <code>http://www.imdb.com/title/tt1334573/</code>, we see:</p>

<pre><code>&lt;meta property="og:url" content="http://www.imdb.com/title/tt1334573/" /&gt;
&lt;meta property="og:title" content="Moby Dick (TV Series 2010)"/&gt;
&lt;meta property="og:type" content="video.tv_show"/&gt;
&lt;meta property="og:image" content="http://i.media-imdb.com/images/SFc0774313bf9ccbfe22050c8bb4029e41/imdb-share-logo.gif"/&gt;
&lt;meta property="og:site_name" content="IMDb"/&gt;
&lt;meta property="fb:app_id" content="115109575169727"/&gt;
</code></pre>

<p>Some of these properties &#8212; the url, title, type and image &#8212; are about the Moby Dick TV Series &#8212; the <em>sense</em> referred to by the URI <code>http://www.imdb.com/title/tt1334573/</code>. Others &#8212; the site name and Facebook application id &#8212; are about the <em>content</em> located by the URI. The properties that are provided by this data are all related to the same URI, but they aren&#8217;t all properties of the same thing. In natural language we might say:</p>

<ul>
<li>a URL of the thing described by the page is <code>http://www.imdb.com/title/tt1334573/</code></li>
<li>a title of the thing described by the page is &#8220;Moby Dick (TV Series 2010)&#8221;</li>
<li>a type of the thing described by the page is &#8220;video.tv_show&#8221;</li>
<li>an image of the thing described by the page is the content at <code>http://i.media-imdb.com/images/SFc0774313bf9ccbfe22050c8bb4029e41/imdb-share-logo.gif</code></li>
<li>the site of the page is IMDb</li>
<li>the Facebook application to use with the page has the identifier 115109575169727</li>
</ul>

<p>The property itself determines whether it applies to the <em>content</em> located by the URI (the page) or a <em>sense</em> referred to by the URI (in this case, the thing the page describes). Here&#8217;s a diagram that shows the distinction:</p>

<p style="text-align: center;">
<img src="/blog/files/punning-moby-dick.png" />
</p>

<h3>Defining URI Usage</h3>

<p>The way in which a URI is interpreted &#8212; as referring to a <em>sense</em> or locating <em>content</em> &#8212; is dependent on where it is used. In XML, for example, an <code>xmlns:*</code> attribute contains a URI; when this is a hash-less <code>http:</code> URI, this refers to an XML namespace (a <em>sense</em> of that URI): it doesn&#8217;t matter what <em>content</em> you <code>GET</code> from dereferencing the URI, or even if it can be dereferenced at all. On the other hand, the <code>href</code> attribute on a <code>xi:include</code> element defined by <a href="http://www.w3.org/TR/xinclude/">XInclude</a> is used to locate some <em>content</em> to be included within the referring XML.</p>

<p>It is really up to the format in which data is encoded to determine how the URI should be interpreted: as locating some <em>content</em> or referring to a <em>sense</em>. As with interpreting any information with which it&#8217;s presented, an application that needs to work out which is meant might use:</p>

<ul>
<li>built-in knowledge (eg an application might know that the <code>og:title</code> property is always about the <em>sense</em> referred to by the subject URI, based on documentation about the property [this is essentially the same as if the information were embedded within a schema, but without the implication that every application must download and interpret a schema every time it happens across a property])</li>
<li>information encoded within a schema (eg a schema might classify the <code>og:title</code> property as a <code>PropertyWhoseSubjectIsTheSubstanceOfAResource</code>)</li>
<li>a default for the format of the data (eg given OGP uses RDFa, RDF could specify that by default URIs refer to a <em>sense</em>, and therefore barring other information to the contrary, properties cannot be assumed to be about the page itself)</li>
<li>a default for the web (eg we might say that barring overriding information, all hash-less <code>http:</code> URIs are assumed to locate <em>content</em>, as this is consistent with the current definition of HTTP in RFC 2616)</li>
</ul>

<p>Thus an implication of this approach is that the people who define languages and vocabularies must specify what aspect of a resource a URI used in a particular way identifies. There are four possibilities for a given URI:</p>

<ol>
<li>the URI is being used to locate some <em>content</em></li>
<li>the URI is being used to refer to a <em>sense</em></li>
<li>the URI is being used to identify either <em>content</em> or <em>sense</em> but it&#8217;s not specified which</li>
<li>the URI is being used to both locate <em>content</em> and refer to a <em>sense</em> (ie a property applies equally to both)</li>
</ol>

<h3>Equality</h3>

<p>Now let&#8217;s consider what happens when there is more information available about something, but it uses a different URI. The page <code>https://en.wikipedia.org/wiki/Moby_Dick_(miniseries)</code> is about the same TV series as <code>http://www.imdb.com/title/tt1334573/</code>. Imagine that this similarly made available the information that it held using OGP. It might contain:</p>

<pre><code>&lt;meta property="og:url" content="https://en.wikipedia.org/wiki/Moby_Dick_(miniseries)" /&gt;
&lt;meta property="og:title" content="Moby Dick (miniseries)"/&gt;
&lt;meta property="og:type" content="video.tv_show"/&gt;
&lt;meta property="og:site_name" content="Wikipedia"/&gt;
</code></pre>

<p>The two pages describe the same thing: the <em>sense</em> referred to by the two URIs is the same. However, the <em>content</em> of the two pages is different. If you simply smushed the properties together, ignoring the fact that some properties apply to the <em>content</em> and others the <em>sense</em> of the resource, you&#8217;d get some data that wasn&#8217;t quite right:</p>

<pre><code>{
  url: [
    'http://www.imdb.com/title/tt1334573/',
    'https://en.wikipedia.org/wiki/Moby_Dick_(miniseries)'
  ],
  title: [
    'Moby Dick (TV Series 2010)',
    'Moby Dick (miniseries)'
  ],
  type: [
    'video.tv_show'
  ],
  image: [
    'http://i.media-imdb.com/images/SFc0774313bf9ccbfe22050c8bb4029e41/imdb-share-logo.gif'
  ],
  site_name: [
    'IMDb',
    'Wikipedia'
  ],
  app_id: [
    '115109575169727'
  ]
}
</code></pre>

<p>Having two URLs, two titles and so on is fine, but having two site names doesn&#8217;t make sense: the <code>og:site_name</code> property is related to the <em>content</em> located by the URI, and the <em>content</em> is different for the two URIs. This is illustrated below.</p>

<p style="text-align: center;">
<img src="/blog/files/punning-equality.png" />
</p>

<p>Conversely, imagine a situation in which there is a single document on a web server that is served up from the two URIs</p>

<pre><code>http://example.org/gender/male
http://example.org/gender/female
</code></pre>

<p>In this case, the <em>content</em> located by the two URIs is exactly the same, but the <em>sense</em> referred to by the two URIs is different: one refers to the gender &#8216;male&#8217; and the other to the gender &#8216;female&#8217;, as illustrated here.</p>

<p style="text-align: center;">
<img src="/blog/files/punning-equality2.png" />
</p>

<p>So there are three types of equality that we have to be concerned with:</p>

<ul>
<li>equality between the desired <em>sense</em> and <em>content</em> of a single URI, as described earlier</li>
<li>equality between the <em>senses</em> referred to by different URIs</li>
<li>equality between the <em>content</em> located by different URIs</li>
</ul>

<p>In general equality between the resources identified between two URIs is a controversial thing to assert, because different contexts may refer to different <em>senses</em> of a particular URI, some of which may be equal with a <em>sense</em> referred to by another URI and some not. Like any statement made about URIs, the source of statements about equality must be considered.</p>

<p>Formats that wish to make assertions about equality between resources should provide ways of saying that the <em>sense</em> referred to by two URIs is the same, without implying that the <em>content</em> located at those two URIs are the same, and vice versa, and to assert that the <em>sense</em> and <em>content</em> of a URI are equal. What these properties are &#8212; how exactly these kinds of equality are asserted in a given format &#8212; is up to the format, but it&#8217;s important that the properties are kept distinct to enable people to articulate the full range of equality relationships between resources.</p>

<h2>Implications for Linked Data</h2>

<p>I have tried to keep the description above neutral in terms of technology choice, because I believe that the issue of how to interpret URIs within data is common across all languages that use URIs. However, as I&#8217;ve discussed previously, linked data is particularly affected by these issues both because URIs form a central part of the way it works as a data format and because culturally the community tries very hard to adhere to &#8220;good web architectural practice&#8221; in the hope that this will confer long-term benefits.</p>

<p>For that reason, I&#8217;ll look at what I think the impacts are on linked data practice of using the &#8220;punning&#8221; approach that I&#8217;ve described above.</p>

<h3>RDF</h3>

<p>The definition of RDF is currently in flux, as <a href="http://www.w3.org/TR/rdf11-concepts/">RDF 1.1</a> is developed, so now is a good time to consider its use of URIs.</p>

<p>RDF itself is not particularly concerned with what URIs identify: it is simply a model that can be used to associate properties between &#8220;resources&#8221;, where in the RDF context this term means anything that can be the subject or object of an RDF statement, including literals. (RDF&#8217;s use of the term &#8220;resource&#8221; is not the same as that used in the URI specification or HTTPbis.) The only real limitation in <a href="http://www.w3.org/TR/rdf-concepts/">RDF 1.0 Concepts</a>, is that a <a href="http://www.w3.org/TR/rdf-concepts/#section-fragID">hash URI identifies something described by the RDF/XML representation retrieved when the URI is resolved</a>. In the current Editor&#8217;s Draft of RDF 1.1, <a href="http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#section-fragID">the same section</a> is less specific about what such a fragment might denote. So if the language and concepts above were adopted, RDF 1.1 should be more careful in its use of terminology, and attempt to be consistent with the URI and HTTP specifications, but I don&#8217;t think anything fundamental needs to change in the core semantics of RDF.</p>

<h3>Vocabulary Designers</h3>

<p>Under the &#8220;punning&#8221; approach, the property used within an RDF statement determines how the URI given as its subject or object should be interpreted. A consumer that discovers a URI by looking at the properties associated with it needs to be able to tell from the properties themselves whether it can associate those properties to particular <em>content</em> that it locates by requesting the URI or not.</p>

<p>Some properties have a defined domain or range that precludes the property from being used to annotate <em>content</em>. For example, the <code>foaf:nick</code> property has a domain of <code>foaf:Person</code>, and a <code>foaf:Person</code> cannot be a web page. Given this domain, an application can tell that the URI <code>http://www.jenitennison.com/</code> used in a statement such as:</p>

<pre><code>&lt;http://www.jenitennison.com/&gt;
  foaf:nick "JeniT" ;
  .
</code></pre>

<p>cannot be being used to locate the <em>content</em> of <code>http://www.jenitennison.com/</code>, even if <code>http://www.jenitennison.com/</code> responds with a <code>200 OK</code> response.</p>

<p>Note that this inference doesn&#8217;t work the other way around. The property <code>cc:license</code> has a domain of a <code>cc:Work</code> but without additional information about the property an application could not infer that in a statement such as</p>

<pre><code>&lt;http://www.amazon.com/gp/product/B004TRXX7C&gt;
  cc:license &lt;http://creativecommons.org/publicdomain/mark/1.0/&gt; ;
  .
</code></pre>

<p>the URI <code>http://www.amazon.com/gp/product/B004TRXX7C</code> was being used to locate the <em>content</em> of <code>http://www.amazon.com/gp/product/B004TRXX7C</code>: it could equally be being used to refer to some <em>sense</em> of the resource (for example the novel Moby Dick).</p>

<p>To support &#8220;punning&#8221;, therefore, RDF vocabulary designers would need to have additional properties that could be applied to RDF Properties to indicate how their subject (and object where applicable) should be interpreted. For example, the Creative Commons vocabulary might include (warning: made up property names and instances):</p>

<pre><code>cc:license
  rdfs:subjectUri rdf:sense ;
  rdfs:objectUri rdf:content ;
  .
</code></pre>

<p>with the implication that URIs used as the subject of <code>cc:license</code> should be understood as referring to the <em>sense</em> of the URI, while those used as the object of <code>cc:license</code> should be understood as referring to the <em>content</em> retrieved from the URI.</p>

<p>Even if properties like <code>rdfs:subjectUri</code> or <code>rdfs:objectUri</code> are defined, there are going to be RDF properties for which the interpretation of subject and/or object URIs isn&#8217;t specified, and thus consumers of RDF content need to have a default interpretation. What that should be is, I think, a matter for the RDF community to decide.</p>

<h3>Inference</h3>

<p>The major difficulties with the &#8220;punning&#8221; approach and the current use of RDF comes when reasoning is used across RDF statements in which the same URI is used in different ways, particularly with properties where the interpretation of the subject and/or object isn&#8217;t specified.</p>

<p>For example, if a consumer finds the following triples at <code>http://www.imdb.com/title/tt1334573/</code>:</p>

<pre><code>&lt;http://www.imdb.com/title/tt1334573/&gt;
  og:url "http://www.imdb.com/title/tt1334573/" ;
  og:title "Moby Dick (TV Series 2010)" ;
  og:type "video.tv_show" ;
  og:image "http://i.media-imdb.com/images/SFc0774313bf9ccbfe22050c8bb4029e41/imdb-share-logo.gif" ;
  og:site_name "IMDb" ;
  fb:app_id "115109575169727" ;
  .
</code></pre>

<p>and the following triples at <code>https://en.wikipedia.org/wiki/Moby_Dick_(miniseries)</code>:</p>

<pre><code>&lt;https://en.wikipedia.org/wiki/Moby_Dick_(miniseries)&gt;
  og:url "https://en.wikipedia.org/wiki/Moby_Dick_(miniseries)" ;
  og:title "Moby Dick (miniseries)" ;
  og:type "video.tv_show" ;
  og:site_name "Wikipedia" ;
  .
</code></pre>

<p>and then the assertion:</p>

<pre><code>&lt;http://www.imdb.com/title/tt1334573/&gt;
  owl:sameAs &lt;https://en.wikipedia.org/wiki/Moby_Dick_(miniseries)&gt; ;
  .
</code></pre>

<p>then the result of inference will be that all the statements made about <code>http://www.imdb.com/title/tt1334573/</code> apply equally to <code>https://en.wikipedia.org/wiki/Moby_Dick_(miniseries)</code>, which is not the case.</p>

<p>To enable publishers to make assertions about equality of <em>sense</em> and equality of <em>content</em> separately, we will need new relationships. For example:</p>

<pre><code>&lt;http://www.imdb.com/title/tt1334573/&gt;
  owl:sameSenseAs &lt;https://en.wikipedia.org/wiki/Moby_Dick_(miniseries)&gt; ;
  .
</code></pre>

<p>would only infer that those properties whose subject is the <em>sense</em> of <code>http://www.imdb.com/title/tt1334573/</code> apply equally to the <em>sense</em> of <code>https://en.wikipedia.org/wiki/Moby_Dick_(miniseries)</code>. A <code>owl:sameContentAs</code> property could similarly assert equality between the <em>content</em> of two URIs.</p>

<p>The impact is not limited to reasoning with <code>owl:sameAs</code>: all inference in <a href="http://www.w3.org/TR/rdf-schema/">RDFS</a> and <a href="http://www.w3.org/TR/owl2-overview/">OWL</a> is based on the assumption that a single URI identifies a single entity. This works in situations where the RDF over which inferences are being made is all trusted (for example if it is all made available by the same publisher), and a lot of current use of OWL is precisely in these kinds of closed environments. The same inferences can be made even with information gleaned from the web at large, if that information is selected carefully.</p>

<p>Another approach, where publishers have mixed properties about the <em>sense</em> referred to by a URI and those about the <em>content</em> located by a URI is to pre-process those RDF statements to create separate (blank node) RDF resources. For example, if <code>og:url</code>, <code>og:title</code>, <code>og:type</code> and <code>og:image</code> are defined to have a subject that refers to the <em>sense</em> of the URI, and <code>og:site_name</code> and <code>fb:app_id</code> to have a subject that locates the <em>content</em> of the URI, the statements about <code>http://www.imdb.com/title/tt1334573/</code> above could be translated into:</p>

<pre><code>_:imdbSubstance
  rdf:senseUri "http://www.imdb.com/title/tt1334573/" ;
  og:url "http://www.imdb.com/title/tt1334573/" ;
  og:title "Moby Dick (TV Series 2010)" ;
  og:type "video.tv_show" ;
  og:image "http://i.media-imdb.com/images/SFc0774313bf9ccbfe22050c8bb4029e41/imdb-share-logo.gif" ;
  .

_:imdbContent
  rdf:contentUri "http://www.imdb.com/title/tt1334573/" ;
  og:site_name "IMDb" ;
  fb:app_id "115109575169727" ;
  .
</code></pre>

<p>Here, the (putative) <code>rdf:senseUri</code> property is an inverse functional property that provides the URI for which the individual is a <em>sense</em>, and the <code>rdf:contentUri</code> property is an inverse functional properties that provides the URI for which the individual is the <em>content</em>.</p>

<p>This separation would then allow existing inference to take place on the separate entities.</p>

<h3>Publishers</h3>

<p>There are many advantages offered by the &#8220;punning&#8221; approach for linked data publishers:</p>

<ul>
<li>it supplies an easy on-ramp for suppliers who want to annotate their pages with HTML data such as RDFa and microdata: suppliers can use URIs that they already support to refer to things other than documents, if they choose to, which means all they need to do is add metadata to their pages (as they are currently using OGP and schema.org)</li>
<li>suppliers do not have to have access to server configuration in order to promote the use of particular URIs to mean things that do not have <em>content</em> (such as people or organisations)</li>
<li>publishers can copy and paste URIs from the location bar of their browsers (a familiar activity for people who wish to provide a pointer to something) rather than inspecting pages for a recommended URI to be used to refer to a particular <em>sense</em></li>
<li>organisations such as schema.org can easily recommend the reuse of URIs published by other people, such as Wikipedia, without requiring those publishers to alter their server configuration or requiring developers that use schema.org markup to add fragment identifiers to their URIs</li>
<li>explicit <code>describedby</code> and <code>describes</code> links can be made between URIs rather than using an HTTP status code where necessary; these can be incorporated directly in data and do not require a network connection to be discovered</li>
</ul>

<h3>Provenance</h3>

<p>The &#8220;punning&#8221; approach that I&#8217;ve described here has as its core the recognition that different consumers will trust different sources of information to different levels. Knowledge of the provenance of a particular source of information is one way in which consumers can work out what to trust and how to resolve conflicts sources.</p>

<p>The work of the <a href="http://www.w3.org/2011/prov">Provenance Working Group</a> is important here both in identifying the provenance of particular <em>content</em> located at a given URI and in providing a vocabulary for describing the processing that a consumer performs to retrieve and process that <em>content</em> in order to extract data from it (for example, the time of the retrieval and the HTTP headers used may lead to the consumer receiving different content; the particular version of software used may lead to different information being gleaned from that content).</p>

<h3>Linked Data Platform</h3>

<p>The particular issues around what URIs actually identify within RDF only become an issue when the URIs are resolved &#8212; when RDF is used within linked data. The new <a href="http://www.w3.org/2012/ldp/">Linked Data Platform Working Group</a> is a great opportunity to standardise around these practices, in collaboration with the other relevant working groups.</p>

<h2>Final Thoughts</h2>

<p>People use the terms &#8220;resource&#8221;, &#8220;identifies&#8221; and &#8220;representation&#8221; both within specifications and in common parlance as if there is a shared understanding of what they mean, when in fact different people use the terms in subtly but meaningfully different ways. This would be fine, except that the different understandings lead to different assumptions and engineering decisions, and friction for developers trying to build applications that publish and consume data whose assumptions differ.</p>

<p>We need to find a way forward that, even if not everyone&#8217;s ideal, is realistic, explicable and palatable. The &#8220;punning&#8221; approach that I&#8217;ve described above might not be it, but the analysis that Jonathan&#8217;s done of the various proposals and use cases suggests to me that it&#8217;s the closest we have. The main questions I have are:</p>

<ul>
<li>what use cases cannot be satisfied using this approach?</li>
<li>what specifications would have to change if this approach was adopted, and would it be realistic to make those changes?</li>
<li>what existing applications would break if this approach was adopted, and how might that breakage be mitigated?</li>
</ul>

<p>At the very least, I hope that the vocabulary I&#8217;ve laid out in this post might be helpful in further discussions.</p>

<p>Of course any other comments are most welcome.</p>
    ]]></content>
  </entry>
  <entry>
    <title>UK Open Standards Consultation</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/169" />
    <id>http://www.jenitennison.com/blog/node/169</id>
    <published>2012-04-14T23:44:51+01:00</published>
    <updated>2012-06-26T20:53:36+01:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="opendata" />
    <summary type="html"><![CDATA[<p>Over the last few months, the UK Government has been running a <a href="http://consultation.cabinetoffice.gov.uk/openstandards/">consultation on its Open Standards policy</a>. The outcome of this consultation is incredibly important not only for organisations and individuals who want to work with government but also because of its potential knock-on effects on the publication of Open Data and the use of Open Source software within public sector organisations.</p>

<p>Unsurprisingly, Microsoft, Qualcomm and other organisations who have a vested interest in keeping the UK Government locked in to their products are <a href="http://www.computerweekly.com/blogs/public-sector/2012/04/proprietary-lobby-triumphs-in.html">responding vociferously to the consultation</a>. They risk not only losing business to smaller enterprises within the UK but also, if the policy is successfully adopted here, in other countries in Europe and internationally that follow suit.</p>

<p>If we want our Government to be Open &#8212; to use Open Standards, to publish Open Data, to adopt Open Source &#8212; then we must respond to this consultation in numbers.</p>

<p>There are three things that you can do:</p>

<ol>
<li><strong>Respond to the consultation</strong> &#8212; made even easier by <a href="http://open.squarecows.com/">this response form</a> developed by Ric Harvey</li>
<li><strong>Attend the <a href="http://consultation.cabinetoffice.gov.uk/openstandards/events/">events</a></strong> &#8212; these seem pretty full now, but try to get in if you can</li>
<li><strong>Spread the message</strong> &#8212; blog and tweet and write to raise awareness of the importance and impact that this consultation could have</li>
</ol>
    ]]></summary>
    <content type="html"><![CDATA[<p>Over the last few months, the UK Government has been running a <a href="http://consultation.cabinetoffice.gov.uk/openstandards/">consultation on its Open Standards policy</a>. The outcome of this consultation is incredibly important not only for organisations and individuals who want to work with government but also because of its potential knock-on effects on the publication of Open Data and the use of Open Source software within public sector organisations.</p>

<p>Unsurprisingly, Microsoft, Qualcomm and other organisations who have a vested interest in keeping the UK Government locked in to their products are <a href="http://www.computerweekly.com/blogs/public-sector/2012/04/proprietary-lobby-triumphs-in.html">responding vociferously to the consultation</a>. They risk not only losing business to smaller enterprises within the UK but also, if the policy is successfully adopted here, in other countries in Europe and internationally that follow suit.</p>

<p>If we want our Government to be Open &#8212; to use Open Standards, to publish Open Data, to adopt Open Source &#8212; then we must respond to this consultation in numbers.</p>

<p>There are three things that you can do:</p>

<ol>
<li><strong>Respond to the consultation</strong> &#8212; made even easier by <a href="http://open.squarecows.com/">this response form</a> developed by Ric Harvey</li>
<li><strong>Attend the <a href="http://consultation.cabinetoffice.gov.uk/openstandards/events/">events</a></strong> &#8212; these seem pretty full now, but try to get in if you can</li>
<li><strong>Spread the message</strong> &#8212; blog and tweet and write to raise awareness of the importance and impact that this consultation could have</li>
</ol>

<!--break-->

<p>The consultation is quite long and there are a lot of questions to answer. In the hope of making this easier for everyone, I&#8217;m publishing my response below. Please consider these responses public domain, and feel free to copy as much or as little as you like from them (though I recommend you omit the parts that are about my individual experience and substitute them with your own).</p>

<p>For extra background, read:</p>

<ul>
<li><a href="http://blogs.computerworlduk.com/open-enterprise/2012/04/of-microsoft-netscape-patents-and-open-standards/index.htm">Of Microsoft, Netscape, Patents and Open Standards</a> by Glyn Moody</li>
<li><a href="http://digital.cabinetoffice.gov.uk/2012/04/12/are-open-standards-a-closed-barrier/">Are open standards a closed barrier?</a> by Linda Humphries</li>
<li><a href="http://dev.squarecows.com/2012/04/10/open-standards-at-risk/">Open Standards at risk</a> by Ric Harvey</li>
</ul>

<h2>Criteria for Open Standards</h2>

<h3>1. How does this definition of open standard compare to your view of what makes a standard &#8216;open&#8217;?</h3>

<p>The definition in the consultation closely matches my view of what makes a standard open. The important factors are:</p>

<ul>
<li>a documented, open process which enables participation not just from implementers but also from users of the standard, and that provides ongoing maintenance and development of the standard</li>
<li>publication of the standard such that anyone can read it</li>
<li>a royalty-free and non-discriminatory license such that anyone can implement the standard without cost</li>
</ul>

<p>The one factor that does not match my view is the availability of multiple independent implementations of the standard. In some cases it may be that market pressures mean there are not currently multiple good implementations of an otherwise Open Standard, or several but only for one particular platform. Limiting the definition of Open Standards to only those with multiple cross-platform implementations is probably too constraining.</p>

<p>There are two examples from my work that demonstrate why the availability of multiple implementations should not be a factor.</p>

<p>First, one of the Open Standards that I use is XSLT, which for years has been dominated by a single implementation &#8212; Saxon &#8212; giving customers no real choice. Nevertheless, because it is an Open Standard, Saxon has had a lot of pressure to be completely conformant with that standard, and in the past year a number of other implementations have been started that can compete with it on different platforms, so the presence of a single implementation has proven to be a short-term issue.</p>

<p>Second, in some new technology areas such as Linked Data, there may be only single implementations simply because the area and the Open Standards on which they are built are not yet very mature. As the use of the technology grows, so do the number of implementations and their adherence to the standard. Thus number and quality of implementations does develop over time; government should concentrate on long-term adoption rather than short-term availability.</p>

<h3>2. What will the Government be inhibited from doing if this definition of open standards is adopted for software interoperability, data and document formats across central government?</h3>

<p>In some specialist areas, there may not be existing Open Standards available, or it may be that the Open Standards that are available do not match the specialist needs of the UK government. Where an Open Standard is imminent but not yet fully standardised or implemented, waiting for standardisation or implementation could delay government IT projects. However, the measures suggested in the rest of the proposed policy, including government involvement in standards creation and allowing for selecting other standards where there is no available Open Standard, mitigate against this risk.</p>

<p>The other mitigating factor is that where an Open Standard doesn&#8217;t exist, it is usually possible to build new work on top of Open Standards. In my experience working on legislation.gov.uk, even though good appropriate Open Standards for UK legislation weren&#8217;t available, we have been able to work using the underlying Open Standards such as XML, RDF and HTTP, so the gap between necessary custom work and Open Standards is minimised.</p>

<p>The definition of Open Standards may also prevent the Government from entering into contracts with companies which do not adopt Open Standards. It may hinder exchange of information with outside organisations that use software that doesn&#8217;t support Open Standards. These may be problems in the short term but over the longer term, an Open Standards policy will move the supplier market and other organisations towards Open Standards more generally.</p>

<h3>3. For businesses attempting to break into the government IT market, would this policy make things easier or more difficult – does it help to level the playing field?</h3>

<p>The policy does help to level the playing field for two main reasons:</p>

<ul>
<li>using Open Standards reduces the cost of switching to new suppliers, because the new suppliers do not have to spend a lot of time reverse engineering existing processing and data; this means new suppliers can make more competitive bids</li>
<li>Open Standards are often implemented within Open Source Software, which has low or no cost; this helps smaller businesses because it lowers the cost of their entering the market</li>
</ul>

<h3>4. How would mandating open standards for use in government IT for software interoperability, data and document formats affect your organisation?</h3>

<p>I work as an independent contractor who specialises in a variety of Open Web Standards. For me as a contractor, the adoption of Open Standards within government increases the potential opportunities for me to work on government projects.</p>

<p>I am currently contracted to work on the delivery of legislation.gov.uk and The National Archives&#8217; Expert Participation Programme. This work is already built substantially on Open Standards. One of the main pain points in this work has been that the government organisations that provide data such as Bills, new Statutory Instruments or Tables of Effects for legislation.gov.uk are using proprietary technologies to do so, and converting from those proprietary data formats, or getting users to save in an open data format, can be both hard to do initially and difficult to maintain as new versions of software are rolled out. An Open Standards policy within government would greatly reduce the cost involved in those conversion processes and increase the ease of use for government users.</p>

<p>I am also a member of the Technical Architecture Group within the World Wide Web Consortium (W3C), which is the main standards body for Web Standards. From that perspective, the Open Standards policy would increase UK government, and UK government supplier, involvement in the development of Open Standards within the W3C, which can only improve the quality of the standards and the life of the organisation as a whole.</p>

<h3>5. What effect would this policy have on improving value for money in the provision of government services?</h3>

<p>This policy would greatly increase value for money in the provision of government services. Adopting Open Standards often means that the basic building blocks for a service can be selected off the shelf, and then fitted together and customised, rather than a proprietary solution built from scratch. This can reduce the cost of providing the service as a whole and improve the service as development effort can be directed on the unique features of the service.</p>

<h3>6. Would this policy support innovation, competition and choice in delivery of government services?</h3>

<p>The policy as written is well-framed to support innovation, competition and choice across the market in the medium and long term, in areas which are beneficial to the UK Government and to the rest of the economy.</p>

<p>Adopting Open Standards focuses innovation on novel areas (which are not currently covered by existing standards) and on providing better quality services (which may mean better performance, better user experience and so on). It also encourages innovation in public, and this greater exposure brings with it higher quality and a better focus on user requirements. The only innovation it prevents is that whose purpose is to lock the government in to individual suppliers, such as closed standards that largely repeat existing work but can only be implemented by one supplier.</p>

<p>As part of the work that I did on Linked Data for data.gov.uk, myself and several colleagues worked on new RDF vocabularies such as <a href="http://www.w3.org/TR/vocab-org/">org</a>, <a href="http://www.w3.org/TR/vocab-data-cube/">Data Cube</a> and <a href="http://purl.org/net/opmv/ns">OPMV</a> and processing standards such as the <a href="https://code.google.com/p/linked-data-api/">Linked Data API</a>. We did this in the open, and it was all based on Open Standards: that approach did not prevent us from doing new and innovative things. The results of that innovation were then taken forward by the wider community, being made more rigorous and better suited to applicability across a wider audience, and are resulting in Open Standards from W3C. In addition, my colleagues have built new products that integrate that work, for example in <a href="http://kasabi.com/">Kasabi</a>.</p>

<p>Adopting Open Standards focuses competition on the quality of service that is offered rather than on winning a single competition that will lock the government in to contracts for many years to come. It prevents supplier complacency: when the government can move easily to a new supplier, suppliers have to provide continuous improvements over the lifetime of a contract, because they cannot be guaranteed to win the next one simply because they are the only ones who have implementations that can process the data. Competition is thus focused into areas that matter to the customer, including cost, rather than areas that matter to the supplier.</p>

<p>I was involved tangentially in The Stationery Office&#8217;s (TSO) bid during the recent re-procurement of legislation services by The National Archives. Because legislation.gov.uk was built on Open Standards, TSO&#8217;s bid had to be based on quality of service, on continuing innovation over the lifetime of the contract, and on low cost of delivery.</p>

<p>Adopting Open Standards increases choice for the UK Government because it opens up competition to suppliers who would not otherwise be able to compete (as in Question 3). Of course some companies may not currently use Open Standards, and under this policy the UK Government would not be able to choose them as suppliers in the short term, but it is unlikely that companies would continue their use of closed standards in the long term, if they wish to compete for UK Government contracts.</p>

<p>Put another way, adopting Open Standards means companies are innovating and competing on the <strong>right</strong> things: on things that are important to the UK Government.</p>

<h3>7. In what way do software copyright licences and standards patent licences interact to support or prevent interoperability?</h3>

<p>It is possible to have a Open Standard implemented by software that is public domain or completely closed or anything in between: Open Standards do not necessarily lead to free software. On the other hand, standards patent licenses reduce the ability of developers to produce free (open licensed) software because they need to make enough money to pay to use the license. Thus standards patent licenses limit the number and type of implementations of a standard, effectively limiting the market to those with enough capital to enter it.</p>

<p>The fewer implementations of a standard, the less pressure there is on those implementations to be interoperable, because there are a known and limited number of other implementations with which they need to interoperate. The greater the market of implementations, the greater the drive for interoperability because it becomes increasingly likely that they will have to interoperate with each other.</p>

<p>For example, I have often had to move code that I have written based on an Open Standard from one implementation and use it in another. If it doesn&#8217;t work, I can work out which implementation is correct (by looking at the standard) and report the error to the implementation developers, which helps them improve the interoperability of their product. The ability to move between implementations is vital to ensure interoperability between them, and the broader the market the more likely that is to happen.</p>

<h3>8. How could adopting (Fair) Reasonable and Non Discriminatory ((F)RAND) standards deliver a level playing field for open source and proprietary software solution providers?</h3>

<p>Adopting (F)RAND standards does not deliver a level playing field across providers, because it limits the ability of open source providers to enter the market, as they have to recoup licensing cost. Only royalty-free licenses provide a completely level playing field across providers.</p>

<h3>9. Does selecting open standards which are compatible with a free or open source software licence exclude certain suppliers or products?</h3>

<p>Selecting Open Standards necessarily excludes those products that only use closed standards, and suppliers that only offer those products. In the short term, suppliers who have built their products and business models around closed standards and lock-in will be excluded.</p>

<p>However, there is no requirement for companies that adopt Open Standards to license their products using a free or open source software license. While there are often free or open source products built around Open Standards, these are only competitive when they are at the same quality as those with closed licenses.</p>

<p>An example from legislation.gov.uk is that our early development used eXist, which is an XML database available under an open source software license which implemented the Open Standards of XML and XQuery. It became clear that (at that time) eXist did not support the level of use that we needed, in its performance and its scalability. We therefore instead adopted MarkLogic, which uses the same Open Standards but is not available under an open source software license. This demonstrates how companies which adopt Open Standards can still offer competitive value even within a market with open source implementations.</p>

<p>Another interesting point to draw from this is that the presence of an open source implementation of an Open Standard enabled us to prototype and experiment using that implementation, knowing that should we need better performance and so on we would be able to move all our code to another interoperable implementation. If MarkLogic had implemented a custom method of querying XML, committing to paying for it early in the process would have been too high a risk. So the use of Open Standards effectively helped MarkLogic win that business.</p>

<h3>10. Does a promise of non-assertion of a patent when used in open source software alleviate concerns relating to patents and royalty charging?</h3>

<p>I would personally be very wary of implementing a standard which had a promise of non-assertion of a patent, and standards made available under those terms would seem more risky and costly to implement because any such promise would have to be checked by a lawyer and would likely constrain my future actions and the use of the software in other environments.</p>

<h3>11. Should a different rationale be applied when purchasing off-the-shelf software solutions than is applied when purchasing bespoke solutions?</h3>

<p>The same policy should be applied both to off-the-shelf and bespoke solutions, particularly as the difference between these is not at all clear cut: off-the-shelf solutions are often customised, and bespoke solutions built from off-the-shelf products. Whatever the type of solution, the crucial point is that it needs to interoperate with other products, using Open Standards.</p>

<h3>12. In terms of standards for software interoperability, data and document formats, is there a need for the Government to engage with or provide funding for specific committees/bodies?</h3>

<p>The Government should engage with those standards bodies that work on standards that the Government uses, so that it can shape the development of those standards and highlight new areas where standards work is needed to satisfy the UK Government&#8217;s requirements. In my own area of web standards, the main one is the W3C. In fact, given the UK Government&#8217;s transparency, open data and &#8220;digital by default&#8221; policies, all which require the use of W3C standards, the UK Government is under-represented within W3C, with only two agencies (Ordnance Survey (OS) and The National Archives (TNA)) being members. Other public-sector organisations which make heavy use of W3C standards, such as the BBC, have made a business decision be become more engaged in W3C activities in order to shape and influence them.</p>

<p>Membership of W3C and other standards bodies is particularly important where standards development impacts on the ability to achieve the goals of particular organisations. For example, TNA are taking particular interest in the provenance work being done at W3C; the Government Digital Service should be participating in the development of standards in web design and applications; the Office of National Statistics should be taking an interest in the development of the Data Cube vocabulary for statistical information and so on.</p>

<h3>13. Are there any are other policy options which would meet the described outcomes more effectively?</h3>

<p>I believe that the Open Standards policy described in the consultation is the best way to achieve lower cost, higher interoperability, reduced lock-in, increased innovation and competition in the right areas and to level the playing field both for open source software and for small and medium enterprises who wish to compete for government contracts.</p>

<h2>Open Standards Mandation</h2>

<h3>1. What criteria should the Government consider when deciding whether it is appropriate to mandate particular standards?</h3>

<p>The only time the Government should mandate a particular Open Standard for IT is when there is a clear and apparent cost in two competing Open Standards that offer equivalent functionality being adopted by different parts of Government due to poor interoperability between them. In most cases, central government should avoid mandating a particular Open Standard, but instead let individual public-sector organisations select an appropriate Open Standard based on their own requirements.</p>

<p>Government should mandate a particular Open Standard where:</p>

<ul>
<li>there are competing Open Standards that cover the given functionality and</li>
<li>interoperability or conversion between these standards is lossy or difficult and</li>
<li>there are multiple organisations within government with which interoperability involving the standard is required</li>
</ul>

<h3>2. What effect would mandating particular open standards have on improving value for money in the provision of government services?</h3>

<p>A central government authority mandating particular Open Standards could reduce value for money in the provision of government services, because that central authority is unlikely to fully understand the requirements of the particular service in detail. Public sector bodies who are actually procuring solutions should select particular Open Standards as part of the procurement process.</p>

<p>Where mandating particular Open Standards could improve value for money is if there are competing Open Standards and different public sector bodies are likely to adopt different ones. In those cases, there may be interoperability problems between the standards which make it more costly to provide services, and mandating the adoption of a particular Open Standard could help.</p>

<h3>3. Are there any legal or procurement barriers to mandating specific open standards in the UK Government&#8217;s IT?</h3>

<p>I do not know.</p>

<h3>4. Could mandation of competing open standards for the same function deliver interoperable software and information at reduced cost?</h3>

<p>It is unclear what this question means.</p>

<p>Mandating both of two competing Open Standards may increase the availability of interoperable software, but it will raise the cost of implementation and therefore of software, because implementing two standards takes more development effort than implementing one.</p>

<p>Mandating a single Open Standard from two competing standards may increase interoperability but comes with a possible risk and cost. It may be that two competing standards, while similar, have different target audiences and capabilities. If one is mandated, those public sector bodies whose requirements fit more closely with the other will suffer increased cost in trying to use the first standard to fit their requirements. In addition, the standards may evolve over time such that they either diverge in functionality or increase in interoperability (so that mandation is no longer necessary) or such that the original judgement about which to mandate no longer applies.</p>

<h3>5. Could mandation of open standards promote anti-competitive behaviour in public procurement?</h3>

<p>Anti-competitive behaviour arises when the number of potential competitors is artificially restricted due to the terms of the procurement. Whether mandating a particular Open Standard promotes anti-competitive behaviour depends on two factors.</p>

<p>First, it depends on the nature of the Open Standard and its implementations. Some Open Standards are small and easy to implement while others are large and complex to implement. Some have open source implementations across a number of platforms, others have few implementations, available under restricted terms or on restricted platforms. In the short term, mandating an Open Standard that is costly for a company to implement and for which there is no easily available implementation is going to favour those companies who have already implemented the Open Standard. In the longer term, by their nature, implementations of Open Standards tend to become more widely available, and companies are likely to invest in implementing them if they are industry standards. Crucially, because the standards are Open, they are freely able to read them and can implement them without  royalty payments, so in the long term it is not anti-competitive.</p>

<p>Secondly, the reason for the mandation of a particular Open Standard should be clear within the procurement exercise and a particular Open Standard should not be mandated unless there are clear reasons for that mandation. As an example from my own experience, there are two Open Standards for embedding metadata within HTML pages: RDFa and microdata. Either could be used to provide largely the same functionality so there would generally not be a need to mandate one or the other during procurement, but there should be a requirement to show how the standard selected by the supplier would be used to achieve the aims of the system.</p>

<h3>6. How would mandation of specific open standards for government IT software interoperability, data and document formats affect your organisation/business?</h3>

<p>Mandating specific Open Standards could limit the approaches that I would be able to take in my work, which could mean that I was less able to select an appropriate technology based on the requirements of the system. Any technology selection requires balancing a large set of requirements, both in terms of functionality and in terms of performance, reliability, scalability and so on. As a developer, I and my direct customers have the clearest understanding of these requirements and their relative importance within the system. These are complex choices, and they should not be made by central authorities who do not understand the details.</p>

<p>It is much more important to me to have a clear understanding that Open Standards should be used wherever possible and which Open Standards are being used within the organisations that interact with the systems that I am responsible for, so that the systems I build interoperate with them more smoothly.</p>

<h3>7. How should the Government best deal with the issue of change relating to legacy systems or incompatible updates to existing open standards?</h3>

<p>Good Open Standards provide clear statements about both forwards and backwards compatibility and, as they are developed through open participation, the extent of changes and the reasons for them are usually clear, which makes the process of working out what needs to be upgraded easier than with closed standards. For example, I was involved in the development of the second version of XSLT, and those of us in the Working Group spent substantial time ensuring that the impact on existing users of XSLT were not too great and were thoroughly documented.</p>

<p>When adopting a particular Open Standard within a system, the Government or their suppliers should assess the impacts of future changes in the standard on the system and should be involved to an appropriate level in the development of the standard to ensure that it continues to meet requirements.</p>

<h3>8. What should trigger the review of an open standard that has already been mandated?</h3>

<p>The Government should continuously work with its suppliers during the contract term and with potential suppliers during procurement to assess the best Open Standard to use. Even when a given standard itself doesn&#8217;t change, the wider IT environment may alter: there may be more or fewer implementations over time, alternative standards, or a change in interoperating standards used by other organisations. Thus there are no particular trigger points at which an Open Standard should be reviewed, though re-procurement will naturally cause a re-exploration of a system&#8217;s environment and the best approach to its implementation.</p>

<h3>9. How should the Government strike a balance between nurturing innovation and conforming to standards?</h3>

<p>Good Open Standards have built-in extensibility points which provide the scope for implementer innovation while providing general interoperability. In the best cases these extensions gradually become standardised themselves: this is what has happened with XSLT, for example. Thus, conforming to standards does not prevent innovation; instead it focuses innovation on user requirements, and in particular on improving the quality of implementation. The Government&#8217;s Open Standards policy should ensure that when standards are selected they provide broad interoperability while giving scope for extension, and this is particularly important if the Government mandates particular Open Standards for general use.</p>

<h3>10. How should the Government confirm that a solution claiming conformity to a standard is interoperable in practice?</h3>

<p>Within the W3C, new standards must have an associated test suite, usually constructed both by the Working Group who develop the standard and from the test suites created by individual implementers of the standard. Running an implementation against such a test suite makes it possible to empirically test whether there is conformance to the standard. The results of running a given implementation against such a test suite are often available, for example see <a href="http://rdfa.info/earl-reports/">the RDFa test suite results</a>, but the Government could also ask the solution provider for evidence of interoperability in the form of test suite results.</p>

<h3>11. Are there any are other policy options which would meet the objective more effectively?</h3>

<p>A general Government policy to use Open Standards will in most cases naturally lead to the best Open Standard being adopted across the public sector without Government needing to mandate the use of specific Open Standards. Systems should be procured on the basis of their ability to interoperate with those other systems with which they need to work; in such an environment the easiest approach for suppliers will be to adopt the standards used by other systems.</p>

<p>We have seen this happen within legislation.gov.uk, where we store and publish legislation using a particular XML vocabulary. Various parliaments and government departments draft legislation which is published on the site, and they have traditionally done so based on the particular requirements of the drafters themselves (for example, the UK parliament uses Framemaker while government departments use Word). We have to cater for these different formats, and converting them into the standard that we use within legislation.gov.uk. However, as these authoring systems are being re-procured, the ability to produce the XML vocabulary that we use within legislation.gov.uk has become part of the requirements on future authoring systems, because it gives greater fidelity between authoring and eventual publication. Thus we are naturally moving towards a more interoperable environment without any central mandation of the standard and without overriding the local requirements of particular authors.</p>

<h2>International Alignment</h2>

<h3>1. Is the proposed UK policy compatible with European policies, directives and regulations (existing or planned) such as the European Interoperability Framework version 2.0 and the reform proposal for European Standardisation?</h3>

<p>I do not know.</p>

<h3>2. Will the open standards policy be beneficial or detrimental for innovation and competition in the UK and Europe?</h3>

<p>The UK&#8217;s Open Standards policy will be beneficial to innovation and competition in the UK because it levels the playing field for a wider set of providers and focuses innovation and competition on the requirements of the users of software rather than the suppliers. These benefits apply as well to Europe as to the UK, and the successful adoption of an Open Standards policy within the UK will naturally aid the adoption of similar policies in other countries within Europe and internationally.</p>

<h3>3. Are there any are other policy options which would meet the objectives described in this consultation paper more effectively?</h3>

<p>The one part of the international alignment policy that needs to be rethought is the preference to international standards over local standards. While in general the wider and greater adoption of a given standard, the better, it is not always the case that international standards provide greater benefits than local ones.</p>

<p>Looking at providing data about legislation, for example, different jurisdictions have very different ways of identifying, creating, revising and formatting legislation. Any international standards for legislation are highly unlikely to take into account the complexities and special cases that are specific to UK legislation (such as the use of regnal years for identifying older items). A local standard can be better tailored to the requirements of the locality, which may be incredibly important in easing implementation cost and data fidelity.</p>

<p>I would therefore recommend a policy approach that emphasised interoperability with international standards without requiring their wholesale adoption, particularly where there are specific local requirements that are not met by the international standard.</p>
    ]]></content>
  </entry>
  <entry>
    <title>Content and Descriptions of Web Resources</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/168" />
    <id>http://www.jenitennison.com/blog/node/168</id>
    <published>2012-03-31T22:27:08+01:00</published>
    <updated>2012-06-26T20:53:54+01:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="linked data" />
    <category term="rest" />
    <category term="tag" />
    <summary type="html"><![CDATA[<p>Those readers who follow the <a href="http://lists.w3.org/Archives/Public/www-tag/">TAG</a> or <a href="http://lists.w3.org/Archives/Public/public-lod/">public-lod</a> mailing lists over the last couple of weeks cannot have failed to notice a large number of posts on a theme that recurs on roughly a 9-monthly cycle within these communities: <a href="http://www.w3.org/2001/tag/group/track/issues/14">httpRange-14</a>.</p>

<p>The reason for this particular recurrence was a <a href="http://www.w3.org/2001/tag/doc/uddp/change-proposal-call.html">Call for Change Proposals</a> on the resolution. The TAG meets on Monday, and discussion of this issue is one of the first items on <a href="http://www.w3.org/2001/tag/2012/04/02-agenda">our agenda</a>. These are my thoughts going in to that discussion.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>Those readers who follow the <a href="http://lists.w3.org/Archives/Public/www-tag/">TAG</a> or <a href="http://lists.w3.org/Archives/Public/public-lod/">public-lod</a> mailing lists over the last couple of weeks cannot have failed to notice a large number of posts on a theme that recurs on roughly a 9-monthly cycle within these communities: <a href="http://www.w3.org/2001/tag/group/track/issues/14">httpRange-14</a>.</p>

<p>The reason for this particular recurrence was a <a href="http://www.w3.org/2001/tag/doc/uddp/change-proposal-call.html">Call for Change Proposals</a> on the resolution. The TAG meets on Monday, and discussion of this issue is one of the first items on <a href="http://www.w3.org/2001/tag/2012/04/02-agenda">our agenda</a>. These are my thoughts going in to that discussion.</p>

<!--break-->

<h2>The Questions</h2>

<p>The recent discussion on the lists has, I think, helped to refine the questions that lie at the core of the httpRange-14 issue. They are:</p>

<ol>
<li>When you get a successful response from a URI, does that response, by definition, include the <em>content</em> of the resource identified by the URI?</li>
<li>How can you discover a <em>description</em> of the resource identified by a URI?</li>
</ol>

<p>Knowing whether the response to a URI provides the content of the resource identified by that URI is important because when you have data about the thing identified by a URI, such as its author or the license that it is provided under, you need to know what information is actually being referred to so that you can tell what information you can reuse and whom you have to attribute. </p>

<p>For example, the <a href="http://www.gov.uk/">GOV UK</a> website has a license at the bottom of each page:</p>

<pre><code>&lt;p&gt;
  Much of the information on this website is available for reuse under the 
  &lt;a href="http://www.nationalarchives.gov.uk/doc/open-government-licence/" 
     rel="licence"&gt;Open Government Licence&lt;/a&gt;
&lt;/p&gt;
</code></pre>

<p>Seeing this, an application that knows the <a href="http://www.nationalarchives.gov.uk/id/open-government-licence/">Open Government License</a> enables free reuse can tell that it can lift content out of the page and use it on their own site. An application could automatically scrape out and republish the first paragraph of those <a href="https://www.gov.uk/government/news-and-speeches">news stories</a> provided on this site and any others that were published with under this license.</p>

<h2>The Conflict</h2>

<p>There are vocal disagreements about particularly the first of the two questions I outlined above. What&#8217;s become clear to me is that the source of the arguments stem from a difference in world view about what kind of resources are available on the web.</p>

<h2>Web of Data</h2>

<p>Under the <em>web of data</em> view, the web consists of data, and all the resources on the web are <em>information resources</em>, defined as those <a href="http://www.w3.org/TR/webarch/#def-information-resource">resources whose essential characteristics can be conveyed in a message</a>. Data, in other words.</p>

<p>URIs can still be used to name other resources, which are not on the web either because they are not information resources (such as a Person) or because they are not available yet (such as unscanned books). Under this world view, however, giving a successful HTTP response for such a resource is simply wrong, because these resources aren&#8217;t on the web.</p>

<p>The problem that this world view therefore needs to address is how to create URIs to identify resources that aren&#8217;t on the web. There are two answers:</p>

<h4>Hash URIs</h4>

<p>Hash URIs have the benefit that there is a direct relationship between the hash URI which identifies the resource and a resource on the web that describes it. An HTTP client naturally strips the fragment identifier from the URI in order to make the request to a server, which then delivers the description of the resource.</p>

<h4>303 Redirections</h4>

<p>If you identify a resource that isn&#8217;t on the web using an HTTP URI that is not a hash URI, you cannot get a successful response back because the resource you have asked for is, by definition in this world view, not on the web. The workaround is for the publisher to use the <a href="http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-19#section-7.3.4">303 See Other</a> status code to point from the resource that you requested to its description on the web. (This is the essence of <a href="http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html">the httpRange-14 resolution</a>.)</p>

<h3>Web of Things</h3>

<p>Under the <em>web of things</em> view, the web consists of things, and resources on the web could be anything: documents, people, films, teapots and so on. When a client makes an HTTP request for a resource, the response must reflect the state of the resource, but that state could be its content (if it&#8217;s an information resource) or it could be a description of the resource.</p>

<p>Under this world view, giving a successful HTTP response for a resource that isn&#8217;t an information resource is absolutely fine: the description of the resource is still a reflection of its state.</p>

<p>The problem that needs to be addressed when you have this world view becomes apparent when you think back to the licensing example above. Given an application knows that the resource identified by a given URI can be reused, how does it know whether the representation of that resource is reusable? It could be that the identified resource (for example an out-of-copyright book) has an open license, but that the representation of the resource holds only a description of that resource (some metadata about the book), and that description has a much more restrictive license. Or vice versa.</p>

<p>So to address this use case, you need some other mechanism to enable an application to tell that the representation is the content of the resource, rather than merely a description of it.</p>

<h2>The Current State</h2>

<p>To generalise, the linked data community operates within the <em>web of data</em> world view and the larger web community operates within the <em>web of things</em> world view.</p>

<p>What is happening increasingly, however, is that these two world views are rubbing up against each other, and while both are internally coherent, switching between the world views causes not only a cognitive disconnect for developers but practical problems when transforming or moving data published under one world view into the other world view.</p>

<p>In addition, publication of data on the web through APIs is growing all the time, particularly <a href="https://en.wikipedia.org/wiki/Representational_State_Transfer">REST APIs</a> supporting the <a href="https://en.wikipedia.org/wiki/HATEOAS">Hypertext as the Engine of Application State (HATEOAS) principle</a>. As we share more data on the web, and we use URIs in our APIs, the question of what those URIs mean and how we associate licenses and provenance information with data, will only become more important.</p>

<p>We have an obligation, therefore, to reflect on the experience from the linked data community over the last few years and how that experience might spread to the larger web community.</p>

<p>Discussions within the linked data community over the httpRange-14 resolution centre on two problems that people have encountered:</p>

<ol>
<li>the <em>web of data</em> vs <em>web of things</em> disconnect hurting adoption</li>
<li>practical aspects of responding to requests with 303 redirections such as
<ul><li>round-trip delays, particularly as in HTTP 1.1 303s can&#8217;t be cached</li>
<li>inability for people to use 303s without server admin access</li></ul></li>
</ol>

<h2>The Social Context</h2>

<p>A bit of a side-point here. I think that the questions I posed at the start of this post are general questions about web architecture, so it puzzles me that the only people who seem to really care about them, and who debate them endlessly, are the linked data community. This is partly because the linked data community use URIs extensively to identify the things about which they provide data, but I think it&#8217;s also about the fundamental attitude of those within the community, which was characterised in a <a href="http://lists.w3.org/Archives/Public/public-lod/2012Mar/0185.html">recent post</a> by <a href="http://www.seme4.com/who-we-are/profile/hugh-glaser/">Hugh Glaser</a>:</p>

<blockquote>
  <p>Personally, I never did agree with the solution [to httpRange-14], but have always aimed to carry out the implications of it in the systems I construct.</p>
  
  <p>This is for two reasons:<br>
  a) as a member of a small community, it is destructive to do otherwise;<br>
  b) as a professional engineer, my ethical obligations require me to do so.</p>
  
  <p>It is this second, the ethical obligations that are the most significant.<br>
  I should not digress from the standards, or even Best Practice, in my work.</p>
</blockquote>

<p>The linked data community is jam packed with people who feel an ethical obligation to adhere to standards and best practices. We try to do what we are told is the Right Thing by individuals and standards organisations even when we don&#8217;t agree that it is the Right Thing and even if it turns out to be impractical.</p>

<p>In the larger web community, people who don&#8217;t agree with a standard or best practice, or who find it too impractical to implement, simply ignore it. There is no need to endlessly debate something that you can just ignore. And the httpRange-14 resolution is ignorable by the larger web community because so far it has had very little impact on any implementations at all, let alone widely-deployed implementations that work over the non-linked-data web.</p>

<h2>The Choices</h2>

<p>Going into the TAG meeting about this on Monday, the main decision that I see is whether to continue to assume a <em>web of data</em> world view. In the <em>web of data</em> world view, it is impossible for a URI to return a description of a resource, whereas in the <em>web of things</em> world view it is fine. Personally, I would prefer to design around the <em>web of things</em> world view as I think this would ease some of the disconnects between linked data and the wider web, but there are others on the TAG who adhere strongly to the <em>web of data</em> view, so I think that change is unlikely.</p>

<p>If we stick with the <em>web of data</em> view, the main issues are how to alleviate the current practical difficulties that people are encountering with its implementation and explanation. I think there are three measures that would help:</p>

<ol>
<li><p>Determine a conventional syntax for fragment identifiers that are used to identify things that are not on the web, as opposed to fragments of content. I&#8217;m thinking something like hash-bang URIs: using a character after the hash character that just gives a quick indication that the fragment identifier is being used in a special way, to refer to something that isn&#8217;t on the web rather than a fragment of a document, for example <code>#*</code>.</p></li>
<li><p>Change to recommending a single best practice of using hash URIs for resources that aren&#8217;t on the web, and in particular recommending having a one-to-one correspondence between resources on the web and those not on the web, using one particular conventional hash URI. For example, <code>http://www.whitehouse.gov/#*</code> would identify the resource that <code>http://www.whitehouse.gov/</code> is about: The Whitehouse. This ensures that new publishers of data won&#8217;t run into the problems with publishing using 303 redirections, because they won&#8217;t use that method of publication. It also removes choice, which helps adopters who can otherwise get overwhelmed with options and the trade-offs between them.</p></li>
<li><p>Allow publishers who are currently using 303 redirections to publish descriptions of resources identified using non-hash URIs to switch to providing a representation using a 200 status code, along with a method of indicating that the representation is the <em>description</em> of the resource rather than its <em>content</em>. This indicator could be:</p>

<ul><li>a new HTTP header or status code (though I&#8217;d prefer not)</li>
<li>a Link: header with a particular relationship (eg &#8216;describedby&#8217;)</li>
<li>a statement embedded in the response itself (eg a <code>&lt;link rel="describedby"&gt;</code> element in HTML)</li></ul></li>
</ol>

<p>If we did move to a <em>web of things</em> view, the main question would be how to provide an indicator that the representation of a particular resource is the content of that resource as opposed to being a description. It would help ease transition if this was a natural consequence of the current pattern of publication on httpRange-14-compliant sites, so for example, you&#8217;d want to consider the representation of a resource the content of the resource if you got to it:</p>

<ul>
<li>when retrieving a hash URI, if it was the part of the URI before the hash</li>
<li>when following a 303 See Other redirection, if it was the target of the redirection</li>
<li>when following a &#8216;describedby&#8217; link, if it was the target of the link</li>
</ul>

<p>as well as if there was an explicit indicator within the representation that said the resource was an information resource.</p>

<p>Whichever decisions are made, I would personally like to see the concrete requirements on client behaviour that arise from these different publication practices, for example enabling a reuser to associate a license with a particular piece of content or a crawler to create RDF statements about URIs encountered on the web, to bring whatever decisions are made down to earth and less ignorable.</p>
    ]]></content>
  </entry>
  <entry>
    <title>Precious Snowflakes</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/167" />
    <id>http://www.jenitennison.com/blog/node/167</id>
    <published>2012-03-10T11:56:36+00:00</published>
    <updated>2012-05-19T11:55:47+01:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="betagovuk" />
    <category term="gds" />
    <category term="web" />
    <summary type="html"><![CDATA[<p><em>Disclaimer: As usual, this post contains my personal opinion and does not reflect that of any organisation with which you might associate me.</em></p>

<p>The other day, I had a lovely conversation with some folks from the BBC about some of their future plans. In the course of the conversation, <a href="http://smethur.st/">Michael Smethurst</a> spoke about his frustration when dealing with people involved with particular <a href="http://www.bbc.co.uk/programmes">programmes at the BBC</a>, where every single one of them thinks their programme is a &#8220;precious snowflake&#8221;, completely unique, that simply can&#8217;t be treated in the same way as all the other programmes described on the site.</p>

<p>Michael&#8217;s point, of course, is that TV programmes have a hell of a lot of similarities with each other. They all have episodes and cast members and may have trailers or be available on iPlayer. When the BBC models them in the same way, they gain enormous efficiencies in their ability to store and access information about programmes: they can reuse code, share content between programmes, and perform analyses over the aggregated data set. It&#8217;s great for users too: they get the same fantastic user experience no matter which programme they are viewing information about, and can apply the experience they gain when navigating pages about one programme when they need to find information about another.</p>

<p>The ability to classify and categorise, to bring order to what seems like chaos, to create a model of the world, is one of the things that marks humans from animals. We can look at a hundred people, with different colour hair and skin; different height and build; smiling, talking, crying, and still call them all Person because the essential characteristics that govern how we interact with them are the same.</p>

<p>But if there&#8217;s one thing that the last five long, hard years working with legislation has taught me, it&#8217;s that in any vaguely interesting domain, this search for order will always fall down in the face of reality.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p><em>Disclaimer: As usual, this post contains my personal opinion and does not reflect that of any organisation with which you might associate me.</em></p>

<p>The other day, I had a lovely conversation with some folks from the BBC about some of their future plans. In the course of the conversation, <a href="http://smethur.st/">Michael Smethurst</a> spoke about his frustration when dealing with people involved with particular <a href="http://www.bbc.co.uk/programmes">programmes at the BBC</a>, where every single one of them thinks their programme is a &#8220;precious snowflake&#8221;, completely unique, that simply can&#8217;t be treated in the same way as all the other programmes described on the site.</p>

<p>Michael&#8217;s point, of course, is that TV programmes have a hell of a lot of similarities with each other. They all have episodes and cast members and may have trailers or be available on iPlayer. When the BBC models them in the same way, they gain enormous efficiencies in their ability to store and access information about programmes: they can reuse code, share content between programmes, and perform analyses over the aggregated data set. It&#8217;s great for users too: they get the same fantastic user experience no matter which programme they are viewing information about, and can apply the experience they gain when navigating pages about one programme when they need to find information about another.</p>

<p>The ability to classify and categorise, to bring order to what seems like chaos, to create a model of the world, is one of the things that marks humans from animals. We can look at a hundred people, with different colour hair and skin; different height and build; smiling, talking, crying, and still call them all Person because the essential characteristics that govern how we interact with them are the same.</p>

<p>But if there&#8217;s one thing that the last five long, hard years working with legislation has taught me, it&#8217;s that in any vaguely interesting domain, this search for order will always fall down in the face of reality.</p>

<!--break-->

<p>Surely, I thought in my naive early days, every piece of legislation is uniquely identified through its type, calendar year, and number? Not so! There are six items for which this is not the case, because prior to 1963 legislation was numbered based on the year of reign of the monarch rather than the calendar year.</p>

<p>Surely the year that is used to number legislation is dependent on the date it is made and written into law? Not so! Sometimes departments forget to register legislation they make until the following year, so it is numbered the year after it&#8217;s made.</p>

<p>Surely an item of legislation can only make changes to legislation from the day it is written into law? Not so! There is, rarely, legislation that rewrites history: that says other legislation should always have had different content to that which was originally written.</p>

<p>It has come to the point where I never (hah!) make any statements about legislation of the form &#8220;X never happens&#8221; or &#8220;Y is always true&#8221; because there is always, <em>always</em>, an exception.</p>

<p>What this has taught me, as a developer, is the power and necessity of escape hatches. For example, templating languages that provide a method of escaping to code are so much more valuable than those that do not. Similarly, I favour strongly, in the technologies that I use, the ability to extend a common data structure, be it through <code>data-*</code> attributes in HTML, through generic elements such as <code>&lt;span&gt;</code> and <code>&lt;div&gt;</code> or through the essentially open-ended nature of RDF as a data model.</p>

<p>It has also given me a very different view of the world to Michael. Because when you accept that there are always exceptions, you do not see snowflakes as merely crystals of water, but as exceptional, beautiful and, yes, immensely precious.</p>

<p>And this is why I love the web. The web does not force every site to have the same structure or the same look and feel. It does not insist on consistency; it has space for every quirk. And it proves beyond all doubt that it is possible for all these precious snowflakes to exist in a single, global, interlinked information system in which people manage to find not only the information that they need, but also community and connection with each other.</p>

<h2>Inside Government</h2>

<p>So it is with these eyes that I look at the new <a href="https://www.gov.uk/government">Inside Government</a> pages on the <a href="https://www.gov.uk/">gov.uk site</a> and am frankly horrified. Because we&#8217;re not just talking about a BBC programmes here, but about <a href="https://www.gov.uk/government/organisations">powerful institutions</a>, many of them decades if not centuries old, that lie at the very heart of government and how our nation is run. And each of them is relegated to a subfolder of a subfolder of a subfolder, their unique histories and approaches and goals expressed through three pictures on a carousel.</p>

<p>It feels like some kind of Orwellian nightmare: the <a href="http://digital.cabinetoffice.gov.uk/2012/01/31/this-is-why-we-are-here/">relentless focus on user needs</a> leading to a future of identikit pages, with no individuality, no character, no clue that behind these pages &#8212; which, remember, under the <a href="https://www.gov.uk/government/policies/launching-the-single-domain">Single Government Domain policy</a> becomes the single authoritative view, <em>the</em> site that represents the department on the web &#8212; is a living and breathing institution that manages hugely important parts of our lives. A future in which what each department says and the way that it says it is governed through the <a href="http://digital.cabinetoffice.gov.uk/">Government Digital Service (GDS)</a>, in <a href="http://www.cabinetoffice.gov.uk/">Cabinet Office</a>, the hand of the prime minister. And <a href="http://digital.cabinetoffice.gov.uk/2012/03/07/does-local-government-need-a-local-government-digital-service/">now we&#8217;re talking about local government too</a>?</p>

<p>Let us just look at one example. Last September, <a href="http://www.fco.gov.uk/en/news/latest-news/?view=Speech&amp;id=652930982">William Hague gave a speech</a> in which he described the hollowing out of the <a href="http://www.fco.gov.uk/">Foreign and Commonwealth Office (FCO)</a> by the previous government, a process that scrapped its language school, closed embassies and destroyed its library. He said:</p>

<blockquote cite="http://www.fco.gov.uk/en/news/latest-news/?view=Speech&amp;id=652930982">
Strong institutions are necessary in civil society, to encourage participation and keep in check an overmighty State; they are necessary to our judiciary and Parliament so that the law is upheld and the making of it respected; but they are also necessary within the State, a point tragically overlooked by those Prime Ministers who have created and abolished departments on a fancy or a whim, destroying as they did so the pride and continuity of thousands of public servants while rendering government incomprehensible to the average citizen. The whole country should know what the Foreign and Commonwealth Office is and what it does, and all those interested in foreign policy at home or abroad should see it as a centre of excellence with which they aspire to be associated.
</blockquote>

<p>For most UK citizens, the only point of access to the Foreign and Commonwealth Office is its website: they will not visit <a href="http://www.fco.gov.uk/en/about-us/our-history/our-buildings/buildings-in-uk/king-charles-street/">King Charles Street</a>, nor any of the <a href="http://www.fco.gov.uk/en/travel-and-living-abroad/find-an-embassy/">UK&#8217;s embassies</a>. The department&#8217;s web presence is the only way that it makes itself, and its unique role, comprehensible to the average citizen, the only method of letting the whole country know what the FCO is and what it does. And they have content that is completely unique to them: a <a href="http://www.fco.gov.uk/en/treaties/search">database of Treaties</a>, a hugely rich set of information on <a href="http://www.fco.gov.uk/en/travel-and-living-abroad/">travel and living abroad</a> and a <a href="http://www.fco.gov.uk/en/about-us/our-history/">wealth of historical information about the Foreign Office</a>. This simply doesn&#8217;t fit in a model of a department as a set of Ministers, Policies, Publications and so on. And if it doesn&#8217;t fit, will it simply be excluded, lost from its website like its language school, its embassies, its library?</p>

<p>I could have picked any government department here &#8212; each one has its unique characteristics and content &#8212; but Hague articulates the case around FCO so well. His message is not the expression of a simple conservative impulse to resist change and preserve the status quo, but about maintaining the integrity of an institution&#8217;s identity and independence <q cite="http://www.fco.gov.uk/en/news/latest-news/?view=Speech&amp;id=652930982">to encourage participation and keep in check an overmighty State</q>. If we believe in Open Government, Open Democracy and the power of the web to enhance civic engagement then we must, surely, enable each of these institutions to have their own independent voice on the web.</p>

<p>I am reminded of the <a href="http://xkcd.com/">XKCD</a>:</p>

<p style="text-align: center;">
<a href="http://xkcd.com/773/"><img src="http://imgs.xkcd.com/comics/university_website.png" /></a>
</p>

<p>The two sides of this Venn diagram illustrate two approaches to building a website for an organisation. On the left is the website as an expression of the identity of the institution, on the right the website as a means of satisfying the reason the user originally visited the site. My argument is not that the right side of this diagram is unimportant &#8212; in fact I believe it is absolutely essential &#8212; but that an institution&#8217;s website must cover the entire space: it must provide a mechanism for self-expression as well as catering for its user&#8217;s requirements. To enhance civic engagement, we do not need to simply answer the query that led the user to the site, but to encourage and lead them on to see more about the institution that has provided the answer.</p>

<p>It is only the institution itself that knows the self it wants to express, and because the real world is complex and organisations are unique, that self will not fit into any model that we devise. News, Policies, Consultations &#8212; of course these are all important to all departments, but they are the tip of an iceberg. Look at the space that <a href="http://www.fco.gov.uk/en/about-us/our-history/">FCO devotes to its history on its website</a>: this shouts to the world the kind of reliable, solid and flexible organisation that they are and want to remain. Compare how <a href="http://www.decc.gov.uk/en/content/cms/statistics/statistics.aspx">DECC devotes space to statistics</a>, emphasising its adherence to transparency and evidence-based policy. Self-expression is so much more than changing logos or backgrounds, more than having different content on an About page, it is about making space for the things that are important to <em>you</em>.</p>

<p>&#8220;But but but!&#8221; I know the arguments. We must cut costs, stop the uncontrolled proliferation of government websites; we must improve the quality of the government&#8217;s presence on the web, present a unified view, make it easy for users to locate content without knowing where to look. The vision we see expressed through Inside Government is but the natural conclusion, the end of that slippery slope. But it is the great <a href="https://en.wikipedia.org/wiki/Slippery_slope_fallacy">slippery slope fallacy</a> that everything must be taken to its natural conclusion, that because 750 websites is too many, one is enough.</p>

<p>Possibly the biggest irony of the gov.uk beta is that while it is delivering a Single Government Domain &#8212; everything is to be found under <code>www.gov.uk</code> &#8212; it does not seem to address the core reason stated for providing it. In <a href="https://whitehall-frontend-production.s3.amazonaws.com/system/uploads/attachment/file/745/Martha_Lane_Fox_s_letter_to_Francis_Maude_14th_Oct_2010.pdf">Martha Lane-Fox&#8217;s letter to Francis Maude</a>, which kicked off this whole endeavour, she said:</p>

<blockquote cite="https://whitehall-frontend-production.s3.amazonaws.com/system/uploads/attachment/file/745/Martha_Lane_Fox_s_letter_to_Francis_Maude_14th_Oct_2010.pdf">
Government publishes millions of pages on the Web, via hundreds of different websites. Most of these sites are still run as silos within departments. This fragmentation leads to significant duplication of functions and technology, and means the overall user experience is highly inconsistent.
</blockquote>

<p>Try <a href="https://www.gov.uk/search?q=Single+Government+Domain">searching for &#8216;Single Government Domain&#8217; on the main gov.uk site</a> or for <a href="https://www.gov.uk/government/search?q=driving+test+centre">driving test centres on Inside Government</a>. The searches do not work (though Inside Government does give you a link that enables you to search the other silo). The result pages are completely different in feel except for the top and bottom banners. The page on <a href="https://www.gov.uk/arrest-imprison-abroad">Arrest and Imprisonment Abroad</a> mentions but does not link to the <a href="https://www.gov.uk/government/organisations/foreign-and-commonwealth-office">Foreign and Commonwealth Office&#8217;s page</a>. Yes, yes, I know that it&#8217;s still beta, but these things lie at the heart of the stated rationale for a Single Government Domain: is this the extent of the consistency and integration that we are aiming for?</p>

<p>Yes, it is, because the Single Government Domain policy was never truly about either of these things. Read Martha Lane-Fox&#8217;s letter again carefully (my emphasis):</p>

<blockquote cite="https://whitehall-frontend-production.s3.amazonaws.com/system/uploads/attachment/file/745/Martha_Lane_Fox_s_letter_to_Francis_Maude_14th_Oct_2010.pdf">
<strong>No1O feel</strong> it is preferable to go from 750 top level website domains (eg www.cabinetoffice.gov.uk) to a single top level website domain for all of central government.
</blockquote>

<p>The Single Government Domain policy, indeed GDS itself, is about control. It is &#8220;<a href="http://digital.cabinetoffice.gov.uk/about/">we will do it for you</a>&#8221;, not &#8220;we will help you do it&#8221;. It is about managing the output of institutions that might <q cite="http://www.fco.gov.uk/en/news/latest-news/?view=Speech&amp;id=652930982">keep in check an overmighty State</q>. It is anti-web and it is anti-democracy and I cannot remain quiet about it any longer.</p>

<p>To my friends at GDS: I respect and admire you all. You are incredibly talented and able to do amazing things. You have behind you a level of financial and political support the like of which most civil servants will never see. I know you have joined GDS not just to do work that you love but to do good for the country. This is my plea to you: find a way to avoid this vision. Nurture the exceptions. Give institutions their voice. Treat them as precious snowflakes.</p>
    ]]></content>
  </entry>
  <entry>
    <title>Microdata and RDFa Living Together in Harmony</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/165" />
    <id>http://www.jenitennison.com/blog/node/165</id>
    <published>2011-08-20T17:39:11+01:00</published>
    <updated>2012-05-19T11:55:24+01:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="microdata" />
    <category term="rdfa" />
    <summary type="html"><![CDATA[<p>One of the options that the TAG put forward when it <a href="http://lists.w3.org/Archives/Public/public-html/2011Jun/0366.html">asked the W3C to put together task force on embedded data in HTML</a> was the co-existence of RDFa and microdata. If <a href="http://lists.w3.org/Archives/Public/www-tag/2011Aug/0050.html">that&#8217;s what we&#8217;re headed for</a>, what might make things easier for consumers and publishers who have to live in that world?</p>

<p>In a situation where there are two competing standards, I think that developers &#8212; both on the publication and consumption sides &#8212; are going to want to hedge their bets. They will want to avoid being tied to one syntax in case it turns out that that syntax isn&#8217;t supported by the majority of publishers/consumers in the long term and they have to switch.</p>

<p>Publishers like us at <a href="http://www.legislation.gov.uk/">legislation.gov.uk</a> who are aiming to share their data to whoever is interested in it (rather than having a particular consumer in mind) are also likely to want to publish in both microdata and RDFa, rather than force potential consumers to adopt a particular processing model, and will therefore need to mix the syntaxes within their pages.</p>

<p>(Of course developers might just avoid embedded data altogether while they wait to see what happens, but let&#8217;s assume that they want to press ahead regardless of the lack of consensus from the standardistas.)</p>

<p>I&#8217;ve therefore embarked on a task of trying:</p>

<ul>
<li>to identify the differences in approach and functionality of the two languages, which should help developers choose between them</li>
<li>to identify any guidelines for developers of vocabularies for use with both languages</li>
<li>to identify a subset of functionality that is common between the two languages, which developers might want to stick to to make switching and mixing easier</li>
<li>to identify mapping rules that might be applied to automatically or manually map from one language to another if the simple subset is used</li>
</ul>

<p>I&#8217;ve done this by looking at converting microdata examples to RDFa and vice versa, and the lessons to be drawn from that exercise. I&#8217;ve broken down the result into three posts:</p>

<ul>
<li><a href="http://www.jenitennison.com/blog/node/163">converting microdata to RDFa</a></li>
<li><a href="http://www.jenitennison.com/blog/node/164">converting RDFa to microdata</a></li>
<li><a href="http://www.jenitennison.com/blog/node/165">lessons learned from this exercise</a></li>
</ul>

<p>This is the last of these posts. It is probably the only one you will want to read :)</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>One of the options that the TAG put forward when it <a href="http://lists.w3.org/Archives/Public/public-html/2011Jun/0366.html">asked the W3C to put together task force on embedded data in HTML</a> was the co-existence of RDFa and microdata. If <a href="http://lists.w3.org/Archives/Public/www-tag/2011Aug/0050.html">that&#8217;s what we&#8217;re headed for</a>, what might make things easier for consumers and publishers who have to live in that world?</p>

<p>In a situation where there are two competing standards, I think that developers &#8212; both on the publication and consumption sides &#8212; are going to want to hedge their bets. They will want to avoid being tied to one syntax in case it turns out that that syntax isn&#8217;t supported by the majority of publishers/consumers in the long term and they have to switch.</p>

<p>Publishers like us at <a href="http://www.legislation.gov.uk/">legislation.gov.uk</a> who are aiming to share their data to whoever is interested in it (rather than having a particular consumer in mind) are also likely to want to publish in both microdata and RDFa, rather than force potential consumers to adopt a particular processing model, and will therefore need to mix the syntaxes within their pages.</p>

<p>(Of course developers might just avoid embedded data altogether while they wait to see what happens, but let&#8217;s assume that they want to press ahead regardless of the lack of consensus from the standardistas.)</p>

<p>I&#8217;ve therefore embarked on a task of trying:</p>

<ul>
<li>to identify the differences in approach and functionality of the two languages, which should help developers choose between them</li>
<li>to identify any guidelines for developers of vocabularies for use with both languages</li>
<li>to identify a subset of functionality that is common between the two languages, which developers might want to stick to to make switching and mixing easier</li>
<li>to identify mapping rules that might be applied to automatically or manually map from one language to another if the simple subset is used</li>
</ul>

<p>I&#8217;ve done this by looking at converting microdata examples to RDFa and vice versa, and the lessons to be drawn from that exercise. I&#8217;ve broken down the result into three posts:</p>

<ul>
<li><a href="http://www.jenitennison.com/blog/node/163">converting microdata to RDFa</a></li>
<li><a href="http://www.jenitennison.com/blog/node/164">converting RDFa to microdata</a></li>
<li><a href="http://www.jenitennison.com/blog/node/165">lessons learned from this exercise</a></li>
</ul>

<p>This is the last of these posts. It is probably the only one you will want to read :)</p>

<!--break-->

<p>Please treat this as a draft on which I&#8217;d welcome comments. I have based what&#8217;s written here on the latest specifications of both microdata (in its <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html">WHAT WG</a> and <a href="http://dev.w3.org/html5/md/Overview.html">W3C</a> variants) and <a href="http://www.w3.org/2010/02/rdfa/drafts/2011/ED-rdfa-core-20110814/">RDFa Core</a> and <a href="http://dev.w3.org/html5/rdfa/">HTML+RDFa</a> but I haven&#8217;t consulted with anyone involved in these efforts and may well have got things wrong. Plus the specs are changing all the time. I have only here considered the syntax of the two languages, not the features such as DOM APIs or drag-and-drop support, where there are also clear differences.</p>

<p>Please add comments if there are things that I&#8217;ve missed or got wrong, or just to have your say.</p>

<p><a href="http://foolip.org/microdatajs/live/">Philip Jägenstedt&#8217;s Live Microdata service</a> and <a href="http://rdf.greggkellogg.net/distiller">Gregg Kellogg&#8217;s Distiller service</a> have both proved invaluable for testing &#8212; thank you to both for making these services available. I heartily recommend them.</p>

<h2>Mapping Rules</h2>

<p>The first problem is how to judge equivalence when microdata and RDFa have different data models. Microdata essentially uses a JSON data model: there are objects (items) with properties that have values that are strings, other objects, or arrays of strings or objects or both. RDFa naturally uses a RDF data model: there are resources with properties that have values that are literals (of some datatype or with a language) or other resources.</p>

<p>Underlying both is the same basic entity-attribute-value pattern, but there are various mismatches between the models that make some mappings more complicated than others, or in other cases mean that information is necessarily lost on conversion.</p>

<p>In performing the analysis, I&#8217;ve tried to map microdata into sensible RDF and then match that RDF output using RDFa, and to map RDFa into sensible microdata+JSON and then match that microdata+JSON using microdata. The microdata-to-RDF mapping rules that I&#8217;ve followed are basically those outlined in my post on <a href="http://www.jenitennison.com/blog/node/162">Microdata + RDF</a>. To create microdata JSON from RDFa, I&#8217;ve used the rule that the URI of the first type of a resource is processed to provide a namespace that is stripped from the URIs of the properties (to create simple names where possible). In addition, when a resource has no properties, it will be represented as a string (URI) value rather than as a nested item.</p>

<p>These rules need to be formalised, obviously, but the basics above work well enough for the examples from the specs.</p>

<h2>Mismatched Features</h2>

<p>The following features are problematic when mapping from microdata to RDFa or vice versa. I&#8217;ve described them roughly in an order from things where it might be relatively easy to address the problem by changing one or other specification, to places where the necessary changes would be difficult to make in the specs, which means that publishers and consumers need to be aware of the issue so that they can make an educated choice about how they proceed.</p>

<h3>Local Property Names</h3>

<p>Many of the microdata examples involve items with no type and local property names. I&#8217;ve assumed in the analysis below that this generates properties whose URI is based on the document in which they are found, but this is not a helpful solution for data sharing: if a whole site uses short property names across its pages, those properties really need to be recognised as being the same across the site for any kind of useful processing to occur.</p>

<p>What microdata actually creates here is a global namespace, shared by everyone, specifically for embedded data. There are three things that could be done at different levels here:</p>

<ol>
<li><p>In a mapping from microdata to RDF, any short property names on items that don&#8217;t have a type could be assigned to a global namespace (eg <code>http://w3.org/ns/global/</code>). Of course there will be clashes in semantics within this namespace, but that is true in microdata generally and not having to create a new namespace makes the initial experimentation easier for those starting with embedded data. The W3C (or whoever operates the namespace) could operate a wiki at that location that would operate as an informal registry for the property names.</p></li>
<li><p>HTML+RDFa could change to use this global namespace as the default vocabulary URI (rather than not having one). This would make it a little easier for people to convert microdata to RDFa: if they don&#8217;t use types for their items, there would then be no need for a <code>vocab</code> attribute to be added to the HTML. It also makes it possible to use RDFa in a basic, lightweight way, which might help people get started with it.</p></li>
<li><p>Publishers can be advised to use <code>itemtype</code> within their microdata, reusing existing classes or creating their own, if they want to ensure that the embedded data within their pages isn&#8217;t misinterpreted by global consumers.</p></li>
</ol>

<h3>Interpretation of <code>&lt;time&gt;</code> Element&#8217;s <code>datetime</code> Attribute</h3>

<p>Interpreting the <code>datetime</code> attribute of the <code>&lt;time&gt;</code> element to supply a value, rather than repeating that value in a <code>content</code> attribute, is <a href="http://www.w3.org/2010/02/rdfa/track/issues/97">ISSUE-97</a> on RDFa, and hopefully RDFa will be changed to use that value (or the content of the element if there is no <code>datetime</code> attribute), add a seconds component if necessary, and work out an appropriate date/time datatype for it based on its syntax.</p>

<h3>Content Overrides</h3>

<p>In RDFa, publishers can provide a machine-readable version of the content of an element (or even an entirely different value) using the <code>content</code> attribute. This can only be done for date/times in microdata. The ability to <a href="http://www.w3.org/Bugs/Public/show_bug.cgi?id=13240">annotate non-date/time content with machine-readable values</a> is a current issue on HTML5. Resolving this in favour of providing such annotation would make using RDFa and microdata in concert, or converting between them, easier, particularly if HTML5 uses the attribute <code>content</code> or RDFa adopts whatever attribute is introduced to HTML5.</p>

<h3><code>&lt;link&gt;</code> and <code>&lt;meta&gt;</code> Elements in Flow Content</h3>

<p>The ability to <a href="http://dev.w3.org/html5/md/Overview.html#content-models">use <code>&lt;link&gt;</code> and <code>&lt;meta&gt;</code> elements in flow content</a> is only supported in microdata: it&#8217;s support that&#8217;s added by the microdata specification (in the Editor&#8217;s Draft since May 31st; the text allowing this didn&#8217;t make it into the Last Call version of the spec), in which it&#8217;s limited to <code>&lt;link&gt;</code> and <code>&lt;meta&gt;</code> elements with an <code>itemprop</code> attribute. </p>

<p>It would be possible for the RDFa specification to similarly make the statement that <code>&lt;link&gt;</code> and <code>&lt;meta&gt;</code> elements are allowed in flow content as long as they have particular attributes. This would ease the transition between the two formats, and works a lot better than empty <code>&lt;span&gt;</code> elements which crop up fairly commonly in RDFa content.</p>

<p>(One oddity here is that because date/time values have to be on a <code>&lt;time&gt;</code> element in microdata, publishers cannot replace empty <code>&lt;time&gt;</code> elements with <code>&lt;meta&gt;</code> elements as they might an empty <code>&lt;span&gt;</code>.)</p>

<h3>Identifiers without Types</h3>

<p>Many of the RDFa examples are of resources that have a URI identifier but for which no type is supplied. Microdata, on the other hand, states that <code>itemid</code> is only allowed on elements that also have an <code>itemtype</code> (and an <code>itemscope</code>). The reason given is because the <code>itemid</code> needs to be interpreted based on the <code>itemtype</code>. This would be understandable if it held a string, but given that the <code>itemid</code> provides a URI it seems a bit strange. Perhaps it&#8217;s an attempt to avoid the whole <a href="http://www.jenitennison.com/blog/node/159">httpRange-14 / ambiguity in URIs issue</a>.</p>

<p>If this restriction remains, the advice to RDFa users who might want to convert to microdata at a future date would be to always provide a type for your (non-blank-node) resources. It may be useful to define a <code>http://w3.org/ns/global/Thing</code> within the vocabulary that I propose above, given that the URI for <code>rdfs:Resource</code> is long and hard to recall.</p>

<h3>Built-in Prefixes</h3>

<p>The built-in <a href="http://www.w3.org/profile/rdfa-1.1">profile for RDFa</a> defines a number of prefixes for vocabularies that are either coined by the W3C or coined elsewhere but in common use on the web. This, coupled with <code>vocab</code> and the ability to directly use URIs in the relevant attributes, means that declaring prefixes within the document is increasingly unnecessary in RDFa.</p>

<p>In contrast, using existing vocabularies, even popular ones, within microdata is relatively difficult, particularly when vocabularies are mixed on the same item.</p>

<p>Most useful for publishers would be if both RDFa and microdata recognised the same set of prefixes. This would reduce the size of microdata created from existing RDFa content as well as making it easier to move between the languages. At the very least, it would be good to have <code>rdf:</code>, <code>rdfs:</code>, <code>xsd:</code> and <code>xhv:</code> built into both.</p>

<p>The list of popular vocabularies is likely to change over time; for example a prefix for the schema.org vocabulary might be useful at some point in the near future. The problem is that publishers and consumers need to be synchronised in their use of prefixes: it&#8217;s no good for a publisher to use the prefix <code>sch:</code> if there might be processors for the page that don&#8217;t recognise it. Equally, consumers shouldn&#8217;t be reliant on a network connection to retrieve the latest set of prefix mappings in order to parse the page. It&#8217;s not clear to me how best to manage this evolution, but even a fixed set of prefixes at the point the specs reach Recommendation is more usable than spelling out URIs all the time.</p>

<h3>Literals Including Markup</h3>

<p>RDFa supports literals that include markup (the <code>innerHTML</code> of an element) as well as those that don&#8217;t (the <code>textContent</code> of an element), whereas microdata only supports creating values from particular attributes or the <code>textContent</code> of the element. This makes it hard to create embedded microdata that includes values which contain things like mathematical or chemical formulae, ruby text, or multiple paragraphs.</p>

<p>A solution would be for microdata to introduce an <code>itemhtml</code> (or something) attribute that, when present, indicates that the value of the property should include markup. There is a current issue on microdata to <a href="http://www.w3.org/Bugs/Public/show_bug.cgi?id=13468">support HTML values</a>.</p>

<h3>Itemref</h3>

<p>RDFa can support a subset of <code>itemref</code>&#8217;s functionality, namely to have properties defined elsewhere in a document be associated with a given resource. What it doesn&#8217;t support is the sharing of properties defined in one place by two or more resources.</p>

<p>RDFa could add such support by adding an attribute that mirrors <code>itemref</code> (eg <code>ref</code>, I guess), with the referenced element being processed using the <a href="http://www.w3.org/TR/rdfa-core/#evaluation-context">evaluation context</a> inherited by the referencing element (which means that attributes such as <code>vocab</code> would sometimes have a scope that wasn&#8217;t based on the document tree). This would make it easier to tackle the use case for <code>itemref</code> using RDFa as well as making it easier to move between or mix RDFa and microdata.</p>

<h3>Lists</h3>

<p>It is easy for microdata to represent a property with a list of values, and really really hard to do the same in RDFa. This is in part because RDF views lists resources rather than a distinct data type, and in part because RDFa hasn&#8217;t added any syntax sugar to make creating <code>rdf:List</code> resources easy. Adding some syntax sugar for lists would make life a lot easier for anyone using RDFa, but especially if they are adapting existing microdata content to RDFa.</p>

<h3>Datatypes</h3>

<p>Microdata assumes that consumers will convert values to appropriate datatypes based on the property (which they understand) as a separate process after microdata processing, whereas RDFa supports the use of a <code>datatype</code> attribute to explicitly indicate the datatype of each value. This mismatch means that information is lost when RDFa is converted to microdata, and has to be added when microdata is converted to RDFa.</p>

<p>Bringing the languages completely into sync would mean either microdata adding a facility to support (at least some) datatypes, or deprecating the <code>datatype</code> attribute in RDFa. Alternatively, this may simply be an area where the differences in behaviour between the two specifications doesn&#8217;t matter because the data models that they use are distinct anyway.</p>

<h3>Languages</h3>

<p>Languages are similar to datatypes, in that RDF (and hence RDFa) supports annotating strings with the language that they are in whereas microdata doesn&#8217;t within its core data model or its JSON serialisation. However, the elements that represent properties within the HTML, used within the DOM API access to microdata, will have a language.</p>

<p>It may be that in practice consumers need to base their microdata processing on the DOM API rather than the core microdata data model or JSON extracted through a standalone process, and thus pick up the language from the property elements, I don&#8217;t know. In any case, the microdata JSON serialisation, used for drag-and-drop, is lossy and could be extended to include the language of each value when available, at fairly substantial complexity cost.</p>

<p>For publishers, it doesn&#8217;t much matter either way; if they are dealing with multi-lingual text they will want to include a <code>lang</code> attribute in the HTML anyway, regardless of the impact on embedded data.</p>

<h3>Multiple Types</h3>

<p>RDFa supports having multiple types named in the <code>typeof</code> attribute whereas microdata only supports one type per item. In any mapping from RDFa to microdata, publishers have to choose which type is the primary type for the item and move the others to be expressed via <code>rdf:type</code> properties. Consumers who want to support publishers who might not choose their type as the primary type have to detect items that have the type they are interested in within the <code>rdf:type</code> property as well as those which have the type as the main type. Given that the <code>rdf:type</code> URI is long and (naturally) associated with RDF, it might be better to define a property such as <code>http://w3.org/ns/global/type</code> for this use.</p>

<p>Microdata could be extended to allow multiple values in the <code>itemtype</code> attribute, with the first being used to interpret any properties that aren&#8217;t full URIs. This would make it easier for both consumers to detect when a type they were interested in was used and for publishers to use RDFa and microdata in tandem or move between them.</p>

<h3>The <code>src</code> Attribute</h3>

<p>RDFa and microdata interpret the <code>src</code> attribute in opposite ways. In RDFa, it provides the identifier for a new resource (equivalent to <code>itemid</code> in microdata); in microdata, it provides a URL value of a property on elements that support it (equivalent to <code>resource</code> or <code>href</code> in RDFa).</p>

<p>RDFa interprets <code>src</code> in this way to make it easier to make assertions about an image, but it&#8217;s of limited effect as even in RDFa its only possible to make three such assertions (through the <code>typeof</code>, <code>rel</code> and <code>property</code> attributes). So, for example, you can specify the type of the image, link to its license and give the name of its creator, with:</p>

<pre><code>&lt;img src="photo1.jpg" typeof="foaf:Image"
  rel="license" resource="http://creativecommons.org/licenses/by/2.0/"
  property="dc:creator" content="Mark Birbeck"&gt;
</code></pre>

<p>but this won&#8217;t help you if you <em>also</em> want to give the title for the image and when it was created (say). At that point, the microdata and RDFa start to look similar:</p>

<pre><code>&lt;div itemscope itemid="photo1.jpg" itemtype="http://xmlns.com/foaf/0.1/Image"&gt;
  &lt;link itemprop="license" href="http://creativecommons.org/licenses/by/2.0/"&gt;
  &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Mark Birbeck"&gt;
  &lt;meta itemprop="http://purl.org/dc/terms/title" content="Picture of Mark"&gt;
  &lt;time itemprop="http://purl.org/dc/terms/created" datetime="2009-03-17"&gt;&lt;/time&gt;
  &lt;img src="photo1.jpg"&gt;
&lt;/div&gt;
</code></pre>

<p>or:</p>

<pre><code>&lt;div about="photo1.jpg" typeof="http://xmlns.com/foaf/0.1/Image"&gt;
  &lt;span property="http://purl.org/dc/terms/title" content="Picture of Mark"&gt;&lt;/span&gt;
  &lt;time property="http://purl.org/dc/terms/created" content="2009-03-17" datatype="xsd:date" datetime="2009-03-17"&gt;&lt;/time&gt;
  &lt;img src="photo1.jpg" typeof="foaf:Image"
    rel="license" resource="http://creativecommons.org/licenses/by/2.0/"
    property="dc:creator" content="Mark Birbeck"&gt;
&lt;/div&gt;
</code></pre>

<p>and really, to make the markup consistent, you may as well not use the <code>src</code> of the image at all in the RDFa either:</p>

<pre><code>&lt;div about="photo1.jpg" typeof="http://xmlns.com/foaf/0.1/Image"&gt;
  &lt;span rel="license" href="http://creativecommons.org/licenses/by/2.0/"&gt;&lt;/span&gt;
  &lt;span property="http://purl.org/dc/terms/creator" content="Mark Birbeck"&gt;&lt;/span&gt;
  &lt;span property="http://purl.org/dc/terms/title" content="Picture of Mark"&gt;&lt;/span&gt;
  &lt;time property="http://purl.org/dc/terms/created" content="2009-03-17" datatype="xsd:date" datetime="2009-03-17"&gt;&lt;/time&gt;
  &lt;img src="photo1.jpg"&gt;
&lt;/div&gt;
</code></pre>

<p>So it&#8217;s not clear to me that interpreting the <code>src</code> attribute as the subject of triples offers such a huge advantage that it&#8217;s worth the inconvenience that it brings for the simple things, such as having to use:</p>

<pre><code>&lt;span rel="image"&gt;&lt;img src="google-logo.png" alt="Google"&gt;&lt;/span&gt;
</code></pre>

<p>rather than:</p>

<pre><code>&lt;img property="image" src="google-logo.png" alt="Google"&gt;
</code></pre>

<h3>Link relations</h3>

<p>This isn&#8217;t so much a clash between RDFa and microdata as between the interpretation that RDFa has for the <code>rel</code> attribute and that specified in HTML.</p>

<p>The built-in <code>rel</code> values in HTML are a bit of a mix. Some of them, like <code>alternate</code>, <code>prev</code> and <code>next</code> encode relationships between the document in which the link appears and another document. Others, such as <code>bookmark</code> and <code>help</code>, create relationships between the context in which the link is found and the referenced document. Still others, like <code>nofollow</code>, <code>noreferrer</code> and <code>prefetch</code>, are really instructions to the client about how to manage the act of traversing the link.</p>

<p>It doesn&#8217;t seem semantically correct to automatically create relationships based on the built-in HTML <code>rel</code> values, unless you are deliberately trying to extract <a href="http://lin-clark.com/blog/two-meanings-semantics-html5"><em>document</em> semantics</a> from the page. This is a problem for RDFa, which reuses the <code>rel</code> attribute to provide property values for the embedded <em>data</em>.</p>

<p>One thing that could be done would be for RDFa to consistently use the <code>property</code> attribute everywhere rather than the <code>rel</code> attribute. This would not only ease the overloading but also reduce the confusion for users, who currently have to work out which attribute to use based on whether the value is a resource or a literal.</p>

<h2>Possible Subset of RDFa</h2>

<p>When mapping from microdata to RDFa, the only attributes that are really needed are:</p>

<ul>
<li><code>vocab</code> to define a vocabulary for the types and properties within its scope (not technically necessary, but keeps the markup simple compared to spelling out URIs for everything)</li>
<li><code>typeof</code> to define the type of a resource or indicate a new blank node</li>
<li><code>about</code> to provide a URI for a resource or a local identifier for a blank node</li>
<li><code>property</code> and <code>rel</code> to define property names (though see above for discussion about dropping <code>rel</code>)</li>
<li><code>href</code>, <code>src</code> and <code>content</code> to provide values (and <code>datetime</code> assuming that is supported)</li>
</ul>

<p>In the mappings in the analysis below, I did also use the <code>resource</code> attribute, but only to create a reference to a blank node that was described elsewhere, when replicating the functionality of <code>itemref</code>. If RDFa were to enable <code>&lt;link&gt;</code> and <code>&lt;meta&gt;</code> in content in the same way as microdata, <code>resource</code> functionality could be replicated using <code>&lt;link&gt;</code>; as it is, you can get away with using an empty <code>&lt;a&gt;</code> element.</p>

<p>Similarly, I only used <code>datatype</code> when providing a datatype for date/time values, something that could be done automatically by RDFa. But this isn&#8217;t surprising given that microdata doesn&#8217;t support datatypes at all and the examples I was using for the mapping were from the microdata specification.</p>

<p>There was no need for:</p>

<ul>
<li><code>prefix</code> which defines prefixes to simplify references to properties and classes; this is hardly surprising as few of the microdata examples involved mixing namespaces, but it&#8217;s notable that the built-in prefixes of <code>rdf:</code> and <code>xsd:</code> were useful</li>
<li><code>profile</code> which is a pointer to an external document that defines a set of terms; this is being dropped from RDFa in any case</li>
</ul>

<p>I also kept to a simplified version of the syntax in which each property element only provided one value. This subset is basically:</p>

<ul>
<li>resource elements can have <code>about</code> (equivalent to <code>itemid</code>) and <code>typeof</code> (equivalent to <code>itemtype</code>) attributes on them</li>
<li>property elements can have <code>property</code> or <code>rel</code> (equivalent to <code>itemprop</code>), and a value-providing attribute on them such as <code>href</code> or <code>content</code></li>
<li>no element is both a resource element and a property element; to provide a property whose value is a resource, nest the resource element within the property element (using &#8220;hanging rel&#8221; processing)</li>
<li>no property element should provide more than one value for a property; in particular, a &#8220;hanging rel&#8221; should only have a single resource element child</li>
</ul>

<p>This simplified profile of RDFa is fairly easy to remember and maps easily to and from microdata: most attributes can be simply renamed; the only attribute that needs to be moved as well as renamed is the &#8220;hanging rel&#8221;, which moves onto the resource element and is renamed to <code>itemprop</code>. Note that it also means avoiding using the <code>src</code> attribute to encode embedded data.</p>

<p>In addition to sticking to this subset of attributes, developers might be advised that using HTML link relations may lead to clashes with browser or search engine interpretation of the links in the page.</p>

<h2>Possible Subset of Microdata</h2>

<p>Microdata is pretty minimalistic already. The only feature that developers need to be warned about is <code>itemref</code>, which has no RDFa equivalent at the moment.</p>

<h2>Guidelines for Vocabulary Authors</h2>

<p>There are a several guidelines that come out of this comparison for people putting together vocabularies that aim to be usable in both RDFa and microdata:</p>

<ul>
<li>The classes in the vocabulary should be distinct, or subclasses created with any relevant combinations of superclasses, so that publishers don&#8217;t have to assign more than one type to an item/resource. This restriction helps with using the vocabulary with microdata, which assumes that every item has a single type.</li>
<li>Provide explicit classes for everything which you anticipate might be given an identifier, as microdata doesn&#8217;t (currently) enable items to have an identifier without also having a type.</li>
<li>Put classes and properties in the same namespace, but do not name classes and properties with the same local name; while this doesn&#8217;t matter in microdata because the properties are interpreted relative to the class, standard conversions to RDF will create a class and a property with the same URI. URIs are case-sensitive to a simple way of ensuring that there aren&#8217;t clashes is to follow the usual RDF convention of beginning class names with an upper-case letter and property names with a lower-case letter.</li>
<li>Avoid property names that contain dots, as these aren&#8217;t allowed in non-URI property names in microdata.</li>
<li>Ensure that properties either only expect one type of value or expect values whose type can be sniffed based on the syntax of the value. If publishers use microdata, they will not be able to indicate the type of a value through the markup.</li>
<li>Be aware that consumers of microdata using your vocabulary will have to use the DOM API to identify the language used in any strings, and that language information won&#8217;t be carried through the standard microdata JSON serialisation (used by drag-and-drop, for example). If you anticipate multi-lingual use of your vocabulary, you may way to define a <code>MultiLingual</code> class with <code>value</code> and <code>language</code> properties that people can use as nested items. (It may be useful for this class and properties to be defined in the proposed &#8216;global&#8217; W3C namespace so that it can be used anywhere.) If you know what languages will be used then provide separate properties for each language (eg for UK legislation I know the languages are English and Welsh so on a vocabulary for UK Legislation I could have <code>title-en</code> and <code>title-cy</code> properties).</li>
<li>To make markup cleaner, only reuse properties from other vocabularies on your classes if they have built-in prefixes (eg unless <code>rdfs:</code> is built-in to microdata as well as RDFa, don&#8217;t use <code>rdfs:label</code> to provide a label, but create your own <code>label</code> property). On the other hand, do reuse classes from other vocabularies if you don&#8217;t need to add any specialised properties to them. Note that avoiding reuse has the unfortunate side-effect of not enabling processors that understand these other vocabularies to process your data.</li>
<li>Avoid having properties whose values need to be retrieved in order, as these are hard to represent in RDFa. Instead, use properties with distinct names when position is important. (Yes, I know this sucks.)</li>
</ul>

<h2>Choosing Between Microdata and RDFa</h2>

<p>The choices developers make between microdata and RDFa will, I suspect, be largely dictated by what their consumers/toolsets/publishers will support. Nevertheless, there are some features that are better supported by one or other format and might therefore sway developers one way or another:</p>

<ul>
<li><strong>multi-lingual embedded data</strong> is better supported in RDF than microdata+JSON</li>
<li><strong>explicit datatypes for values</strong> can be provided by RDFa but not microdata</li>
<li><strong>resources with multiple types</strong> are a lot easier to describe in RDFa</li>
<li><strong>property values that include markup</strong> are a lot easier to write in RDFa</li>
<li><strong>mixed vocabulary use</strong> is a bit easier in RDFa than in microdata</li>
<li><strong>HTML5 link relations</strong> may be misinterpreted by RDFa processors</li>
<li><strong>properties with list values</strong> are much easier to support in microdata</li>
<li><strong>common content</strong> adopted by multiple entities is much easier in microdata</li>
</ul>

<h2>Final Words</h2>

<p>I have no doubt that developers would be better off if there were only one recommended way of embedding data in HTML (so long as it met their requirements of course). But realistically that is, and always has been, a long shot, given the entrenched positions of the microdata and RDFa communities.</p>

<p>Regardless, there are lessons that RDFa and microdata could learn from each other, and changes to both languages that would help developers use them on their own, switch between them and mix them in the same document. I expect and welcome debate about the viability and effectiveness of the changes and guidelines that I&#8217;ve suggested here.</p>

<p>Investigating those lessons, documenting those changes and generating those guidelines was something that I had hoped the microdata/RDFa task force would be able to do. The other question to ask, given the argument that there shouldn&#8217;t be a task force at all if it&#8217;s not going to be able to bring the languages together, is whether this kind of analysis is worthwhile, and worth publishing as something more official than a blog post?</p>
    ]]></content>
  </entry>
  <entry>
    <title>Mapping RDFa to Microdata</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/164" />
    <id>http://www.jenitennison.com/blog/node/164</id>
    <published>2011-08-20T17:38:38+01:00</published>
    <updated>2012-05-19T11:56:29+01:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="microdata" />
    <category term="rdfa" />
    <summary type="html"><![CDATA[<p>This post is part of a three-part series that analyses the differences in features and syntax between microdata and RDFa. The series attempts:</p>

<ul>
<li>to identify the differences in approach and functionality of the two languages, which should help developers choose between them</li>
<li>to identify any guidelines for developers of vocabularies for use with both languages</li>
<li>to identify a subset of functionality that is common between the two languages, which developers might want to stick to to make switching and mixing easier</li>
<li>to identify mapping rules that might be applied to automatically or manually map from one language to another if the simple subset is used</li>
</ul>

<p>I&#8217;ve done this by looking at converting microdata examples to RDFa and vice versa, and the lessons to be drawn from that exercise. The three posts are on:</p>

<ul>
<li><a href="http://www.jenitennison.com/blog/node/163">converting microdata to RDFa</a></li>
<li><a href="http://www.jenitennison.com/blog/node/164">converting RDFa to microdata</a></li>
<li><a href="http://www.jenitennison.com/blog/node/165">lessons learned from this exercise</a></li>
</ul>

<p>This post is the second of these, which looks at how RDFa might be mapped to microdata.  In this case, I will aim to express the RDF created from the RDFa as the equivalent microdata JSON, and aim to create that JSON with the microdata.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>This post is part of a three-part series that analyses the differences in features and syntax between microdata and RDFa. The series attempts:</p>

<ul>
<li>to identify the differences in approach and functionality of the two languages, which should help developers choose between them</li>
<li>to identify any guidelines for developers of vocabularies for use with both languages</li>
<li>to identify a subset of functionality that is common between the two languages, which developers might want to stick to to make switching and mixing easier</li>
<li>to identify mapping rules that might be applied to automatically or manually map from one language to another if the simple subset is used</li>
</ul>

<p>I&#8217;ve done this by looking at converting microdata examples to RDFa and vice versa, and the lessons to be drawn from that exercise. The three posts are on:</p>

<ul>
<li><a href="http://www.jenitennison.com/blog/node/163">converting microdata to RDFa</a></li>
<li><a href="http://www.jenitennison.com/blog/node/164">converting RDFa to microdata</a></li>
<li><a href="http://www.jenitennison.com/blog/node/165">lessons learned from this exercise</a></li>
</ul>

<p>This post is the second of these, which looks at how RDFa might be mapped to microdata.  In this case, I will aim to express the RDF created from the RDFa as the equivalent microdata JSON, and aim to create that JSON with the microdata.</p>

<!--break-->

<p>To create the microdata JSON, I&#8217;ve used the rule that the URI of the first type of a resource is processed to provide a namespace that is stripped from the URIs of the properties (to create simple names where possible). In addition, when a resource has no properties, it will be represented as a string (URI) value rather than as a nested item. Other than that I hope the mapping will be obvious; I&#8217;ll point out where it involves a loss of information. I&#8217;m assuming that the document is at <code>http://example.org/</code> throughout.</p>

<p>I have based what&#8217;s written here on the latest specifications of both microdata (in its <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html">WHAT WG</a> and <a href="http://dev.w3.org/html5/md/Overview.html">W3C</a> variants) and <a href="http://www.w3.org/2010/02/rdfa/drafts/2011/ED-rdfa-core-20110814/">RDFa Core</a> and <a href="http://dev.w3.org/html5/rdfa/">HTML+RDFa</a> but I haven&#8217;t consulted with anyone involved in these efforts and may well have got things wrong. <a href="http://foolip.org/microdatajs/live/">Philip Jägenstedt&#8217;s Live Microdata service</a> has proved invaluable for testing, so many thanks to him for providing that service.</p>

<p>The post is rather heavy going and you might want to just <a href="http://www.jenitennison.com/blog/node/165">skip to the summary</a> instead of reading the whole thing.</p>

<p>The post goes through the examples from the RDFa specification plus one additional example from the wild. I haven&#8217;t included examples that don&#8217;t illustrate anything new, so there are some that are skipped. Other examples would be welcome.</p>

<h2>Page Metadata</h2>

<blockquote>
  <p>When parsing begins, the current subject will be the IRI of the document being parsed, or a value as set by a Host Language-provided mechanism (e.g., the base element in (X)HTML). This means that by default any metadata found in the document will concern the document itself:</p>

<pre><code>&lt;html&gt;
  &lt;head&gt;
    &lt;title&gt;Jo's Friends and Family Blog&lt;/title&gt;
    &lt;link rel="foaf:primaryTopic" href="#bbq" /&gt;
    &lt;meta property="dc:creator" content="Jo" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    ...
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>
</blockquote>

<p>The equivalent microdata JSON for this document that we&#8217;d want to create is (<strong>note: invalid example</strong>):</p>

<pre><code>{ "items": [
  {
    "id": "http://example.org/" ,
    "properties": {
      "http://xmlns.com/foaf/0.1/primaryTopic": [ "http://example.org/#bbq" ],
      "http://purl.org/dc/terms/creator": [ "Jo" ]
    }
  }
]}
</code></pre>

<p>and we&#8217;d want to create it with (<strong>note: invalid example</strong>):</p>

<pre><code>&lt;html&gt;
  &lt;head itemscope itemid="."&gt;
    &lt;title&gt;Jo's Friends and Family Blog&lt;/title&gt;
    &lt;link itemprop="http://xmlns.com/foaf/0.1/primaryTopic" href="#bbq" /&gt;
    &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Jo" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    ...
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<p>However, this is is not valid according to the microdata specification. In microdata, <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#attr-itemid">only items that have types are allowed to have identifiers</a>. Rather than losing the identifier, we&#8217;ll add a type; I&#8217;m going to use <code>rdfs:Resource</code>. It&#8217;s not the nicest of URIs to type, but it&#8217;s got something close to the correct semantics. So we&#8217;ll aim for the microdata JSON:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://example.org/" ,
    "properties": {
      "http://xmlns.com/foaf/0.1/primaryTopic": [ "http://example.org/#bbq" ] ,
      "http://purl.org/dc/terms/creator": [ "Jo" ]
    }
  }
]}
</code></pre>

<p>which means we need:</p>

<pre><code>&lt;html&gt;
  &lt;head itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" itemid="."&gt;
    &lt;title&gt;Jo's Friends and Family Blog&lt;/title&gt;
    &lt;link itemprop="http://xmlns.com/foaf/0.1/primaryTopic" href="#bbq" /&gt;
    &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Jo" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    ...
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>The <code>itemscope</code> is necessary for the page to be recognised as containing any data at all.</li>
<li>The <code>itemid</code> can&#8217;t just be empty: the <code>.</code> is the shortest URI you can use to reference the page itself.</li>
<li>I put the <code>itemscope</code>, <code>itemtype</code> and <code>itemid</code> on the <code>&lt;head&gt;</code> element rather than the <code>&lt;html&gt;</code> element so that they wouldn&#8217;t be inherited into the <code>&lt;body&gt;</code>: it seems to make sense for any data within the <code>&lt;head&gt;</code> to be about the page itself.</li>
<li>The <code>foaf:</code> and <code>dc:</code> prefixes are built-in to RDFa, so it&#8217;s easy for people to use classes and properties in those common vocabularies without having to remember their full URI. In microdata, that URI and the one for the <code>rdfs:Resource</code> class have to be spelled out in full.</li>
</ul>

<h2>Base URI</h2>

<blockquote>
  <p>In (X)HTML the value of base may change the initial value of current subject:</p>

<pre><code>&lt;html&gt;
  &lt;head&gt;
    &lt;base href="http://www.example.org/jo/blog" /&gt;
    &lt;title&gt;Jo's Friends and Family Blog&lt;/title&gt;
    &lt;link rel="foaf:primaryTopic" href="#bbq" /&gt;
    &lt;meta property="dc:creator" content="Jo" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    ...
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>
</blockquote>

<p>This changes the id of the item generated:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://www.example.org/jo/blog" ,
    "properties": {
      "http://xmlns.com/foaf/0.1/primaryTopic": [ "http://www.example.org/jo/blog#bbq" ] ,
      "http://purl.org/dc/terms/creator": [ "Jo" ]
    }
  }
]}
</code></pre>

<p>In the microdata, the <code>itemid</code> can still be <code>.</code> as the base URI set by the <code>&lt;base&gt;</code> element is used to resolve it:</p>

<pre><code>&lt;html&gt;
  &lt;head itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" itemid="."&gt;
    &lt;base href="http://www.example.org/jo/blog" /&gt;
    &lt;title&gt;Jo's Friends and Family Blog&lt;/title&gt;
    &lt;link itemprop="http://xmlns.com/foaf/0.1/primaryTopic" href="#bbq" /&gt;
    &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Jo" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    ...
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<h2>Explicit Subjects / ItemIds</h2>

<blockquote>
  <p>To illustrate how this affects the statements, note in this markup how the properties inside the (X)HTML body element become part of a new calendar event object, rather than referring to the document as they do in the head of the document:</p>

<pre><code>&lt;html prefix="cal: http://www.w3.org/2002/12/cal/ical#"&gt;
  &lt;head&gt;
    &lt;title&gt;Jo's Friends and Family Blog&lt;/title&gt;
    &lt;link rel="foaf:primaryTopic" href="#bbq" /&gt;
    &lt;meta property="dc:creator" content="Jo" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;p about="#bbq" typeof="cal:Vevent"&gt;
      I'm holding
      &lt;span property="cal:summary"&gt;
        one last summer barbecue
      &lt;/span&gt;,
      on
      &lt;span property="cal:dtstart" content="2015-09-16T16:00:00-05:00" 
            datatype="xsd:dateTime"&gt;
        September 16th at 4pm
      &lt;/span&gt;.
    &lt;/p&gt;
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>
</blockquote>

<p>In microdata JSON, this would be:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://example.org/" ,
    "properties": {
      "http://xmlns.com/foaf/0.1/primaryTopic": [ "http://example.org/#bbq" ],
      "http://purl.org/dc/terms/creator": [ "Jo" ]
    }
  } ,
  {
    "type": "http://www.w3.org/2002/12/cal/ical#Vevent" ,
    "id": "http://example.org/#bbq" ,
    "properties": {
      "summary": [ "one last summer barbecue" ] ,
      "dtstart": [ "2015-09-16T16:00:00-05:00" ] ,
    }
  }
]}
</code></pre>

<p>Note that this mapping loses the fact that the value of the <code>dtstart</code> property is a date-and-time. Processors of this JSON are expected to know that the <code>dtstart</code> property takes a date/time value and would have to sniff the value to work out that it&#8217;s a date-and-time rather than a date.</p>

<p>In-browser microdata processors can identify the value as a date/time value because the property element itself is accessed through the <code>element.properties</code> IDL attribute; processors that work with this DOM API can tell that it&#8217;s a <code>&lt;time&gt;</code> element, get hold of the date/time itself and access the content of the element for the human-readable representation used on the page. However, this information isn&#8217;t part of the core <a href="http://www.w3.org/TR/microdata/#the-microdata-model">microdata data model</a>.</p>

<p>To create this JSON from microdata you need:</p>

<pre><code>&lt;html&gt;
  &lt;head itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" itemid="."&gt;
    &lt;title&gt;Jo's Friends and Family Blog&lt;/title&gt;
    &lt;link itemprop="http://xmlns.com/foaf/0.1/primaryTopic" href="#bbq" /&gt;
    &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Jo" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;p itemscope itemid="#bbq" itemtype="http://www.w3.org/2002/12/cal/ical#Vevent"&gt;
      I'm holding
      &lt;span itemprop="summary"&gt;
        one last summer barbecue
      &lt;/span&gt;,
      on
      &lt;time itemprop="dtstart" datetime="2015-09-16T16:00:00-05:00"&gt;
        September 16th at 4pm
      &lt;/time&gt;.
    &lt;/p&gt;
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>There are no prefix definitions in microdata, so the type has to be spelled out in full. However, with the mapping I&#8217;m assuming from RDFa to microdata JSON, the properties in that same namespace for items in that class don&#8217;t.</li>
<li>The <code>itemscope</code> has to be added despite the <code>&lt;p&gt;</code> element having both an <code>itemid</code> and an <code>itemtype</code>; if the <code>itemscope</code> is forgotten, the item isn&#8217;t recognised.</li>
<li>The original <code>&lt;span&gt;</code> element has to be changed to a <code>&lt;time&gt;</code> element because it isn&#8217;t conformant microdata for a date/time value to be supplied by any other element.</li>
</ul>

<h2>Items from the <code>src</code> Attribute</h2>

<blockquote>
  <p>If @about is not present, then @src is next in priority order, for setting the subject of a statement. A typical use would be to indicate the licensing type of an image:</p>

<pre><code>&lt;img src="photo1.jpg" rel="license" 
     resource="http://creativecommons.org/licenses/by/2.0/" /&gt;
</code></pre>
</blockquote>

<p>This should generate the microdata JSON:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://example.org/photo1.jpg" ,
    "properties": {
      "http://www.w3.org/1999/xhtml/vocab#license": [ "http://creativecommons.org/licenses/by/2.0/" ] ,
    }
  }
]}
</code></pre>

<p>The <code>src</code> attribute in microdata is only used for a value, so creating the microdata about the image means a wrapper <code>&lt;span&gt;</code> element and a separate <code>&lt;link&gt;</code> element:</p>

<pre><code> &lt;span itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" itemid="photo1.jpg"&gt;
   &lt;img src="photo1.jpg" /&gt;
   &lt;link itemprop="http://www.w3.org/1999/xhtml/vocab#license" href="http://creativecommons.org/licenses/by/2.0/" /&gt;
 &lt;/span&gt;
</code></pre>

<p>Note:</p>

<ul>
<li>The <code>license</code> property is part of the built-in set of link relationships in HTML, but there is no easy way to refer to that property from microdata; they have to be spelled out as full URLs.</li>
</ul>

<h2>Additional Properties for Images</h2>

<blockquote>
  <p>Since there is no difference between @src and @about, then the information expressed in the last example in the section on @about (the creator of an image), could be expressed as follows:</p>

<pre><code>&lt;img src="photo1.jpg"
  rel="license" resource="http://creativecommons.org/licenses/by/2.0/"
  property="dc:creator" content="Mark Birbeck"
/&gt;
</code></pre>
</blockquote>

<p>This is a simple additional property in the microdata JSON:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://example.org/photo1.jpg" ,
    "properties": {
      "http://www.w3.org/1999/xhtml/vocab#license": [ "http://creativecommons.org/licenses/by/2.0/" ] ,
      "http://purl.org/dc/terms/creator": [ "Mark Birbeck" ]
    }
  }
]}
</code></pre>

<p>which can be created through a <code>&lt;meta&gt;</code> element within the <code>&lt;span&gt;</code>:</p>

<pre><code>&lt;span itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" itemid="photo1.jpg"&gt;
  &lt;img src="photo1.jpg" /&gt;
  &lt;link itemprop="http://www.w3.org/1999/xhtml/vocab#license" href="http://creativecommons.org/licenses/by/2.0/" /&gt;
  &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Mark Birbeck" /&gt;
&lt;/span&gt;
</code></pre>

<h2>Nested Images</h2>

<blockquote>
  <p>Since normal chaining rules will apply, the image IRI can also be used to complete hanging triples:</p>

<pre><code>&lt;div about="http://www.blogger.com/profile/1109404" rel="foaf:img"&gt;
  &lt;img src="photo1.jpg"
    rel="license" resource="http://creativecommons.org/licenses/by/2.0/"
    property="dc:creator" content="Mark Birbeck"
  /&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>This should generate the microdata JSON:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://www.blogger.com/profile/1109404" ,
    "properties": {
      "http://xmlns.com/foaf/0.1/img": [{
        "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
        "id": "http://example.org/photo1.jpg" ,
        "properties": {
          "http://www.w3.org/1999/xhtml/vocab#license": [ "http://creativecommons.org/licenses/by/2.0/" ] ,
          "http://purl.org/dc/terms/creator": [ "Mark Birbeck" ]
        }
      }]
    }
  }
]}
</code></pre>

<p>The microdata to generate this is:</p>

<pre><code>&lt;div itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" 
     itemid="http://www.blogger.com/profile/1109404"&gt;
  &lt;span itemprop="http://xmlns.com/foaf/0.1/img" 
        itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" 
        itemid="photo1.jpg"&gt;
    &lt;img src="photo1.jpg" /&gt;
    &lt;link itemprop="http://www.w3.org/1999/xhtml/vocab#license" href="http://creativecommons.org/licenses/by/2.0/" /&gt;
    &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Mark Birbeck" /&gt;
  &lt;/span&gt;
&lt;/div&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>The big gotcha in this conversion is that in microdata, the <code>foaf:img</code> property has to be moved onto the item that is a value of that property; there&#8217;s no equivalent to the &#8220;hanging rel&#8221; processing in RDFa. A disadvantage of this is that anyone copying-and-pasting the <code>&lt;span&gt;</code> element to embed the same information about the image within their own page will have the <code>itemprop</code> attribute carried along with the image, into a context where the <code>foaf:img</code> property might not be relevant.</li>
</ul>

<h2>Types with Blank Nodes</h2>

<blockquote>
  <p>For example, an author may wish to create markup for a person using the FOAF vocabulary, but without having a clear identifier for the item:</p>

<pre><code>&lt;div typeof="foaf:Person"&gt;
  &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
  &lt;span property="foaf:givenName"&gt;Albert&lt;/span&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>Now we have an explicit type, we can create microdata JSON that uses short names:</p>

<pre><code>{ "items": [
  {
    "type": "http://xmlns.com/foaf/0.1/Person" ,
    "properties": {
      "name": [ "Albert Einstein" ] ,
      "givenName": [ "Albert" ] ,
    }
  }
]}
</code></pre>

<p>This can be generated with the microdata:</p>

<pre><code>&lt;div itemscope itemtype="http://xmlns.com/foaf/0.1/Person"&gt;
  &lt;span itemprop="name"&gt;Albert Einstein&lt;/span&gt;
  &lt;span itemprop="givenName"&gt;Albert&lt;/span&gt;
&lt;/div&gt;
</code></pre>

<p>which is nice and simple.</p>

<h2>Inherited Subject</h2>

<blockquote>
  <p>The most usual way that an inherited subject might get set would be when the parent statement has an object that is a resource. Returning to the earlier example, in which the long name for the German_Empire was added, the following markup was used:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Albert_Einstein"&gt;
  &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
  &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
  &lt;div rel="dbp:birthPlace" resource="http://dbpedia.org/resource/German_Empire" /&gt;
  &lt;span about="http://dbpedia.org/resource/German_Empire"
    property="dbp:conventionalLongName"&gt;the German Empire&lt;/span&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>The equivalent microdata JSON for this would be:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://dbpedia.org/resource/Albert_Einstein" ,
    "properties": {
      "http://xmlns.com/foaf/0.1/name": [ "Albert Einstein" ] ,
      "http://dbpedia.org/property/dateOfBirth": [ "1879-03-14" ] ,
      "http://dbpedia.org/property/birthPlace": [ "http://dbpedia.org/resource/German_Empire" ]
    }
  } , {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://dbpedia.org/resource/German_Empire" ,
    "properties": {
      "http://dbpedia.org/property/conventionalLongName": [ "the German Empire" ] ,
    }
  }
]}
</code></pre>

<p>Note that this microdata JSON could only be generated syntactically from the RDFa, not via RDF, because going via RDF would make it impossible to know whether to give the <code>dbp:birthPlace</code> property a string (which is a URI) value or a nested item. We&#8217;ll see the alternative version of the microdata RDF in the next example.</p>

<p>To create this microdata JSON, we need:</p>

<pre><code>&lt;div itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" 
     itemid="http://dbpedia.org/resource/Albert_Einstein"&gt;
  &lt;span itemprop="http://xmlns.com/foaf/0.1/name"&gt;Albert Einstein&lt;/span&gt;
  &lt;time itemprop="http://dbpedia.org/property/dateOfBirth"&gt;1879-03-14&lt;/time&gt;
  &lt;link itemprop="http://dbpedia.org/property/birthPlace" href="http://dbpedia.org/resource/German_Empire" /&gt;
  &lt;span itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource"  
        itemid="http://dbpedia.org/resource/German_Empire"&gt;
    &lt;span itemprop="http://dbpedia.org/property/conventionalLongName"&gt;the German Empire&lt;/span&gt;
  &lt;/span&gt;
&lt;/div&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>I&#8217;ve had to change two elements here: the <code>&lt;span&gt;</code> for the date of birth has become a <code>&lt;time&gt;</code> element as the value of the property is a date, and the <code>&lt;div&gt;</code> for the birth place has become a <code>&lt;link&gt;</code> element because the value of that property is a URL.</li>
<li>I&#8217;ve also had to add a nested <code>&lt;span&gt;</code> element as it&#8217;s not possible in microdata to have a single element describe both an item and a property for that item as it is in RDFa.</li>
</ul>

<blockquote>
  <p>In an earlier illustration the subject and object for the German Empire were connected by removing the @resource, relying on the @about to set the object:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Albert_Einstein"&gt;
  &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
  &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
  &lt;div rel="dbp:birthPlace"&gt;
    &lt;span about="http://dbpedia.org/resource/German_Empire"
      property="dbp:conventionalLongName"&gt;the German Empire&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>While this generates the same RDF as the previous example, the microdata JSON that it generates should probably be different: this time, the item for the German Empire is nested within the item for Albert Einstein:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://dbpedia.org/resource/Albert_Einstein" ,
    "properties": {
      "http://xmlns.com/foaf/0.1/name": [ "Albert Einstein" ] ,
      "http://dbpedia.org/property/dateOfBirth": [ "1879-03-14" ] ,
      "http://dbpedia.org/property/birthPlace": [{
        "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
        "id": "http://dbpedia.org/resource/German_Empire" ,
        "properties": {
          "http://dbpedia.org/property/conventionalLongName": [ "the German Empire" ]
        }
      }
    }
  }
]}
</code></pre>

<p>To create this, the microdata needs to look like:</p>

<pre><code>&lt;div itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" 
     itemid="http://dbpedia.org/resource/Albert_Einstein"&gt;
  &lt;span itemprop="http://xmlns.com/foaf/0.1/name"&gt;Albert Einstein&lt;/span&gt;
  &lt;time itemprop="http://dbpedia.org/property/dateOfBirth"&gt;1879-03-14&lt;/time&gt;
  &lt;div itemprop="http://dbpedia.org/property/birthPlace" 
       itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource"  
       itemid="http://dbpedia.org/resource/German_Empire"&gt;
    &lt;span itemprop="http://dbpedia.org/property/conventionalLongName"&gt;the German Empire&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>

<p>Note that while this looks quite similar to the RDFa version, in fact the <code>itemid</code> attribute that holds the URI for the German Empire is on a different element from the <code>about</code> attribute in the RDFa.</p>

<p>The third RDFa example around this same content is:</p>

<blockquote>
  <p>but it is also possible for authors to achieve the same effect by removing the @about and leaving the @resource:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Albert_Einstein"&gt;
  &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
  &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
  &lt;div rel="dbp:birthPlace" resource="http://dbpedia.org/resource/German_Empire"&gt;
    &lt;span property="dbp:conventionalLongName"&gt;the German Empire&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>This should lead to the same microdata JSON, so I won&#8217;t bother repeating the microdata. What&#8217;s interesting is that this pattern: the wrapper element containing the property (<code>rel</code>) and identifier for the item that is the value for that property (<code>resource</code>) is a lot closer to the microdata pattern of expressing nested items. The big distinction here is that while in microdata, the <code>itemtype</code> also resides on that element, if you tried adding a <code>typeof</code> attribute to the inner <code>&lt;div&gt;</code> in RDFa, you&#8217;d end up with a new blank node.</p>

<h2>Anonymous Nested Resources</h2>

<blockquote>
  <p>However, an author could just as easily say that Spinoza influenced something by the name of Albert Einstein, that was born on March 14th, 1879:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Baruch_Spinoza" rel="dbp-owl:influenced"&gt;
  &lt;div&gt;
    &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
    &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>This should generate the microdata JSON:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://dbpedia.org/resource/Baruch_Spinoza" ,
    "properties": {
      "http://dbpedia.org/ontology/influenced": [{
        "properties": {
          "http://xmlns.com/foaf/0.1/name": [ "Albert Einstein" ] ,
          "http://dbpedia.org/property/dateOfBirth": [ "1879-03-14" ]
        }
      }]
    }
  }
]}
</code></pre>

<p>which again means moving an attribute in microdata:</p>

<pre><code>&lt;div itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource"
     itemid="http://dbpedia.org/resource/Baruch_Spinoza"&gt;
  &lt;div itemprop="http://dbpedia.org/ontology/influenced"
       itemscope&gt;
    &lt;span itemprop="http://xmlns.com/foaf/0.1/name"&gt;Albert Einstein&lt;/span&gt;
    &lt;time itemprop="http://dbpedia.org/property/dateOfBirth"&gt;1879-03-14&lt;/time&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>

<p>It is generally harder to move to microdata from RDFa when the RDFa has an element that both provides a subject and provides a property.</p>

<p>The RDFa spec provides a couple of additional methods of marking up the same content to give exactly the same RDF (and microdata JSON):</p>

<blockquote>
  <p>Note that the div is superfluous, and an RDFa Processor will create the intermediate object even if the element is removed:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Baruch_Spinoza" rel="dbp-owl:influenced"&gt;
  &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
  &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
&lt;/div&gt;
</code></pre>
  
  <p>An alternative pattern is to keep the div and move the @rel onto it:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Baruch_Spinoza"&gt;
  &lt;div rel="dbp-owl:influenced"&gt;
    &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
    &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>
  
  <p>From the point of view of the markup, this latter layout is to be preferred, since it draws attention to the &#8216;hanging rel&#8217;. But from the point of view of an RDFa Processor, all of these permutations need to be supported.</p>
</blockquote>

<p>Interestingly, it&#8217;s this latter permutation that is the one that&#8217;s closest to the microdata method of expressing the data, though as we will see in the next section, the &#8220;hanging rel&#8221; is not exactly equivalent to the <code>itemprop</code> on the wrapper element.</p>

<h2>Hanging Rels</h2>

<blockquote>
  <p>Note that each occurrence of @about will complete any incomplete triples. For example, to mark up the fact that Albert Einstein had a residence both in the German Empire and Switzerland, an author need only specify one @rel value that is then used with multiple @about values:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Albert_Einstein" rel="dbp-owl:residence"&gt;
  &lt;span about="http://dbpedia.org/resource/German_Empire" /&gt;
  &lt;span about="http://dbpedia.org/resource/Switzerland" /&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>The data embedded here gives two values for the <code>dbp-owl:residence</code> property:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://dbpedia.org/resource/Albert_Einstein" ,
    "properties": {
      "http://dbpedia.org/ontology/residence": [
        "http://dbpedia.org/resource/German_Empire" ,
        "http://dbpedia.org/resource/Switzerland"
      ]
    }
  }
]}
</code></pre>

<p>In microdata, the <code>itemprop</code> attribute has to appear on both the nested elements to make it clear that they both provide values for that property:</p>

<pre><code>&lt;div itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" 
     itemid="http://dbpedia.org/resource/Albert_Einstein"&gt;
 &lt;link itemprop="http://dbpedia.org/ontology/residence" 
       href="http://dbpedia.org/resource/German_Empire" /&gt;
 &lt;link itemprop="http://dbpedia.org/ontology/residence"
       href="http://dbpedia.org/resource/Switzerland" /&gt;
&lt;/div&gt;
</code></pre>

<p>The next example illustrates this with nested items rather than strings:</p>

<blockquote>
  <p>To illustrate, to indicate that Spinoza influenced both Einstein and Schopenhauer, the following markup could be used:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Baruch_Spinoza"&gt;
  &lt;div rel="dbp-owl:influenced"&gt;
    &lt;div typeof="foaf:Person"&gt;
      &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
      &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
    &lt;/div&gt;
    &lt;div typeof="foaf:Person"&gt;
      &lt;span property="foaf:name"&gt;Arthur Schopenhauer&lt;/span&gt;
      &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1788-02-22&lt;/span&gt;
    &lt;/div&gt;          
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>This should generate:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://dbpedia.org/resource/Baruch_Spinoza" ,
    "properties": {
      "http://dbpedia.org/ontology/influenced": [{
        "type": "http://xmlns.com/foaf/0.1/Person" ,
        "properties": {
          "name": [ "Albert Einstein" ] ,
          "http://dbpedia.org/property/dateOfBirth": [ "1879-03-14" ]
        }
      }, {
        "type": "http://xmlns.com/foaf/0.1/Person" ,
        "properties": {
          "name": [ "Arthur Schopenhauer" ] ,
          "http://dbpedia.org/property/dateOfBirth": [ "1788-02-22" ]
        }
      }]
    }
  }
]}
</code></pre>

<p>In this case, the <code>itemprop</code> that is equivalent to the RDFa <code>rel</code> has to move down onto the elements representing the items:</p>

<pre><code>&lt;div itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource"
     itemid="http://dbpedia.org/resource/Baruch_Spinoza"&gt;
  &lt;div&gt;
    &lt;div itemprop="http://dbpedia.org/ontology/influenced"
         itemscope itemtype="http://xmlns.com/foaf/0.1/Person"&gt;
      &lt;span itemprop="name"&gt;Albert Einstein&lt;/span&gt;
      &lt;time itemprop="http://dbpedia.org/property/dateOfBirth"&gt;1879-03-14&lt;/time&gt;
    &lt;/div&gt;
    &lt;div itemprop="http://dbpedia.org/ontology/influenced"
         itemscope itemtype="http://xmlns.com/foaf/0.1/Person"&gt;
      &lt;span itemprop="name"&gt;Arthur Schopenhauer&lt;/span&gt;
      &lt;time itemprop="http://dbpedia.org/property/dateOfBirth"&gt;1788-02-22&lt;/time&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>

<p>The wrapper <code>&lt;div&gt;</code> around both items isn&#8217;t necessary; I&#8217;ve left it to stay as close to the markup of the original RDFa as possible.</p>

<h2>Implicit Resources</h2>

<blockquote>
  <p>Triples are also &#8216;completed&#8217; if any one of @property, @rel or @rev are present. However, unlike the situation when @about or @typeof are present, all predicates are attached to one bnode:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Baruch_Spinoza" rel="dbp-owl:influenced"&gt;
  &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
  &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
  &lt;div rel="dbp-owl:residence"&gt;
    &lt;span about="http://dbpedia.org/resource/German_Empire" /&gt;
    &lt;span about="http://dbpedia.org/resource/Switzerland" /&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>To be equivalent to the RDF generated from this markup, the microdata JSON would be:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://dbpedia.org/resource/Baruch_Spinoza" ,
    "properties": {
      "http://dbpedia.org/ontology/influenced": [{
        "properties": {
          "http://xmlns.com/foaf/0.1/name": [ "Albert Einstein" ] ,
          "http://dbpedia.org/property/dateOfBirth": [ "1879-03-14" ]
          "http://dbpedia.org/ontology/residence": [
            "http://dbpedia.org/resource/German_Empire" ,
            "http://dbpedia.org/resource/Switzerland"
          ]
        }
      }]
    }
  }
]}
</code></pre>

<p>Microdata is a lot more explicit about when items get created, and consequently requires a bit more markup:</p>

<pre><code>&lt;div itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource"
     itemid="http://dbpedia.org/resource/Baruch_Spinoza"&gt;
  &lt;div itemprop="http://dbpedia.org/ontology/influenced" itemscope&gt;
    &lt;span itemprop="name"&gt;Albert Einstein&lt;/span&gt;
    &lt;time itemprop="http://dbpedia.org/property/dateOfBirth"&gt;1879-03-14&lt;/time&gt;
    &lt;div&gt;
     &lt;link itemprop="http://dbpedia.org/ontology/residence" 
           href="http://dbpedia.org/resource/German_Empire" /&gt;
     &lt;link itemprop="http://dbpedia.org/ontology/residence"
           href="http://dbpedia.org/resource/Switzerland" /&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>

<h2>Overriding Text Content</h2>

<blockquote>
  <p>The value of @content is given precedence over any element content, so the following would give exactly the same triple as shown above:</p>

<pre><code>&lt;span about="http://internet-apps.blogspot.com/"
      property="dc:creator" content="Mark Birbeck"&gt;John Doe&lt;/span&gt;
</code></pre>
</blockquote>

<p>The equivalent microdata should generate the JSON:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://internet-apps.blogspot.com/" ,
    "properties": {
      "http://purl.org/dc/terms/creator": [ "Mark Birbeck" ]
    }
  }
]}
</code></pre>

<p>Only the <code>&lt;time&gt;</code> element and links override the content of an element in microdata. So a mirror of this example needs a separate element:</p>

<pre><code>  &lt;span itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource"
        itemid="http://internet-apps.blogspot.com/"&gt;
    &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Mark Birbeck" /&gt;
    John Doe
  &lt;/span&gt;
</code></pre>

<h2>Language Support</h2>

<blockquote>
  <p>In RDFa the Host Language may provide a mechanism for setting the language tag. In XHTML+RDFa [XHTML-RDFA], for example, the XML language attribute @xml:lang or the attribute @lang is used to add this information, whether the plain literal is designated by @content, or by the inline text of the element:</p>

<pre><code>&lt;meta about="http://example.org/node"
  property="ex:property" xml:lang="fr" content="chat" /&gt;
</code></pre>
</blockquote>

<p>Like the datatype of a value, the language of a value isn&#8217;t captured by the microdata data model or the JSON representation of that data model. So the fact that &#8216;chat&#8217; is French is lost:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://example.org/node" ,
    "properties": {
      "http://example.org/property": [ "chat" ]
    }
  }
]}
</code></pre>

<p>The equivalent microdata is thus:</p>

<pre><code>&lt;span itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource"
      itemid="http://example.org/node"
  &lt;meta itemprop="ex:property" xml:lang="fr" content="chat" /&gt;
&lt;/span&gt;
</code></pre>

<p>with the language only accessible if you are using the DOM to process the microdata.</p>

<h2>Literals that Include Markup</h2>

<blockquote>
  <p>RDFa therefore supports the use of normal markup to express XML literals, by using @datatype:</p>

<pre><code>&lt;h2 property="dc:title" datatype="rdf:XMLLiteral"&gt;
  E = mc&lt;sup&gt;2&lt;/sup&gt;: The Most Urgent Problem of Our Time
&lt;/h2&gt;
</code></pre>
</blockquote>

<p>The <code>datatype="rdf:XMLLiteral"</code> acts like a flag to indicate that the serialised content of the element (<code>innerHTML</code>) needs to be used as the value of the property, rather than the <code>textContent</code>, which includes markup, can be expressed in microdata JSON as follows:</p>

<pre><code>{ "http://purl.org/dc/terms/title": "E = mc&lt;sup&gt;2&lt;/sup&gt;: The Most Urgent Problem of Our Time" }
</code></pre>

<p>There&#8217;s no way to generate this in microdata except by repeating the escaped version of the content in a <code>content</code> attribute:</p>

<pre><code>&lt;h2&gt;
  E = mc&lt;sup&gt;2&lt;/sup&gt;: The Most Urgent Problem of Our Time
  &lt;meta itemprop="http://purl.org/dc/terms/title"
    content="E = mc&amp;lt;sup&gt;2&amp;lt;/sup&gt;: The Most Urgent Problem of Our Time" /&gt;
&lt;/h2&gt;
</code></pre>

<p>This is hardly ideal. It&#8217;s tedious enough with a short string like this one; for larger amounts of information such as long descriptions of an event, it would be very tedious.</p>

<h2>The <code>resource</code> Attribute</h2>

<blockquote>
  <p>RDFa provides the @resource attribute as a way to set the object of statements. This is particularly useful when referring to resources that are not themselves navigable links:</p>

<pre><code>&lt;html&gt;
  &lt;head&gt;
    &lt;title&gt;On Crime and Punishment&lt;/title&gt;
    &lt;base href="http://www.example.com/candp.xhtml" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;blockquote about="#q1" rel="dc:source" resource="urn:ISBN:0140449132" &gt;
      &lt;p id="q1"&gt;
        Rodion Romanovitch! My dear friend! If you go on in this way
        you will go mad, I am positive! Drink, pray, if only a few drops!
      &lt;/p&gt;
    &lt;/blockquote&gt;
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>
</blockquote>

<p>This should produce:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://www.example.com/candp.xhtml#q1" ,
    "properties": {
      "http://purl.org/dc/terms/source": [ "urn:ISBN:0140449132" ]
    }
  }
]}
</code></pre>

<p>which is expressed through:</p>

<pre><code>&lt;html&gt;
  &lt;head&gt;
    &lt;title&gt;On Crime and Punishment&lt;/title&gt;
    &lt;base href="http://www.example.com/candp.xhtml" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;blockquote itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" itemid="#q1"&gt;
      &lt;link itemprop="http://purl.org/dc/terms/source" href="urn:ISBN:0140449132" /&gt;
      &lt;p id="q1"&gt;
        Rodion Romanovitch! My dear friend! If you go on in this way
        you will go mad, I am positive! Drink, pray, if only a few drops!
      &lt;/p&gt;
    &lt;/blockquote&gt;
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<p>The property and value have to be moved onto a nested <code>&lt;link&gt;</code>, but this is a more extensible pattern than the RDFa method as it enables other properties to be expressed in the same way.</p>

<h2>Multiple Types</h2>

<p>This last example comes from the wild rather than being an example in the specification. At <a href="http://bitmunk.com/browse">http://bitmunk.com/browse</a> we find:</p>

<pre><code>&lt;span about="http://bitmunk.com/about#service" 
      typeof="vcard:VCard commerce:Business gr:BusinessEntity" 
      property="rdfs:label vcard:fn"&gt;Bitmunk&lt;/span&gt;
</code></pre>

<p>This shows the use of multiple types and of multiple properties with the same value, because the pages are attempting to use multiple vocabularies that cover the same domain (organisations) to different depths. In the equivalent microdata, we have to choose one of the types; I&#8217;m going to assume that it should just use the first one from the <code>typeof</code> attribute:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2006/vcard/ns#VCard" ,
    "id": "http://bitmunk.com/about#service" ,
    "properties": {
      "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
        "http://purl.org/commerce#Business" ,
        "http://purl.org/goodrelations/v1#BusinessEntity"
      ] ,
      "http://www.w3.org/2000/01/rdf-schema#label": [ "Bitmunk" ] ,
      "fn": [ "Bitmunk" ]
    }
  }
]}
</code></pre>

<p>The microdata equivalent is:</p>

<pre><code>&lt;span itemscope itemid="http://bitmunk.com/about#service" 
      itemtype="http://www.w3.org/2006/vcard/ns#VCard"&gt;
  &lt;link itemprop="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
       href="http://purl.org/commerce#Business" /&gt;
  &lt;link itemprop="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
       href="http://purl.org/goodrelations/v1#BusinessEntity" /&gt;
  &lt;span itemprop="http://www.w3.org/2000/01/rdf-schema#label fn"&gt;Bitmunk&lt;/span&gt;
&lt;/span&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>Technically, the RDFa doesn&#8217;t place any ordering on the three classes, but I&#8217;m picking the first for the purpose of the microdata conversion. The other classes are harder to get at in the JSON: they have to be referenced via the <code>rdf:type</code> microdata property rather than the <code>type</code> JSON property. Consumers that are on the lookout for items of the type <code>gr:BusinessEntity</code> wouldn&#8217;t spot these items.</li>
</ul>
    ]]></content>
  </entry>
  <entry>
    <title>Mapping Microdata to RDFa</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/163" />
    <id>http://www.jenitennison.com/blog/node/163</id>
    <published>2011-08-20T17:35:28+01:00</published>
    <updated>2012-05-19T11:56:50+01:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="microdata" />
    <category term="rdfa" />
    <summary type="html"><![CDATA[<p>This post is part of a three-part series that analyses the differences in features and syntax between microdata and RDFa. The series attempts:</p>

<ul>
<li>to identify the differences in approach and functionality of the two languages, which should help developers choose between them</li>
<li>to identify any guidelines for developers of vocabularies for use with both languages</li>
<li>to identify a subset of functionality that is common between the two languages, which developers might want to stick to to make switching and mixing easier</li>
<li>to identify mapping rules that might be applied to automatically or manually map from one language to another if the simple subset is used</li>
</ul>

<p>I&#8217;ve done this by looking at converting microdata examples to RDFa and vice versa, and the lessons to be drawn from that exercise. The three posts are on:</p>

<ul>
<li><a href="http://www.jenitennison.com/blog/node/163">converting microdata to RDFa</a></li>
<li><a href="http://www.jenitennison.com/blog/node/164">converting RDFa to microdata</a></li>
<li><a href="http://www.jenitennison.com/blog/node/165">lessons learned from this exercise</a></li>
</ul>

<p>This post is the first of these, which looks at how microdata might be mapped to RDFa, in terms of generating the same RDF according to the microdata-to-RDF mapping rules that I outlined in my post on <a href="http://www.jenitennison.com/blog/node/162">Microdata + RDF</a>.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>This post is part of a three-part series that analyses the differences in features and syntax between microdata and RDFa. The series attempts:</p>

<ul>
<li>to identify the differences in approach and functionality of the two languages, which should help developers choose between them</li>
<li>to identify any guidelines for developers of vocabularies for use with both languages</li>
<li>to identify a subset of functionality that is common between the two languages, which developers might want to stick to to make switching and mixing easier</li>
<li>to identify mapping rules that might be applied to automatically or manually map from one language to another if the simple subset is used</li>
</ul>

<p>I&#8217;ve done this by looking at converting microdata examples to RDFa and vice versa, and the lessons to be drawn from that exercise. The three posts are on:</p>

<ul>
<li><a href="http://www.jenitennison.com/blog/node/163">converting microdata to RDFa</a></li>
<li><a href="http://www.jenitennison.com/blog/node/164">converting RDFa to microdata</a></li>
<li><a href="http://www.jenitennison.com/blog/node/165">lessons learned from this exercise</a></li>
</ul>

<p>This post is the first of these, which looks at how microdata might be mapped to RDFa, in terms of generating the same RDF according to the microdata-to-RDF mapping rules that I outlined in my post on <a href="http://www.jenitennison.com/blog/node/162">Microdata + RDF</a>.</p>

<!--break-->

<p>I have based what&#8217;s written here on the latest specifications of both microdata (in its <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html">WHAT WG</a> and <a href="http://dev.w3.org/html5/md/Overview.html">W3C</a> variants) and <a href="http://www.w3.org/2010/02/rdfa/drafts/2011/ED-rdfa-core-20110814/">RDFa Core</a> and <a href="http://dev.w3.org/html5/rdfa/">HTML+RDFa</a> but I haven&#8217;t consulted with anyone involved in these efforts and may well have got things wrong. <a href="http://rdf.greggkellogg.net/distiller">Gregg Kellogg&#8217;s Distiller service</a> has proved invaluable for testing, so many thanks to him for providing that service.</p>

<p>The post is rather heavy going and you might want to just <a href="http://www.jenitennison.com/blog/node/165">skip to the summary</a> instead of reading the whole thing.</p>

<p>The post goes through the examples from the microdata specification (most of them are in both versions, the only exceptions being those that use the vCard vocabulary). I haven&#8217;t included examples that don&#8217;t illustrate anything new, so there are some that are skipped. Other examples would be welcome.</p>

<h2>Unidentified Items / Blank Node Subjects</h2>

<blockquote>
  <p>Here there are two items, each of which has the property &#8220;name&#8221;:</p>

<pre><code>&lt;div itemscope&gt;
 &lt;p&gt;My name is &lt;span itemprop="name"&gt;Elizabeth&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;

&lt;div itemscope&gt;
 &lt;p&gt;My name is &lt;span itemprop="name"&gt;Daniel&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>The first challenge is to map this into RDFa because the properties are tokens rather than URIs and there is no type for either of the items. What I&#8217;ll assume here is that the <code>name</code> properties are local to the document itself and thus the equivalent RDF is:</p>

<pre><code>[ &lt;#name&gt; "Elizabeth" ] .
[ &lt;#name&gt; "Daniel" ] .
</code></pre>

<p>This can be achieved in RDFa through either:</p>

<pre><code>&lt;div vocab="#" about="_:elizabeth"&gt;
  &lt;p&gt;My name is &lt;span property="name"&gt;Elizabeth&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;

&lt;div vocab="#" about="_:daniel"&gt;
  &lt;p&gt;My name is &lt;span property="name"&gt;Daniel&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;
</code></pre>

<p>or:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
  &lt;p&gt;My name is &lt;span property="name"&gt;Elizabeth&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;

&lt;div vocab="#" typeof&gt;
  &lt;p&gt;My name is &lt;span property="name"&gt;Daniel&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>The <code>vocab="#"</code> sets the vocabulary to the location of the current document (plus an empty fragment identifier); this URI is then concatenated to the property token (<code>name</code>) to create a URI that is unique to the document. In a document such as this it would make sense to put the <code>vocab="#"</code> attribute on the <code>&lt;html&gt;</code> element rather than on every single item.</li>
<li>With no type in sight, blank nodes can either be created by having an empty <code>typeof</code> attribute or through an <code>about</code> attributes whose value starts with <code>_:</code>. The latter has the advantage of providing an identifier for the blank node that can be used elsewhere in the document, but the former is shorter so will be used where possible in the remaining examples of this post.</li>
</ul>

<h2>Values from the <code>src</code> Attribute</h2>

<p>The next example introduces the use of the <code>src</code> attribute to set the value of the property.</p>

<blockquote>
  <p>In this example, the item has one property, &#8220;image&#8221;, whose value is a URL:</p>

<pre><code>&lt;div itemscope&gt;
 &lt;img itemprop="image" src="google-logo.png" alt="Google"&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>This should probably be mapped to the RDF:</p>

<pre><code>[ &lt;#image&gt; &lt;google-logo.png&gt; ] .
</code></pre>

<p>The difficulty with this is that in RDFa, the <code>src</code> attribute is used for the <em>subject</em> of a statement (equivalent to a microdata item) rather than the <em>object</em> (equivalent to a microdata value). So we have two choices for equivalent RDFa. One is to use a similar pattern to that used above, but introduce a wrapper element that provides the property:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;span rel="image"&gt;&lt;img src="google-logo.png" alt="Google"&gt;&lt;/span&gt;
&lt;/div&gt;
</code></pre>

<p>Another is to provide what would normally be an <em>object</em> through a <code>resource</code> attribute and then use a <code>rev</code> attribute (rather than the usual <code>rel</code>) attribute to reverse the relationship:</p>

<pre><code>&lt;div vocab="#"&gt;
 &lt;img resource="_:thing" rev="image" src="google-logo.png" alt="Google"&gt;
&lt;/div&gt;
</code></pre>

<p>This has three disadvantages over the first option:</p>

<ul>
<li>the <code>resource</code> attribute that creates the item is on the <code>&lt;img&gt;</code> element rather than on the wrapper <code>&lt;div&gt;</code> which makes it hard to create other properties for that item</li>
<li>we have to use a <code>rev</code> attribute, reversing the normal flow of relationships; I (at least) find this hard to figure out when there&#8217;s not a <code>rel</code> attribute as well</li>
<li><ins>we have to make up an id for the blank node we want to generate</ins></li>
</ul>

<p>I&#8217;ll note that it took me five or six failed attempts to generate the above options. If I hadn&#8217;t had the <a href="http://rdf.greggkellogg.net/distiller">RDF Distiller</a> to test with, I would have got it wrong. <del>Note that at least through the RDF Distiller, to be recognised, the <code>resource</code> attribute has to have an (empty) value &#8212; it is not enough for it to simply be present, unlike with the <code>typeof</code> attribute.</del> <ins>Note that the <code>resource</code> attribute has to explicitly point to a blank node to create a blank node rather than having the property be associated with the document in which this appears.</ins></p>

<h2>Values from the <code>datetime</code> Attribute</h2>

<p>The next example illustrates the use of the <code>&lt;time&gt;</code> element to provide a date/time value for a property.</p>

<blockquote>
  <p>In this example, the item has one property, &#8220;birthday&#8221;, whose value is a date:</p>

<pre><code>&lt;div itemscope&gt;
 I was born on &lt;time itemprop="birthday" datetime="2009-05-10"&gt;May 10th 2009&lt;/time&gt;.
&lt;/div&gt;
</code></pre>
</blockquote>

<p>I&#8217;m assuming this should map to the RDF:</p>

<pre><code>[ &lt;#birthday&gt; "2009-05-10"^^&lt;http://www.w3.org/2001/XMLSchema#date&gt; ]
</code></pre>

<p>There is an open issue (<a href="http://www.w3.org/2010/02/rdfa/track/issues/97">ISSUE-97</a>) about this on RDFa, which currently requires the use of the <code>content</code> attribute to provide the value as follows:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 I was born on &lt;time property="birthday" content="2009-05-10" datatype="xsd:date" datetime="2009-05-10"&gt;May 10th 2009&lt;/time&gt;.
&lt;/div&gt;
</code></pre>

<p>Note that the <code>xsd:</code> prefix is built-in within RDFa so there&#8217;s on need for any declaration for it, which makes it fairly easy to specify the standard date/time datatypes.</p>

<p>If ISSUE-97 were resolved nicely it would be possible to instead do:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 I was born on &lt;time property="birthday" datetime="2009-05-10"&gt;May 10th 2009&lt;/time&gt;.
&lt;/div&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>To make this work, RDFa processors would have to look at the syntax of the <code>datetime</code> attribute to work out what datatype the value should be matched to.</li>
<li>The syntax permitted in the <code>datetime</code> attribute isn&#8217;t exactly the same as that permitted by the XML Schema <code>time</code> and <code>dateTime</code> types usually used in RDF (and XML), in that the seconds component is optional within HTML. The resolution to ISSUE-97 will need to take this into account. Otherwise, anyone mapping from microdata to RDFa manually will need to ensure that the <code>content</code> attribute includes the seconds component.</li>
</ul>

<h2>Nested Items / Object Properties</h2>

<blockquote>
  <p>In this example, the outer item represents a person, and the inner one represents a band:</p>

<pre><code>&lt;div itemscope&gt;
 &lt;p&gt;Name: &lt;span itemprop="name"&gt;Amanda&lt;/span&gt;&lt;/p&gt;
 &lt;p&gt;Band: &lt;span itemprop="band" itemscope&gt; &lt;span itemprop="name"&gt;Jazz Band&lt;/span&gt; (&lt;span itemprop="size"&gt;12&lt;/span&gt; players)&lt;/span&gt;&lt;/p&gt;
&lt;/div&gt;
</code></pre>
  
  <p>The outer item here has two properties, &#8220;name&#8221; and &#8220;band&#8221;. The &#8220;name&#8221; is &#8220;Amanda&#8221;, and the &#8220;band&#8221; is an item in its own right, with two properties, &#8220;name&#8221; and &#8220;size&#8221;. The &#8220;name&#8221; of the band is &#8220;Jazz Band&#8221;, and the &#8220;size&#8221; is &#8220;12&#8221;.</p>
</blockquote>

<p>The equivalent RDF for this example would be:</p>

<pre><code>[
  &lt;#name&gt; "Amanda" ;
  &lt;#band&gt; [
    &lt;#name&gt; "Jazz Band" ;
    &lt;#size&gt; "12"
  ]
]
</code></pre>

<p>Note that the <code>size</code> property is just a plain literal value; unlike with date/times, there&#8217;s no way to tell from the microdata that the value is a number.</p>

<p>In RDFa this could be done with:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;p&gt;Name: &lt;span property="name"&gt;Amanda&lt;/span&gt;&lt;/p&gt;
 &lt;p&gt;Band: &lt;span rel="band"&gt; &lt;span property="name"&gt;Jazz Band&lt;/span&gt; (&lt;span property="size"&gt;12&lt;/span&gt; players)&lt;/span&gt;&lt;/p&gt;
&lt;/div&gt;
</code></pre>

<p>This follows the microdata fairly closely but note that the nested resource doesn&#8217;t need an empty <code>typeof</code> attribute: it&#8217;s only the top-level items that do. It might be easier, for consistency and extensibility, to always include an explicit nested element (with an empty <code>typeof</code> attribute in this case) to represent the nested resource:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;p&gt;Name: &lt;span property="name"&gt;Amanda&lt;/span&gt;&lt;/p&gt;
 &lt;p&gt;Band: &lt;span rel="band"&gt;&lt;span typeof&gt; &lt;span property="name"&gt;Jazz Band&lt;/span&gt; (&lt;span property="size"&gt;12&lt;/span&gt; players)&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;/div&gt;
</code></pre>

<p>The other thing that people have to watch out for is that because the value of the <code>band</code> property is a resource rather than a literal, we have to use the <code>rel</code> attribute rather than the <code>property</code> attribute as we do elsewhere.</p>

<h2>Itemref</h2>

<blockquote>
  <p>This example is the same as the previous one, but all the properties are separated from their items:</p>

<pre><code>&lt;div itemscope id="amanda" itemref="a b"&gt;&lt;/div&gt;
&lt;p id="a"&gt;Name: &lt;span itemprop="name"&gt;Amanda&lt;/span&gt;&lt;/p&gt;
&lt;div id="b" itemprop="band" itemscope itemref="c"&gt;&lt;/div&gt;
&lt;div id="c"&gt;
 &lt;p&gt;Band: &lt;span itemprop="name"&gt;Jazz Band&lt;/span&gt;&lt;/p&gt;
 &lt;p&gt;Size: &lt;span itemprop="size"&gt;12&lt;/span&gt; players&lt;/p&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>This should create the same RDF as the previous example:</p>

<pre><code>[
  &lt;#name&gt; "Amanda" ;
  &lt;#band&gt; [
    &lt;#name&gt; "Jazz Band" ;
    &lt;#size&gt; "12"
  ]
]
</code></pre>

<p>changing the markup as little as possible. The RDFa equivalent is:</p>

<pre><code>&lt;div id="amanda"&gt;&lt;/div&gt;
&lt;p vocab="#" about="_:amanda"&gt;Name: &lt;span property="name"&gt;Amanda&lt;/span&gt;&lt;/p&gt;
&lt;div vocab="#" about="_:amanda" rel="band" resource="_:c"&gt;&lt;/div&gt;
&lt;div vocab="#" about="_:c"&gt;
 &lt;p&gt;Band: &lt;span property="name"&gt;Jazz Band&lt;/span&gt;&lt;/p&gt;
 &lt;p&gt;Size: &lt;span property="size"&gt;12&lt;/span&gt; players&lt;/p&gt;
&lt;/div&gt;
</code></pre>

<p>In microdata, the <code>itemref</code> attribute is a method of an item adopting name/value pairs described in a separate location within the page. In RDFa, the equivalent is to say that the name/value pairs are all related to the same resource by consistently referring to the resource as the subject of the statements. In the above case, there are two blank nodes labelled <code>_:amanda</code> and <code>_:c</code>, and the <code>about</code> attribute is used on the same elements that provide the properties (or a wrapper element) to indicate the identity of the subject of the statements.</p>

<p>Notes:</p>

<ul>
<li>The <code>resource</code> attribute has to be used to indicate the blank node for the band.</li>
<li>As before, the <code>rel</code> attribute has to be used for the <code>band</code> property, rather than the <code>property</code> attribute, because the object of the statement is a resource. The rule is that if you&#8217;re using <code>resource</code>, you should use <code>rel</code>. (I used <code>property</code> erroneously the first time I tried to write this mapping. I will never learn.)</li>
</ul>

<p>There is another example of <code>itemref</code> in use later in the microdata specification:</p>

<blockquote>
<pre><code>&lt;!DOCTYPE HTML&gt;
&lt;html&gt;
 &lt;head&gt;
  &lt;title&gt;Photo gallery&lt;/title&gt;
 &lt;/head&gt;
 &lt;body&gt;
  &lt;h1&gt;My photos&lt;/h1&gt;
  &lt;figure itemscope itemtype="http://n.whatwg.org/work" itemref="licenses"&gt;
   &lt;img itemprop="work" src="images/house.jpeg" alt="A white house, boarded up, sits in a forest."&gt;
   &lt;figcaption itemprop="title"&gt;The house I found.&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;figure itemscope itemtype="http://n.whatwg.org/work" itemref="licenses"&gt;
   &lt;img itemprop="work" src="images/mailbox.jpeg" alt="Outside the house is a mailbox. It has a leaflet inside."&gt;
   &lt;figcaption itemprop="title"&gt;The mailbox.&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;footer&gt;
   &lt;p id="licenses"&gt;All images licensed under the &lt;a itemprop="license"
   href="http://www.opensource.org/licenses/mit-license.php"&gt;MIT
   license&lt;/a&gt;.&lt;/p&gt;
  &lt;/footer&gt;
 &lt;/body&gt;
&lt;/html&gt;
</code></pre>
</blockquote>

<p>This is equivalent to the RDF:</p>

<pre><code>[
  a &lt;http://n.whatwg.org/work&gt; ;
  &lt;http://n.whatwg.org/license&gt; &lt;http://www.opensource.org/licenses/mit-license.php&gt; ;
  &lt;http://n.whatwg.org/work&gt; &lt;images/house.jpeg&gt; ;
  &lt;http://n.whatwg.org/title&gt; "The house I found." ;
] .
[
  a &lt;http://n.whatwg.org/work&gt; ;
  &lt;http://n.whatwg.org/license&gt; &lt;http://www.opensource.org/licenses/mit-license.php&gt; ;
  &lt;http://n.whatwg.org/work&gt; &lt;images/mailbox.jpeg&gt; ;
  &lt;http://n.whatwg.org/title&gt; "The mailbox." ;
] .
</code></pre>

<p>Note that the <code>license</code> property is adopted by both the items in the microdata. In this particular example, the two items have the same type, and thus the <code>license</code> property has the same meaning in each item. It&#8217;s also possible for <code>itemref</code> to be used on two items that have different types, pointing to the same element, in which case the shared properties defined within that element could mean different things for the two items.</p>

<p>There is no way that I am aware of within RDFa to support shared use of portions of content. There could be a rough equivalent that would work in the case where the shared properties had the same semantics if RDFa allowed the <code>about</code> attribute to take multiple values (<strong>note: invalid example</strong>):</p>

<pre><code>&lt;!DOCTYPE HTML&gt;
&lt;html&gt;
 &lt;head&gt;
  &lt;title&gt;Photo gallery&lt;/title&gt;
 &lt;/head&gt;
 &lt;body vocab="http://n.whatwg.org/"&gt;
  &lt;h1&gt;My photos&lt;/h1&gt;
  &lt;figure about="_:house" typeof="work"&gt;
   &lt;span rel="work"&gt;&lt;img src="images/house.jpeg" alt="A white house, boarded up, sits in a forest."&gt;&lt;/span&gt;
   &lt;figcaption property="title"&gt;The house I found.&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;figure about="_:mailbox" typeof="work"&gt;
   &lt;span rel="work"&gt;&lt;img src="images/mailbox.jpeg" alt="Outside the house is a mailbox. It has a leaflet inside."&gt;&lt;/span&gt;
   &lt;figcaption property="title"&gt;The mailbox.&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;footer&gt;
   &lt;p about="_:house _:mailbox"&gt;All images licensed under the &lt;a rel="license"
   href="http://www.opensource.org/licenses/mit-license.php"&gt;MIT
   license&lt;/a&gt;.&lt;/p&gt;
  &lt;/footer&gt;
 &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<p>but this wouldn&#8217;t support the possibility of the same property having different semantics (and therefore different URIs) for the separate resources.</p>

<p>It&#8217;s also worth noting in this example that the mapping to RDF that I&#8217;m assuming results, in this example, in <code>http://n.whatwg.org/work</code> being both a class and a property. The creators of RDF vocabularies tend to name classes with an Uppercase initial letter and properties with a lowercase initial letter, and thus avoid these kinds of clashes. Vocabulary designers who are mindful of mappings to RDF may want to take the same approach.</p>

<h2>Multiple Values</h2>

<blockquote>
  <p>This example describes an ice cream, with two flavors:</p>

<pre><code>&lt;div itemscope&gt;
 &lt;p&gt;Flavors in my favorite ice cream:&lt;/p&gt;
 &lt;ul&gt;
  &lt;li itemprop="flavor"&gt;Lemon sorbet&lt;/li&gt;
  &lt;li itemprop="flavor"&gt;Apricot sorbet&lt;/li&gt;
 &lt;/ul&gt;
&lt;/div&gt;
</code></pre>
  
  <p>This thus results in an item with two properties, both &#8220;flavor&#8221;, having the values &#8220;Lemon sorbet&#8221; and &#8220;Apricot sorbet&#8221;.</p>
</blockquote>

<p>This example highlights one of the real nightmares of RDF: lists. In microdata, the order of the values &#8216;Lemon sorbet&#8217; and &#8216;Apricot sorbet&#8217; is naturally retained. There are three possible mappings to RDF.</p>

<h3>Creating Multiple Statements</h3>

<p>If the order of the flavours of ice cream in this example don&#8217;t actually matter, the equivalent RDF is:</p>

<pre><code>[ &lt;#flavor&gt; "Lemon sorbet" , "Apricot sorbet" ]
</code></pre>

<p>which is equivalent to:</p>

<pre><code>[ &lt;#flavor&gt; "Apricot sorbet" , "Lemon sorbet" ]
</code></pre>

<p>In this case, the RDFa is straight-forward:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;p&gt;Flavors in my favorite ice cream:&lt;/p&gt;
 &lt;ul&gt;
  &lt;li property="flavor"&gt;Lemon sorbet&lt;/li&gt;
  &lt;li property="flavor"&gt;Apricot sorbet&lt;/li&gt;
 &lt;/ul&gt;
&lt;/div&gt;
</code></pre>

<p>It&#8217;s surprising how common it is that order doesn&#8217;t actually matter when there are multiple values for a property, such that this mapping is quite sufficient. But I&#8217;m absolutely not going to pretend that order is never important&#8230;</p>

<h3>Creating an <code>rdf:Seq</code></h3>

<p>If the order of the flavours does matter, there are two ways of representing that order using RDF. The first is to use an <code>rdf:Seq</code> resource. This method was the original method of representing lists in RDF and is very natural to do in RDF/XML, but has largely fallen out of favour for the second method which I&#8217;ll describe below.</p>

<p>Using the <code>rdf:Seq</code> method, the equivalent RDF for the microdata would be:</p>

<pre><code>@prefix rdf: &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; .
[
  &lt;#flavor&gt; [
    a rdf:Seq ;
    rdf:_1 "Lemon sorbet" ;
    rdf:_2 "Apricot sorbet"
  ]
]
</code></pre>

<p>which can be generated with:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;p&gt;Flavors in my favorite ice cream:&lt;/p&gt;
 &lt;div rel="flavor"&gt;
  &lt;ul typeof="rdf:Seq"&gt;
   &lt;li property="rdf:_1"&gt;Lemon sorbet&lt;/li&gt;
   &lt;li property="rdf:_2"&gt;Apricot sorbet&lt;/li&gt;
  &lt;/ul&gt;
 &lt;/div&gt;
&lt;/div&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>There are various other ways in which the namespace for the <code>rdf:Seq</code> could be created, but since the <code>rdf:</code> prefix is built-in to RDFa 1.1, it seems easier to use that than anything that explicitly writes out the full (ugly) RDF namespace.</li>
<li>The <code>&lt;div&gt;</code> wrapper for the <code>&lt;ul&gt;</code> is needed in the same way as the wrapper <code>&lt;span&gt;</code> element was needed in the <code>&lt;img&gt;</code> example above. Whereas in microdata, the property element also describes the value of that property, in RDFa when the object of a statement is a resource the description of that resource is nested inside the property element (in a similar way to RDF/XML).</li>
</ul>

<h3>Creating a <code>rdf:List</code></h3>

<p>The current recommended way to create a list in RDF is to use a <code>rdf:List</code> resource. This essentially uses a <a href="http://en.wikipedia.org/wiki/Linked_list">linked list</a> model to represent lists, with the <code>rdf:first</code> item of a list being a value and the <code>rdf:rest</code> being either another <code>rdf:List</code> or <code>rdf:nil</code>. Spelled out, the RDF would look like:</p>

<pre><code>@prefix rdf: &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; .
[ 
  &lt;#flavor&gt; [
    a rdf:List ;
    rdf:first "Lemon sorbet" ;
    rdf:rest [
      rdf:first "Apricot sorbet" ;
      rdf:rest rdf:nil
    ]
  ]
]
</code></pre>

<p>but of course Turtle lets you write it:</p>

<pre><code>[] &lt;#flavor&gt; ( "Lemon sorbet" "Apricot sorbet" ) .
</code></pre>

<p>Unfortunately, RDFa has no such syntax sugar. Which means:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;p&gt;Flavors in my favorite ice cream:&lt;/p&gt;
 &lt;div rel="flavor"&gt;
  &lt;ul typeof="rdf:List"&gt;
   &lt;li property="rdf:first"&gt;Lemon sorbet&lt;/li&gt;
   &lt;li rel="rdf:rest"&gt;
    &lt;span typeof="rdf:List"&gt;
     &lt;span property="rdf:first"&gt;Apricot sorbet&lt;/span&gt;
     &lt;a rel="rdf:rest" href="rdf:nil"&gt;&lt;/a&gt;
    &lt;/span&gt;
   &lt;/li&gt;
  &lt;/ul&gt;
 &lt;/div&gt;
&lt;/div&gt;
</code></pre>

<p>Yep, horrific. Verbose and easy to get wrong, and that&#8217;s just for two items. If a third is added, the pattern is to add an <code>about</code> attribute on the middle items of the list so that the <code>rdf:rest</code> property which covers the next item in the list can be assigned to it. For example:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;p&gt;Flavors in my favorite ice cream:&lt;/p&gt;
 &lt;div rel="flavor"&gt;
  &lt;ul typeof="rdf:List"&gt;
   &lt;li property="rdf:first"&gt;Lemon sorbet&lt;/li&gt;
   &lt;li rel="rdf:rest"&gt;
    &lt;span about="_:2" typeof="List"&gt;
     &lt;span property="rdf:first"&gt;Apricot sorbet&lt;/span&gt;
    &lt;/span&gt;
   &lt;/li&gt;
   &lt;li about="_:2" rel="rdf:rest"&gt;
     &lt;span typeof="rdf:List"&gt;
     &lt;span property="rdf:first"&gt;Raspberry sorbet&lt;/span&gt;
     &lt;a rel="rdf:rest" href="http://www.w3.org/1999/02/22-rdf-syntax-ns#nil"&gt;&lt;/a&gt;
     &lt;/span&gt;
   &lt;/li&gt;
  &lt;/ul&gt;
 &lt;/div&gt;
&lt;/div&gt;
</code></pre>

<p>Note:</p>

<ul>
<li>I&#8217;ve used an empty <code>&lt;a&gt;</code> element with a <code>href</code> attribute to point to the <code>rdf:nil</code> resource. An alternative would be to use the <code>resource</code> attribute, which would have the advantage of not having to spell out the full URI for <code>rdf:nil</code>, but I&#8217;m trying to stick to using as few attributes as possible.</li>
<li>Using an empty <code>&lt;a&gt;</code> element for a link isn&#8217;t ideal; it would be neater to use a <code>&lt;link&gt;</code> element, but these aren&#8217;t allowed in flow content within HTML5 (<code>&lt;link&gt;</code> and <code>&lt;meta&gt;</code> are only permitted within the microdata specification, and then only if they have an <code>itemprop</code> attribute). The RDFa specification could likewise allow them.</li>
</ul>

<h2>Multiple Properties Sharing a Value</h2>

<blockquote>
  <p>Here we see an item with two properties, &#8220;favorite-color&#8221; and &#8220;favorite-fruit&#8221;, both set to the value &#8220;orange&#8221;:</p>

<pre><code>&lt;div itemscope&gt;
 &lt;span itemprop="favorite-color favorite-fruit"&gt;orange&lt;/span&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>This should map to the RDF:</p>

<pre><code>[
  &lt;#favorite-color&gt; "orange" ;
  &lt;#favorite-fruit&gt; "orange"
]
</code></pre>

<p>Like <code>itemprop</code>, <code>property</code> can take multiple values, so the RDFa equivalent is simply:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;span property="favorite-color favorite-fruit"&gt;orange&lt;/span&gt;
&lt;/div&gt;
</code></pre>

<h2>Types</h2>

<blockquote>
  <p>Here, the item&#8217;s type is &#8220;http://example.org/animals#cat&#8221;:</p>

<pre><code>&lt;section itemscope itemtype="http://example.org/animals#cat"&gt;
 &lt;h1 itemprop="name"&gt;Hedral&lt;/h1&gt;
 &lt;p itemprop="desc"&gt;Hedral is a male american domestic
 shorthair, with a fluffy black fur with white paws and belly.&lt;/p&gt;
 &lt;img itemprop="img" src="hedral.jpeg" alt="" title="Hedral, age 18 months"&gt;
&lt;/section&gt;
</code></pre>
  
  <p>In this example the &#8220;http://example.org/animals#cat&#8221; item has three properties, a &#8220;name&#8221; (&#8220;Hedral&#8221;), a &#8220;desc&#8221; (&#8220;Hedral is&#8230;&#8221;), and an &#8220;img&#8221; (&#8220;hedral.jpeg&#8221;).</p>
</blockquote>

<p>I&#8217;ll assume that this should be mapped to the RDF:</p>

<pre><code>[
  a &lt;http://example.org/animals#cat&gt; ;
  &lt;http://example.org/animals#name&gt; "Hedral" ;
  &lt;http://example.org/animals#desc&gt; "Hedral is a male american domestic shorthair, with a fluffy black fur with white paws and belly." ;
  &lt;http://example.org/animals#img&gt; &lt;hedral.jpeg&gt;
]
</code></pre>

<p>In this case, the <code>vocab</code> can be set to <code>http://example.org/animals#</code> and both the <code>itemtype</code> and the various <code>property</code> and <code>rel</code> attributes will use that as the basis for their identifying URIs:</p>

<pre><code>&lt;section vocab="http://example.org/animals#" typeof="cat"&gt;
 &lt;h1 property="name"&gt;Hedral&lt;/h1&gt;
 &lt;p property="desc"&gt;Hedral is a male american domestic
 shorthair, with a fluffy black fur with white paws and belly.&lt;/p&gt;
 &lt;div rel="img"&gt;&lt;img src="hedral.jpeg" alt="" title="Hedral, age 18 months"&gt;&lt;/div&gt;
&lt;/section&gt;
</code></pre>

<h2>Global Identifiers</h2>

<blockquote>
  <p>Here, an item is talking about a particular book:</p>

<pre><code>&lt;dl itemscope
    itemtype="http://vocab.example.net/book"
    itemid="urn:isbn:0-330-34032-8"&gt;
 &lt;dt&gt;Title
 &lt;dd itemprop="title"&gt;The Reality Dysfunction
 &lt;dt&gt;Author
 &lt;dd itemprop="author"&gt;Peter F. Hamilton
 &lt;dt&gt;Publication date
 &lt;dd&gt;&lt;time itemprop="pubdate" datetime="1996-01-26"&gt;26 January 1996&lt;/time&gt;
&lt;/dl&gt;
</code></pre>
</blockquote>

<p>Here, the item has an identifier so unlike the previous examples, the subject of the statements in the RDF is no longer a blank node:</p>

<pre><code>@prefix xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt; .
&lt;urn:isbn:0-330-34032-8&gt;
  a &lt;http://vocab.example.net/book&gt; ;
  &lt;http://vocab.example.net/title&gt; "The Reality Dysfunction\n " ;
  &lt;http://vocab.example.net/author&gt; "Peter F. Hamilton\n " ;
  &lt;http://vocab.example.net/pubdate&gt; "1996-01-26"^^xsd:date ;
  .
</code></pre>

<p>In RDFa, the subject is provided using the <code>about</code> attribute:</p>

<pre><code>&lt;dl vocab="http://vocab.example.net/"
    typeof="book"
    about="urn:isbn:0-330-34032-8"&gt;
 &lt;dt&gt;Title
 &lt;dd property="title"&gt;The Reality Dysfunction
 &lt;dt&gt;Author
 &lt;dd property="author"&gt;Peter F. Hamilton
 &lt;dt&gt;Publication date
 &lt;dd&gt;&lt;time property="pubdate" content="1996-01-26" datatype="xsd:date" datetime="1996-01-26"&gt;26 January 1996&lt;/time&gt;
&lt;/dl&gt;
</code></pre>

<h2>Global Property Names</h2>

<blockquote>
  <p>Here, an item is an &#8220;http://example.org/animals#cat&#8221;, and most of the properties have names that are words defined in the context of that type. There are also a few additional properties whose names come from other vocabularies.</p>

<pre><code>&lt;section itemscope itemtype="http://example.org/animals#cat"&gt;
 &lt;h1 itemprop="name http://example.com/fn"&gt;Hedral&lt;/h1&gt;
 &lt;p itemprop="desc"&gt;Hedral is a male american domestic
 shorthair, with a fluffy &lt;span
 itemprop="http://example.com/color"&gt;black&lt;/span&gt; fur with &lt;span
 itemprop="http://example.com/color"&gt;white&lt;/span&gt; paws and belly.&lt;/p&gt;
 &lt;img itemprop="img" src="hedral.jpeg" alt="" title="Hedral, age 18 months"&gt;
&lt;/section&gt;
</code></pre>
</blockquote>

<p>The RDF equivalent to this is:</p>

<pre><code>[
  a &lt;http://example.org/animals#cat&gt; ;
  &lt;http://example.org/animals#name&gt; "Hedral" ;
  &lt;http://example.com/fn&gt; "Hedral" ;
  &lt;http://example.org/animals#desc&gt; "Hedral is a male american domestic shorthair, with a fluffy black fur with white paws and belly." ;
  &lt;http://example.com/color&gt; "black" , "white" ;
  &lt;http://example.org/animals#img&gt; &lt;hedral.jpeg&gt;
]
</code></pre>

<p>To create this, we need the RDFa:</p>

<pre><code>&lt;section vocab="http://example.org/animals#" typeof="cat"&gt;
 &lt;h1 property="name http://example.com/fn"&gt;Hedral&lt;/h1&gt;
 &lt;p property="desc"&gt;Hedral is a male american domestic
 shorthair, with a fluffy &lt;span
 property="http://example.com/color"&gt;black&lt;/span&gt; fur with &lt;span
 property="http://example.com/color"&gt;white&lt;/span&gt; paws and belly.&lt;/p&gt;
 &lt;span rel="img"&gt;&lt;img src="hedral.jpeg" alt="" title="Hedral, age 18 months"&gt;&lt;/span&gt;
&lt;/section&gt;
</code></pre>

<h2>Link Relations</h2>

<blockquote>
  <p>Here is an example of a page that uses the vEvent vocabulary to mark up an event:</p>

<pre><code>&lt;body itemscope itemtype="http://microformats.org/profile/hcalendar#vevent"&gt;
 ...
 &lt;h1 itemprop="summary"&gt;Bluesday Tuesday: Money Road&lt;/h1&gt;
 ...
 &lt;time itemprop="dtstart" datetime="2009-05-05T19:00:00Z"&gt;May 5th @ 7pm&lt;/time&gt;
 (until &lt;time itemprop="dtend" datetime="2009-05-05T21:00:00Z"&gt;9pm&lt;/time&gt;)
 ...
 &lt;a href="http://livebrum.co.uk/2009/05/05/bluesday-tuesday-money-road"
    rel="bookmark" itemprop="url"&gt;Link to this page&lt;/a&gt;
 ...
 &lt;p&gt;Location: &lt;span itemprop="location"&gt;The RoadHouse&lt;/span&gt;&lt;/p&gt;
 ...
 &lt;p&gt;&lt;input type=button value="Add to Calendar"
           onclick="location = getCalendar(this)"&gt;&lt;/p&gt;
 ...
 &lt;meta itemprop="description" content="via livebrum.co.uk"&gt;
&lt;/body&gt;
</code></pre>
</blockquote>

<p>This example is interesting because it contains, in the natural markup of the page, a <code>rel</code> attribute with the value <a href="http://www.w3.org/TR/html5/links.html#link-type-bookmark"><code>bookmark</code></a>, which is used for links that go to the page or section of the page within which the link is found. In this case, it&#8217;s the page. The RDF that should be generated from the page is:</p>

<pre><code>[
  a &lt;http://microformats.org/profile/hcalendar#vevent&gt; ;
  &lt;http://microformats.org/profile/hcalendar#summary&gt; "Bluesday Tuesday: Money Road" ;
  &lt;http://microformats.org/profile/hcalendar#dtstart&gt; "2009-05-05T19:00:00Z"^^xsd:dateTime ;
  &lt;http://microformats.org/profile/hcalendar#dtend&gt; "2009-05-05T21:00:00Z"^^xsd:dateTime ;
  &lt;http://microformats.org/profile/hcalendar#url&gt; &lt;http://livebrum.co.uk/2009/05/05/bluesday-tuesday-money-road&gt; ;
  &lt;http://microformats.org/profile/hcalendar#location&gt; "The RoadHouse" ;
  &lt;http://microformats.org/profile/hcalendar#description&gt; "via livebrum.co.uk"
] .
</code></pre>

<p>The following statement could legitimately be generated as well:</p>

<pre><code>&lt;&gt; 
  &lt;http://www.w3.org/1999/xhtml/vocab#bookmark&gt; &lt;http://livebrum.co.uk/2009/05/05/bluesday-tuesday-money-road&gt; ;
  .
</code></pre>

<p>but the item representing the event should definitely not have the <code>http://www.w3.org/1999/xhtml/vocab#bookmark</code> property.</p>

<p>Achieving this without significantly changing the HTML markup is problematic in RDFa because RDFa uses the <code>rel</code> attribute to provide properties for the resources that it describes within the page, overloading its standard use in HTML which is to describe properties of the page or sections within the page. The following involves the least amount of repetition:</p>

<pre><code>&lt;body vocab="http://microformats.org/profile/hcalendar#"&gt;
 &lt;div typeof="vevent"&gt;
  ...
  &lt;h1 property="summary"&gt;Bluesday Tuesday: Money Road&lt;/h1&gt;
  ...
  &lt;time property="dtstart" content="2009-05-05T19:00:00Z" datatype="xsd:dateTime" 
        datetime="2009-05-05T19:00:00Z"&gt;May 5th @ 7pm&lt;/time&gt;
  (until &lt;time property="dtend" content="2009-05-05T21:00:00Z" datatype="xsd:dateTime" 
               datetime="2009-05-05T21:00:00Z"&gt;9pm&lt;/time&gt;)
  ...
  &lt;a rel="url" href="http://livebrum.co.uk/2009/05/05/bluesday-tuesday-money-road"&gt;&lt;/a&gt;
  &lt;a about href="http://livebrum.co.uk/2009/05/05/bluesday-tuesday-money-road"
     rel="bookmark"&gt;Link to this page&lt;/a&gt;
  ...
  &lt;p&gt;Location: &lt;span property="location"&gt;The RoadHouse&lt;/span&gt;&lt;/p&gt;
  ...
  &lt;p&gt;&lt;input type=button value="Add to Calendar"
            onclick="location = getCalendar(this)"&gt;&lt;/p&gt;
  ...
  &lt;span property="description" content="via livebrum.co.uk"&gt;&lt;/span&gt;
 &lt;/div&gt;
&lt;/body&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>In the above, the <code>typeof</code> attribute has been moved onto a wrapper <code>&lt;div&gt;</code> that encompasses the entirety of the page because if it resides on the <code>&lt;body&gt;</code> element, it&#8217;s assumed to apply to the document itself rather than a blank node. An alternative mapping would use <code>about="_:event"</code> to create a blank node for the event.</li>
<li>There&#8217;s no way to avoid creating a statement for the <code>rel="bookmark"</code> link, so the best we can do is make sure that the statement is accurate, and relates the current document to the provided URI. Unfortunately, that means creating a separate element for the <code>url</code> property, repeating that URL within the page, and adding an empty <code>about</code> attribute; here I&#8217;ve used an empty <code>&lt;a&gt;</code> element to express the relationship; a <code>&lt;link&gt;</code> element would do the same job if it were allowed in flow content.</li>
<li>The <code>&lt;meta&gt;</code> element in the original has been mapped to an empty <code>&lt;span&gt;</code> element as it isn&#8217;t allowed in flow content without an <code>itemprop</code> attribute.</li>
</ul>
    ]]></content>
  </entry>
</feed>
