<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Jeni's Musings</title>
  <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog"/>
  <link rel="self" type="application/atom+xml" href="http://www.jenitennison.com/blog/atom/feed"/>
  <id>http://www.jenitennison.com/blog/atom/feed</id>
  <updated>2011-07-26T18:18:44+01:00</updated>
  <entry>
    <title>Using &quot;Punning&quot; to Answer httpRange-14</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/170" />
    <id>http://www.jenitennison.com/blog/node/170</id>
    <published>2012-05-11T21:11:43+01:00</published>
    <updated>2012-05-11T21:11:43+01:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="rdf" />
    <category term="rest" />
    <category term="tag" />
    <summary type="html"><![CDATA[<p>As part of the TAG&#8217;s work on httpRange-14, <a href="http://mumble.net/~jar/">Jonathan Rees</a> has assessed how a variety of <a href="http://www.w3.org/wiki/HTTPURIUseCases">use cases</a> could be met by various <a href="http://www.w3.org/wiki/TagIssue57Responses">proposals</a> put before the TAG. The results of the assessment are a <a href="http://www.w3.org/wiki/HTTPURIUseCaseMatrix">matrix</a> which shows that &#8220;punning&#8221; is the most promising method, unique in not failing on either <a href="http://www.w3.org/wiki/HTTPURIUseCases#J.29_Naive_linked_data_on_hosting_service">ease of use (use case J)</a> or <a href="http://www.w3.org/wiki/HTTPURIUseCases#M.29_HTTP_consistency">HTTP consistency (use case M)</a>.</p>

<p>In normal use, &#8220;punning&#8221; is about making jokes based around a word that has two meanings. In this context, &#8220;punning&#8221; is about using the same URI to mean two (or more) different things. It&#8217;s most commonly used as a term of art in <a href="http://techwiki.openstructs.org/index.php/Metamodeling_in_Domain_Ontologies">OWL</a> but normal people don&#8217;t need to worry particularly about that use. Here I&#8217;ll explore what that might actually mean as an approach to the httpRange-14 issue.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>As part of the TAG&#8217;s work on httpRange-14, <a href="http://mumble.net/~jar/">Jonathan Rees</a> has assessed how a variety of <a href="http://www.w3.org/wiki/HTTPURIUseCases">use cases</a> could be met by various <a href="http://www.w3.org/wiki/TagIssue57Responses">proposals</a> put before the TAG. The results of the assessment are a <a href="http://www.w3.org/wiki/HTTPURIUseCaseMatrix">matrix</a> which shows that &#8220;punning&#8221; is the most promising method, unique in not failing on either <a href="http://www.w3.org/wiki/HTTPURIUseCases#J.29_Naive_linked_data_on_hosting_service">ease of use (use case J)</a> or <a href="http://www.w3.org/wiki/HTTPURIUseCases#M.29_HTTP_consistency">HTTP consistency (use case M)</a>.</p>

<p>In normal use, &#8220;punning&#8221; is about making jokes based around a word that has two meanings. In this context, &#8220;punning&#8221; is about using the same URI to mean two (or more) different things. It&#8217;s most commonly used as a term of art in <a href="http://techwiki.openstructs.org/index.php/Metamodeling_in_Domain_Ontologies">OWL</a> but normal people don&#8217;t need to worry particularly about that use. Here I&#8217;ll explore what that might actually mean as an approach to the httpRange-14 issue.</p>

<!--break-->

<p><em>Note: The material here is a summary of what I think is the best way forward following various discussions within and outside the <a href="http://www.w3.org/2001/tag/">TAG</a>, in particular with Jonathan, Henry Thompson and TimBL. Not all these people agree with or endorse the approach described here, but neither do all the ideas in this post originate from me.</em></p>

<h2>Background</h2>

<p>Five things recently make me more convinced than ever that the TAG must either provide some direction to the community, and soon, or get out of the way.</p>

<ol>
<li><p>The <a href="https://www.w3.org/2012/ldp/charter">proposed Linked Data Platform Working Group charter</a> and the <a href="http://www.w3.org/Submission/ldbp/">Submission that is the main input to the group</a> specifically brings together linked data and REST, and the only mention of <code>303</code> redirections so far is to do with paging.</p></li>
<li><p>A recent thread on the <a href="http://lists.w3.org/Archives/Public/public-vocabs/2012Apr/">W3C public-vocabs mailing list</a>, raised the question of <a href="http://lists.w3.org/Archives/Public/public-vocabs/2012Apr/0041.html">whether to embed schema.org markup about the page itself within a given page, or only about the thing that the page is about</a>. I wonder how many pages are being described as <code>schema:WebPage</code> as well as things like <code>schema:Organisation</code>, and how people choose which class to use.</p></li>
<li><p>The initial version of Dan&#8217;s <a href="http://www.w3.org/wiki/WebSchemas/ExternalEnumerations">proposal for handling external enumerations within schema.org</a> talked about minting new URIs in the <code>ext.schema.org</code> domain specifically to proxy existing URIs so that they can be guaranteed to provide the right (<code>303</code>) HTTP response. I can see the reasoning (persuading people to use <code>303</code> redirections is difficult) but it would be frustrating if the end result were a centralisation of the URI space.</p></li>
<li><p>Talking with my colleague John Sheridan about updating the UK government&#8217;s guidance on <a href="http://www.cabinetoffice.gov.uk/sites/default/files/resources/designing-URI-sets-uk-public-sector.pdf">Designing URI Sets for the UK Public Sector</a>, I really don&#8217;t know what to advise. Should the guidance continue to be to use <code>303</code> redirections, when I know from experience that these can be <a href="http://lists.w3.org/Archives/Public/public-lod/2012Mar/0424.html">impractically slow</a>? Should it change to recommend using hash URIs to identify things?</p></li>
<li><p>The very first message on the <a href="http://www.w3.org/community/opentag/">Technical Architecture Community Group</a> was in part about <a href="http://lists.w3.org/Archives/Public/public-opentag/2012Apr/0000.html">how to identify people with URIs</a>.</p></li>
</ol>

<p>Of course the httpRange-14 issue has been running for so long now that I&#8217;d estimate that currently 80% of the discussion about it is meta-discussion about whether there it needs to be discussed, how much it should be discussed, how to raise the quality of the discussion, how anyone who discusses it is a time-wasting idiot and should just shut up and so on. It&#8217;s a terrible destructive cycle: the more this goes on, the higher the proportion of time being spent on the meta-discussion, and the longer any discussion takes.</p>

<p>But I believe that we can get to a point where we don&#8217;t have to discuss it any more (except to reminisce about what a waste of time it was), and I believe that the only way to get to that point is for the TAG to push through and provide a practical way forward.</p>

<h2>Terminology</h2>

<p>Let&#8217;s start off with some terminology. The basic scenario is a three-way interaction between three agents:</p>

<ul>
<li>a <strong>supplier</strong> who manages the information that is accessible at a URI; it&#8217;s worth noting that the supplier for a particular URI might change over time and what exactly is provided at a URI is controlled by multiple parties, as there may be many service providers involved in routing the resolution of a URI, others in constructing what&#8217;s shown on the page served from the origin web server, and still others who transform that content en route to the consumer</li>
<li>a <strong>third-party publisher</strong> who publishes some information about a URI; unlike the supplier, they generally have no control and incomplete knowledge about the information available at a particular URI, its stability over time or consistency across representations</li>
<li>a <strong>consumer</strong>, typically an application of some kind, who has discovered the information published by the supplier or third-party publisher and wants to do something with it</li>
</ul>

<p>It&#8217;s also useful to include terms for three things that are passed around or referenced during the interaction, which are defined with the <a href="http://tools.ietf.org/html/rfc3986">URI specification</a> and <a href="http://tools.ietf.org/wg/httpbis/">HTTPbis</a>. (The <a href="http://tools.ietf.org/html/rfc2616">current HTTP specification, RFC 2616</a> is also of interest, of course, but HTTPbis is a better reflection of current practical use of HTTP, and is close to complete, at which point it will replace RFC 2616.)</p>

<ul>
<li>a <strong>URI</strong> which is a string of characters matching the syntax in the URI specification that <strong>identifies</strong> a resource; we only care about <code>http:</code> URIs here, although similar considerations may apply to URIs that use other schemes</li>
<li>a <strong>resource</strong> which is identified by the URI; the debate over whether HTTP constrains the nature of the resource is at the heart of some discussions around httpRange-14; here, as in the URI specification and HTTPbis, a resource could be anything</li>
<li><strong>representations</strong> which are media-typed sequences of bytes (often characters) encoded within the response to an HTTP <code>GET</code> request on a given URI; per HTTPbis, the response to a <code>GET</code> request contains a representation of the current state of the resource identified by the URI</li>
</ul>

<h2>Content and Meaning</h2>

<p>Now I will introduce a few new terms that aren&#8217;t used in the URI specification of HTTPbis but which are useful for discussion.</p>

<ul>
<li>The <strong>content</strong> located by a <code>http:</code> URI is whatever core information a consumer interacts with through the HTTP interface provided by the server for that URI. This is the information that is common across all the representations that are returned through a <code>GET</code> on a given URI (through content-negotiated variants). We can say that the <code>http:</code> URI <strong>locates</strong> the content of the resource that it identifies, because you can get hold of the content of a resource by performing an HTTP <code>GET</code> request.</li>
<li>The <strong>sense</strong> referred to by a <code>http:</code> URI is a social construct that arises from the properties associated with the URI by publishers and the way that these invoke action in consumers. We can say that a <code>http:</code> URI <strong>refers to</strong> a sense. While you can <code>GET</code> content from a URI, determining its sense can only be achieved by examining the way in which the URI is used within data published on the web. The sense referred to by a URI might vary in different contexts, but equally a single sense may emerge in the use of the URI. One sense referred to by a URI may be its content, and for some types of information that may be the only sense referred to by the URI by anyone.</li>
</ul>

<p>Here is a diagram that shows how these different terms hang together.</p>

<p style="text-align: center;">
<img src="/blog/files/punning.png" />
</p>

<p>For example, take the URI <code>http://www.amazon.com/gp/product/B004TRXX7C</code>. The <em>content</em> located by this URI is the core information in the web pages we <code>GET</code> from the URI. The <em>sense</em> referred to by the URI could the same as the <em>content</em>, or it could be the novel Moby Dick, or the particular Kindle edition of the book. We can&#8217;t tell from any interaction at the level of the HTTP protocol what the <em>sense</em> of the resource is: that information has to come from the application level.</p>

<p>Like the meaning of a word, the <em>sense</em> that a URI refers to is a social understanding which emerges from use of the URI across the web, and a given URI may be used to refer to different <em>senses</em> in different sources of information or over time. Consumers interpret the information that uses a URI and is made available to them on the web in order to draw conclusions and perform a task. Different consumers will have different levels of trust in the particular interpretation of the URI that a given publisher provides; in particular, the information published by the supplier of the URI might be given a higher weight than that from third-party publishers. Tools like <a href="http://sig.ma/">sig.ma</a> illustrate how information can be combined from multiple sources with different weights, by associating metadata about the location of the data with the data itself; unpicking commonalities between groups of sources may help to work out the different <em>senses</em> referred to by these different sources.</p>

<p>The <em>content</em> located by a URI is more concrete, and is important because certain classes of application may infer something about what they can do with the <em>content</em> found at a given URI based on information published about the URI. The canonical example of this is a consumer that searches the web for public-domain pages, based on information published about the licensing of those pages, and displays a portion of one each day within a feed. This application can&#8217;t work properly if it doesn&#8217;t know what actual content is public domain. In the Amazon example above, if there is a statement saying <code>http://www.amazon.com/gp/product/B004TRXX7C</code> is public domain (referring to the novel &#8220;Moby Dick&#8221;, which is one possible <em>sense</em> referred by the URI, one on which the copyright has expired), a consumer that assumes the URI is being used to locate the <em>content</em> at that URI will assume that the representation retrieved through a <code>GET</code> on that URI is public domain. The consumer might pick out the first major paragraph from the HTML page for display, but that first paragraph is actually an editorial review that is marked with a separate copyright which therefore shouldn&#8217;t be displayed in a feed of public-domain content.</p>

<p>Of course interpreting assertions made by publishers about particular <em>content</em> can be just as complex as interpreting statements about a particular <em>sense</em> of a URI, especially when those assertions come from a third party. The <em>content</em> located by a URI may change over time, and potentially in dramatic ways if the domain name of the URI changes hands, so any statements that a consumer discovers about <em>content</em> needs to be assessed by a consumer in that context. That said, while the validity or truthfulness of a given piece of information about <em>content</em> may be variable, the bytes that are located through a URI by a consumer are tangible and discoverable in a way that the <em>sense</em> referred to by a URI can never be.</p>

<p>The core disagreements around httpRange-14 arise from whether you view the <em>content</em> located by a hash-less <code>http:</code> URI which provides a successful (<code>2XX</code>) response to be the only valid <em>sense</em> referred to by that URI, or whether you think they could be different things, and if you think they can be different things then which of those you think the URI identifies when it is used.</p>

<h2>Current State</h2>

<p>The URI and HTTP specifications talk only about URIs identifying resources and being able to make requests using URIs to get a representation of the resource, they do not talk about <em>senses</em> or <em>content</em>.</p>

<p>The httpRange-14 decision is based on a design where if you can successfully <code>GET</code> a representation using a URI (ie you receive a <code>200 OK</code> response) then the <em>sense</em> referred to by the URI is the <em>content</em> located by that URI. This is pictured in the diagram below.</p>

<p style="text-align: center;">
<img src="/blog/files/equal-content-sense.png" />
</p>

<p>Sometimes, a server simply doesn&#8217;t store the <em>content</em> for something that it wishes to provide information about (for example, the Amazon website doesn&#8217;t store the content of Moby Dick), and sometimes the <em>sense</em> that a publisher wishes to confer on a URI is such that no <em>content</em> can be transmitted over the wire, such as a Person. In these cases, under this design, the server cannot give a <code>200 OK</code> response because it does not have the <em>content</em> for the URI. There are then two patterns a publisher can use to assign a <code>http:</code> URI to something in these cases.</p>

<p>One pattern is to use a <code>303 See Other</code> redirection to a URI whose <em>content</em> (which is the only <em>sense</em> of the URI in this design, remember) <strong>describes</strong> the original URI. This is pictured below.</p>

<p style="text-align: center;">
<img src="/blog/files/punning-303.png" />
</p>

<p>The second pattern is to use a hash URI. This gives a very similar pattern, as you can see from the diagram below; the only difference is that instead of following a <code>303 See Other</code> redirection to get from one URI to the other, you can use URI parsing: you chop off the fragment part of the URI and perform a <code>GET</code> on the resulting URI to get a description. Often this leads to several hash URIs being described by the same <em>content</em>, as illustrated here:</p>

<p style="text-align: center;">
<img src="/blog/files/punning-hash-uris.png" />
</p>

<p>However, hash URIs are also used to identify fragments within pages, which are bits of <em>content</em>. For hash URIs that identify fragments of a page, the picture looks more like this:</p>

<p style="text-align: center;">
<img src="/blog/files/punning-fragment.png" />
</p>

<p>A consumer can&#8217;t tell just from looking at a hash URI whether it identifies a fragment of content or is being used to refer to something described by the content located by its base URI: the consumer has to make a request and understand the fragment identifier as it applies to the media type of the representation that it gets back. Also, if there&#8217;s any content negotiation going on, even if the fragment identifier doesn&#8217;t make sense for the media type, or doesn&#8217;t locate a fragment within the representation that the consumer retrieves, it might still locate a fragment of content within a different representation of the resource than that the consumer has retrieved.</p>

<p>In any case, common usage of hash-less <code>http:</code> URIs differs from this model of <em>content</em> and <em>sense</em> being one and the same for all URIs that give a successful response. Often URIs are used in data formats such as JSON, XML, RDF or HTML where the <em>content</em> and the <em>sense</em> of the URI are different things. For example, on the Flickr the <em>content</em> located through the URI <code>http://www.flickr.com/photos/45701084@N08/7051652969/</code> is a landing page that provides a bunch of information about an image, but the data within the page includes a statement about its license:</p>

<pre><code>&lt;http://www.flickr.com/photos/45701084@N08/7051652969/&gt;
  cc:license &lt;http://creativecommons.org/licenses/by/2.0/deed.en&gt; ;
  .
</code></pre>

<p>in which that same URI refers to the photograph itself: the <em>sense</em> referred to by the URI. The only way that you can tell this is by being a human: reading the page and the context in which the text describing the license is used.</p>

<p>This mismatch between the design specified by RFC 2616 and the httpRange-14 decision, and practice on the web today results in arguments back and fro with people saying, in essence, that the resource a URI identifies is the <em>sense</em> conferred by the URI&#8217;s supplier, or that a URI should always be taken as identifying the <em>content</em> of the resource, and then discussions about how to signal to an application that in particular cases the supplier really does mean the URI to identify some <em>content</em> or really does mean the URI to identify a particular <em>sense</em>, and if so which <em>sense</em> is being referred to.</p>

<h2>Punning</h2>

<p>&#8220;Punning&#8221; approaches attempt to cut through these disagreements by saying that the context in which the URI is used determines whether it is locating <em>content</em> or referring to a <em>sense</em>.</p>

<p>If we look at some <a href="http://ogp.me/">Open Graph Protocol (OGP)</a> statements on <code>http://www.imdb.com/title/tt1334573/</code>, we see:</p>

<pre><code>&lt;meta property="og:url" content="http://www.imdb.com/title/tt1334573/" /&gt;
&lt;meta property="og:title" content="Moby Dick (TV Series 2010)"/&gt;
&lt;meta property="og:type" content="video.tv_show"/&gt;
&lt;meta property="og:image" content="http://i.media-imdb.com/images/SFc0774313bf9ccbfe22050c8bb4029e41/imdb-share-logo.gif"/&gt;
&lt;meta property="og:site_name" content="IMDb"/&gt;
&lt;meta property="fb:app_id" content="115109575169727"/&gt;
</code></pre>

<p>Some of these properties &#8212; the url, title, type and image &#8212; are about the Moby Dick TV Series &#8212; the <em>sense</em> referred to by the URI <code>http://www.imdb.com/title/tt1334573/</code>. Others &#8212; the site name and Facebook application id &#8212; are about the <em>content</em> located by the URI. The properties that are provided by this data are all related to the same URI, but they aren&#8217;t all properties of the same thing. In natural language we might say:</p>

<ul>
<li>a URL of the thing described by the page is <code>http://www.imdb.com/title/tt1334573/</code></li>
<li>a title of the thing described by the page is &#8220;Moby Dick (TV Series 2010)&#8221;</li>
<li>a type of the thing described by the page is &#8220;video.tv_show&#8221;</li>
<li>an image of the thing described by the page is the content at <code>http://i.media-imdb.com/images/SFc0774313bf9ccbfe22050c8bb4029e41/imdb-share-logo.gif</code></li>
<li>the site of the page is IMDb</li>
<li>the Facebook application to use with the page has the identifier 115109575169727</li>
</ul>

<p>The property itself determines whether it applies to the <em>content</em> located by the URI (the page) or a <em>sense</em> referred to by the URI (in this case, the thing the page describes). Here&#8217;s a diagram that shows the distinction:</p>

<p style="text-align: center;">
<img src="/blog/files/punning-moby-dick.png" />
</p>

<h3>Defining URI Usage</h3>

<p>The way in which a URI is interpreted &#8212; as referring to a <em>sense</em> or locating <em>content</em> &#8212; is dependent on where it is used. In XML, for example, an <code>xmlns:*</code> attribute contains a URI; when this is a hash-less <code>http:</code> URI, this refers to an XML namespace (a <em>sense</em> of that URI): it doesn&#8217;t matter what <em>content</em> you <code>GET</code> from dereferencing the URI, or even if it can be dereferenced at all. On the other hand, the <code>href</code> attribute on a <code>xi:include</code> element defined by <a href="http://www.w3.org/TR/xinclude/">XInclude</a> is used to locate some <em>content</em> to be included within the referring XML.</p>

<p>It is really up to the format in which data is encoded to determine how the URI should be interpreted: as locating some <em>content</em> or referring to a <em>sense</em>. As with interpreting any information with which it&#8217;s presented, an application that needs to work out which is meant might use:</p>

<ul>
<li>built-in knowledge (eg an application might know that the <code>og:title</code> property is always about the <em>sense</em> referred to by the subject URI, based on documentation about the property [this is essentially the same as if the information were embedded within a schema, but without the implication that every application must download and interpret a schema every time it happens across a property])</li>
<li>information encoded within a schema (eg a schema might classify the <code>og:title</code> property as a <code>PropertyWhoseSubjectIsTheSubstanceOfAResource</code>)</li>
<li>a default for the format of the data (eg given OGP uses RDFa, RDF could specify that by default URIs refer to a <em>sense</em>, and therefore barring other information to the contrary, properties cannot be assumed to be about the page itself)</li>
<li>a default for the web (eg we might say that barring overriding information, all hash-less <code>http:</code> URIs are assumed to locate <em>content</em>, as this is consistent with the current definition of HTTP in RFC 2616)</li>
</ul>

<p>Thus an implication of this approach is that the people who define languages and vocabularies must specify what aspect of a resource a URI used in a particular way identifies. There are four possibilities for a given URI:</p>

<ol>
<li>the URI is being used to locate some <em>content</em></li>
<li>the URI is being used to refer to a <em>sense</em></li>
<li>the URI is being used to identify either <em>content</em> or <em>sense</em> but it&#8217;s not specified which</li>
<li>the URI is being used to both locate <em>content</em> and refer to a <em>sense</em> (ie a property applies equally to both)</li>
</ol>

<h3>Equality</h3>

<p>Now let&#8217;s consider what happens when there is more information available about something, but it uses a different URI. The page <code>https://en.wikipedia.org/wiki/Moby_Dick_(miniseries)</code> is about the same TV series as <code>http://www.imdb.com/title/tt1334573/</code>. Imagine that this similarly made available the information that it held using OGP. It might contain:</p>

<pre><code>&lt;meta property="og:url" content="https://en.wikipedia.org/wiki/Moby_Dick_(miniseries)" /&gt;
&lt;meta property="og:title" content="Moby Dick (miniseries)"/&gt;
&lt;meta property="og:type" content="video.tv_show"/&gt;
&lt;meta property="og:site_name" content="Wikipedia"/&gt;
</code></pre>

<p>The two pages describe the same thing: the <em>sense</em> referred to by the two URIs is the same. However, the <em>content</em> of the two pages is different. If you simply smushed the properties together, ignoring the fact that some properties apply to the <em>content</em> and others the <em>sense</em> of the resource, you&#8217;d get some data that wasn&#8217;t quite right:</p>

<pre><code>{
  url: [
    'http://www.imdb.com/title/tt1334573/',
    'https://en.wikipedia.org/wiki/Moby_Dick_(miniseries)'
  ],
  title: [
    'Moby Dick (TV Series 2010)',
    'Moby Dick (miniseries)'
  ],
  type: [
    'video.tv_show'
  ],
  image: [
    'http://i.media-imdb.com/images/SFc0774313bf9ccbfe22050c8bb4029e41/imdb-share-logo.gif'
  ],
  site_name: [
    'IMDb',
    'Wikipedia'
  ],
  app_id: [
    '115109575169727'
  ]
}
</code></pre>

<p>Having two URLs, two titles and so on is fine, but having two site names doesn&#8217;t make sense: the <code>og:site_name</code> property is related to the <em>content</em> located by the URI, and the <em>content</em> is different for the two URIs. This is illustrated below.</p>

<p style="text-align: center;">
<img src="/blog/files/punning-equality.png" />
</p>

<p>Conversely, imagine a situation in which there is a single document on a web server that is served up from the two URIs</p>

<pre><code>http://example.org/gender/male
http://example.org/gender/female
</code></pre>

<p>In this case, the <em>content</em> located by the two URIs is exactly the same, but the <em>sense</em> referred to by the two URIs is different: one refers to the gender &#8216;male&#8217; and the other to the gender &#8216;female&#8217;, as illustrated here.</p>

<p style="text-align: center;">
<img src="/blog/files/punning-equality2.png" />
</p>

<p>So there are three types of equality that we have to be concerned with:</p>

<ul>
<li>equality between the desired <em>sense</em> and <em>content</em> of a single URI, as described earlier</li>
<li>equality between the <em>senses</em> referred to by different URIs</li>
<li>equality between the <em>content</em> located by different URIs</li>
</ul>

<p>In general equality between the resources identified between two URIs is a controversial thing to assert, because different contexts may refer to different <em>senses</em> of a particular URI, some of which may be equal with a <em>sense</em> referred to by another URI and some not. Like any statement made about URIs, the source of statements about equality must be considered.</p>

<p>Formats that wish to make assertions about equality between resources should provide ways of saying that the <em>sense</em> referred to by two URIs is the same, without implying that the <em>content</em> located at those two URIs are the same, and vice versa, and to assert that the <em>sense</em> and <em>content</em> of a URI are equal. What these properties are &#8212; how exactly these kinds of equality are asserted in a given format &#8212; is up to the format, but it&#8217;s important that the properties are kept distinct to enable people to articulate the full range of equality relationships between resources.</p>

<h2>Implications for Linked Data</h2>

<p>I have tried to keep the description above neutral in terms of technology choice, because I believe that the issue of how to interpret URIs within data is common across all languages that use URIs. However, as I&#8217;ve discussed previously, linked data is particularly affected by these issues both because URIs form a central part of the way it works as a data format and because culturally the community tries very hard to adhere to &#8220;good web architectural practice&#8221; in the hope that this will confer long-term benefits.</p>

<p>For that reason, I&#8217;ll look at what I think the impacts are on linked data practice of using the &#8220;punning&#8221; approach that I&#8217;ve described above.</p>

<h3>RDF</h3>

<p>The definition of RDF is currently in flux, as <a href="http://www.w3.org/TR/rdf11-concepts/">RDF 1.1</a> is developed, so now is a good time to consider its use of URIs.</p>

<p>RDF itself is not particularly concerned with what URIs identify: it is simply a model that can be used to associate properties between &#8220;resources&#8221;, where in the RDF context this term means anything that can be the subject or object of an RDF statement, including literals. (RDF&#8217;s use of the term &#8220;resource&#8221; is not the same as that used in the URI specification or HTTPbis.) The only real limitation in <a href="http://www.w3.org/TR/rdf-concepts/">RDF 1.0 Concepts</a>, is that a <a href="http://www.w3.org/TR/rdf-concepts/#section-fragID">hash URI identifies something described by the RDF/XML representation retrieved when the URI is resolved</a>. In the current Editor&#8217;s Draft of RDF 1.1, <a href="http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#section-fragID">the same section</a> is less specific about what such a fragment might denote. So if the language and concepts above were adopted, RDF 1.1 should be more careful in its use of terminology, and attempt to be consistent with the URI and HTTP specifications, but I don&#8217;t think anything fundamental needs to change in the core semantics of RDF.</p>

<h3>Vocabulary Designers</h3>

<p>Under the &#8220;punning&#8221; approach, the property used within an RDF statement determines how the URI given as its subject or object should be interpreted. A consumer that discovers a URI by looking at the properties associated with it needs to be able to tell from the properties themselves whether it can associate those properties to particular <em>content</em> that it locates by requesting the URI or not.</p>

<p>Some properties have a defined domain or range that precludes the property from being used to annotate <em>content</em>. For example, the <code>foaf:nick</code> property has a domain of <code>foaf:Person</code>, and a <code>foaf:Person</code> cannot be a web page. Given this domain, an application can tell that the URI <code>http://www.jenitennison.com/</code> used in a statement such as:</p>

<pre><code>&lt;http://www.jenitennison.com/&gt;
  foaf:nick "JeniT" ;
  .
</code></pre>

<p>cannot be being used to locate the <em>content</em> of <code>http://www.jenitennison.com/</code>, even if <code>http://www.jenitennison.com/</code> responds with a <code>200 OK</code> response.</p>

<p>Note that this inference doesn&#8217;t work the other way around. The property <code>cc:license</code> has a domain of a <code>cc:Work</code> but without additional information about the property an application could not infer that in a statement such as</p>

<pre><code>&lt;http://www.amazon.com/gp/product/B004TRXX7C&gt;
  cc:license &lt;http://creativecommons.org/publicdomain/mark/1.0/&gt; ;
  .
</code></pre>

<p>the URI <code>http://www.amazon.com/gp/product/B004TRXX7C</code> was being used to locate the <em>content</em> of <code>http://www.amazon.com/gp/product/B004TRXX7C</code>: it could equally be being used to refer to some <em>sense</em> of the resource (for example the novel Moby Dick).</p>

<p>To support &#8220;punning&#8221;, therefore, RDF vocabulary designers would need to have additional properties that could be applied to RDF Properties to indicate how their subject (and object where applicable) should be interpreted. For example, the Creative Commons vocabulary might include (warning: made up property names and instances):</p>

<pre><code>cc:license
  rdfs:subjectUri rdf:sense ;
  rdfs:objectUri rdf:content ;
  .
</code></pre>

<p>with the implication that URIs used as the subject of <code>cc:license</code> should be understood as referring to the <em>sense</em> of the URI, while those used as the object of <code>cc:license</code> should be understood as referring to the <em>content</em> retrieved from the URI.</p>

<p>Even if properties like <code>rdfs:subjectUri</code> or <code>rdfs:objectUri</code> are defined, there are going to be RDF properties for which the interpretation of subject and/or object URIs isn&#8217;t specified, and thus consumers of RDF content need to have a default interpretation. What that should be is, I think, a matter for the RDF community to decide.</p>

<h3>Inference</h3>

<p>The major difficulties with the &#8220;punning&#8221; approach and the current use of RDF comes when reasoning is used across RDF statements in which the same URI is used in different ways, particularly with properties where the interpretation of the subject and/or object isn&#8217;t specified.</p>

<p>For example, if a consumer finds the following triples at <code>http://www.imdb.com/title/tt1334573/</code>:</p>

<pre><code>&lt;http://www.imdb.com/title/tt1334573/&gt;
  og:url "http://www.imdb.com/title/tt1334573/" ;
  og:title "Moby Dick (TV Series 2010)" ;
  og:type "video.tv_show" ;
  og:image "http://i.media-imdb.com/images/SFc0774313bf9ccbfe22050c8bb4029e41/imdb-share-logo.gif" ;
  og:site_name "IMDb" ;
  fb:app_id "115109575169727" ;
  .
</code></pre>

<p>and the following triples at <code>https://en.wikipedia.org/wiki/Moby_Dick_(miniseries)</code>:</p>

<pre><code>&lt;https://en.wikipedia.org/wiki/Moby_Dick_(miniseries)&gt;
  og:url "https://en.wikipedia.org/wiki/Moby_Dick_(miniseries)" ;
  og:title "Moby Dick (miniseries)" ;
  og:type "video.tv_show" ;
  og:site_name "Wikipedia" ;
  .
</code></pre>

<p>and then the assertion:</p>

<pre><code>&lt;http://www.imdb.com/title/tt1334573/&gt;
  owl:sameAs &lt;https://en.wikipedia.org/wiki/Moby_Dick_(miniseries)&gt; ;
  .
</code></pre>

<p>then the result of inference will be that all the statements made about <code>http://www.imdb.com/title/tt1334573/</code> apply equally to <code>https://en.wikipedia.org/wiki/Moby_Dick_(miniseries)</code>, which is not the case.</p>

<p>To enable publishers to make assertions about equality of <em>sense</em> and equality of <em>content</em> separately, we will need new relationships. For example:</p>

<pre><code>&lt;http://www.imdb.com/title/tt1334573/&gt;
  owl:sameSenseAs &lt;https://en.wikipedia.org/wiki/Moby_Dick_(miniseries)&gt; ;
  .
</code></pre>

<p>would only infer that those properties whose subject is the <em>sense</em> of <code>http://www.imdb.com/title/tt1334573/</code> apply equally to the <em>sense</em> of <code>https://en.wikipedia.org/wiki/Moby_Dick_(miniseries)</code>. A <code>owl:sameContentAs</code> property could similarly assert equality between the <em>content</em> of two URIs.</p>

<p>The impact is not limited to reasoning with <code>owl:sameAs</code>: all inference in <a href="http://www.w3.org/TR/rdf-schema/">RDFS</a> and <a href="http://www.w3.org/TR/owl2-overview/">OWL</a> is based on the assumption that a single URI identifies a single entity. This works in situations where the RDF over which inferences are being made is all trusted (for example if it is all made available by the same publisher), and a lot of current use of OWL is precisely in these kinds of closed environments. The same inferences can be made even with information gleaned from the web at large, if that information is selected carefully.</p>

<p>Another approach, where publishers have mixed properties about the <em>sense</em> referred to by a URI and those about the <em>content</em> located by a URI is to pre-process those RDF statements to create separate (blank node) RDF resources. For example, if <code>og:url</code>, <code>og:title</code>, <code>og:type</code> and <code>og:image</code> are defined to have a subject that refers to the <em>sense</em> of the URI, and <code>og:site_name</code> and <code>fb:app_id</code> to have a subject that locates the <em>content</em> of the URI, the statements about <code>http://www.imdb.com/title/tt1334573/</code> above could be translated into:</p>

<pre><code>_:imdbSubstance
  rdf:senseUri "http://www.imdb.com/title/tt1334573/" ;
  og:url "http://www.imdb.com/title/tt1334573/" ;
  og:title "Moby Dick (TV Series 2010)" ;
  og:type "video.tv_show" ;
  og:image "http://i.media-imdb.com/images/SFc0774313bf9ccbfe22050c8bb4029e41/imdb-share-logo.gif" ;
  .

_:imdbContent
  rdf:contentUri "http://www.imdb.com/title/tt1334573/" ;
  og:site_name "IMDb" ;
  fb:app_id "115109575169727" ;
  .
</code></pre>

<p>Here, the (putative) <code>rdf:senseUri</code> property is an inverse functional property that provides the URI for which the individual is a <em>sense</em>, and the <code>rdf:contentUri</code> property is an inverse functional properties that provides the URI for which the individual is the <em>content</em>.</p>

<p>This separation would then allow existing inference to take place on the separate entities.</p>

<h3>Publishers</h3>

<p>There are many advantages offered by the &#8220;punning&#8221; approach for linked data publishers:</p>

<ul>
<li>it supplies an easy on-ramp for suppliers who want to annotate their pages with HTML data such as RDFa and microdata: suppliers can use URIs that they already support to refer to things other than documents, if they choose to, which means all they need to do is add metadata to their pages (as they are currently using OGP and schema.org)</li>
<li>suppliers do not have to have access to server configuration in order to promote the use of particular URIs to mean things that do not have <em>content</em> (such as people or organisations)</li>
<li>publishers can copy and paste URIs from the location bar of their browsers (a familiar activity for people who wish to provide a pointer to something) rather than inspecting pages for a recommended URI to be used to refer to a particular <em>sense</em></li>
<li>organisations such as schema.org can easily recommend the reuse of URIs published by other people, such as Wikipedia, without requiring those publishers to alter their server configuration or requiring developers that use schema.org markup to add fragment identifiers to their URIs</li>
<li>explicit <code>describedby</code> and <code>describes</code> links can be made between URIs rather than using an HTTP status code where necessary; these can be incorporated directly in data and do not require a network connection to be discovered</li>
</ul>

<h3>Provenance</h3>

<p>The &#8220;punning&#8221; approach that I&#8217;ve described here has as its core the recognition that different consumers will trust different sources of information to different levels. Knowledge of the provenance of a particular source of information is one way in which consumers can work out what to trust and how to resolve conflicts sources.</p>

<p>The work of the <a href="http://www.w3.org/2011/prov">Provenance Working Group</a> is important here both in identifying the provenance of particular <em>content</em> located at a given URI and in providing a vocabulary for describing the processing that a consumer performs to retrieve and process that <em>content</em> in order to extract data from it (for example, the time of the retrieval and the HTTP headers used may lead to the consumer receiving different content; the particular version of software used may lead to different information being gleaned from that content).</p>

<h3>Linked Data Platform</h3>

<p>The particular issues around what URIs actually identify within RDF only become an issue when the URIs are resolved &#8212; when RDF is used within linked data. The new <a href="http://www.w3.org/2012/ldp/">Linked Data Platform Working Group</a> is a great opportunity to standardise around these practices, in collaboration with the other relevant working groups.</p>

<h2>Final Thoughts</h2>

<p>People use the terms &#8220;resource&#8221;, &#8220;identifies&#8221; and &#8220;representation&#8221; both within specifications and in common parlance as if there is a shared understanding of what they mean, when in fact different people use the terms in subtly but meaningfully different ways. This would be fine, except that the different understandings lead to different assumptions and engineering decisions, and friction for developers trying to build applications that publish and consume data whose assumptions differ.</p>

<p>We need to find a way forward that, even if not everyone&#8217;s ideal, is realistic, explicable and palatable. The &#8220;punning&#8221; approach that I&#8217;ve described above might not be it, but the analysis that Jonathan&#8217;s done of the various proposals and use cases suggests to me that it&#8217;s the closest we have. The main questions I have are:</p>

<ul>
<li>what use cases cannot be satisfied using this approach?</li>
<li>what specifications would have to change if this approach was adopted, and would it be realistic to make those changes?</li>
<li>what existing applications would break if this approach was adopted, and how might that breakage be mitigated?</li>
</ul>

<p>At the very least, I hope that the vocabulary I&#8217;ve laid out in this post might be helpful in further discussions.</p>

<p>Of course any other comments are most welcome.</p>
    ]]></content>
  </entry>
  <entry>
    <title>UK Open Standards Consultation</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/169" />
    <id>http://www.jenitennison.com/blog/node/169</id>
    <published>2012-04-14T23:44:51+01:00</published>
    <updated>2012-04-14T23:44:51+01:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="opendata" />
    <summary type="html"><![CDATA[<p>Over the last few months, the UK Government has been running a <a href="http://consultation.cabinetoffice.gov.uk/openstandards/">consultation on its Open Standards policy</a>. The outcome of this consultation is incredibly important not only for organisations and individuals who want to work with government but also because of its potential knock-on effects on the publication of Open Data and the use of Open Source software within public sector organisations.</p>

<p>Unsurprisingly, Microsoft, Qualcomm and other organisations who have a vested interest in keeping the UK Government locked in to their products are <a href="http://www.computerweekly.com/blogs/public-sector/2012/04/proprietary-lobby-triumphs-in.html">responding vociferously to the consultation</a>. They risk not only losing business to smaller enterprises within the UK but also, if the policy is successfully adopted here, in other countries in Europe and internationally that follow suit.</p>

<p>If we want our Government to be Open &#8212; to use Open Standards, to publish Open Data, to adopt Open Source &#8212; then we must respond to this consultation in numbers.</p>

<p>There are three things that you can do:</p>

<ol>
<li><strong>Respond to the consultation</strong> &#8212; made even easier by <a href="http://open.squarecows.com/">this response form</a> developed by Ric Harvey</li>
<li><strong>Attend the <a href="http://consultation.cabinetoffice.gov.uk/openstandards/events/">events</a></strong> &#8212; these seem pretty full now, but try to get in if you can</li>
<li><strong>Spread the message</strong> &#8212; blog and tweet and write to raise awareness of the importance and impact that this consultation could have</li>
</ol>
    ]]></summary>
    <content type="html"><![CDATA[<p>Over the last few months, the UK Government has been running a <a href="http://consultation.cabinetoffice.gov.uk/openstandards/">consultation on its Open Standards policy</a>. The outcome of this consultation is incredibly important not only for organisations and individuals who want to work with government but also because of its potential knock-on effects on the publication of Open Data and the use of Open Source software within public sector organisations.</p>

<p>Unsurprisingly, Microsoft, Qualcomm and other organisations who have a vested interest in keeping the UK Government locked in to their products are <a href="http://www.computerweekly.com/blogs/public-sector/2012/04/proprietary-lobby-triumphs-in.html">responding vociferously to the consultation</a>. They risk not only losing business to smaller enterprises within the UK but also, if the policy is successfully adopted here, in other countries in Europe and internationally that follow suit.</p>

<p>If we want our Government to be Open &#8212; to use Open Standards, to publish Open Data, to adopt Open Source &#8212; then we must respond to this consultation in numbers.</p>

<p>There are three things that you can do:</p>

<ol>
<li><strong>Respond to the consultation</strong> &#8212; made even easier by <a href="http://open.squarecows.com/">this response form</a> developed by Ric Harvey</li>
<li><strong>Attend the <a href="http://consultation.cabinetoffice.gov.uk/openstandards/events/">events</a></strong> &#8212; these seem pretty full now, but try to get in if you can</li>
<li><strong>Spread the message</strong> &#8212; blog and tweet and write to raise awareness of the importance and impact that this consultation could have</li>
</ol>

<!--break-->

<p>The consultation is quite long and there are a lot of questions to answer. In the hope of making this easier for everyone, I&#8217;m publishing my response below. Please consider these responses public domain, and feel free to copy as much or as little as you like from them (though I recommend you omit the parts that are about my individual experience and substitute them with your own).</p>

<p>For extra background, read:</p>

<ul>
<li><a href="http://blogs.computerworlduk.com/open-enterprise/2012/04/of-microsoft-netscape-patents-and-open-standards/index.htm">Of Microsoft, Netscape, Patents and Open Standards</a> by Glyn Moody</li>
<li><a href="http://digital.cabinetoffice.gov.uk/2012/04/12/are-open-standards-a-closed-barrier/">Are open standards a closed barrier?</a> by Linda Humphries</li>
<li><a href="http://dev.squarecows.com/2012/04/10/open-standards-at-risk/">Open Standards at risk</a> by Ric Harvey</li>
</ul>

<h2>Criteria for Open Standards</h2>

<h3>1. How does this definition of open standard compare to your view of what makes a standard &#8216;open&#8217;?</h3>

<p>The definition in the consultation closely matches my view of what makes a standard open. The important factors are:</p>

<ul>
<li>a documented, open process which enables participation not just from implementers but also from users of the standard, and that provides ongoing maintenance and development of the standard</li>
<li>publication of the standard such that anyone can read it</li>
<li>a royalty-free and non-discriminatory license such that anyone can implement the standard without cost</li>
</ul>

<p>The one factor that does not match my view is the availability of multiple independent implementations of the standard. In some cases it may be that market pressures mean there are not currently multiple good implementations of an otherwise Open Standard, or several but only for one particular platform. Limiting the definition of Open Standards to only those with multiple cross-platform implementations is probably too constraining.</p>

<p>There are two examples from my work that demonstrate why the availability of multiple implementations should not be a factor.</p>

<p>First, one of the Open Standards that I use is XSLT, which for years has been dominated by a single implementation &#8212; Saxon &#8212; giving customers no real choice. Nevertheless, because it is an Open Standard, Saxon has had a lot of pressure to be completely conformant with that standard, and in the past year a number of other implementations have been started that can compete with it on different platforms, so the presence of a single implementation has proven to be a short-term issue.</p>

<p>Second, in some new technology areas such as Linked Data, there may be only single implementations simply because the area and the Open Standards on which they are built are not yet very mature. As the use of the technology grows, so do the number of implementations and their adherence to the standard. Thus number and quality of implementations does develop over time; government should concentrate on long-term adoption rather than short-term availability.</p>

<h3>2. What will the Government be inhibited from doing if this definition of open standards is adopted for software interoperability, data and document formats across central government?</h3>

<p>In some specialist areas, there may not be existing Open Standards available, or it may be that the Open Standards that are available do not match the specialist needs of the UK government. Where an Open Standard is imminent but not yet fully standardised or implemented, waiting for standardisation or implementation could delay government IT projects. However, the measures suggested in the rest of the proposed policy, including government involvement in standards creation and allowing for selecting other standards where there is no available Open Standard, mitigate against this risk.</p>

<p>The other mitigating factor is that where an Open Standard doesn&#8217;t exist, it is usually possible to build new work on top of Open Standards. In my experience working on legislation.gov.uk, even though good appropriate Open Standards for UK legislation weren&#8217;t available, we have been able to work using the underlying Open Standards such as XML, RDF and HTTP, so the gap between necessary custom work and Open Standards is minimised.</p>

<p>The definition of Open Standards may also prevent the Government from entering into contracts with companies which do not adopt Open Standards. It may hinder exchange of information with outside organisations that use software that doesn&#8217;t support Open Standards. These may be problems in the short term but over the longer term, an Open Standards policy will move the supplier market and other organisations towards Open Standards more generally.</p>

<h3>3. For businesses attempting to break into the government IT market, would this policy make things easier or more difficult – does it help to level the playing field?</h3>

<p>The policy does help to level the playing field for two main reasons:</p>

<ul>
<li>using Open Standards reduces the cost of switching to new suppliers, because the new suppliers do not have to spend a lot of time reverse engineering existing processing and data; this means new suppliers can make more competitive bids</li>
<li>Open Standards are often implemented within Open Source Software, which has low or no cost; this helps smaller businesses because it lowers the cost of their entering the market</li>
</ul>

<h3>4. How would mandating open standards for use in government IT for software interoperability, data and document formats affect your organisation?</h3>

<p>I work as an independent contractor who specialises in a variety of Open Web Standards. For me as a contractor, the adoption of Open Standards within government increases the potential opportunities for me to work on government projects.</p>

<p>I am currently contracted to work on the delivery of legislation.gov.uk and The National Archives&#8217; Expert Participation Programme. This work is already built substantially on Open Standards. One of the main pain points in this work has been that the government organisations that provide data such as Bills, new Statutory Instruments or Tables of Effects for legislation.gov.uk are using proprietary technologies to do so, and converting from those proprietary data formats, or getting users to save in an open data format, can be both hard to do initially and difficult to maintain as new versions of software are rolled out. An Open Standards policy within government would greatly reduce the cost involved in those conversion processes and increase the ease of use for government users.</p>

<p>I am also a member of the Technical Architecture Group within the World Wide Web Consortium (W3C), which is the main standards body for Web Standards. From that perspective, the Open Standards policy would increase UK government, and UK government supplier, involvement in the development of Open Standards within the W3C, which can only improve the quality of the standards and the life of the organisation as a whole.</p>

<h3>5. What effect would this policy have on improving value for money in the provision of government services?</h3>

<p>This policy would greatly increase value for money in the provision of government services. Adopting Open Standards often means that the basic building blocks for a service can be selected off the shelf, and then fitted together and customised, rather than a proprietary solution built from scratch. This can reduce the cost of providing the service as a whole and improve the service as development effort can be directed on the unique features of the service.</p>

<h3>6. Would this policy support innovation, competition and choice in delivery of government services?</h3>

<p>The policy as written is well-framed to support innovation, competition and choice across the market in the medium and long term, in areas which are beneficial to the UK Government and to the rest of the economy.</p>

<p>Adopting Open Standards focuses innovation on novel areas (which are not currently covered by existing standards) and on providing better quality services (which may mean better performance, better user experience and so on). It also encourages innovation in public, and this greater exposure brings with it higher quality and a better focus on user requirements. The only innovation it prevents is that whose purpose is to lock the government in to individual suppliers, such as closed standards that largely repeat existing work but can only be implemented by one supplier.</p>

<p>As part of the work that I did on Linked Data for data.gov.uk, myself and several colleagues worked on new RDF vocabularies such as <a href="http://www.w3.org/TR/vocab-org/">org</a>, <a href="http://www.w3.org/TR/vocab-data-cube/">Data Cube</a> and <a href="http://purl.org/net/opmv/ns">OPMV</a> and processing standards such as the <a href="https://code.google.com/p/linked-data-api/">Linked Data API</a>. We did this in the open, and it was all based on Open Standards: that approach did not prevent us from doing new and innovative things. The results of that innovation were then taken forward by the wider community, being made more rigorous and better suited to applicability across a wider audience, and are resulting in Open Standards from W3C. In addition, my colleagues have built new products that integrate that work, for example in <a href="http://kasabi.com/">Kasabi</a>.</p>

<p>Adopting Open Standards focuses competition on the quality of service that is offered rather than on winning a single competition that will lock the government in to contracts for many years to come. It prevents supplier complacency: when the government can move easily to a new supplier, suppliers have to provide continuous improvements over the lifetime of a contract, because they cannot be guaranteed to win the next one simply because they are the only ones who have implementations that can process the data. Competition is thus focused into areas that matter to the customer, including cost, rather than areas that matter to the supplier.</p>

<p>I was involved tangentially in The Stationery Office&#8217;s (TSO) bid during the recent re-procurement of legislation services by The National Archives. Because legislation.gov.uk was built on Open Standards, TSO&#8217;s bid had to be based on quality of service, on continuing innovation over the lifetime of the contract, and on low cost of delivery.</p>

<p>Adopting Open Standards increases choice for the UK Government because it opens up competition to suppliers who would not otherwise be able to compete (as in Question 3). Of course some companies may not currently use Open Standards, and under this policy the UK Government would not be able to choose them as suppliers in the short term, but it is unlikely that companies would continue their use of closed standards in the long term, if they wish to compete for UK Government contracts.</p>

<p>Put another way, adopting Open Standards means companies are innovating and competing on the <strong>right</strong> things: on things that are important to the UK Government.</p>

<h3>7. In what way do software copyright licences and standards patent licences interact to support or prevent interoperability?</h3>

<p>It is possible to have a Open Standard implemented by software that is public domain or completely closed or anything in between: Open Standards do not necessarily lead to free software. On the other hand, standards patent licenses reduce the ability of developers to produce free (open licensed) software because they need to make enough money to pay to use the license. Thus standards patent licenses limit the number and type of implementations of a standard, effectively limiting the market to those with enough capital to enter it.</p>

<p>The fewer implementations of a standard, the less pressure there is on those implementations to be interoperable, because there are a known and limited number of other implementations with which they need to interoperate. The greater the market of implementations, the greater the drive for interoperability because it becomes increasingly likely that they will have to interoperate with each other.</p>

<p>For example, I have often had to move code that I have written based on an Open Standard from one implementation and use it in another. If it doesn&#8217;t work, I can work out which implementation is correct (by looking at the standard) and report the error to the implementation developers, which helps them improve the interoperability of their product. The ability to move between implementations is vital to ensure interoperability between them, and the broader the market the more likely that is to happen.</p>

<h3>8. How could adopting (Fair) Reasonable and Non Discriminatory ((F)RAND) standards deliver a level playing field for open source and proprietary software solution providers?</h3>

<p>Adopting (F)RAND standards does not deliver a level playing field across providers, because it limits the ability of open source providers to enter the market, as they have to recoup licensing cost. Only royalty-free licenses provide a completely level playing field across providers.</p>

<h3>9. Does selecting open standards which are compatible with a free or open source software licence exclude certain suppliers or products?</h3>

<p>Selecting Open Standards necessarily excludes those products that only use closed standards, and suppliers that only offer those products. In the short term, suppliers who have built their products and business models around closed standards and lock-in will be excluded.</p>

<p>However, there is no requirement for companies that adopt Open Standards to license their products using a free or open source software license. While there are often free or open source products built around Open Standards, these are only competitive when they are at the same quality as those with closed licenses.</p>

<p>An example from legislation.gov.uk is that our early development used eXist, which is an XML database available under an open source software license which implemented the Open Standards of XML and XQuery. It became clear that (at that time) eXist did not support the level of use that we needed, in its performance and its scalability. We therefore instead adopted MarkLogic, which uses the same Open Standards but is not available under an open source software license. This demonstrates how companies which adopt Open Standards can still offer competitive value even within a market with open source implementations.</p>

<p>Another interesting point to draw from this is that the presence of an open source implementation of an Open Standard enabled us to prototype and experiment using that implementation, knowing that should we need better performance and so on we would be able to move all our code to another interoperable implementation. If MarkLogic had implemented a custom method of querying XML, committing to paying for it early in the process would have been too high a risk. So the use of Open Standards effectively helped MarkLogic win that business.</p>

<h3>10. Does a promise of non-assertion of a patent when used in open source software alleviate concerns relating to patents and royalty charging?</h3>

<p>I would personally be very wary of implementing a standard which had a promise of non-assertion of a patent, and standards made available under those terms would seem more risky and costly to implement because any such promise would have to be checked by a lawyer and would likely constrain my future actions and the use of the software in other environments.</p>

<h3>11. Should a different rationale be applied when purchasing off-the-shelf software solutions than is applied when purchasing bespoke solutions?</h3>

<p>The same policy should be applied both to off-the-shelf and bespoke solutions, particularly as the difference between these is not at all clear cut: off-the-shelf solutions are often customised, and bespoke solutions built from off-the-shelf products. Whatever the type of solution, the crucial point is that it needs to interoperate with other products, using Open Standards.</p>

<h3>12. In terms of standards for software interoperability, data and document formats, is there a need for the Government to engage with or provide funding for specific committees/bodies?</h3>

<p>The Government should engage with those standards bodies that work on standards that the Government uses, so that it can shape the development of those standards and highlight new areas where standards work is needed to satisfy the UK Government&#8217;s requirements. In my own area of web standards, the main one is the W3C. In fact, given the UK Government&#8217;s transparency, open data and &#8220;digital by default&#8221; policies, all which require the use of W3C standards, the UK Government is under-represented within W3C, with only two agencies (Ordnance Survey (OS) and The National Archives (TNA)) being members. Other public-sector organisations which make heavy use of W3C standards, such as the BBC, have made a business decision be become more engaged in W3C activities in order to shape and influence them.</p>

<p>Membership of W3C and other standards bodies is particularly important where standards development impacts on the ability to achieve the goals of particular organisations. For example, TNA are taking particular interest in the provenance work being done at W3C; the Government Digital Service should be participating in the development of standards in web design and applications; the Office of National Statistics should be taking an interest in the development of the Data Cube vocabulary for statistical information and so on.</p>

<h3>13. Are there any are other policy options which would meet the described outcomes more effectively?</h3>

<p>I believe that the Open Standards policy described in the consultation is the best way to achieve lower cost, higher interoperability, reduced lock-in, increased innovation and competition in the right areas and to level the playing field both for open source software and for small and medium enterprises who wish to compete for government contracts.</p>

<h2>Open Standards Mandation</h2>

<h3>1. What criteria should the Government consider when deciding whether it is appropriate to mandate particular standards?</h3>

<p>The only time the Government should mandate a particular Open Standard for IT is when there is a clear and apparent cost in two competing Open Standards that offer equivalent functionality being adopted by different parts of Government due to poor interoperability between them. In most cases, central government should avoid mandating a particular Open Standard, but instead let individual public-sector organisations select an appropriate Open Standard based on their own requirements.</p>

<p>Government should mandate a particular Open Standard where:</p>

<ul>
<li>there are competing Open Standards that cover the given functionality and</li>
<li>interoperability or conversion between these standards is lossy or difficult and</li>
<li>there are multiple organisations within government with which interoperability involving the standard is required</li>
</ul>

<h3>2. What effect would mandating particular open standards have on improving value for money in the provision of government services?</h3>

<p>A central government authority mandating particular Open Standards could reduce value for money in the provision of government services, because that central authority is unlikely to fully understand the requirements of the particular service in detail. Public sector bodies who are actually procuring solutions should select particular Open Standards as part of the procurement process.</p>

<p>Where mandating particular Open Standards could improve value for money is if there are competing Open Standards and different public sector bodies are likely to adopt different ones. In those cases, there may be interoperability problems between the standards which make it more costly to provide services, and mandating the adoption of a particular Open Standard could help.</p>

<h3>3. Are there any legal or procurement barriers to mandating specific open standards in the UK Government&#8217;s IT?</h3>

<p>I do not know.</p>

<h3>4. Could mandation of competing open standards for the same function deliver interoperable software and information at reduced cost?</h3>

<p>It is unclear what this question means.</p>

<p>Mandating both of two competing Open Standards may increase the availability of interoperable software, but it will raise the cost of implementation and therefore of software, because implementing two standards takes more development effort than implementing one.</p>

<p>Mandating a single Open Standard from two competing standards may increase interoperability but comes with a possible risk and cost. It may be that two competing standards, while similar, have different target audiences and capabilities. If one is mandated, those public sector bodies whose requirements fit more closely with the other will suffer increased cost in trying to use the first standard to fit their requirements. In addition, the standards may evolve over time such that they either diverge in functionality or increase in interoperability (so that mandation is no longer necessary) or such that the original judgement about which to mandate no longer applies.</p>

<h3>5. Could mandation of open standards promote anti-competitive behaviour in public procurement?</h3>

<p>Anti-competitive behaviour arises when the number of potential competitors is artificially restricted due to the terms of the procurement. Whether mandating a particular Open Standard promotes anti-competitive behaviour depends on two factors.</p>

<p>First, it depends on the nature of the Open Standard and its implementations. Some Open Standards are small and easy to implement while others are large and complex to implement. Some have open source implementations across a number of platforms, others have few implementations, available under restricted terms or on restricted platforms. In the short term, mandating an Open Standard that is costly for a company to implement and for which there is no easily available implementation is going to favour those companies who have already implemented the Open Standard. In the longer term, by their nature, implementations of Open Standards tend to become more widely available, and companies are likely to invest in implementing them if they are industry standards. Crucially, because the standards are Open, they are freely able to read them and can implement them without  royalty payments, so in the long term it is not anti-competitive.</p>

<p>Secondly, the reason for the mandation of a particular Open Standard should be clear within the procurement exercise and a particular Open Standard should not be mandated unless there are clear reasons for that mandation. As an example from my own experience, there are two Open Standards for embedding metadata within HTML pages: RDFa and microdata. Either could be used to provide largely the same functionality so there would generally not be a need to mandate one or the other during procurement, but there should be a requirement to show how the standard selected by the supplier would be used to achieve the aims of the system.</p>

<h3>6. How would mandation of specific open standards for government IT software interoperability, data and document formats affect your organisation/business?</h3>

<p>Mandating specific Open Standards could limit the approaches that I would be able to take in my work, which could mean that I was less able to select an appropriate technology based on the requirements of the system. Any technology selection requires balancing a large set of requirements, both in terms of functionality and in terms of performance, reliability, scalability and so on. As a developer, I and my direct customers have the clearest understanding of these requirements and their relative importance within the system. These are complex choices, and they should not be made by central authorities who do not understand the details.</p>

<p>It is much more important to me to have a clear understanding that Open Standards should be used wherever possible and which Open Standards are being used within the organisations that interact with the systems that I am responsible for, so that the systems I build interoperate with them more smoothly.</p>

<h3>7. How should the Government best deal with the issue of change relating to legacy systems or incompatible updates to existing open standards?</h3>

<p>Good Open Standards provide clear statements about both forwards and backwards compatibility and, as they are developed through open participation, the extent of changes and the reasons for them are usually clear, which makes the process of working out what needs to be upgraded easier than with closed standards. For example, I was involved in the development of the second version of XSLT, and those of us in the Working Group spent substantial time ensuring that the impact on existing users of XSLT were not too great and were thoroughly documented.</p>

<p>When adopting a particular Open Standard within a system, the Government or their suppliers should assess the impacts of future changes in the standard on the system and should be involved to an appropriate level in the development of the standard to ensure that it continues to meet requirements.</p>

<h3>8. What should trigger the review of an open standard that has already been mandated?</h3>

<p>The Government should continuously work with its suppliers during the contract term and with potential suppliers during procurement to assess the best Open Standard to use. Even when a given standard itself doesn&#8217;t change, the wider IT environment may alter: there may be more or fewer implementations over time, alternative standards, or a change in interoperating standards used by other organisations. Thus there are no particular trigger points at which an Open Standard should be reviewed, though re-procurement will naturally cause a re-exploration of a system&#8217;s environment and the best approach to its implementation.</p>

<h3>9. How should the Government strike a balance between nurturing innovation and conforming to standards?</h3>

<p>Good Open Standards have built-in extensibility points which provide the scope for implementer innovation while providing general interoperability. In the best cases these extensions gradually become standardised themselves: this is what has happened with XSLT, for example. Thus, conforming to standards does not prevent innovation; instead it focuses innovation on user requirements, and in particular on improving the quality of implementation. The Government&#8217;s Open Standards policy should ensure that when standards are selected they provide broad interoperability while giving scope for extension, and this is particularly important if the Government mandates particular Open Standards for general use.</p>

<h3>10. How should the Government confirm that a solution claiming conformity to a standard is interoperable in practice?</h3>

<p>Within the W3C, new standards must have an associated test suite, usually constructed both by the Working Group who develop the standard and from the test suites created by individual implementers of the standard. Running an implementation against such a test suite makes it possible to empirically test whether there is conformance to the standard. The results of running a given implementation against such a test suite are often available, for example see <a href="http://rdfa.info/earl-reports/">the RDFa test suite results</a>, but the Government could also ask the solution provider for evidence of interoperability in the form of test suite results.</p>

<h3>11. Are there any are other policy options which would meet the objective more effectively?</h3>

<p>A general Government policy to use Open Standards will in most cases naturally lead to the best Open Standard being adopted across the public sector without Government needing to mandate the use of specific Open Standards. Systems should be procured on the basis of their ability to interoperate with those other systems with which they need to work; in such an environment the easiest approach for suppliers will be to adopt the standards used by other systems.</p>

<p>We have seen this happen within legislation.gov.uk, where we store and publish legislation using a particular XML vocabulary. Various parliaments and government departments draft legislation which is published on the site, and they have traditionally done so based on the particular requirements of the drafters themselves (for example, the UK parliament uses Framemaker while government departments use Word). We have to cater for these different formats, and converting them into the standard that we use within legislation.gov.uk. However, as these authoring systems are being re-procured, the ability to produce the XML vocabulary that we use within legislation.gov.uk has become part of the requirements on future authoring systems, because it gives greater fidelity between authoring and eventual publication. Thus we are naturally moving towards a more interoperable environment without any central mandation of the standard and without overriding the local requirements of particular authors.</p>

<h2>International Alignment</h2>

<h3>1. Is the proposed UK policy compatible with European policies, directives and regulations (existing or planned) such as the European Interoperability Framework version 2.0 and the reform proposal for European Standardisation?</h3>

<p>I do not know.</p>

<h3>2. Will the open standards policy be beneficial or detrimental for innovation and competition in the UK and Europe?</h3>

<p>The UK&#8217;s Open Standards policy will be beneficial to innovation and competition in the UK because it levels the playing field for a wider set of providers and focuses innovation and competition on the requirements of the users of software rather than the suppliers. These benefits apply as well to Europe as to the UK, and the successful adoption of an Open Standards policy within the UK will naturally aid the adoption of similar policies in other countries within Europe and internationally.</p>

<h3>3. Are there any are other policy options which would meet the objectives described in this consultation paper more effectively?</h3>

<p>The one part of the international alignment policy that needs to be rethought is the preference to international standards over local standards. While in general the wider and greater adoption of a given standard, the better, it is not always the case that international standards provide greater benefits than local ones.</p>

<p>Looking at providing data about legislation, for example, different jurisdictions have very different ways of identifying, creating, revising and formatting legislation. Any international standards for legislation are highly unlikely to take into account the complexities and special cases that are specific to UK legislation (such as the use of regnal years for identifying older items). A local standard can be better tailored to the requirements of the locality, which may be incredibly important in easing implementation cost and data fidelity.</p>

<p>I would therefore recommend a policy approach that emphasised interoperability with international standards without requiring their wholesale adoption, particularly where there are specific local requirements that are not met by the international standard.</p>
    ]]></content>
  </entry>
  <entry>
    <title>Content and Descriptions of Web Resources</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/168" />
    <id>http://www.jenitennison.com/blog/node/168</id>
    <published>2012-03-31T22:27:08+01:00</published>
    <updated>2012-03-31T22:27:08+01:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="linked data" />
    <category term="rest" />
    <category term="tag" />
    <summary type="html"><![CDATA[<p>Those readers who follow the <a href="http://lists.w3.org/Archives/Public/www-tag/">TAG</a> or <a href="http://lists.w3.org/Archives/Public/public-lod/">public-lod</a> mailing lists over the last couple of weeks cannot have failed to notice a large number of posts on a theme that recurs on roughly a 9-monthly cycle within these communities: <a href="http://www.w3.org/2001/tag/group/track/issues/14">httpRange-14</a>.</p>

<p>The reason for this particular recurrence was a <a href="http://www.w3.org/2001/tag/doc/uddp/change-proposal-call.html">Call for Change Proposals</a> on the resolution. The TAG meets on Monday, and discussion of this issue is one of the first items on <a href="http://www.w3.org/2001/tag/2012/04/02-agenda">our agenda</a>. These are my thoughts going in to that discussion.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>Those readers who follow the <a href="http://lists.w3.org/Archives/Public/www-tag/">TAG</a> or <a href="http://lists.w3.org/Archives/Public/public-lod/">public-lod</a> mailing lists over the last couple of weeks cannot have failed to notice a large number of posts on a theme that recurs on roughly a 9-monthly cycle within these communities: <a href="http://www.w3.org/2001/tag/group/track/issues/14">httpRange-14</a>.</p>

<p>The reason for this particular recurrence was a <a href="http://www.w3.org/2001/tag/doc/uddp/change-proposal-call.html">Call for Change Proposals</a> on the resolution. The TAG meets on Monday, and discussion of this issue is one of the first items on <a href="http://www.w3.org/2001/tag/2012/04/02-agenda">our agenda</a>. These are my thoughts going in to that discussion.</p>

<!--break-->

<h2>The Questions</h2>

<p>The recent discussion on the lists has, I think, helped to refine the questions that lie at the core of the httpRange-14 issue. They are:</p>

<ol>
<li>When you get a successful response from a URI, does that response, by definition, include the <em>content</em> of the resource identified by the URI?</li>
<li>How can you discover a <em>description</em> of the resource identified by a URI?</li>
</ol>

<p>Knowing whether the response to a URI provides the content of the resource identified by that URI is important because when you have data about the thing identified by a URI, such as its author or the license that it is provided under, you need to know what information is actually being referred to so that you can tell what information you can reuse and whom you have to attribute. </p>

<p>For example, the <a href="http://www.gov.uk/">GOV UK</a> website has a license at the bottom of each page:</p>

<pre><code>&lt;p&gt;
  Much of the information on this website is available for reuse under the 
  &lt;a href="http://www.nationalarchives.gov.uk/doc/open-government-licence/" 
     rel="licence"&gt;Open Government Licence&lt;/a&gt;
&lt;/p&gt;
</code></pre>

<p>Seeing this, an application that knows the <a href="http://www.nationalarchives.gov.uk/id/open-government-licence/">Open Government License</a> enables free reuse can tell that it can lift content out of the page and use it on their own site. An application could automatically scrape out and republish the first paragraph of those <a href="https://www.gov.uk/government/news-and-speeches">news stories</a> provided on this site and any others that were published with under this license.</p>

<h2>The Conflict</h2>

<p>There are vocal disagreements about particularly the first of the two questions I outlined above. What&#8217;s become clear to me is that the source of the arguments stem from a difference in world view about what kind of resources are available on the web.</p>

<h2>Web of Data</h2>

<p>Under the <em>web of data</em> view, the web consists of data, and all the resources on the web are <em>information resources</em>, defined as those <a href="http://www.w3.org/TR/webarch/#def-information-resource">resources whose essential characteristics can be conveyed in a message</a>. Data, in other words.</p>

<p>URIs can still be used to name other resources, which are not on the web either because they are not information resources (such as a Person) or because they are not available yet (such as unscanned books). Under this world view, however, giving a successful HTTP response for such a resource is simply wrong, because these resources aren&#8217;t on the web.</p>

<p>The problem that this world view therefore needs to address is how to create URIs to identify resources that aren&#8217;t on the web. There are two answers:</p>

<h4>Hash URIs</h4>

<p>Hash URIs have the benefit that there is a direct relationship between the hash URI which identifies the resource and a resource on the web that describes it. An HTTP client naturally strips the fragment identifier from the URI in order to make the request to a server, which then delivers the description of the resource.</p>

<h4>303 Redirections</h4>

<p>If you identify a resource that isn&#8217;t on the web using an HTTP URI that is not a hash URI, you cannot get a successful response back because the resource you have asked for is, by definition in this world view, not on the web. The workaround is for the publisher to use the <a href="http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-19#section-7.3.4">303 See Other</a> status code to point from the resource that you requested to its description on the web. (This is the essence of <a href="http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html">the httpRange-14 resolution</a>.)</p>

<h3>Web of Things</h3>

<p>Under the <em>web of things</em> view, the web consists of things, and resources on the web could be anything: documents, people, films, teapots and so on. When a client makes an HTTP request for a resource, the response must reflect the state of the resource, but that state could be its content (if it&#8217;s an information resource) or it could be a description of the resource.</p>

<p>Under this world view, giving a successful HTTP response for a resource that isn&#8217;t an information resource is absolutely fine: the description of the resource is still a reflection of its state.</p>

<p>The problem that needs to be addressed when you have this world view becomes apparent when you think back to the licensing example above. Given an application knows that the resource identified by a given URI can be reused, how does it know whether the representation of that resource is reusable? It could be that the identified resource (for example an out-of-copyright book) has an open license, but that the representation of the resource holds only a description of that resource (some metadata about the book), and that description has a much more restrictive license. Or vice versa.</p>

<p>So to address this use case, you need some other mechanism to enable an application to tell that the representation is the content of the resource, rather than merely a description of it.</p>

<h2>The Current State</h2>

<p>To generalise, the linked data community operates within the <em>web of data</em> world view and the larger web community operates within the <em>web of things</em> world view.</p>

<p>What is happening increasingly, however, is that these two world views are rubbing up against each other, and while both are internally coherent, switching between the world views causes not only a cognitive disconnect for developers but practical problems when transforming or moving data published under one world view into the other world view.</p>

<p>In addition, publication of data on the web through APIs is growing all the time, particularly <a href="https://en.wikipedia.org/wiki/Representational_State_Transfer">REST APIs</a> supporting the <a href="https://en.wikipedia.org/wiki/HATEOAS">Hypertext as the Engine of Application State (HATEOAS) principle</a>. As we share more data on the web, and we use URIs in our APIs, the question of what those URIs mean and how we associate licenses and provenance information with data, will only become more important.</p>

<p>We have an obligation, therefore, to reflect on the experience from the linked data community over the last few years and how that experience might spread to the larger web community.</p>

<p>Discussions within the linked data community over the httpRange-14 resolution centre on two problems that people have encountered:</p>

<ol>
<li>the <em>web of data</em> vs <em>web of things</em> disconnect hurting adoption</li>
<li>practical aspects of responding to requests with 303 redirections such as
<ul><li>round-trip delays, particularly as in HTTP 1.1 303s can&#8217;t be cached</li>
<li>inability for people to use 303s without server admin access</li></ul></li>
</ol>

<h2>The Social Context</h2>

<p>A bit of a side-point here. I think that the questions I posed at the start of this post are general questions about web architecture, so it puzzles me that the only people who seem to really care about them, and who debate them endlessly, are the linked data community. This is partly because the linked data community use URIs extensively to identify the things about which they provide data, but I think it&#8217;s also about the fundamental attitude of those within the community, which was characterised in a <a href="http://lists.w3.org/Archives/Public/public-lod/2012Mar/0185.html">recent post</a> by <a href="http://www.seme4.com/who-we-are/profile/hugh-glaser/">Hugh Glaser</a>:</p>

<blockquote>
  <p>Personally, I never did agree with the solution [to httpRange-14], but have always aimed to carry out the implications of it in the systems I construct.</p>
  
  <p>This is for two reasons:<br>
  a) as a member of a small community, it is destructive to do otherwise;<br>
  b) as a professional engineer, my ethical obligations require me to do so.</p>
  
  <p>It is this second, the ethical obligations that are the most significant.<br>
  I should not digress from the standards, or even Best Practice, in my work.</p>
</blockquote>

<p>The linked data community is jam packed with people who feel an ethical obligation to adhere to standards and best practices. We try to do what we are told is the Right Thing by individuals and standards organisations even when we don&#8217;t agree that it is the Right Thing and even if it turns out to be impractical.</p>

<p>In the larger web community, people who don&#8217;t agree with a standard or best practice, or who find it too impractical to implement, simply ignore it. There is no need to endlessly debate something that you can just ignore. And the httpRange-14 resolution is ignorable by the larger web community because so far it has had very little impact on any implementations at all, let alone widely-deployed implementations that work over the non-linked-data web.</p>

<h2>The Choices</h2>

<p>Going into the TAG meeting about this on Monday, the main decision that I see is whether to continue to assume a <em>web of data</em> world view. In the <em>web of data</em> world view, it is impossible for a URI to return a description of a resource, whereas in the <em>web of things</em> world view it is fine. Personally, I would prefer to design around the <em>web of things</em> world view as I think this would ease some of the disconnects between linked data and the wider web, but there are others on the TAG who adhere strongly to the <em>web of data</em> view, so I think that change is unlikely.</p>

<p>If we stick with the <em>web of data</em> view, the main issues are how to alleviate the current practical difficulties that people are encountering with its implementation and explanation. I think there are three measures that would help:</p>

<ol>
<li><p>Determine a conventional syntax for fragment identifiers that are used to identify things that are not on the web, as opposed to fragments of content. I&#8217;m thinking something like hash-bang URIs: using a character after the hash character that just gives a quick indication that the fragment identifier is being used in a special way, to refer to something that isn&#8217;t on the web rather than a fragment of a document, for example <code>#*</code>.</p></li>
<li><p>Change to recommending a single best practice of using hash URIs for resources that aren&#8217;t on the web, and in particular recommending having a one-to-one correspondence between resources on the web and those not on the web, using one particular conventional hash URI. For example, <code>http://www.whitehouse.gov/#*</code> would identify the resource that <code>http://www.whitehouse.gov/</code> is about: The Whitehouse. This ensures that new publishers of data won&#8217;t run into the problems with publishing using 303 redirections, because they won&#8217;t use that method of publication. It also removes choice, which helps adopters who can otherwise get overwhelmed with options and the trade-offs between them.</p></li>
<li><p>Allow publishers who are currently using 303 redirections to publish descriptions of resources identified using non-hash URIs to switch to providing a representation using a 200 status code, along with a method of indicating that the representation is the <em>description</em> of the resource rather than its <em>content</em>. This indicator could be:</p>

<ul><li>a new HTTP header or status code (though I&#8217;d prefer not)</li>
<li>a Link: header with a particular relationship (eg &#8216;describedby&#8217;)</li>
<li>a statement embedded in the response itself (eg a <code>&lt;link rel="describedby"&gt;</code> element in HTML)</li></ul></li>
</ol>

<p>If we did move to a <em>web of things</em> view, the main question would be how to provide an indicator that the representation of a particular resource is the content of that resource as opposed to being a description. It would help ease transition if this was a natural consequence of the current pattern of publication on httpRange-14-compliant sites, so for example, you&#8217;d want to consider the representation of a resource the content of the resource if you got to it:</p>

<ul>
<li>when retrieving a hash URI, if it was the part of the URI before the hash</li>
<li>when following a 303 See Other redirection, if it was the target of the redirection</li>
<li>when following a &#8216;describedby&#8217; link, if it was the target of the link</li>
</ul>

<p>as well as if there was an explicit indicator within the representation that said the resource was an information resource.</p>

<p>Whichever decisions are made, I would personally like to see the concrete requirements on client behaviour that arise from these different publication practices, for example enabling a reuser to associate a license with a particular piece of content or a crawler to create RDF statements about URIs encountered on the web, to bring whatever decisions are made down to earth and less ignorable.</p>
    ]]></content>
  </entry>
  <entry>
    <title>Precious Snowflakes</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/167" />
    <id>http://www.jenitennison.com/blog/node/167</id>
    <published>2012-03-10T11:56:36+00:00</published>
    <updated>2012-03-10T11:56:36+00:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="betagovuk" />
    <category term="gds" />
    <category term="web" />
    <summary type="html"><![CDATA[<p><em>Disclaimer: As usual, this post contains my personal opinion and does not reflect that of any organisation with which you might associate me.</em></p>

<p>The other day, I had a lovely conversation with some folks from the BBC about some of their future plans. In the course of the conversation, <a href="http://smethur.st/">Michael Smethurst</a> spoke about his frustration when dealing with people involved with particular <a href="http://www.bbc.co.uk/programmes">programmes at the BBC</a>, where every single one of them thinks their programme is a &#8220;precious snowflake&#8221;, completely unique, that simply can&#8217;t be treated in the same way as all the other programmes described on the site.</p>

<p>Michael&#8217;s point, of course, is that TV programmes have a hell of a lot of similarities with each other. They all have episodes and cast members and may have trailers or be available on iPlayer. When the BBC models them in the same way, they gain enormous efficiencies in their ability to store and access information about programmes: they can reuse code, share content between programmes, and perform analyses over the aggregated data set. It&#8217;s great for users too: they get the same fantastic user experience no matter which programme they are viewing information about, and can apply the experience they gain when navigating pages about one programme when they need to find information about another.</p>

<p>The ability to classify and categorise, to bring order to what seems like chaos, to create a model of the world, is one of the things that marks humans from animals. We can look at a hundred people, with different colour hair and skin; different height and build; smiling, talking, crying, and still call them all Person because the essential characteristics that govern how we interact with them are the same.</p>

<p>But if there&#8217;s one thing that the last five long, hard years working with legislation has taught me, it&#8217;s that in any vaguely interesting domain, this search for order will always fall down in the face of reality.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p><em>Disclaimer: As usual, this post contains my personal opinion and does not reflect that of any organisation with which you might associate me.</em></p>

<p>The other day, I had a lovely conversation with some folks from the BBC about some of their future plans. In the course of the conversation, <a href="http://smethur.st/">Michael Smethurst</a> spoke about his frustration when dealing with people involved with particular <a href="http://www.bbc.co.uk/programmes">programmes at the BBC</a>, where every single one of them thinks their programme is a &#8220;precious snowflake&#8221;, completely unique, that simply can&#8217;t be treated in the same way as all the other programmes described on the site.</p>

<p>Michael&#8217;s point, of course, is that TV programmes have a hell of a lot of similarities with each other. They all have episodes and cast members and may have trailers or be available on iPlayer. When the BBC models them in the same way, they gain enormous efficiencies in their ability to store and access information about programmes: they can reuse code, share content between programmes, and perform analyses over the aggregated data set. It&#8217;s great for users too: they get the same fantastic user experience no matter which programme they are viewing information about, and can apply the experience they gain when navigating pages about one programme when they need to find information about another.</p>

<p>The ability to classify and categorise, to bring order to what seems like chaos, to create a model of the world, is one of the things that marks humans from animals. We can look at a hundred people, with different colour hair and skin; different height and build; smiling, talking, crying, and still call them all Person because the essential characteristics that govern how we interact with them are the same.</p>

<p>But if there&#8217;s one thing that the last five long, hard years working with legislation has taught me, it&#8217;s that in any vaguely interesting domain, this search for order will always fall down in the face of reality.</p>

<!--break-->

<p>Surely, I thought in my naive early days, every piece of legislation is uniquely identified through its type, calendar year, and number? Not so! There are six items for which this is not the case, because prior to 1963 legislation was numbered based on the year of reign of the monarch rather than the calendar year.</p>

<p>Surely the year that is used to number legislation is dependent on the date it is made and written into law? Not so! Sometimes departments forget to register legislation they make until the following year, so it is numbered the year after it&#8217;s made.</p>

<p>Surely an item of legislation can only make changes to legislation from the day it is written into law? Not so! There is, rarely, legislation that rewrites history: that says other legislation should always have had different content to that which was originally written.</p>

<p>It has come to the point where I never (hah!) make any statements about legislation of the form &#8220;X never happens&#8221; or &#8220;Y is always true&#8221; because there is always, <em>always</em>, an exception.</p>

<p>What this has taught me, as a developer, is the power and necessity of escape hatches. For example, templating languages that provide a method of escaping to code are so much more valuable than those that do not. Similarly, I favour strongly, in the technologies that I use, the ability to extend a common data structure, be it through <code>data-*</code> attributes in HTML, through generic elements such as <code>&lt;span&gt;</code> and <code>&lt;div&gt;</code> or through the essentially open-ended nature of RDF as a data model.</p>

<p>It has also given me a very different view of the world to Michael. Because when you accept that there are always exceptions, you do not see snowflakes as merely crystals of water, but as exceptional, beautiful and, yes, immensely precious.</p>

<p>And this is why I love the web. The web does not force every site to have the same structure or the same look and feel. It does not insist on consistency; it has space for every quirk. And it proves beyond all doubt that it is possible for all these precious snowflakes to exist in a single, global, interlinked information system in which people manage to find not only the information that they need, but also community and connection with each other.</p>

<h2>Inside Government</h2>

<p>So it is with these eyes that I look at the new <a href="https://www.gov.uk/government">Inside Government</a> pages on the <a href="https://www.gov.uk/">gov.uk site</a> and am frankly horrified. Because we&#8217;re not just talking about a BBC programmes here, but about <a href="https://www.gov.uk/government/organisations">powerful institutions</a>, many of them decades if not centuries old, that lie at the very heart of government and how our nation is run. And each of them is relegated to a subfolder of a subfolder of a subfolder, their unique histories and approaches and goals expressed through three pictures on a carousel.</p>

<p>It feels like some kind of Orwellian nightmare: the <a href="http://digital.cabinetoffice.gov.uk/2012/01/31/this-is-why-we-are-here/">relentless focus on user needs</a> leading to a future of identikit pages, with no individuality, no character, no clue that behind these pages &#8212; which, remember, under the <a href="https://www.gov.uk/government/policies/launching-the-single-domain">Single Government Domain policy</a> becomes the single authoritative view, <em>the</em> site that represents the department on the web &#8212; is a living and breathing institution that manages hugely important parts of our lives. A future in which what each department says and the way that it says it is governed through the <a href="http://digital.cabinetoffice.gov.uk/">Government Digital Service (GDS)</a>, in <a href="http://www.cabinetoffice.gov.uk/">Cabinet Office</a>, the hand of the prime minister. And <a href="http://digital.cabinetoffice.gov.uk/2012/03/07/does-local-government-need-a-local-government-digital-service/">now we&#8217;re talking about local government too</a>?</p>

<p>Let us just look at one example. Last September, <a href="http://www.fco.gov.uk/en/news/latest-news/?view=Speech&amp;id=652930982">William Hague gave a speech</a> in which he described the hollowing out of the <a href="http://www.fco.gov.uk/">Foreign and Commonwealth Office (FCO)</a> by the previous government, a process that scrapped its language school, closed embassies and destroyed its library. He said:</p>

<blockquote cite="http://www.fco.gov.uk/en/news/latest-news/?view=Speech&amp;id=652930982">
Strong institutions are necessary in civil society, to encourage participation and keep in check an overmighty State; they are necessary to our judiciary and Parliament so that the law is upheld and the making of it respected; but they are also necessary within the State, a point tragically overlooked by those Prime Ministers who have created and abolished departments on a fancy or a whim, destroying as they did so the pride and continuity of thousands of public servants while rendering government incomprehensible to the average citizen. The whole country should know what the Foreign and Commonwealth Office is and what it does, and all those interested in foreign policy at home or abroad should see it as a centre of excellence with which they aspire to be associated.
</blockquote>

<p>For most UK citizens, the only point of access to the Foreign and Commonwealth Office is its website: they will not visit <a href="http://www.fco.gov.uk/en/about-us/our-history/our-buildings/buildings-in-uk/king-charles-street/">King Charles Street</a>, nor any of the <a href="http://www.fco.gov.uk/en/travel-and-living-abroad/find-an-embassy/">UK&#8217;s embassies</a>. The department&#8217;s web presence is the only way that it makes itself, and its unique role, comprehensible to the average citizen, the only method of letting the whole country know what the FCO is and what it does. And they have content that is completely unique to them: a <a href="http://www.fco.gov.uk/en/treaties/search">database of Treaties</a>, a hugely rich set of information on <a href="http://www.fco.gov.uk/en/travel-and-living-abroad/">travel and living abroad</a> and a <a href="http://www.fco.gov.uk/en/about-us/our-history/">wealth of historical information about the Foreign Office</a>. This simply doesn&#8217;t fit in a model of a department as a set of Ministers, Policies, Publications and so on. And if it doesn&#8217;t fit, will it simply be excluded, lost from its website like its language school, its embassies, its library?</p>

<p>I could have picked any government department here &#8212; each one has its unique characteristics and content &#8212; but Hague articulates the case around FCO so well. His message is not the expression of a simple conservative impulse to resist change and preserve the status quo, but about maintaining the integrity of an institution&#8217;s identity and independence <q cite="http://www.fco.gov.uk/en/news/latest-news/?view=Speech&amp;id=652930982">to encourage participation and keep in check an overmighty State</q>. If we believe in Open Government, Open Democracy and the power of the web to enhance civic engagement then we must, surely, enable each of these institutions to have their own independent voice on the web.</p>

<p>I am reminded of the <a href="http://xkcd.com/">XKCD</a>:</p>

<p style="text-align: center;">
<a href="http://xkcd.com/773/"><img src="http://imgs.xkcd.com/comics/university_website.png" /></a>
</p>

<p>The two sides of this Venn diagram illustrate two approaches to building a website for an organisation. On the left is the website as an expression of the identity of the institution, on the right the website as a means of satisfying the reason the user originally visited the site. My argument is not that the right side of this diagram is unimportant &#8212; in fact I believe it is absolutely essential &#8212; but that an institution&#8217;s website must cover the entire space: it must provide a mechanism for self-expression as well as catering for its user&#8217;s requirements. To enhance civic engagement, we do not need to simply answer the query that led the user to the site, but to encourage and lead them on to see more about the institution that has provided the answer.</p>

<p>It is only the institution itself that knows the self it wants to express, and because the real world is complex and organisations are unique, that self will not fit into any model that we devise. News, Policies, Consultations &#8212; of course these are all important to all departments, but they are the tip of an iceberg. Look at the space that <a href="http://www.fco.gov.uk/en/about-us/our-history/">FCO devotes to its history on its website</a>: this shouts to the world the kind of reliable, solid and flexible organisation that they are and want to remain. Compare how <a href="http://www.decc.gov.uk/en/content/cms/statistics/statistics.aspx">DECC devotes space to statistics</a>, emphasising its adherence to transparency and evidence-based policy. Self-expression is so much more than changing logos or backgrounds, more than having different content on an About page, it is about making space for the things that are important to <em>you</em>.</p>

<p>&#8220;But but but!&#8221; I know the arguments. We must cut costs, stop the uncontrolled proliferation of government websites; we must improve the quality of the government&#8217;s presence on the web, present a unified view, make it easy for users to locate content without knowing where to look. The vision we see expressed through Inside Government is but the natural conclusion, the end of that slippery slope. But it is the great <a href="https://en.wikipedia.org/wiki/Slippery_slope_fallacy">slippery slope fallacy</a> that everything must be taken to its natural conclusion, that because 750 websites is too many, one is enough.</p>

<p>Possibly the biggest irony of the gov.uk beta is that while it is delivering a Single Government Domain &#8212; everything is to be found under <code>www.gov.uk</code> &#8212; it does not seem to address the core reason stated for providing it. In <a href="https://whitehall-frontend-production.s3.amazonaws.com/system/uploads/attachment/file/745/Martha_Lane_Fox_s_letter_to_Francis_Maude_14th_Oct_2010.pdf">Martha Lane-Fox&#8217;s letter to Francis Maude</a>, which kicked off this whole endeavour, she said:</p>

<blockquote cite="https://whitehall-frontend-production.s3.amazonaws.com/system/uploads/attachment/file/745/Martha_Lane_Fox_s_letter_to_Francis_Maude_14th_Oct_2010.pdf">
Government publishes millions of pages on the Web, via hundreds of different websites. Most of these sites are still run as silos within departments. This fragmentation leads to significant duplication of functions and technology, and means the overall user experience is highly inconsistent.
</blockquote>

<p>Try <a href="https://www.gov.uk/search?q=Single+Government+Domain">searching for &#8216;Single Government Domain&#8217; on the main gov.uk site</a> or for <a href="https://www.gov.uk/government/search?q=driving+test+centre">driving test centres on Inside Government</a>. The searches do not work (though Inside Government does give you a link that enables you to search the other silo). The result pages are completely different in feel except for the top and bottom banners. The page on <a href="https://www.gov.uk/arrest-imprison-abroad">Arrest and Imprisonment Abroad</a> mentions but does not link to the <a href="https://www.gov.uk/government/organisations/foreign-and-commonwealth-office">Foreign and Commonwealth Office&#8217;s page</a>. Yes, yes, I know that it&#8217;s still beta, but these things lie at the heart of the stated rationale for a Single Government Domain: is this the extent of the consistency and integration that we are aiming for?</p>

<p>Yes, it is, because the Single Government Domain policy was never truly about either of these things. Read Martha Lane-Fox&#8217;s letter again carefully (my emphasis):</p>

<blockquote cite="https://whitehall-frontend-production.s3.amazonaws.com/system/uploads/attachment/file/745/Martha_Lane_Fox_s_letter_to_Francis_Maude_14th_Oct_2010.pdf">
<strong>No1O feel</strong> it is preferable to go from 750 top level website domains (eg www.cabinetoffice.gov.uk) to a single top level website domain for all of central government.
</blockquote>

<p>The Single Government Domain policy, indeed GDS itself, is about control. It is &#8220;<a href="http://digital.cabinetoffice.gov.uk/about/">we will do it for you</a>&#8221;, not &#8220;we will help you do it&#8221;. It is about managing the output of institutions that might <q cite="http://www.fco.gov.uk/en/news/latest-news/?view=Speech&amp;id=652930982">keep in check an overmighty State</q>. It is anti-web and it is anti-democracy and I cannot remain quiet about it any longer.</p>

<p>To my friends at GDS: I respect and admire you all. You are incredibly talented and able to do amazing things. You have behind you a level of financial and political support the like of which most civil servants will never see. I know you have joined GDS not just to do work that you love but to do good for the country. This is my plea to you: find a way to avoid this vision. Nurture the exceptions. Give institutions their voice. Treat them as precious snowflakes.</p>
    ]]></content>
  </entry>
  <entry>
    <title>Microdata and RDFa Living Together in Harmony</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/165" />
    <id>http://www.jenitennison.com/blog/node/165</id>
    <published>2011-08-20T17:39:11+01:00</published>
    <updated>2011-08-24T22:01:01+01:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="microdata" />
    <category term="rdfa" />
    <summary type="html"><![CDATA[<p>One of the options that the TAG put forward when it <a href="http://lists.w3.org/Archives/Public/public-html/2011Jun/0366.html">asked the W3C to put together task force on embedded data in HTML</a> was the co-existence of RDFa and microdata. If <a href="http://lists.w3.org/Archives/Public/www-tag/2011Aug/0050.html">that&#8217;s what we&#8217;re headed for</a>, what might make things easier for consumers and publishers who have to live in that world?</p>

<p>In a situation where there are two competing standards, I think that developers &#8212; both on the publication and consumption sides &#8212; are going to want to hedge their bets. They will want to avoid being tied to one syntax in case it turns out that that syntax isn&#8217;t supported by the majority of publishers/consumers in the long term and they have to switch.</p>

<p>Publishers like us at <a href="http://www.legislation.gov.uk/">legislation.gov.uk</a> who are aiming to share their data to whoever is interested in it (rather than having a particular consumer in mind) are also likely to want to publish in both microdata and RDFa, rather than force potential consumers to adopt a particular processing model, and will therefore need to mix the syntaxes within their pages.</p>

<p>(Of course developers might just avoid embedded data altogether while they wait to see what happens, but let&#8217;s assume that they want to press ahead regardless of the lack of consensus from the standardistas.)</p>

<p>I&#8217;ve therefore embarked on a task of trying:</p>

<ul>
<li>to identify the differences in approach and functionality of the two languages, which should help developers choose between them</li>
<li>to identify any guidelines for developers of vocabularies for use with both languages</li>
<li>to identify a subset of functionality that is common between the two languages, which developers might want to stick to to make switching and mixing easier</li>
<li>to identify mapping rules that might be applied to automatically or manually map from one language to another if the simple subset is used</li>
</ul>

<p>I&#8217;ve done this by looking at converting microdata examples to RDFa and vice versa, and the lessons to be drawn from that exercise. I&#8217;ve broken down the result into three posts:</p>

<ul>
<li><a href="http://www.jenitennison.com/blog/node/163">converting microdata to RDFa</a></li>
<li><a href="http://www.jenitennison.com/blog/node/164">converting RDFa to microdata</a></li>
<li><a href="http://www.jenitennison.com/blog/node/165">lessons learned from this exercise</a></li>
</ul>

<p>This is the last of these posts. It is probably the only one you will want to read :)</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>One of the options that the TAG put forward when it <a href="http://lists.w3.org/Archives/Public/public-html/2011Jun/0366.html">asked the W3C to put together task force on embedded data in HTML</a> was the co-existence of RDFa and microdata. If <a href="http://lists.w3.org/Archives/Public/www-tag/2011Aug/0050.html">that&#8217;s what we&#8217;re headed for</a>, what might make things easier for consumers and publishers who have to live in that world?</p>

<p>In a situation where there are two competing standards, I think that developers &#8212; both on the publication and consumption sides &#8212; are going to want to hedge their bets. They will want to avoid being tied to one syntax in case it turns out that that syntax isn&#8217;t supported by the majority of publishers/consumers in the long term and they have to switch.</p>

<p>Publishers like us at <a href="http://www.legislation.gov.uk/">legislation.gov.uk</a> who are aiming to share their data to whoever is interested in it (rather than having a particular consumer in mind) are also likely to want to publish in both microdata and RDFa, rather than force potential consumers to adopt a particular processing model, and will therefore need to mix the syntaxes within their pages.</p>

<p>(Of course developers might just avoid embedded data altogether while they wait to see what happens, but let&#8217;s assume that they want to press ahead regardless of the lack of consensus from the standardistas.)</p>

<p>I&#8217;ve therefore embarked on a task of trying:</p>

<ul>
<li>to identify the differences in approach and functionality of the two languages, which should help developers choose between them</li>
<li>to identify any guidelines for developers of vocabularies for use with both languages</li>
<li>to identify a subset of functionality that is common between the two languages, which developers might want to stick to to make switching and mixing easier</li>
<li>to identify mapping rules that might be applied to automatically or manually map from one language to another if the simple subset is used</li>
</ul>

<p>I&#8217;ve done this by looking at converting microdata examples to RDFa and vice versa, and the lessons to be drawn from that exercise. I&#8217;ve broken down the result into three posts:</p>

<ul>
<li><a href="http://www.jenitennison.com/blog/node/163">converting microdata to RDFa</a></li>
<li><a href="http://www.jenitennison.com/blog/node/164">converting RDFa to microdata</a></li>
<li><a href="http://www.jenitennison.com/blog/node/165">lessons learned from this exercise</a></li>
</ul>

<p>This is the last of these posts. It is probably the only one you will want to read :)</p>

<!--break-->

<p>Please treat this as a draft on which I&#8217;d welcome comments. I have based what&#8217;s written here on the latest specifications of both microdata (in its <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html">WHAT WG</a> and <a href="http://dev.w3.org/html5/md/Overview.html">W3C</a> variants) and <a href="http://www.w3.org/2010/02/rdfa/drafts/2011/ED-rdfa-core-20110814/">RDFa Core</a> and <a href="http://dev.w3.org/html5/rdfa/">HTML+RDFa</a> but I haven&#8217;t consulted with anyone involved in these efforts and may well have got things wrong. Plus the specs are changing all the time. I have only here considered the syntax of the two languages, not the features such as DOM APIs or drag-and-drop support, where there are also clear differences.</p>

<p>Please add comments if there are things that I&#8217;ve missed or got wrong, or just to have your say.</p>

<p><a href="http://foolip.org/microdatajs/live/">Philip Jägenstedt&#8217;s Live Microdata service</a> and <a href="http://rdf.greggkellogg.net/distiller">Gregg Kellogg&#8217;s Distiller service</a> have both proved invaluable for testing &#8212; thank you to both for making these services available. I heartily recommend them.</p>

<h2>Mapping Rules</h2>

<p>The first problem is how to judge equivalence when microdata and RDFa have different data models. Microdata essentially uses a JSON data model: there are objects (items) with properties that have values that are strings, other objects, or arrays of strings or objects or both. RDFa naturally uses a RDF data model: there are resources with properties that have values that are literals (of some datatype or with a language) or other resources.</p>

<p>Underlying both is the same basic entity-attribute-value pattern, but there are various mismatches between the models that make some mappings more complicated than others, or in other cases mean that information is necessarily lost on conversion.</p>

<p>In performing the analysis, I&#8217;ve tried to map microdata into sensible RDF and then match that RDF output using RDFa, and to map RDFa into sensible microdata+JSON and then match that microdata+JSON using microdata. The microdata-to-RDF mapping rules that I&#8217;ve followed are basically those outlined in my post on <a href="http://www.jenitennison.com/blog/node/162">Microdata + RDF</a>. To create microdata JSON from RDFa, I&#8217;ve used the rule that the URI of the first type of a resource is processed to provide a namespace that is stripped from the URIs of the properties (to create simple names where possible). In addition, when a resource has no properties, it will be represented as a string (URI) value rather than as a nested item.</p>

<p>These rules need to be formalised, obviously, but the basics above work well enough for the examples from the specs.</p>

<h2>Mismatched Features</h2>

<p>The following features are problematic when mapping from microdata to RDFa or vice versa. I&#8217;ve described them roughly in an order from things where it might be relatively easy to address the problem by changing one or other specification, to places where the necessary changes would be difficult to make in the specs, which means that publishers and consumers need to be aware of the issue so that they can make an educated choice about how they proceed.</p>

<h3>Local Property Names</h3>

<p>Many of the microdata examples involve items with no type and local property names. I&#8217;ve assumed in the analysis below that this generates properties whose URI is based on the document in which they are found, but this is not a helpful solution for data sharing: if a whole site uses short property names across its pages, those properties really need to be recognised as being the same across the site for any kind of useful processing to occur.</p>

<p>What microdata actually creates here is a global namespace, shared by everyone, specifically for embedded data. There are three things that could be done at different levels here:</p>

<ol>
<li><p>In a mapping from microdata to RDF, any short property names on items that don&#8217;t have a type could be assigned to a global namespace (eg <code>http://w3.org/ns/global/</code>). Of course there will be clashes in semantics within this namespace, but that is true in microdata generally and not having to create a new namespace makes the initial experimentation easier for those starting with embedded data. The W3C (or whoever operates the namespace) could operate a wiki at that location that would operate as an informal registry for the property names.</p></li>
<li><p>HTML+RDFa could change to use this global namespace as the default vocabulary URI (rather than not having one). This would make it a little easier for people to convert microdata to RDFa: if they don&#8217;t use types for their items, there would then be no need for a <code>vocab</code> attribute to be added to the HTML. It also makes it possible to use RDFa in a basic, lightweight way, which might help people get started with it.</p></li>
<li><p>Publishers can be advised to use <code>itemtype</code> within their microdata, reusing existing classes or creating their own, if they want to ensure that the embedded data within their pages isn&#8217;t misinterpreted by global consumers.</p></li>
</ol>

<h3>Interpretation of <code>&lt;time&gt;</code> Element&#8217;s <code>datetime</code> Attribute</h3>

<p>Interpreting the <code>datetime</code> attribute of the <code>&lt;time&gt;</code> element to supply a value, rather than repeating that value in a <code>content</code> attribute, is <a href="http://www.w3.org/2010/02/rdfa/track/issues/97">ISSUE-97</a> on RDFa, and hopefully RDFa will be changed to use that value (or the content of the element if there is no <code>datetime</code> attribute), add a seconds component if necessary, and work out an appropriate date/time datatype for it based on its syntax.</p>

<h3>Content Overrides</h3>

<p>In RDFa, publishers can provide a machine-readable version of the content of an element (or even an entirely different value) using the <code>content</code> attribute. This can only be done for date/times in microdata. The ability to <a href="http://www.w3.org/Bugs/Public/show_bug.cgi?id=13240">annotate non-date/time content with machine-readable values</a> is a current issue on HTML5. Resolving this in favour of providing such annotation would make using RDFa and microdata in concert, or converting between them, easier, particularly if HTML5 uses the attribute <code>content</code> or RDFa adopts whatever attribute is introduced to HTML5.</p>

<h3><code>&lt;link&gt;</code> and <code>&lt;meta&gt;</code> Elements in Flow Content</h3>

<p>The ability to <a href="http://dev.w3.org/html5/md/Overview.html#content-models">use <code>&lt;link&gt;</code> and <code>&lt;meta&gt;</code> elements in flow content</a> is only supported in microdata: it&#8217;s support that&#8217;s added by the microdata specification (in the Editor&#8217;s Draft since May 31st; the text allowing this didn&#8217;t make it into the Last Call version of the spec), in which it&#8217;s limited to <code>&lt;link&gt;</code> and <code>&lt;meta&gt;</code> elements with an <code>itemprop</code> attribute. </p>

<p>It would be possible for the RDFa specification to similarly make the statement that <code>&lt;link&gt;</code> and <code>&lt;meta&gt;</code> elements are allowed in flow content as long as they have particular attributes. This would ease the transition between the two formats, and works a lot better than empty <code>&lt;span&gt;</code> elements which crop up fairly commonly in RDFa content.</p>

<p>(One oddity here is that because date/time values have to be on a <code>&lt;time&gt;</code> element in microdata, publishers cannot replace empty <code>&lt;time&gt;</code> elements with <code>&lt;meta&gt;</code> elements as they might an empty <code>&lt;span&gt;</code>.)</p>

<h3>Identifiers without Types</h3>

<p>Many of the RDFa examples are of resources that have a URI identifier but for which no type is supplied. Microdata, on the other hand, states that <code>itemid</code> is only allowed on elements that also have an <code>itemtype</code> (and an <code>itemscope</code>). The reason given is because the <code>itemid</code> needs to be interpreted based on the <code>itemtype</code>. This would be understandable if it held a string, but given that the <code>itemid</code> provides a URI it seems a bit strange. Perhaps it&#8217;s an attempt to avoid the whole <a href="http://www.jenitennison.com/blog/node/159">httpRange-14 / ambiguity in URIs issue</a>.</p>

<p>If this restriction remains, the advice to RDFa users who might want to convert to microdata at a future date would be to always provide a type for your (non-blank-node) resources. It may be useful to define a <code>http://w3.org/ns/global/Thing</code> within the vocabulary that I propose above, given that the URI for <code>rdfs:Resource</code> is long and hard to recall.</p>

<h3>Built-in Prefixes</h3>

<p>The built-in <a href="http://www.w3.org/profile/rdfa-1.1">profile for RDFa</a> defines a number of prefixes for vocabularies that are either coined by the W3C or coined elsewhere but in common use on the web. This, coupled with <code>vocab</code> and the ability to directly use URIs in the relevant attributes, means that declaring prefixes within the document is increasingly unnecessary in RDFa.</p>

<p>In contrast, using existing vocabularies, even popular ones, within microdata is relatively difficult, particularly when vocabularies are mixed on the same item.</p>

<p>Most useful for publishers would be if both RDFa and microdata recognised the same set of prefixes. This would reduce the size of microdata created from existing RDFa content as well as making it easier to move between the languages. At the very least, it would be good to have <code>rdf:</code>, <code>rdfs:</code>, <code>xsd:</code> and <code>xhv:</code> built into both.</p>

<p>The list of popular vocabularies is likely to change over time; for example a prefix for the schema.org vocabulary might be useful at some point in the near future. The problem is that publishers and consumers need to be synchronised in their use of prefixes: it&#8217;s no good for a publisher to use the prefix <code>sch:</code> if there might be processors for the page that don&#8217;t recognise it. Equally, consumers shouldn&#8217;t be reliant on a network connection to retrieve the latest set of prefix mappings in order to parse the page. It&#8217;s not clear to me how best to manage this evolution, but even a fixed set of prefixes at the point the specs reach Recommendation is more usable than spelling out URIs all the time.</p>

<h3>Literals Including Markup</h3>

<p>RDFa supports literals that include markup (the <code>innerHTML</code> of an element) as well as those that don&#8217;t (the <code>textContent</code> of an element), whereas microdata only supports creating values from particular attributes or the <code>textContent</code> of the element. This makes it hard to create embedded microdata that includes values which contain things like mathematical or chemical formulae, ruby text, or multiple paragraphs.</p>

<p>A solution would be for microdata to introduce an <code>itemhtml</code> (or something) attribute that, when present, indicates that the value of the property should include markup. There is a current issue on microdata to <a href="http://www.w3.org/Bugs/Public/show_bug.cgi?id=13468">support HTML values</a>.</p>

<h3>Itemref</h3>

<p>RDFa can support a subset of <code>itemref</code>&#8217;s functionality, namely to have properties defined elsewhere in a document be associated with a given resource. What it doesn&#8217;t support is the sharing of properties defined in one place by two or more resources.</p>

<p>RDFa could add such support by adding an attribute that mirrors <code>itemref</code> (eg <code>ref</code>, I guess), with the referenced element being processed using the <a href="http://www.w3.org/TR/rdfa-core/#evaluation-context">evaluation context</a> inherited by the referencing element (which means that attributes such as <code>vocab</code> would sometimes have a scope that wasn&#8217;t based on the document tree). This would make it easier to tackle the use case for <code>itemref</code> using RDFa as well as making it easier to move between or mix RDFa and microdata.</p>

<h3>Lists</h3>

<p>It is easy for microdata to represent a property with a list of values, and really really hard to do the same in RDFa. This is in part because RDF views lists resources rather than a distinct data type, and in part because RDFa hasn&#8217;t added any syntax sugar to make creating <code>rdf:List</code> resources easy. Adding some syntax sugar for lists would make life a lot easier for anyone using RDFa, but especially if they are adapting existing microdata content to RDFa.</p>

<h3>Datatypes</h3>

<p>Microdata assumes that consumers will convert values to appropriate datatypes based on the property (which they understand) as a separate process after microdata processing, whereas RDFa supports the use of a <code>datatype</code> attribute to explicitly indicate the datatype of each value. This mismatch means that information is lost when RDFa is converted to microdata, and has to be added when microdata is converted to RDFa.</p>

<p>Bringing the languages completely into sync would mean either microdata adding a facility to support (at least some) datatypes, or deprecating the <code>datatype</code> attribute in RDFa. Alternatively, this may simply be an area where the differences in behaviour between the two specifications doesn&#8217;t matter because the data models that they use are distinct anyway.</p>

<h3>Languages</h3>

<p>Languages are similar to datatypes, in that RDF (and hence RDFa) supports annotating strings with the language that they are in whereas microdata doesn&#8217;t within its core data model or its JSON serialisation. However, the elements that represent properties within the HTML, used within the DOM API access to microdata, will have a language.</p>

<p>It may be that in practice consumers need to base their microdata processing on the DOM API rather than the core microdata data model or JSON extracted through a standalone process, and thus pick up the language from the property elements, I don&#8217;t know. In any case, the microdata JSON serialisation, used for drag-and-drop, is lossy and could be extended to include the language of each value when available, at fairly substantial complexity cost.</p>

<p>For publishers, it doesn&#8217;t much matter either way; if they are dealing with multi-lingual text they will want to include a <code>lang</code> attribute in the HTML anyway, regardless of the impact on embedded data.</p>

<h3>Multiple Types</h3>

<p>RDFa supports having multiple types named in the <code>typeof</code> attribute whereas microdata only supports one type per item. In any mapping from RDFa to microdata, publishers have to choose which type is the primary type for the item and move the others to be expressed via <code>rdf:type</code> properties. Consumers who want to support publishers who might not choose their type as the primary type have to detect items that have the type they are interested in within the <code>rdf:type</code> property as well as those which have the type as the main type. Given that the <code>rdf:type</code> URI is long and (naturally) associated with RDF, it might be better to define a property such as <code>http://w3.org/ns/global/type</code> for this use.</p>

<p>Microdata could be extended to allow multiple values in the <code>itemtype</code> attribute, with the first being used to interpret any properties that aren&#8217;t full URIs. This would make it easier for both consumers to detect when a type they were interested in was used and for publishers to use RDFa and microdata in tandem or move between them.</p>

<h3>The <code>src</code> Attribute</h3>

<p>RDFa and microdata interpret the <code>src</code> attribute in opposite ways. In RDFa, it provides the identifier for a new resource (equivalent to <code>itemid</code> in microdata); in microdata, it provides a URL value of a property on elements that support it (equivalent to <code>resource</code> or <code>href</code> in RDFa).</p>

<p>RDFa interprets <code>src</code> in this way to make it easier to make assertions about an image, but it&#8217;s of limited effect as even in RDFa its only possible to make three such assertions (through the <code>typeof</code>, <code>rel</code> and <code>property</code> attributes). So, for example, you can specify the type of the image, link to its license and give the name of its creator, with:</p>

<pre><code>&lt;img src="photo1.jpg" typeof="foaf:Image"
  rel="license" resource="http://creativecommons.org/licenses/by/2.0/"
  property="dc:creator" content="Mark Birbeck"&gt;
</code></pre>

<p>but this won&#8217;t help you if you <em>also</em> want to give the title for the image and when it was created (say). At that point, the microdata and RDFa start to look similar:</p>

<pre><code>&lt;div itemscope itemid="photo1.jpg" itemtype="http://xmlns.com/foaf/0.1/Image"&gt;
  &lt;link itemprop="license" href="http://creativecommons.org/licenses/by/2.0/"&gt;
  &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Mark Birbeck"&gt;
  &lt;meta itemprop="http://purl.org/dc/terms/title" content="Picture of Mark"&gt;
  &lt;time itemprop="http://purl.org/dc/terms/created" datetime="2009-03-17"&gt;&lt;/time&gt;
  &lt;img src="photo1.jpg"&gt;
&lt;/div&gt;
</code></pre>

<p>or:</p>

<pre><code>&lt;div about="photo1.jpg" typeof="http://xmlns.com/foaf/0.1/Image"&gt;
  &lt;span property="http://purl.org/dc/terms/title" content="Picture of Mark"&gt;&lt;/span&gt;
  &lt;time property="http://purl.org/dc/terms/created" content="2009-03-17" datatype="xsd:date" datetime="2009-03-17"&gt;&lt;/time&gt;
  &lt;img src="photo1.jpg" typeof="foaf:Image"
    rel="license" resource="http://creativecommons.org/licenses/by/2.0/"
    property="dc:creator" content="Mark Birbeck"&gt;
&lt;/div&gt;
</code></pre>

<p>and really, to make the markup consistent, you may as well not use the <code>src</code> of the image at all in the RDFa either:</p>

<pre><code>&lt;div about="photo1.jpg" typeof="http://xmlns.com/foaf/0.1/Image"&gt;
  &lt;span rel="license" href="http://creativecommons.org/licenses/by/2.0/"&gt;&lt;/span&gt;
  &lt;span property="http://purl.org/dc/terms/creator" content="Mark Birbeck"&gt;&lt;/span&gt;
  &lt;span property="http://purl.org/dc/terms/title" content="Picture of Mark"&gt;&lt;/span&gt;
  &lt;time property="http://purl.org/dc/terms/created" content="2009-03-17" datatype="xsd:date" datetime="2009-03-17"&gt;&lt;/time&gt;
  &lt;img src="photo1.jpg"&gt;
&lt;/div&gt;
</code></pre>

<p>So it&#8217;s not clear to me that interpreting the <code>src</code> attribute as the subject of triples offers such a huge advantage that it&#8217;s worth the inconvenience that it brings for the simple things, such as having to use:</p>

<pre><code>&lt;span rel="image"&gt;&lt;img src="google-logo.png" alt="Google"&gt;&lt;/span&gt;
</code></pre>

<p>rather than:</p>

<pre><code>&lt;img property="image" src="google-logo.png" alt="Google"&gt;
</code></pre>

<h3>Link relations</h3>

<p>This isn&#8217;t so much a clash between RDFa and microdata as between the interpretation that RDFa has for the <code>rel</code> attribute and that specified in HTML.</p>

<p>The built-in <code>rel</code> values in HTML are a bit of a mix. Some of them, like <code>alternate</code>, <code>prev</code> and <code>next</code> encode relationships between the document in which the link appears and another document. Others, such as <code>bookmark</code> and <code>help</code>, create relationships between the context in which the link is found and the referenced document. Still others, like <code>nofollow</code>, <code>noreferrer</code> and <code>prefetch</code>, are really instructions to the client about how to manage the act of traversing the link.</p>

<p>It doesn&#8217;t seem semantically correct to automatically create relationships based on the built-in HTML <code>rel</code> values, unless you are deliberately trying to extract <a href="http://lin-clark.com/blog/two-meanings-semantics-html5"><em>document</em> semantics</a> from the page. This is a problem for RDFa, which reuses the <code>rel</code> attribute to provide property values for the embedded <em>data</em>.</p>

<p>One thing that could be done would be for RDFa to consistently use the <code>property</code> attribute everywhere rather than the <code>rel</code> attribute. This would not only ease the overloading but also reduce the confusion for users, who currently have to work out which attribute to use based on whether the value is a resource or a literal.</p>

<h2>Possible Subset of RDFa</h2>

<p>When mapping from microdata to RDFa, the only attributes that are really needed are:</p>

<ul>
<li><code>vocab</code> to define a vocabulary for the types and properties within its scope (not technically necessary, but keeps the markup simple compared to spelling out URIs for everything)</li>
<li><code>typeof</code> to define the type of a resource or indicate a new blank node</li>
<li><code>about</code> to provide a URI for a resource or a local identifier for a blank node</li>
<li><code>property</code> and <code>rel</code> to define property names (though see above for discussion about dropping <code>rel</code>)</li>
<li><code>href</code>, <code>src</code> and <code>content</code> to provide values (and <code>datetime</code> assuming that is supported)</li>
</ul>

<p>In the mappings in the analysis below, I did also use the <code>resource</code> attribute, but only to create a reference to a blank node that was described elsewhere, when replicating the functionality of <code>itemref</code>. If RDFa were to enable <code>&lt;link&gt;</code> and <code>&lt;meta&gt;</code> in content in the same way as microdata, <code>resource</code> functionality could be replicated using <code>&lt;link&gt;</code>; as it is, you can get away with using an empty <code>&lt;a&gt;</code> element.</p>

<p>Similarly, I only used <code>datatype</code> when providing a datatype for date/time values, something that could be done automatically by RDFa. But this isn&#8217;t surprising given that microdata doesn&#8217;t support datatypes at all and the examples I was using for the mapping were from the microdata specification.</p>

<p>There was no need for:</p>

<ul>
<li><code>prefix</code> which defines prefixes to simplify references to properties and classes; this is hardly surprising as few of the microdata examples involved mixing namespaces, but it&#8217;s notable that the built-in prefixes of <code>rdf:</code> and <code>xsd:</code> were useful</li>
<li><code>profile</code> which is a pointer to an external document that defines a set of terms; this is being dropped from RDFa in any case</li>
</ul>

<p>I also kept to a simplified version of the syntax in which each property element only provided one value. This subset is basically:</p>

<ul>
<li>resource elements can have <code>about</code> (equivalent to <code>itemid</code>) and <code>typeof</code> (equivalent to <code>itemtype</code>) attributes on them</li>
<li>property elements can have <code>property</code> or <code>rel</code> (equivalent to <code>itemprop</code>), and a value-providing attribute on them such as <code>href</code> or <code>content</code></li>
<li>no element is both a resource element and a property element; to provide a property whose value is a resource, nest the resource element within the property element (using &#8220;hanging rel&#8221; processing)</li>
<li>no property element should provide more than one value for a property; in particular, a &#8220;hanging rel&#8221; should only have a single resource element child</li>
</ul>

<p>This simplified profile of RDFa is fairly easy to remember and maps easily to and from microdata: most attributes can be simply renamed; the only attribute that needs to be moved as well as renamed is the &#8220;hanging rel&#8221;, which moves onto the resource element and is renamed to <code>itemprop</code>. Note that it also means avoiding using the <code>src</code> attribute to encode embedded data.</p>

<p>In addition to sticking to this subset of attributes, developers might be advised that using HTML link relations may lead to clashes with browser or search engine interpretation of the links in the page.</p>

<h2>Possible Subset of Microdata</h2>

<p>Microdata is pretty minimalistic already. The only feature that developers need to be warned about is <code>itemref</code>, which has no RDFa equivalent at the moment.</p>

<h2>Guidelines for Vocabulary Authors</h2>

<p>There are a several guidelines that come out of this comparison for people putting together vocabularies that aim to be usable in both RDFa and microdata:</p>

<ul>
<li>The classes in the vocabulary should be distinct, or subclasses created with any relevant combinations of superclasses, so that publishers don&#8217;t have to assign more than one type to an item/resource. This restriction helps with using the vocabulary with microdata, which assumes that every item has a single type.</li>
<li>Provide explicit classes for everything which you anticipate might be given an identifier, as microdata doesn&#8217;t (currently) enable items to have an identifier without also having a type.</li>
<li>Put classes and properties in the same namespace, but do not name classes and properties with the same local name; while this doesn&#8217;t matter in microdata because the properties are interpreted relative to the class, standard conversions to RDF will create a class and a property with the same URI. URIs are case-sensitive to a simple way of ensuring that there aren&#8217;t clashes is to follow the usual RDF convention of beginning class names with an upper-case letter and property names with a lower-case letter.</li>
<li>Avoid property names that contain dots, as these aren&#8217;t allowed in non-URI property names in microdata.</li>
<li>Ensure that properties either only expect one type of value or expect values whose type can be sniffed based on the syntax of the value. If publishers use microdata, they will not be able to indicate the type of a value through the markup.</li>
<li>Be aware that consumers of microdata using your vocabulary will have to use the DOM API to identify the language used in any strings, and that language information won&#8217;t be carried through the standard microdata JSON serialisation (used by drag-and-drop, for example). If you anticipate multi-lingual use of your vocabulary, you may way to define a <code>MultiLingual</code> class with <code>value</code> and <code>language</code> properties that people can use as nested items. (It may be useful for this class and properties to be defined in the proposed &#8216;global&#8217; W3C namespace so that it can be used anywhere.) If you know what languages will be used then provide separate properties for each language (eg for UK legislation I know the languages are English and Welsh so on a vocabulary for UK Legislation I could have <code>title-en</code> and <code>title-cy</code> properties).</li>
<li>To make markup cleaner, only reuse properties from other vocabularies on your classes if they have built-in prefixes (eg unless <code>rdfs:</code> is built-in to microdata as well as RDFa, don&#8217;t use <code>rdfs:label</code> to provide a label, but create your own <code>label</code> property). On the other hand, do reuse classes from other vocabularies if you don&#8217;t need to add any specialised properties to them. Note that avoiding reuse has the unfortunate side-effect of not enabling processors that understand these other vocabularies to process your data.</li>
<li>Avoid having properties whose values need to be retrieved in order, as these are hard to represent in RDFa. Instead, use properties with distinct names when position is important. (Yes, I know this sucks.)</li>
</ul>

<h2>Choosing Between Microdata and RDFa</h2>

<p>The choices developers make between microdata and RDFa will, I suspect, be largely dictated by what their consumers/toolsets/publishers will support. Nevertheless, there are some features that are better supported by one or other format and might therefore sway developers one way or another:</p>

<ul>
<li><strong>multi-lingual embedded data</strong> is better supported in RDF than microdata+JSON</li>
<li><strong>explicit datatypes for values</strong> can be provided by RDFa but not microdata</li>
<li><strong>resources with multiple types</strong> are a lot easier to describe in RDFa</li>
<li><strong>property values that include markup</strong> are a lot easier to write in RDFa</li>
<li><strong>mixed vocabulary use</strong> is a bit easier in RDFa than in microdata</li>
<li><strong>HTML5 link relations</strong> may be misinterpreted by RDFa processors</li>
<li><strong>properties with list values</strong> are much easier to support in microdata</li>
<li><strong>common content</strong> adopted by multiple entities is much easier in microdata</li>
</ul>

<h2>Final Words</h2>

<p>I have no doubt that developers would be better off if there were only one recommended way of embedding data in HTML (so long as it met their requirements of course). But realistically that is, and always has been, a long shot, given the entrenched positions of the microdata and RDFa communities.</p>

<p>Regardless, there are lessons that RDFa and microdata could learn from each other, and changes to both languages that would help developers use them on their own, switch between them and mix them in the same document. I expect and welcome debate about the viability and effectiveness of the changes and guidelines that I&#8217;ve suggested here.</p>

<p>Investigating those lessons, documenting those changes and generating those guidelines was something that I had hoped the microdata/RDFa task force would be able to do. The other question to ask, given the argument that there shouldn&#8217;t be a task force at all if it&#8217;s not going to be able to bring the languages together, is whether this kind of analysis is worthwhile, and worth publishing as something more official than a blog post?</p>
    ]]></content>
  </entry>
  <entry>
    <title>Mapping RDFa to Microdata</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/164" />
    <id>http://www.jenitennison.com/blog/node/164</id>
    <published>2011-08-20T17:38:38+01:00</published>
    <updated>2011-08-20T17:38:38+01:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="microdata" />
    <category term="rdfa" />
    <summary type="html"><![CDATA[<p>This post is part of a three-part series that analyses the differences in features and syntax between microdata and RDFa. The series attempts:</p>

<ul>
<li>to identify the differences in approach and functionality of the two languages, which should help developers choose between them</li>
<li>to identify any guidelines for developers of vocabularies for use with both languages</li>
<li>to identify a subset of functionality that is common between the two languages, which developers might want to stick to to make switching and mixing easier</li>
<li>to identify mapping rules that might be applied to automatically or manually map from one language to another if the simple subset is used</li>
</ul>

<p>I&#8217;ve done this by looking at converting microdata examples to RDFa and vice versa, and the lessons to be drawn from that exercise. The three posts are on:</p>

<ul>
<li><a href="http://www.jenitennison.com/blog/node/163">converting microdata to RDFa</a></li>
<li><a href="http://www.jenitennison.com/blog/node/164">converting RDFa to microdata</a></li>
<li><a href="http://www.jenitennison.com/blog/node/165">lessons learned from this exercise</a></li>
</ul>

<p>This post is the second of these, which looks at how RDFa might be mapped to microdata.  In this case, I will aim to express the RDF created from the RDFa as the equivalent microdata JSON, and aim to create that JSON with the microdata.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>This post is part of a three-part series that analyses the differences in features and syntax between microdata and RDFa. The series attempts:</p>

<ul>
<li>to identify the differences in approach and functionality of the two languages, which should help developers choose between them</li>
<li>to identify any guidelines for developers of vocabularies for use with both languages</li>
<li>to identify a subset of functionality that is common between the two languages, which developers might want to stick to to make switching and mixing easier</li>
<li>to identify mapping rules that might be applied to automatically or manually map from one language to another if the simple subset is used</li>
</ul>

<p>I&#8217;ve done this by looking at converting microdata examples to RDFa and vice versa, and the lessons to be drawn from that exercise. The three posts are on:</p>

<ul>
<li><a href="http://www.jenitennison.com/blog/node/163">converting microdata to RDFa</a></li>
<li><a href="http://www.jenitennison.com/blog/node/164">converting RDFa to microdata</a></li>
<li><a href="http://www.jenitennison.com/blog/node/165">lessons learned from this exercise</a></li>
</ul>

<p>This post is the second of these, which looks at how RDFa might be mapped to microdata.  In this case, I will aim to express the RDF created from the RDFa as the equivalent microdata JSON, and aim to create that JSON with the microdata.</p>

<!--break-->

<p>To create the microdata JSON, I&#8217;ve used the rule that the URI of the first type of a resource is processed to provide a namespace that is stripped from the URIs of the properties (to create simple names where possible). In addition, when a resource has no properties, it will be represented as a string (URI) value rather than as a nested item. Other than that I hope the mapping will be obvious; I&#8217;ll point out where it involves a loss of information. I&#8217;m assuming that the document is at <code>http://example.org/</code> throughout.</p>

<p>I have based what&#8217;s written here on the latest specifications of both microdata (in its <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html">WHAT WG</a> and <a href="http://dev.w3.org/html5/md/Overview.html">W3C</a> variants) and <a href="http://www.w3.org/2010/02/rdfa/drafts/2011/ED-rdfa-core-20110814/">RDFa Core</a> and <a href="http://dev.w3.org/html5/rdfa/">HTML+RDFa</a> but I haven&#8217;t consulted with anyone involved in these efforts and may well have got things wrong. <a href="http://foolip.org/microdatajs/live/">Philip Jägenstedt&#8217;s Live Microdata service</a> has proved invaluable for testing, so many thanks to him for providing that service.</p>

<p>The post is rather heavy going and you might want to just <a href="http://www.jenitennison.com/blog/node/165">skip to the summary</a> instead of reading the whole thing.</p>

<p>The post goes through the examples from the RDFa specification plus one additional example from the wild. I haven&#8217;t included examples that don&#8217;t illustrate anything new, so there are some that are skipped. Other examples would be welcome.</p>

<h2>Page Metadata</h2>

<blockquote>
  <p>When parsing begins, the current subject will be the IRI of the document being parsed, or a value as set by a Host Language-provided mechanism (e.g., the base element in (X)HTML). This means that by default any metadata found in the document will concern the document itself:</p>

<pre><code>&lt;html&gt;
  &lt;head&gt;
    &lt;title&gt;Jo's Friends and Family Blog&lt;/title&gt;
    &lt;link rel="foaf:primaryTopic" href="#bbq" /&gt;
    &lt;meta property="dc:creator" content="Jo" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    ...
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>
</blockquote>

<p>The equivalent microdata JSON for this document that we&#8217;d want to create is (<strong>note: invalid example</strong>):</p>

<pre><code>{ "items": [
  {
    "id": "http://example.org/" ,
    "properties": {
      "http://xmlns.com/foaf/0.1/primaryTopic": [ "http://example.org/#bbq" ],
      "http://purl.org/dc/terms/creator": [ "Jo" ]
    }
  }
]}
</code></pre>

<p>and we&#8217;d want to create it with (<strong>note: invalid example</strong>):</p>

<pre><code>&lt;html&gt;
  &lt;head itemscope itemid="."&gt;
    &lt;title&gt;Jo's Friends and Family Blog&lt;/title&gt;
    &lt;link itemprop="http://xmlns.com/foaf/0.1/primaryTopic" href="#bbq" /&gt;
    &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Jo" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    ...
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<p>However, this is is not valid according to the microdata specification. In microdata, <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#attr-itemid">only items that have types are allowed to have identifiers</a>. Rather than losing the identifier, we&#8217;ll add a type; I&#8217;m going to use <code>rdfs:Resource</code>. It&#8217;s not the nicest of URIs to type, but it&#8217;s got something close to the correct semantics. So we&#8217;ll aim for the microdata JSON:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://example.org/" ,
    "properties": {
      "http://xmlns.com/foaf/0.1/primaryTopic": [ "http://example.org/#bbq" ] ,
      "http://purl.org/dc/terms/creator": [ "Jo" ]
    }
  }
]}
</code></pre>

<p>which means we need:</p>

<pre><code>&lt;html&gt;
  &lt;head itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" itemid="."&gt;
    &lt;title&gt;Jo's Friends and Family Blog&lt;/title&gt;
    &lt;link itemprop="http://xmlns.com/foaf/0.1/primaryTopic" href="#bbq" /&gt;
    &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Jo" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    ...
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>The <code>itemscope</code> is necessary for the page to be recognised as containing any data at all.</li>
<li>The <code>itemid</code> can&#8217;t just be empty: the <code>.</code> is the shortest URI you can use to reference the page itself.</li>
<li>I put the <code>itemscope</code>, <code>itemtype</code> and <code>itemid</code> on the <code>&lt;head&gt;</code> element rather than the <code>&lt;html&gt;</code> element so that they wouldn&#8217;t be inherited into the <code>&lt;body&gt;</code>: it seems to make sense for any data within the <code>&lt;head&gt;</code> to be about the page itself.</li>
<li>The <code>foaf:</code> and <code>dc:</code> prefixes are built-in to RDFa, so it&#8217;s easy for people to use classes and properties in those common vocabularies without having to remember their full URI. In microdata, that URI and the one for the <code>rdfs:Resource</code> class have to be spelled out in full.</li>
</ul>

<h2>Base URI</h2>

<blockquote>
  <p>In (X)HTML the value of base may change the initial value of current subject:</p>

<pre><code>&lt;html&gt;
  &lt;head&gt;
    &lt;base href="http://www.example.org/jo/blog" /&gt;
    &lt;title&gt;Jo's Friends and Family Blog&lt;/title&gt;
    &lt;link rel="foaf:primaryTopic" href="#bbq" /&gt;
    &lt;meta property="dc:creator" content="Jo" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    ...
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>
</blockquote>

<p>This changes the id of the item generated:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://www.example.org/jo/blog" ,
    "properties": {
      "http://xmlns.com/foaf/0.1/primaryTopic": [ "http://www.example.org/jo/blog#bbq" ] ,
      "http://purl.org/dc/terms/creator": [ "Jo" ]
    }
  }
]}
</code></pre>

<p>In the microdata, the <code>itemid</code> can still be <code>.</code> as the base URI set by the <code>&lt;base&gt;</code> element is used to resolve it:</p>

<pre><code>&lt;html&gt;
  &lt;head itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" itemid="."&gt;
    &lt;base href="http://www.example.org/jo/blog" /&gt;
    &lt;title&gt;Jo's Friends and Family Blog&lt;/title&gt;
    &lt;link itemprop="http://xmlns.com/foaf/0.1/primaryTopic" href="#bbq" /&gt;
    &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Jo" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    ...
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<h2>Explicit Subjects / ItemIds</h2>

<blockquote>
  <p>To illustrate how this affects the statements, note in this markup how the properties inside the (X)HTML body element become part of a new calendar event object, rather than referring to the document as they do in the head of the document:</p>

<pre><code>&lt;html prefix="cal: http://www.w3.org/2002/12/cal/ical#"&gt;
  &lt;head&gt;
    &lt;title&gt;Jo's Friends and Family Blog&lt;/title&gt;
    &lt;link rel="foaf:primaryTopic" href="#bbq" /&gt;
    &lt;meta property="dc:creator" content="Jo" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;p about="#bbq" typeof="cal:Vevent"&gt;
      I'm holding
      &lt;span property="cal:summary"&gt;
        one last summer barbecue
      &lt;/span&gt;,
      on
      &lt;span property="cal:dtstart" content="2015-09-16T16:00:00-05:00" 
            datatype="xsd:dateTime"&gt;
        September 16th at 4pm
      &lt;/span&gt;.
    &lt;/p&gt;
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>
</blockquote>

<p>In microdata JSON, this would be:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://example.org/" ,
    "properties": {
      "http://xmlns.com/foaf/0.1/primaryTopic": [ "http://example.org/#bbq" ],
      "http://purl.org/dc/terms/creator": [ "Jo" ]
    }
  } ,
  {
    "type": "http://www.w3.org/2002/12/cal/ical#Vevent" ,
    "id": "http://example.org/#bbq" ,
    "properties": {
      "summary": [ "one last summer barbecue" ] ,
      "dtstart": [ "2015-09-16T16:00:00-05:00" ] ,
    }
  }
]}
</code></pre>

<p>Note that this mapping loses the fact that the value of the <code>dtstart</code> property is a date-and-time. Processors of this JSON are expected to know that the <code>dtstart</code> property takes a date/time value and would have to sniff the value to work out that it&#8217;s a date-and-time rather than a date.</p>

<p>In-browser microdata processors can identify the value as a date/time value because the property element itself is accessed through the <code>element.properties</code> IDL attribute; processors that work with this DOM API can tell that it&#8217;s a <code>&lt;time&gt;</code> element, get hold of the date/time itself and access the content of the element for the human-readable representation used on the page. However, this information isn&#8217;t part of the core <a href="http://www.w3.org/TR/microdata/#the-microdata-model">microdata data model</a>.</p>

<p>To create this JSON from microdata you need:</p>

<pre><code>&lt;html&gt;
  &lt;head itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" itemid="."&gt;
    &lt;title&gt;Jo's Friends and Family Blog&lt;/title&gt;
    &lt;link itemprop="http://xmlns.com/foaf/0.1/primaryTopic" href="#bbq" /&gt;
    &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Jo" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;p itemscope itemid="#bbq" itemtype="http://www.w3.org/2002/12/cal/ical#Vevent"&gt;
      I'm holding
      &lt;span itemprop="summary"&gt;
        one last summer barbecue
      &lt;/span&gt;,
      on
      &lt;time itemprop="dtstart" datetime="2015-09-16T16:00:00-05:00"&gt;
        September 16th at 4pm
      &lt;/time&gt;.
    &lt;/p&gt;
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>There are no prefix definitions in microdata, so the type has to be spelled out in full. However, with the mapping I&#8217;m assuming from RDFa to microdata JSON, the properties in that same namespace for items in that class don&#8217;t.</li>
<li>The <code>itemscope</code> has to be added despite the <code>&lt;p&gt;</code> element having both an <code>itemid</code> and an <code>itemtype</code>; if the <code>itemscope</code> is forgotten, the item isn&#8217;t recognised.</li>
<li>The original <code>&lt;span&gt;</code> element has to be changed to a <code>&lt;time&gt;</code> element because it isn&#8217;t conformant microdata for a date/time value to be supplied by any other element.</li>
</ul>

<h2>Items from the <code>src</code> Attribute</h2>

<blockquote>
  <p>If @about is not present, then @src is next in priority order, for setting the subject of a statement. A typical use would be to indicate the licensing type of an image:</p>

<pre><code>&lt;img src="photo1.jpg" rel="license" 
     resource="http://creativecommons.org/licenses/by/2.0/" /&gt;
</code></pre>
</blockquote>

<p>This should generate the microdata JSON:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://example.org/photo1.jpg" ,
    "properties": {
      "http://www.w3.org/1999/xhtml/vocab#license": [ "http://creativecommons.org/licenses/by/2.0/" ] ,
    }
  }
]}
</code></pre>

<p>The <code>src</code> attribute in microdata is only used for a value, so creating the microdata about the image means a wrapper <code>&lt;span&gt;</code> element and a separate <code>&lt;link&gt;</code> element:</p>

<pre><code> &lt;span itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" itemid="photo1.jpg"&gt;
   &lt;img src="photo1.jpg" /&gt;
   &lt;link itemprop="http://www.w3.org/1999/xhtml/vocab#license" href="http://creativecommons.org/licenses/by/2.0/" /&gt;
 &lt;/span&gt;
</code></pre>

<p>Note:</p>

<ul>
<li>The <code>license</code> property is part of the built-in set of link relationships in HTML, but there is no easy way to refer to that property from microdata; they have to be spelled out as full URLs.</li>
</ul>

<h2>Additional Properties for Images</h2>

<blockquote>
  <p>Since there is no difference between @src and @about, then the information expressed in the last example in the section on @about (the creator of an image), could be expressed as follows:</p>

<pre><code>&lt;img src="photo1.jpg"
  rel="license" resource="http://creativecommons.org/licenses/by/2.0/"
  property="dc:creator" content="Mark Birbeck"
/&gt;
</code></pre>
</blockquote>

<p>This is a simple additional property in the microdata JSON:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://example.org/photo1.jpg" ,
    "properties": {
      "http://www.w3.org/1999/xhtml/vocab#license": [ "http://creativecommons.org/licenses/by/2.0/" ] ,
      "http://purl.org/dc/terms/creator": [ "Mark Birbeck" ]
    }
  }
]}
</code></pre>

<p>which can be created through a <code>&lt;meta&gt;</code> element within the <code>&lt;span&gt;</code>:</p>

<pre><code>&lt;span itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" itemid="photo1.jpg"&gt;
  &lt;img src="photo1.jpg" /&gt;
  &lt;link itemprop="http://www.w3.org/1999/xhtml/vocab#license" href="http://creativecommons.org/licenses/by/2.0/" /&gt;
  &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Mark Birbeck" /&gt;
&lt;/span&gt;
</code></pre>

<h2>Nested Images</h2>

<blockquote>
  <p>Since normal chaining rules will apply, the image IRI can also be used to complete hanging triples:</p>

<pre><code>&lt;div about="http://www.blogger.com/profile/1109404" rel="foaf:img"&gt;
  &lt;img src="photo1.jpg"
    rel="license" resource="http://creativecommons.org/licenses/by/2.0/"
    property="dc:creator" content="Mark Birbeck"
  /&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>This should generate the microdata JSON:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://www.blogger.com/profile/1109404" ,
    "properties": {
      "http://xmlns.com/foaf/0.1/img": [{
        "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
        "id": "http://example.org/photo1.jpg" ,
        "properties": {
          "http://www.w3.org/1999/xhtml/vocab#license": [ "http://creativecommons.org/licenses/by/2.0/" ] ,
          "http://purl.org/dc/terms/creator": [ "Mark Birbeck" ]
        }
      }]
    }
  }
]}
</code></pre>

<p>The microdata to generate this is:</p>

<pre><code>&lt;div itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" 
     itemid="http://www.blogger.com/profile/1109404"&gt;
  &lt;span itemprop="http://xmlns.com/foaf/0.1/img" 
        itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" 
        itemid="photo1.jpg"&gt;
    &lt;img src="photo1.jpg" /&gt;
    &lt;link itemprop="http://www.w3.org/1999/xhtml/vocab#license" href="http://creativecommons.org/licenses/by/2.0/" /&gt;
    &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Mark Birbeck" /&gt;
  &lt;/span&gt;
&lt;/div&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>The big gotcha in this conversion is that in microdata, the <code>foaf:img</code> property has to be moved onto the item that is a value of that property; there&#8217;s no equivalent to the &#8220;hanging rel&#8221; processing in RDFa. A disadvantage of this is that anyone copying-and-pasting the <code>&lt;span&gt;</code> element to embed the same information about the image within their own page will have the <code>itemprop</code> attribute carried along with the image, into a context where the <code>foaf:img</code> property might not be relevant.</li>
</ul>

<h2>Types with Blank Nodes</h2>

<blockquote>
  <p>For example, an author may wish to create markup for a person using the FOAF vocabulary, but without having a clear identifier for the item:</p>

<pre><code>&lt;div typeof="foaf:Person"&gt;
  &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
  &lt;span property="foaf:givenName"&gt;Albert&lt;/span&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>Now we have an explicit type, we can create microdata JSON that uses short names:</p>

<pre><code>{ "items": [
  {
    "type": "http://xmlns.com/foaf/0.1/Person" ,
    "properties": {
      "name": [ "Albert Einstein" ] ,
      "givenName": [ "Albert" ] ,
    }
  }
]}
</code></pre>

<p>This can be generated with the microdata:</p>

<pre><code>&lt;div itemscope itemtype="http://xmlns.com/foaf/0.1/Person"&gt;
  &lt;span itemprop="name"&gt;Albert Einstein&lt;/span&gt;
  &lt;span itemprop="givenName"&gt;Albert&lt;/span&gt;
&lt;/div&gt;
</code></pre>

<p>which is nice and simple.</p>

<h2>Inherited Subject</h2>

<blockquote>
  <p>The most usual way that an inherited subject might get set would be when the parent statement has an object that is a resource. Returning to the earlier example, in which the long name for the German_Empire was added, the following markup was used:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Albert_Einstein"&gt;
  &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
  &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
  &lt;div rel="dbp:birthPlace" resource="http://dbpedia.org/resource/German_Empire" /&gt;
  &lt;span about="http://dbpedia.org/resource/German_Empire"
    property="dbp:conventionalLongName"&gt;the German Empire&lt;/span&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>The equivalent microdata JSON for this would be:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://dbpedia.org/resource/Albert_Einstein" ,
    "properties": {
      "http://xmlns.com/foaf/0.1/name": [ "Albert Einstein" ] ,
      "http://dbpedia.org/property/dateOfBirth": [ "1879-03-14" ] ,
      "http://dbpedia.org/property/birthPlace": [ "http://dbpedia.org/resource/German_Empire" ]
    }
  } , {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://dbpedia.org/resource/German_Empire" ,
    "properties": {
      "http://dbpedia.org/property/conventionalLongName": [ "the German Empire" ] ,
    }
  }
]}
</code></pre>

<p>Note that this microdata JSON could only be generated syntactically from the RDFa, not via RDF, because going via RDF would make it impossible to know whether to give the <code>dbp:birthPlace</code> property a string (which is a URI) value or a nested item. We&#8217;ll see the alternative version of the microdata RDF in the next example.</p>

<p>To create this microdata JSON, we need:</p>

<pre><code>&lt;div itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" 
     itemid="http://dbpedia.org/resource/Albert_Einstein"&gt;
  &lt;span itemprop="http://xmlns.com/foaf/0.1/name"&gt;Albert Einstein&lt;/span&gt;
  &lt;time itemprop="http://dbpedia.org/property/dateOfBirth"&gt;1879-03-14&lt;/time&gt;
  &lt;link itemprop="http://dbpedia.org/property/birthPlace" href="http://dbpedia.org/resource/German_Empire" /&gt;
  &lt;span itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource"  
        itemid="http://dbpedia.org/resource/German_Empire"&gt;
    &lt;span itemprop="http://dbpedia.org/property/conventionalLongName"&gt;the German Empire&lt;/span&gt;
  &lt;/span&gt;
&lt;/div&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>I&#8217;ve had to change two elements here: the <code>&lt;span&gt;</code> for the date of birth has become a <code>&lt;time&gt;</code> element as the value of the property is a date, and the <code>&lt;div&gt;</code> for the birth place has become a <code>&lt;link&gt;</code> element because the value of that property is a URL.</li>
<li>I&#8217;ve also had to add a nested <code>&lt;span&gt;</code> element as it&#8217;s not possible in microdata to have a single element describe both an item and a property for that item as it is in RDFa.</li>
</ul>

<blockquote>
  <p>In an earlier illustration the subject and object for the German Empire were connected by removing the @resource, relying on the @about to set the object:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Albert_Einstein"&gt;
  &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
  &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
  &lt;div rel="dbp:birthPlace"&gt;
    &lt;span about="http://dbpedia.org/resource/German_Empire"
      property="dbp:conventionalLongName"&gt;the German Empire&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>While this generates the same RDF as the previous example, the microdata JSON that it generates should probably be different: this time, the item for the German Empire is nested within the item for Albert Einstein:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://dbpedia.org/resource/Albert_Einstein" ,
    "properties": {
      "http://xmlns.com/foaf/0.1/name": [ "Albert Einstein" ] ,
      "http://dbpedia.org/property/dateOfBirth": [ "1879-03-14" ] ,
      "http://dbpedia.org/property/birthPlace": [{
        "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
        "id": "http://dbpedia.org/resource/German_Empire" ,
        "properties": {
          "http://dbpedia.org/property/conventionalLongName": [ "the German Empire" ]
        }
      }
    }
  }
]}
</code></pre>

<p>To create this, the microdata needs to look like:</p>

<pre><code>&lt;div itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" 
     itemid="http://dbpedia.org/resource/Albert_Einstein"&gt;
  &lt;span itemprop="http://xmlns.com/foaf/0.1/name"&gt;Albert Einstein&lt;/span&gt;
  &lt;time itemprop="http://dbpedia.org/property/dateOfBirth"&gt;1879-03-14&lt;/time&gt;
  &lt;div itemprop="http://dbpedia.org/property/birthPlace" 
       itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource"  
       itemid="http://dbpedia.org/resource/German_Empire"&gt;
    &lt;span itemprop="http://dbpedia.org/property/conventionalLongName"&gt;the German Empire&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>

<p>Note that while this looks quite similar to the RDFa version, in fact the <code>itemid</code> attribute that holds the URI for the German Empire is on a different element from the <code>about</code> attribute in the RDFa.</p>

<p>The third RDFa example around this same content is:</p>

<blockquote>
  <p>but it is also possible for authors to achieve the same effect by removing the @about and leaving the @resource:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Albert_Einstein"&gt;
  &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
  &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
  &lt;div rel="dbp:birthPlace" resource="http://dbpedia.org/resource/German_Empire"&gt;
    &lt;span property="dbp:conventionalLongName"&gt;the German Empire&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>This should lead to the same microdata JSON, so I won&#8217;t bother repeating the microdata. What&#8217;s interesting is that this pattern: the wrapper element containing the property (<code>rel</code>) and identifier for the item that is the value for that property (<code>resource</code>) is a lot closer to the microdata pattern of expressing nested items. The big distinction here is that while in microdata, the <code>itemtype</code> also resides on that element, if you tried adding a <code>typeof</code> attribute to the inner <code>&lt;div&gt;</code> in RDFa, you&#8217;d end up with a new blank node.</p>

<h2>Anonymous Nested Resources</h2>

<blockquote>
  <p>However, an author could just as easily say that Spinoza influenced something by the name of Albert Einstein, that was born on March 14th, 1879:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Baruch_Spinoza" rel="dbp-owl:influenced"&gt;
  &lt;div&gt;
    &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
    &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>This should generate the microdata JSON:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://dbpedia.org/resource/Baruch_Spinoza" ,
    "properties": {
      "http://dbpedia.org/ontology/influenced": [{
        "properties": {
          "http://xmlns.com/foaf/0.1/name": [ "Albert Einstein" ] ,
          "http://dbpedia.org/property/dateOfBirth": [ "1879-03-14" ]
        }
      }]
    }
  }
]}
</code></pre>

<p>which again means moving an attribute in microdata:</p>

<pre><code>&lt;div itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource"
     itemid="http://dbpedia.org/resource/Baruch_Spinoza"&gt;
  &lt;div itemprop="http://dbpedia.org/ontology/influenced"
       itemscope&gt;
    &lt;span itemprop="http://xmlns.com/foaf/0.1/name"&gt;Albert Einstein&lt;/span&gt;
    &lt;time itemprop="http://dbpedia.org/property/dateOfBirth"&gt;1879-03-14&lt;/time&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>

<p>It is generally harder to move to microdata from RDFa when the RDFa has an element that both provides a subject and provides a property.</p>

<p>The RDFa spec provides a couple of additional methods of marking up the same content to give exactly the same RDF (and microdata JSON):</p>

<blockquote>
  <p>Note that the div is superfluous, and an RDFa Processor will create the intermediate object even if the element is removed:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Baruch_Spinoza" rel="dbp-owl:influenced"&gt;
  &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
  &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
&lt;/div&gt;
</code></pre>
  
  <p>An alternative pattern is to keep the div and move the @rel onto it:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Baruch_Spinoza"&gt;
  &lt;div rel="dbp-owl:influenced"&gt;
    &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
    &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>
  
  <p>From the point of view of the markup, this latter layout is to be preferred, since it draws attention to the &#8216;hanging rel&#8217;. But from the point of view of an RDFa Processor, all of these permutations need to be supported.</p>
</blockquote>

<p>Interestingly, it&#8217;s this latter permutation that is the one that&#8217;s closest to the microdata method of expressing the data, though as we will see in the next section, the &#8220;hanging rel&#8221; is not exactly equivalent to the <code>itemprop</code> on the wrapper element.</p>

<h2>Hanging Rels</h2>

<blockquote>
  <p>Note that each occurrence of @about will complete any incomplete triples. For example, to mark up the fact that Albert Einstein had a residence both in the German Empire and Switzerland, an author need only specify one @rel value that is then used with multiple @about values:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Albert_Einstein" rel="dbp-owl:residence"&gt;
  &lt;span about="http://dbpedia.org/resource/German_Empire" /&gt;
  &lt;span about="http://dbpedia.org/resource/Switzerland" /&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>The data embedded here gives two values for the <code>dbp-owl:residence</code> property:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://dbpedia.org/resource/Albert_Einstein" ,
    "properties": {
      "http://dbpedia.org/ontology/residence": [
        "http://dbpedia.org/resource/German_Empire" ,
        "http://dbpedia.org/resource/Switzerland"
      ]
    }
  }
]}
</code></pre>

<p>In microdata, the <code>itemprop</code> attribute has to appear on both the nested elements to make it clear that they both provide values for that property:</p>

<pre><code>&lt;div itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" 
     itemid="http://dbpedia.org/resource/Albert_Einstein"&gt;
 &lt;link itemprop="http://dbpedia.org/ontology/residence" 
       href="http://dbpedia.org/resource/German_Empire" /&gt;
 &lt;link itemprop="http://dbpedia.org/ontology/residence"
       href="http://dbpedia.org/resource/Switzerland" /&gt;
&lt;/div&gt;
</code></pre>

<p>The next example illustrates this with nested items rather than strings:</p>

<blockquote>
  <p>To illustrate, to indicate that Spinoza influenced both Einstein and Schopenhauer, the following markup could be used:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Baruch_Spinoza"&gt;
  &lt;div rel="dbp-owl:influenced"&gt;
    &lt;div typeof="foaf:Person"&gt;
      &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
      &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
    &lt;/div&gt;
    &lt;div typeof="foaf:Person"&gt;
      &lt;span property="foaf:name"&gt;Arthur Schopenhauer&lt;/span&gt;
      &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1788-02-22&lt;/span&gt;
    &lt;/div&gt;          
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>This should generate:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://dbpedia.org/resource/Baruch_Spinoza" ,
    "properties": {
      "http://dbpedia.org/ontology/influenced": [{
        "type": "http://xmlns.com/foaf/0.1/Person" ,
        "properties": {
          "name": [ "Albert Einstein" ] ,
          "http://dbpedia.org/property/dateOfBirth": [ "1879-03-14" ]
        }
      }, {
        "type": "http://xmlns.com/foaf/0.1/Person" ,
        "properties": {
          "name": [ "Arthur Schopenhauer" ] ,
          "http://dbpedia.org/property/dateOfBirth": [ "1788-02-22" ]
        }
      }]
    }
  }
]}
</code></pre>

<p>In this case, the <code>itemprop</code> that is equivalent to the RDFa <code>rel</code> has to move down onto the elements representing the items:</p>

<pre><code>&lt;div itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource"
     itemid="http://dbpedia.org/resource/Baruch_Spinoza"&gt;
  &lt;div&gt;
    &lt;div itemprop="http://dbpedia.org/ontology/influenced"
         itemscope itemtype="http://xmlns.com/foaf/0.1/Person"&gt;
      &lt;span itemprop="name"&gt;Albert Einstein&lt;/span&gt;
      &lt;time itemprop="http://dbpedia.org/property/dateOfBirth"&gt;1879-03-14&lt;/time&gt;
    &lt;/div&gt;
    &lt;div itemprop="http://dbpedia.org/ontology/influenced"
         itemscope itemtype="http://xmlns.com/foaf/0.1/Person"&gt;
      &lt;span itemprop="name"&gt;Arthur Schopenhauer&lt;/span&gt;
      &lt;time itemprop="http://dbpedia.org/property/dateOfBirth"&gt;1788-02-22&lt;/time&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>

<p>The wrapper <code>&lt;div&gt;</code> around both items isn&#8217;t necessary; I&#8217;ve left it to stay as close to the markup of the original RDFa as possible.</p>

<h2>Implicit Resources</h2>

<blockquote>
  <p>Triples are also &#8216;completed&#8217; if any one of @property, @rel or @rev are present. However, unlike the situation when @about or @typeof are present, all predicates are attached to one bnode:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Baruch_Spinoza" rel="dbp-owl:influenced"&gt;
  &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
  &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
  &lt;div rel="dbp-owl:residence"&gt;
    &lt;span about="http://dbpedia.org/resource/German_Empire" /&gt;
    &lt;span about="http://dbpedia.org/resource/Switzerland" /&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>To be equivalent to the RDF generated from this markup, the microdata JSON would be:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://dbpedia.org/resource/Baruch_Spinoza" ,
    "properties": {
      "http://dbpedia.org/ontology/influenced": [{
        "properties": {
          "http://xmlns.com/foaf/0.1/name": [ "Albert Einstein" ] ,
          "http://dbpedia.org/property/dateOfBirth": [ "1879-03-14" ]
          "http://dbpedia.org/ontology/residence": [
            "http://dbpedia.org/resource/German_Empire" ,
            "http://dbpedia.org/resource/Switzerland"
          ]
        }
      }]
    }
  }
]}
</code></pre>

<p>Microdata is a lot more explicit about when items get created, and consequently requires a bit more markup:</p>

<pre><code>&lt;div itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource"
     itemid="http://dbpedia.org/resource/Baruch_Spinoza"&gt;
  &lt;div itemprop="http://dbpedia.org/ontology/influenced" itemscope&gt;
    &lt;span itemprop="name"&gt;Albert Einstein&lt;/span&gt;
    &lt;time itemprop="http://dbpedia.org/property/dateOfBirth"&gt;1879-03-14&lt;/time&gt;
    &lt;div&gt;
     &lt;link itemprop="http://dbpedia.org/ontology/residence" 
           href="http://dbpedia.org/resource/German_Empire" /&gt;
     &lt;link itemprop="http://dbpedia.org/ontology/residence"
           href="http://dbpedia.org/resource/Switzerland" /&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>

<h2>Overriding Text Content</h2>

<blockquote>
  <p>The value of @content is given precedence over any element content, so the following would give exactly the same triple as shown above:</p>

<pre><code>&lt;span about="http://internet-apps.blogspot.com/"
      property="dc:creator" content="Mark Birbeck"&gt;John Doe&lt;/span&gt;
</code></pre>
</blockquote>

<p>The equivalent microdata should generate the JSON:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://internet-apps.blogspot.com/" ,
    "properties": {
      "http://purl.org/dc/terms/creator": [ "Mark Birbeck" ]
    }
  }
]}
</code></pre>

<p>Only the <code>&lt;time&gt;</code> element and links override the content of an element in microdata. So a mirror of this example needs a separate element:</p>

<pre><code>  &lt;span itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource"
        itemid="http://internet-apps.blogspot.com/"&gt;
    &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Mark Birbeck" /&gt;
    John Doe
  &lt;/span&gt;
</code></pre>

<h2>Language Support</h2>

<blockquote>
  <p>In RDFa the Host Language may provide a mechanism for setting the language tag. In XHTML+RDFa [XHTML-RDFA], for example, the XML language attribute @xml:lang or the attribute @lang is used to add this information, whether the plain literal is designated by @content, or by the inline text of the element:</p>

<pre><code>&lt;meta about="http://example.org/node"
  property="ex:property" xml:lang="fr" content="chat" /&gt;
</code></pre>
</blockquote>

<p>Like the datatype of a value, the language of a value isn&#8217;t captured by the microdata data model or the JSON representation of that data model. So the fact that &#8216;chat&#8217; is French is lost:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://example.org/node" ,
    "properties": {
      "http://example.org/property": [ "chat" ]
    }
  }
]}
</code></pre>

<p>The equivalent microdata is thus:</p>

<pre><code>&lt;span itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource"
      itemid="http://example.org/node"
  &lt;meta itemprop="ex:property" xml:lang="fr" content="chat" /&gt;
&lt;/span&gt;
</code></pre>

<p>with the language only accessible if you are using the DOM to process the microdata.</p>

<h2>Literals that Include Markup</h2>

<blockquote>
  <p>RDFa therefore supports the use of normal markup to express XML literals, by using @datatype:</p>

<pre><code>&lt;h2 property="dc:title" datatype="rdf:XMLLiteral"&gt;
  E = mc&lt;sup&gt;2&lt;/sup&gt;: The Most Urgent Problem of Our Time
&lt;/h2&gt;
</code></pre>
</blockquote>

<p>The <code>datatype="rdf:XMLLiteral"</code> acts like a flag to indicate that the serialised content of the element (<code>innerHTML</code>) needs to be used as the value of the property, rather than the <code>textContent</code>, which includes markup, can be expressed in microdata JSON as follows:</p>

<pre><code>{ "http://purl.org/dc/terms/title": "E = mc&lt;sup&gt;2&lt;/sup&gt;: The Most Urgent Problem of Our Time" }
</code></pre>

<p>There&#8217;s no way to generate this in microdata except by repeating the escaped version of the content in a <code>content</code> attribute:</p>

<pre><code>&lt;h2&gt;
  E = mc&lt;sup&gt;2&lt;/sup&gt;: The Most Urgent Problem of Our Time
  &lt;meta itemprop="http://purl.org/dc/terms/title"
    content="E = mc&amp;lt;sup&gt;2&amp;lt;/sup&gt;: The Most Urgent Problem of Our Time" /&gt;
&lt;/h2&gt;
</code></pre>

<p>This is hardly ideal. It&#8217;s tedious enough with a short string like this one; for larger amounts of information such as long descriptions of an event, it would be very tedious.</p>

<h2>The <code>resource</code> Attribute</h2>

<blockquote>
  <p>RDFa provides the @resource attribute as a way to set the object of statements. This is particularly useful when referring to resources that are not themselves navigable links:</p>

<pre><code>&lt;html&gt;
  &lt;head&gt;
    &lt;title&gt;On Crime and Punishment&lt;/title&gt;
    &lt;base href="http://www.example.com/candp.xhtml" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;blockquote about="#q1" rel="dc:source" resource="urn:ISBN:0140449132" &gt;
      &lt;p id="q1"&gt;
        Rodion Romanovitch! My dear friend! If you go on in this way
        you will go mad, I am positive! Drink, pray, if only a few drops!
      &lt;/p&gt;
    &lt;/blockquote&gt;
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>
</blockquote>

<p>This should produce:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://www.example.com/candp.xhtml#q1" ,
    "properties": {
      "http://purl.org/dc/terms/source": [ "urn:ISBN:0140449132" ]
    }
  }
]}
</code></pre>

<p>which is expressed through:</p>

<pre><code>&lt;html&gt;
  &lt;head&gt;
    &lt;title&gt;On Crime and Punishment&lt;/title&gt;
    &lt;base href="http://www.example.com/candp.xhtml" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;blockquote itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" itemid="#q1"&gt;
      &lt;link itemprop="http://purl.org/dc/terms/source" href="urn:ISBN:0140449132" /&gt;
      &lt;p id="q1"&gt;
        Rodion Romanovitch! My dear friend! If you go on in this way
        you will go mad, I am positive! Drink, pray, if only a few drops!
      &lt;/p&gt;
    &lt;/blockquote&gt;
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<p>The property and value have to be moved onto a nested <code>&lt;link&gt;</code>, but this is a more extensible pattern than the RDFa method as it enables other properties to be expressed in the same way.</p>

<h2>Multiple Types</h2>

<p>This last example comes from the wild rather than being an example in the specification. At <a href="http://bitmunk.com/browse">http://bitmunk.com/browse</a> we find:</p>

<pre><code>&lt;span about="http://bitmunk.com/about#service" 
      typeof="vcard:VCard commerce:Business gr:BusinessEntity" 
      property="rdfs:label vcard:fn"&gt;Bitmunk&lt;/span&gt;
</code></pre>

<p>This shows the use of multiple types and of multiple properties with the same value, because the pages are attempting to use multiple vocabularies that cover the same domain (organisations) to different depths. In the equivalent microdata, we have to choose one of the types; I&#8217;m going to assume that it should just use the first one from the <code>typeof</code> attribute:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2006/vcard/ns#VCard" ,
    "id": "http://bitmunk.com/about#service" ,
    "properties": {
      "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
        "http://purl.org/commerce#Business" ,
        "http://purl.org/goodrelations/v1#BusinessEntity"
      ] ,
      "http://www.w3.org/2000/01/rdf-schema#label": [ "Bitmunk" ] ,
      "fn": [ "Bitmunk" ]
    }
  }
]}
</code></pre>

<p>The microdata equivalent is:</p>

<pre><code>&lt;span itemscope itemid="http://bitmunk.com/about#service" 
      itemtype="http://www.w3.org/2006/vcard/ns#VCard"&gt;
  &lt;link itemprop="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
       href="http://purl.org/commerce#Business" /&gt;
  &lt;link itemprop="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
       href="http://purl.org/goodrelations/v1#BusinessEntity" /&gt;
  &lt;span itemprop="http://www.w3.org/2000/01/rdf-schema#label fn"&gt;Bitmunk&lt;/span&gt;
&lt;/span&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>Technically, the RDFa doesn&#8217;t place any ordering on the three classes, but I&#8217;m picking the first for the purpose of the microdata conversion. The other classes are harder to get at in the JSON: they have to be referenced via the <code>rdf:type</code> microdata property rather than the <code>type</code> JSON property. Consumers that are on the lookout for items of the type <code>gr:BusinessEntity</code> wouldn&#8217;t spot these items.</li>
</ul>
    ]]></content>
  </entry>
  <entry>
    <title>Mapping Microdata to RDFa</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/163" />
    <id>http://www.jenitennison.com/blog/node/163</id>
    <published>2011-08-20T17:35:28+01:00</published>
    <updated>2011-08-22T15:50:57+01:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="microdata" />
    <category term="rdfa" />
    <summary type="html"><![CDATA[<p>This post is part of a three-part series that analyses the differences in features and syntax between microdata and RDFa. The series attempts:</p>

<ul>
<li>to identify the differences in approach and functionality of the two languages, which should help developers choose between them</li>
<li>to identify any guidelines for developers of vocabularies for use with both languages</li>
<li>to identify a subset of functionality that is common between the two languages, which developers might want to stick to to make switching and mixing easier</li>
<li>to identify mapping rules that might be applied to automatically or manually map from one language to another if the simple subset is used</li>
</ul>

<p>I&#8217;ve done this by looking at converting microdata examples to RDFa and vice versa, and the lessons to be drawn from that exercise. The three posts are on:</p>

<ul>
<li><a href="http://www.jenitennison.com/blog/node/163">converting microdata to RDFa</a></li>
<li><a href="http://www.jenitennison.com/blog/node/164">converting RDFa to microdata</a></li>
<li><a href="http://www.jenitennison.com/blog/node/165">lessons learned from this exercise</a></li>
</ul>

<p>This post is the first of these, which looks at how microdata might be mapped to RDFa, in terms of generating the same RDF according to the microdata-to-RDF mapping rules that I outlined in my post on <a href="http://www.jenitennison.com/blog/node/162">Microdata + RDF</a>.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>This post is part of a three-part series that analyses the differences in features and syntax between microdata and RDFa. The series attempts:</p>

<ul>
<li>to identify the differences in approach and functionality of the two languages, which should help developers choose between them</li>
<li>to identify any guidelines for developers of vocabularies for use with both languages</li>
<li>to identify a subset of functionality that is common between the two languages, which developers might want to stick to to make switching and mixing easier</li>
<li>to identify mapping rules that might be applied to automatically or manually map from one language to another if the simple subset is used</li>
</ul>

<p>I&#8217;ve done this by looking at converting microdata examples to RDFa and vice versa, and the lessons to be drawn from that exercise. The three posts are on:</p>

<ul>
<li><a href="http://www.jenitennison.com/blog/node/163">converting microdata to RDFa</a></li>
<li><a href="http://www.jenitennison.com/blog/node/164">converting RDFa to microdata</a></li>
<li><a href="http://www.jenitennison.com/blog/node/165">lessons learned from this exercise</a></li>
</ul>

<p>This post is the first of these, which looks at how microdata might be mapped to RDFa, in terms of generating the same RDF according to the microdata-to-RDF mapping rules that I outlined in my post on <a href="http://www.jenitennison.com/blog/node/162">Microdata + RDF</a>.</p>

<!--break-->

<p>I have based what&#8217;s written here on the latest specifications of both microdata (in its <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html">WHAT WG</a> and <a href="http://dev.w3.org/html5/md/Overview.html">W3C</a> variants) and <a href="http://www.w3.org/2010/02/rdfa/drafts/2011/ED-rdfa-core-20110814/">RDFa Core</a> and <a href="http://dev.w3.org/html5/rdfa/">HTML+RDFa</a> but I haven&#8217;t consulted with anyone involved in these efforts and may well have got things wrong. <a href="http://rdf.greggkellogg.net/distiller">Gregg Kellogg&#8217;s Distiller service</a> has proved invaluable for testing, so many thanks to him for providing that service.</p>

<p>The post is rather heavy going and you might want to just <a href="http://www.jenitennison.com/blog/node/165">skip to the summary</a> instead of reading the whole thing.</p>

<p>The post goes through the examples from the microdata specification (most of them are in both versions, the only exceptions being those that use the vCard vocabulary). I haven&#8217;t included examples that don&#8217;t illustrate anything new, so there are some that are skipped. Other examples would be welcome.</p>

<h2>Unidentified Items / Blank Node Subjects</h2>

<blockquote>
  <p>Here there are two items, each of which has the property &#8220;name&#8221;:</p>

<pre><code>&lt;div itemscope&gt;
 &lt;p&gt;My name is &lt;span itemprop="name"&gt;Elizabeth&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;

&lt;div itemscope&gt;
 &lt;p&gt;My name is &lt;span itemprop="name"&gt;Daniel&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>The first challenge is to map this into RDFa because the properties are tokens rather than URIs and there is no type for either of the items. What I&#8217;ll assume here is that the <code>name</code> properties are local to the document itself and thus the equivalent RDF is:</p>

<pre><code>[ &lt;#name&gt; "Elizabeth" ] .
[ &lt;#name&gt; "Daniel" ] .
</code></pre>

<p>This can be achieved in RDFa through either:</p>

<pre><code>&lt;div vocab="#" about="_:elizabeth"&gt;
  &lt;p&gt;My name is &lt;span property="name"&gt;Elizabeth&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;

&lt;div vocab="#" about="_:daniel"&gt;
  &lt;p&gt;My name is &lt;span property="name"&gt;Daniel&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;
</code></pre>

<p>or:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
  &lt;p&gt;My name is &lt;span property="name"&gt;Elizabeth&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;

&lt;div vocab="#" typeof&gt;
  &lt;p&gt;My name is &lt;span property="name"&gt;Daniel&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>The <code>vocab="#"</code> sets the vocabulary to the location of the current document (plus an empty fragment identifier); this URI is then concatenated to the property token (<code>name</code>) to create a URI that is unique to the document. In a document such as this it would make sense to put the <code>vocab="#"</code> attribute on the <code>&lt;html&gt;</code> element rather than on every single item.</li>
<li>With no type in sight, blank nodes can either be created by having an empty <code>typeof</code> attribute or through an <code>about</code> attributes whose value starts with <code>_:</code>. The latter has the advantage of providing an identifier for the blank node that can be used elsewhere in the document, but the former is shorter so will be used where possible in the remaining examples of this post.</li>
</ul>

<h2>Values from the <code>src</code> Attribute</h2>

<p>The next example introduces the use of the <code>src</code> attribute to set the value of the property.</p>

<blockquote>
  <p>In this example, the item has one property, &#8220;image&#8221;, whose value is a URL:</p>

<pre><code>&lt;div itemscope&gt;
 &lt;img itemprop="image" src="google-logo.png" alt="Google"&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>This should probably be mapped to the RDF:</p>

<pre><code>[ &lt;#image&gt; &lt;google-logo.png&gt; ] .
</code></pre>

<p>The difficulty with this is that in RDFa, the <code>src</code> attribute is used for the <em>subject</em> of a statement (equivalent to a microdata item) rather than the <em>object</em> (equivalent to a microdata value). So we have two choices for equivalent RDFa. One is to use a similar pattern to that used above, but introduce a wrapper element that provides the property:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;span rel="image"&gt;&lt;img src="google-logo.png" alt="Google"&gt;&lt;/span&gt;
&lt;/div&gt;
</code></pre>

<p>Another is to provide what would normally be an <em>object</em> through a <code>resource</code> attribute and then use a <code>rev</code> attribute (rather than the usual <code>rel</code>) attribute to reverse the relationship:</p>

<pre><code>&lt;div vocab="#"&gt;
 &lt;img resource="_:thing" rev="image" src="google-logo.png" alt="Google"&gt;
&lt;/div&gt;
</code></pre>

<p>This has three disadvantages over the first option:</p>

<ul>
<li>the <code>resource</code> attribute that creates the item is on the <code>&lt;img&gt;</code> element rather than on the wrapper <code>&lt;div&gt;</code> which makes it hard to create other properties for that item</li>
<li>we have to use a <code>rev</code> attribute, reversing the normal flow of relationships; I (at least) find this hard to figure out when there&#8217;s not a <code>rel</code> attribute as well</li>
<li><ins>we have to make up an id for the blank node we want to generate</ins></li>
</ul>

<p>I&#8217;ll note that it took me five or six failed attempts to generate the above options. If I hadn&#8217;t had the <a href="http://rdf.greggkellogg.net/distiller">RDF Distiller</a> to test with, I would have got it wrong. <del>Note that at least through the RDF Distiller, to be recognised, the <code>resource</code> attribute has to have an (empty) value &#8212; it is not enough for it to simply be present, unlike with the <code>typeof</code> attribute.</del> <ins>Note that the <code>resource</code> attribute has to explicitly point to a blank node to create a blank node rather than having the property be associated with the document in which this appears.</ins></p>

<h2>Values from the <code>datetime</code> Attribute</h2>

<p>The next example illustrates the use of the <code>&lt;time&gt;</code> element to provide a date/time value for a property.</p>

<blockquote>
  <p>In this example, the item has one property, &#8220;birthday&#8221;, whose value is a date:</p>

<pre><code>&lt;div itemscope&gt;
 I was born on &lt;time itemprop="birthday" datetime="2009-05-10"&gt;May 10th 2009&lt;/time&gt;.
&lt;/div&gt;
</code></pre>
</blockquote>

<p>I&#8217;m assuming this should map to the RDF:</p>

<pre><code>[ &lt;#birthday&gt; "2009-05-10"^^&lt;http://www.w3.org/2001/XMLSchema#date&gt; ]
</code></pre>

<p>There is an open issue (<a href="http://www.w3.org/2010/02/rdfa/track/issues/97">ISSUE-97</a>) about this on RDFa, which currently requires the use of the <code>content</code> attribute to provide the value as follows:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 I was born on &lt;time property="birthday" content="2009-05-10" datatype="xsd:date" datetime="2009-05-10"&gt;May 10th 2009&lt;/time&gt;.
&lt;/div&gt;
</code></pre>

<p>Note that the <code>xsd:</code> prefix is built-in within RDFa so there&#8217;s on need for any declaration for it, which makes it fairly easy to specify the standard date/time datatypes.</p>

<p>If ISSUE-97 were resolved nicely it would be possible to instead do:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 I was born on &lt;time property="birthday" datetime="2009-05-10"&gt;May 10th 2009&lt;/time&gt;.
&lt;/div&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>To make this work, RDFa processors would have to look at the syntax of the <code>datetime</code> attribute to work out what datatype the value should be matched to.</li>
<li>The syntax permitted in the <code>datetime</code> attribute isn&#8217;t exactly the same as that permitted by the XML Schema <code>time</code> and <code>dateTime</code> types usually used in RDF (and XML), in that the seconds component is optional within HTML. The resolution to ISSUE-97 will need to take this into account. Otherwise, anyone mapping from microdata to RDFa manually will need to ensure that the <code>content</code> attribute includes the seconds component.</li>
</ul>

<h2>Nested Items / Object Properties</h2>

<blockquote>
  <p>In this example, the outer item represents a person, and the inner one represents a band:</p>

<pre><code>&lt;div itemscope&gt;
 &lt;p&gt;Name: &lt;span itemprop="name"&gt;Amanda&lt;/span&gt;&lt;/p&gt;
 &lt;p&gt;Band: &lt;span itemprop="band" itemscope&gt; &lt;span itemprop="name"&gt;Jazz Band&lt;/span&gt; (&lt;span itemprop="size"&gt;12&lt;/span&gt; players)&lt;/span&gt;&lt;/p&gt;
&lt;/div&gt;
</code></pre>
  
  <p>The outer item here has two properties, &#8220;name&#8221; and &#8220;band&#8221;. The &#8220;name&#8221; is &#8220;Amanda&#8221;, and the &#8220;band&#8221; is an item in its own right, with two properties, &#8220;name&#8221; and &#8220;size&#8221;. The &#8220;name&#8221; of the band is &#8220;Jazz Band&#8221;, and the &#8220;size&#8221; is &#8220;12&#8221;.</p>
</blockquote>

<p>The equivalent RDF for this example would be:</p>

<pre><code>[
  &lt;#name&gt; "Amanda" ;
  &lt;#band&gt; [
    &lt;#name&gt; "Jazz Band" ;
    &lt;#size&gt; "12"
  ]
]
</code></pre>

<p>Note that the <code>size</code> property is just a plain literal value; unlike with date/times, there&#8217;s no way to tell from the microdata that the value is a number.</p>

<p>In RDFa this could be done with:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;p&gt;Name: &lt;span property="name"&gt;Amanda&lt;/span&gt;&lt;/p&gt;
 &lt;p&gt;Band: &lt;span rel="band"&gt; &lt;span property="name"&gt;Jazz Band&lt;/span&gt; (&lt;span property="size"&gt;12&lt;/span&gt; players)&lt;/span&gt;&lt;/p&gt;
&lt;/div&gt;
</code></pre>

<p>This follows the microdata fairly closely but note that the nested resource doesn&#8217;t need an empty <code>typeof</code> attribute: it&#8217;s only the top-level items that do. It might be easier, for consistency and extensibility, to always include an explicit nested element (with an empty <code>typeof</code> attribute in this case) to represent the nested resource:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;p&gt;Name: &lt;span property="name"&gt;Amanda&lt;/span&gt;&lt;/p&gt;
 &lt;p&gt;Band: &lt;span rel="band"&gt;&lt;span typeof&gt; &lt;span property="name"&gt;Jazz Band&lt;/span&gt; (&lt;span property="size"&gt;12&lt;/span&gt; players)&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;/div&gt;
</code></pre>

<p>The other thing that people have to watch out for is that because the value of the <code>band</code> property is a resource rather than a literal, we have to use the <code>rel</code> attribute rather than the <code>property</code> attribute as we do elsewhere.</p>

<h2>Itemref</h2>

<blockquote>
  <p>This example is the same as the previous one, but all the properties are separated from their items:</p>

<pre><code>&lt;div itemscope id="amanda" itemref="a b"&gt;&lt;/div&gt;
&lt;p id="a"&gt;Name: &lt;span itemprop="name"&gt;Amanda&lt;/span&gt;&lt;/p&gt;
&lt;div id="b" itemprop="band" itemscope itemref="c"&gt;&lt;/div&gt;
&lt;div id="c"&gt;
 &lt;p&gt;Band: &lt;span itemprop="name"&gt;Jazz Band&lt;/span&gt;&lt;/p&gt;
 &lt;p&gt;Size: &lt;span itemprop="size"&gt;12&lt;/span&gt; players&lt;/p&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>This should create the same RDF as the previous example:</p>

<pre><code>[
  &lt;#name&gt; "Amanda" ;
  &lt;#band&gt; [
    &lt;#name&gt; "Jazz Band" ;
    &lt;#size&gt; "12"
  ]
]
</code></pre>

<p>changing the markup as little as possible. The RDFa equivalent is:</p>

<pre><code>&lt;div id="amanda"&gt;&lt;/div&gt;
&lt;p vocab="#" about="_:amanda"&gt;Name: &lt;span property="name"&gt;Amanda&lt;/span&gt;&lt;/p&gt;
&lt;div vocab="#" about="_:amanda" rel="band" resource="_:c"&gt;&lt;/div&gt;
&lt;div vocab="#" about="_:c"&gt;
 &lt;p&gt;Band: &lt;span property="name"&gt;Jazz Band&lt;/span&gt;&lt;/p&gt;
 &lt;p&gt;Size: &lt;span property="size"&gt;12&lt;/span&gt; players&lt;/p&gt;
&lt;/div&gt;
</code></pre>

<p>In microdata, the <code>itemref</code> attribute is a method of an item adopting name/value pairs described in a separate location within the page. In RDFa, the equivalent is to say that the name/value pairs are all related to the same resource by consistently referring to the resource as the subject of the statements. In the above case, there are two blank nodes labelled <code>_:amanda</code> and <code>_:c</code>, and the <code>about</code> attribute is used on the same elements that provide the properties (or a wrapper element) to indicate the identity of the subject of the statements.</p>

<p>Notes:</p>

<ul>
<li>The <code>resource</code> attribute has to be used to indicate the blank node for the band.</li>
<li>As before, the <code>rel</code> attribute has to be used for the <code>band</code> property, rather than the <code>property</code> attribute, because the object of the statement is a resource. The rule is that if you&#8217;re using <code>resource</code>, you should use <code>rel</code>. (I used <code>property</code> erroneously the first time I tried to write this mapping. I will never learn.)</li>
</ul>

<p>There is another example of <code>itemref</code> in use later in the microdata specification:</p>

<blockquote>
<pre><code>&lt;!DOCTYPE HTML&gt;
&lt;html&gt;
 &lt;head&gt;
  &lt;title&gt;Photo gallery&lt;/title&gt;
 &lt;/head&gt;
 &lt;body&gt;
  &lt;h1&gt;My photos&lt;/h1&gt;
  &lt;figure itemscope itemtype="http://n.whatwg.org/work" itemref="licenses"&gt;
   &lt;img itemprop="work" src="images/house.jpeg" alt="A white house, boarded up, sits in a forest."&gt;
   &lt;figcaption itemprop="title"&gt;The house I found.&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;figure itemscope itemtype="http://n.whatwg.org/work" itemref="licenses"&gt;
   &lt;img itemprop="work" src="images/mailbox.jpeg" alt="Outside the house is a mailbox. It has a leaflet inside."&gt;
   &lt;figcaption itemprop="title"&gt;The mailbox.&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;footer&gt;
   &lt;p id="licenses"&gt;All images licensed under the &lt;a itemprop="license"
   href="http://www.opensource.org/licenses/mit-license.php"&gt;MIT
   license&lt;/a&gt;.&lt;/p&gt;
  &lt;/footer&gt;
 &lt;/body&gt;
&lt;/html&gt;
</code></pre>
</blockquote>

<p>This is equivalent to the RDF:</p>

<pre><code>[
  a &lt;http://n.whatwg.org/work&gt; ;
  &lt;http://n.whatwg.org/license&gt; &lt;http://www.opensource.org/licenses/mit-license.php&gt; ;
  &lt;http://n.whatwg.org/work&gt; &lt;images/house.jpeg&gt; ;
  &lt;http://n.whatwg.org/title&gt; "The house I found." ;
] .
[
  a &lt;http://n.whatwg.org/work&gt; ;
  &lt;http://n.whatwg.org/license&gt; &lt;http://www.opensource.org/licenses/mit-license.php&gt; ;
  &lt;http://n.whatwg.org/work&gt; &lt;images/mailbox.jpeg&gt; ;
  &lt;http://n.whatwg.org/title&gt; "The mailbox." ;
] .
</code></pre>

<p>Note that the <code>license</code> property is adopted by both the items in the microdata. In this particular example, the two items have the same type, and thus the <code>license</code> property has the same meaning in each item. It&#8217;s also possible for <code>itemref</code> to be used on two items that have different types, pointing to the same element, in which case the shared properties defined within that element could mean different things for the two items.</p>

<p>There is no way that I am aware of within RDFa to support shared use of portions of content. There could be a rough equivalent that would work in the case where the shared properties had the same semantics if RDFa allowed the <code>about</code> attribute to take multiple values (<strong>note: invalid example</strong>):</p>

<pre><code>&lt;!DOCTYPE HTML&gt;
&lt;html&gt;
 &lt;head&gt;
  &lt;title&gt;Photo gallery&lt;/title&gt;
 &lt;/head&gt;
 &lt;body vocab="http://n.whatwg.org/"&gt;
  &lt;h1&gt;My photos&lt;/h1&gt;
  &lt;figure about="_:house" typeof="work"&gt;
   &lt;span rel="work"&gt;&lt;img src="images/house.jpeg" alt="A white house, boarded up, sits in a forest."&gt;&lt;/span&gt;
   &lt;figcaption property="title"&gt;The house I found.&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;figure about="_:mailbox" typeof="work"&gt;
   &lt;span rel="work"&gt;&lt;img src="images/mailbox.jpeg" alt="Outside the house is a mailbox. It has a leaflet inside."&gt;&lt;/span&gt;
   &lt;figcaption property="title"&gt;The mailbox.&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;footer&gt;
   &lt;p about="_:house _:mailbox"&gt;All images licensed under the &lt;a rel="license"
   href="http://www.opensource.org/licenses/mit-license.php"&gt;MIT
   license&lt;/a&gt;.&lt;/p&gt;
  &lt;/footer&gt;
 &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<p>but this wouldn&#8217;t support the possibility of the same property having different semantics (and therefore different URIs) for the separate resources.</p>

<p>It&#8217;s also worth noting in this example that the mapping to RDF that I&#8217;m assuming results, in this example, in <code>http://n.whatwg.org/work</code> being both a class and a property. The creators of RDF vocabularies tend to name classes with an Uppercase initial letter and properties with a lowercase initial letter, and thus avoid these kinds of clashes. Vocabulary designers who are mindful of mappings to RDF may want to take the same approach.</p>

<h2>Multiple Values</h2>

<blockquote>
  <p>This example describes an ice cream, with two flavors:</p>

<pre><code>&lt;div itemscope&gt;
 &lt;p&gt;Flavors in my favorite ice cream:&lt;/p&gt;
 &lt;ul&gt;
  &lt;li itemprop="flavor"&gt;Lemon sorbet&lt;/li&gt;
  &lt;li itemprop="flavor"&gt;Apricot sorbet&lt;/li&gt;
 &lt;/ul&gt;
&lt;/div&gt;
</code></pre>
  
  <p>This thus results in an item with two properties, both &#8220;flavor&#8221;, having the values &#8220;Lemon sorbet&#8221; and &#8220;Apricot sorbet&#8221;.</p>
</blockquote>

<p>This example highlights one of the real nightmares of RDF: lists. In microdata, the order of the values &#8216;Lemon sorbet&#8217; and &#8216;Apricot sorbet&#8217; is naturally retained. There are three possible mappings to RDF.</p>

<h3>Creating Multiple Statements</h3>

<p>If the order of the flavours of ice cream in this example don&#8217;t actually matter, the equivalent RDF is:</p>

<pre><code>[ &lt;#flavor&gt; "Lemon sorbet" , "Apricot sorbet" ]
</code></pre>

<p>which is equivalent to:</p>

<pre><code>[ &lt;#flavor&gt; "Apricot sorbet" , "Lemon sorbet" ]
</code></pre>

<p>In this case, the RDFa is straight-forward:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;p&gt;Flavors in my favorite ice cream:&lt;/p&gt;
 &lt;ul&gt;
  &lt;li property="flavor"&gt;Lemon sorbet&lt;/li&gt;
  &lt;li property="flavor"&gt;Apricot sorbet&lt;/li&gt;
 &lt;/ul&gt;
&lt;/div&gt;
</code></pre>

<p>It&#8217;s surprising how common it is that order doesn&#8217;t actually matter when there are multiple values for a property, such that this mapping is quite sufficient. But I&#8217;m absolutely not going to pretend that order is never important&#8230;</p>

<h3>Creating an <code>rdf:Seq</code></h3>

<p>If the order of the flavours does matter, there are two ways of representing that order using RDF. The first is to use an <code>rdf:Seq</code> resource. This method was the original method of representing lists in RDF and is very natural to do in RDF/XML, but has largely fallen out of favour for the second method which I&#8217;ll describe below.</p>

<p>Using the <code>rdf:Seq</code> method, the equivalent RDF for the microdata would be:</p>

<pre><code>@prefix rdf: &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; .
[
  &lt;#flavor&gt; [
    a rdf:Seq ;
    rdf:_1 "Lemon sorbet" ;
    rdf:_2 "Apricot sorbet"
  ]
]
</code></pre>

<p>which can be generated with:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;p&gt;Flavors in my favorite ice cream:&lt;/p&gt;
 &lt;div rel="flavor"&gt;
  &lt;ul typeof="rdf:Seq"&gt;
   &lt;li property="rdf:_1"&gt;Lemon sorbet&lt;/li&gt;
   &lt;li property="rdf:_2"&gt;Apricot sorbet&lt;/li&gt;
  &lt;/ul&gt;
 &lt;/div&gt;
&lt;/div&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>There are various other ways in which the namespace for the <code>rdf:Seq</code> could be created, but since the <code>rdf:</code> prefix is built-in to RDFa 1.1, it seems easier to use that than anything that explicitly writes out the full (ugly) RDF namespace.</li>
<li>The <code>&lt;div&gt;</code> wrapper for the <code>&lt;ul&gt;</code> is needed in the same way as the wrapper <code>&lt;span&gt;</code> element was needed in the <code>&lt;img&gt;</code> example above. Whereas in microdata, the property element also describes the value of that property, in RDFa when the object of a statement is a resource the description of that resource is nested inside the property element (in a similar way to RDF/XML).</li>
</ul>

<h3>Creating a <code>rdf:List</code></h3>

<p>The current recommended way to create a list in RDF is to use a <code>rdf:List</code> resource. This essentially uses a <a href="http://en.wikipedia.org/wiki/Linked_list">linked list</a> model to represent lists, with the <code>rdf:first</code> item of a list being a value and the <code>rdf:rest</code> being either another <code>rdf:List</code> or <code>rdf:nil</code>. Spelled out, the RDF would look like:</p>

<pre><code>@prefix rdf: &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; .
[ 
  &lt;#flavor&gt; [
    a rdf:List ;
    rdf:first "Lemon sorbet" ;
    rdf:rest [
      rdf:first "Apricot sorbet" ;
      rdf:rest rdf:nil
    ]
  ]
]
</code></pre>

<p>but of course Turtle lets you write it:</p>

<pre><code>[] &lt;#flavor&gt; ( "Lemon sorbet" "Apricot sorbet" ) .
</code></pre>

<p>Unfortunately, RDFa has no such syntax sugar. Which means:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;p&gt;Flavors in my favorite ice cream:&lt;/p&gt;
 &lt;div rel="flavor"&gt;
  &lt;ul typeof="rdf:List"&gt;
   &lt;li property="rdf:first"&gt;Lemon sorbet&lt;/li&gt;
   &lt;li rel="rdf:rest"&gt;
    &lt;span typeof="rdf:List"&gt;
     &lt;span property="rdf:first"&gt;Apricot sorbet&lt;/span&gt;
     &lt;a rel="rdf:rest" href="rdf:nil"&gt;&lt;/a&gt;
    &lt;/span&gt;
   &lt;/li&gt;
  &lt;/ul&gt;
 &lt;/div&gt;
&lt;/div&gt;
</code></pre>

<p>Yep, horrific. Verbose and easy to get wrong, and that&#8217;s just for two items. If a third is added, the pattern is to add an <code>about</code> attribute on the middle items of the list so that the <code>rdf:rest</code> property which covers the next item in the list can be assigned to it. For example:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;p&gt;Flavors in my favorite ice cream:&lt;/p&gt;
 &lt;div rel="flavor"&gt;
  &lt;ul typeof="rdf:List"&gt;
   &lt;li property="rdf:first"&gt;Lemon sorbet&lt;/li&gt;
   &lt;li rel="rdf:rest"&gt;
    &lt;span about="_:2" typeof="List"&gt;
     &lt;span property="rdf:first"&gt;Apricot sorbet&lt;/span&gt;
    &lt;/span&gt;
   &lt;/li&gt;
   &lt;li about="_:2" rel="rdf:rest"&gt;
     &lt;span typeof="rdf:List"&gt;
     &lt;span property="rdf:first"&gt;Raspberry sorbet&lt;/span&gt;
     &lt;a rel="rdf:rest" href="http://www.w3.org/1999/02/22-rdf-syntax-ns#nil"&gt;&lt;/a&gt;
     &lt;/span&gt;
   &lt;/li&gt;
  &lt;/ul&gt;
 &lt;/div&gt;
&lt;/div&gt;
</code></pre>

<p>Note:</p>

<ul>
<li>I&#8217;ve used an empty <code>&lt;a&gt;</code> element with a <code>href</code> attribute to point to the <code>rdf:nil</code> resource. An alternative would be to use the <code>resource</code> attribute, which would have the advantage of not having to spell out the full URI for <code>rdf:nil</code>, but I&#8217;m trying to stick to using as few attributes as possible.</li>
<li>Using an empty <code>&lt;a&gt;</code> element for a link isn&#8217;t ideal; it would be neater to use a <code>&lt;link&gt;</code> element, but these aren&#8217;t allowed in flow content within HTML5 (<code>&lt;link&gt;</code> and <code>&lt;meta&gt;</code> are only permitted within the microdata specification, and then only if they have an <code>itemprop</code> attribute). The RDFa specification could likewise allow them.</li>
</ul>

<h2>Multiple Properties Sharing a Value</h2>

<blockquote>
  <p>Here we see an item with two properties, &#8220;favorite-color&#8221; and &#8220;favorite-fruit&#8221;, both set to the value &#8220;orange&#8221;:</p>

<pre><code>&lt;div itemscope&gt;
 &lt;span itemprop="favorite-color favorite-fruit"&gt;orange&lt;/span&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>This should map to the RDF:</p>

<pre><code>[
  &lt;#favorite-color&gt; "orange" ;
  &lt;#favorite-fruit&gt; "orange"
]
</code></pre>

<p>Like <code>itemprop</code>, <code>property</code> can take multiple values, so the RDFa equivalent is simply:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;span property="favorite-color favorite-fruit"&gt;orange&lt;/span&gt;
&lt;/div&gt;
</code></pre>

<h2>Types</h2>

<blockquote>
  <p>Here, the item&#8217;s type is &#8220;http://example.org/animals#cat&#8221;:</p>

<pre><code>&lt;section itemscope itemtype="http://example.org/animals#cat"&gt;
 &lt;h1 itemprop="name"&gt;Hedral&lt;/h1&gt;
 &lt;p itemprop="desc"&gt;Hedral is a male american domestic
 shorthair, with a fluffy black fur with white paws and belly.&lt;/p&gt;
 &lt;img itemprop="img" src="hedral.jpeg" alt="" title="Hedral, age 18 months"&gt;
&lt;/section&gt;
</code></pre>
  
  <p>In this example the &#8220;http://example.org/animals#cat&#8221; item has three properties, a &#8220;name&#8221; (&#8220;Hedral&#8221;), a &#8220;desc&#8221; (&#8220;Hedral is&#8230;&#8221;), and an &#8220;img&#8221; (&#8220;hedral.jpeg&#8221;).</p>
</blockquote>

<p>I&#8217;ll assume that this should be mapped to the RDF:</p>

<pre><code>[
  a &lt;http://example.org/animals#cat&gt; ;
  &lt;http://example.org/animals#name&gt; "Hedral" ;
  &lt;http://example.org/animals#desc&gt; "Hedral is a male american domestic shorthair, with a fluffy black fur with white paws and belly." ;
  &lt;http://example.org/animals#img&gt; &lt;hedral.jpeg&gt;
]
</code></pre>

<p>In this case, the <code>vocab</code> can be set to <code>http://example.org/animals#</code> and both the <code>itemtype</code> and the various <code>property</code> and <code>rel</code> attributes will use that as the basis for their identifying URIs:</p>

<pre><code>&lt;section vocab="http://example.org/animals#" typeof="cat"&gt;
 &lt;h1 property="name"&gt;Hedral&lt;/h1&gt;
 &lt;p property="desc"&gt;Hedral is a male american domestic
 shorthair, with a fluffy black fur with white paws and belly.&lt;/p&gt;
 &lt;div rel="img"&gt;&lt;img src="hedral.jpeg" alt="" title="Hedral, age 18 months"&gt;&lt;/div&gt;
&lt;/section&gt;
</code></pre>

<h2>Global Identifiers</h2>

<blockquote>
  <p>Here, an item is talking about a particular book:</p>

<pre><code>&lt;dl itemscope
    itemtype="http://vocab.example.net/book"
    itemid="urn:isbn:0-330-34032-8"&gt;
 &lt;dt&gt;Title
 &lt;dd itemprop="title"&gt;The Reality Dysfunction
 &lt;dt&gt;Author
 &lt;dd itemprop="author"&gt;Peter F. Hamilton
 &lt;dt&gt;Publication date
 &lt;dd&gt;&lt;time itemprop="pubdate" datetime="1996-01-26"&gt;26 January 1996&lt;/time&gt;
&lt;/dl&gt;
</code></pre>
</blockquote>

<p>Here, the item has an identifier so unlike the previous examples, the subject of the statements in the RDF is no longer a blank node:</p>

<pre><code>@prefix xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt; .
&lt;urn:isbn:0-330-34032-8&gt;
  a &lt;http://vocab.example.net/book&gt; ;
  &lt;http://vocab.example.net/title&gt; "The Reality Dysfunction\n " ;
  &lt;http://vocab.example.net/author&gt; "Peter F. Hamilton\n " ;
  &lt;http://vocab.example.net/pubdate&gt; "1996-01-26"^^xsd:date ;
  .
</code></pre>

<p>In RDFa, the subject is provided using the <code>about</code> attribute:</p>

<pre><code>&lt;dl vocab="http://vocab.example.net/"
    typeof="book"
    about="urn:isbn:0-330-34032-8"&gt;
 &lt;dt&gt;Title
 &lt;dd property="title"&gt;The Reality Dysfunction
 &lt;dt&gt;Author
 &lt;dd property="author"&gt;Peter F. Hamilton
 &lt;dt&gt;Publication date
 &lt;dd&gt;&lt;time property="pubdate" content="1996-01-26" datatype="xsd:date" datetime="1996-01-26"&gt;26 January 1996&lt;/time&gt;
&lt;/dl&gt;
</code></pre>

<h2>Global Property Names</h2>

<blockquote>
  <p>Here, an item is an &#8220;http://example.org/animals#cat&#8221;, and most of the properties have names that are words defined in the context of that type. There are also a few additional properties whose names come from other vocabularies.</p>

<pre><code>&lt;section itemscope itemtype="http://example.org/animals#cat"&gt;
 &lt;h1 itemprop="name http://example.com/fn"&gt;Hedral&lt;/h1&gt;
 &lt;p itemprop="desc"&gt;Hedral is a male american domestic
 shorthair, with a fluffy &lt;span
 itemprop="http://example.com/color"&gt;black&lt;/span&gt; fur with &lt;span
 itemprop="http://example.com/color"&gt;white&lt;/span&gt; paws and belly.&lt;/p&gt;
 &lt;img itemprop="img" src="hedral.jpeg" alt="" title="Hedral, age 18 months"&gt;
&lt;/section&gt;
</code></pre>
</blockquote>

<p>The RDF equivalent to this is:</p>

<pre><code>[
  a &lt;http://example.org/animals#cat&gt; ;
  &lt;http://example.org/animals#name&gt; "Hedral" ;
  &lt;http://example.com/fn&gt; "Hedral" ;
  &lt;http://example.org/animals#desc&gt; "Hedral is a male american domestic shorthair, with a fluffy black fur with white paws and belly." ;
  &lt;http://example.com/color&gt; "black" , "white" ;
  &lt;http://example.org/animals#img&gt; &lt;hedral.jpeg&gt;
]
</code></pre>

<p>To create this, we need the RDFa:</p>

<pre><code>&lt;section vocab="http://example.org/animals#" typeof="cat"&gt;
 &lt;h1 property="name http://example.com/fn"&gt;Hedral&lt;/h1&gt;
 &lt;p property="desc"&gt;Hedral is a male american domestic
 shorthair, with a fluffy &lt;span
 property="http://example.com/color"&gt;black&lt;/span&gt; fur with &lt;span
 property="http://example.com/color"&gt;white&lt;/span&gt; paws and belly.&lt;/p&gt;
 &lt;span rel="img"&gt;&lt;img src="hedral.jpeg" alt="" title="Hedral, age 18 months"&gt;&lt;/span&gt;
&lt;/section&gt;
</code></pre>

<h2>Link Relations</h2>

<blockquote>
  <p>Here is an example of a page that uses the vEvent vocabulary to mark up an event:</p>

<pre><code>&lt;body itemscope itemtype="http://microformats.org/profile/hcalendar#vevent"&gt;
 ...
 &lt;h1 itemprop="summary"&gt;Bluesday Tuesday: Money Road&lt;/h1&gt;
 ...
 &lt;time itemprop="dtstart" datetime="2009-05-05T19:00:00Z"&gt;May 5th @ 7pm&lt;/time&gt;
 (until &lt;time itemprop="dtend" datetime="2009-05-05T21:00:00Z"&gt;9pm&lt;/time&gt;)
 ...
 &lt;a href="http://livebrum.co.uk/2009/05/05/bluesday-tuesday-money-road"
    rel="bookmark" itemprop="url"&gt;Link to this page&lt;/a&gt;
 ...
 &lt;p&gt;Location: &lt;span itemprop="location"&gt;The RoadHouse&lt;/span&gt;&lt;/p&gt;
 ...
 &lt;p&gt;&lt;input type=button value="Add to Calendar"
           onclick="location = getCalendar(this)"&gt;&lt;/p&gt;
 ...
 &lt;meta itemprop="description" content="via livebrum.co.uk"&gt;
&lt;/body&gt;
</code></pre>
</blockquote>

<p>This example is interesting because it contains, in the natural markup of the page, a <code>rel</code> attribute with the value <a href="http://www.w3.org/TR/html5/links.html#link-type-bookmark"><code>bookmark</code></a>, which is used for links that go to the page or section of the page within which the link is found. In this case, it&#8217;s the page. The RDF that should be generated from the page is:</p>

<pre><code>[
  a &lt;http://microformats.org/profile/hcalendar#vevent&gt; ;
  &lt;http://microformats.org/profile/hcalendar#summary&gt; "Bluesday Tuesday: Money Road" ;
  &lt;http://microformats.org/profile/hcalendar#dtstart&gt; "2009-05-05T19:00:00Z"^^xsd:dateTime ;
  &lt;http://microformats.org/profile/hcalendar#dtend&gt; "2009-05-05T21:00:00Z"^^xsd:dateTime ;
  &lt;http://microformats.org/profile/hcalendar#url&gt; &lt;http://livebrum.co.uk/2009/05/05/bluesday-tuesday-money-road&gt; ;
  &lt;http://microformats.org/profile/hcalendar#location&gt; "The RoadHouse" ;
  &lt;http://microformats.org/profile/hcalendar#description&gt; "via livebrum.co.uk"
] .
</code></pre>

<p>The following statement could legitimately be generated as well:</p>

<pre><code>&lt;&gt; 
  &lt;http://www.w3.org/1999/xhtml/vocab#bookmark&gt; &lt;http://livebrum.co.uk/2009/05/05/bluesday-tuesday-money-road&gt; ;
  .
</code></pre>

<p>but the item representing the event should definitely not have the <code>http://www.w3.org/1999/xhtml/vocab#bookmark</code> property.</p>

<p>Achieving this without significantly changing the HTML markup is problematic in RDFa because RDFa uses the <code>rel</code> attribute to provide properties for the resources that it describes within the page, overloading its standard use in HTML which is to describe properties of the page or sections within the page. The following involves the least amount of repetition:</p>

<pre><code>&lt;body vocab="http://microformats.org/profile/hcalendar#"&gt;
 &lt;div typeof="vevent"&gt;
  ...
  &lt;h1 property="summary"&gt;Bluesday Tuesday: Money Road&lt;/h1&gt;
  ...
  &lt;time property="dtstart" content="2009-05-05T19:00:00Z" datatype="xsd:dateTime" 
        datetime="2009-05-05T19:00:00Z"&gt;May 5th @ 7pm&lt;/time&gt;
  (until &lt;time property="dtend" content="2009-05-05T21:00:00Z" datatype="xsd:dateTime" 
               datetime="2009-05-05T21:00:00Z"&gt;9pm&lt;/time&gt;)
  ...
  &lt;a rel="url" href="http://livebrum.co.uk/2009/05/05/bluesday-tuesday-money-road"&gt;&lt;/a&gt;
  &lt;a about href="http://livebrum.co.uk/2009/05/05/bluesday-tuesday-money-road"
     rel="bookmark"&gt;Link to this page&lt;/a&gt;
  ...
  &lt;p&gt;Location: &lt;span property="location"&gt;The RoadHouse&lt;/span&gt;&lt;/p&gt;
  ...
  &lt;p&gt;&lt;input type=button value="Add to Calendar"
            onclick="location = getCalendar(this)"&gt;&lt;/p&gt;
  ...
  &lt;span property="description" content="via livebrum.co.uk"&gt;&lt;/span&gt;
 &lt;/div&gt;
&lt;/body&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>In the above, the <code>typeof</code> attribute has been moved onto a wrapper <code>&lt;div&gt;</code> that encompasses the entirety of the page because if it resides on the <code>&lt;body&gt;</code> element, it&#8217;s assumed to apply to the document itself rather than a blank node. An alternative mapping would use <code>about="_:event"</code> to create a blank node for the event.</li>
<li>There&#8217;s no way to avoid creating a statement for the <code>rel="bookmark"</code> link, so the best we can do is make sure that the statement is accurate, and relates the current document to the provided URI. Unfortunately, that means creating a separate element for the <code>url</code> property, repeating that URL within the page, and adding an empty <code>about</code> attribute; here I&#8217;ve used an empty <code>&lt;a&gt;</code> element to express the relationship; a <code>&lt;link&gt;</code> element would do the same job if it were allowed in flow content.</li>
<li>The <code>&lt;meta&gt;</code> element in the original has been mapped to an empty <code>&lt;span&gt;</code> element as it isn&#8217;t allowed in flow content without an <code>itemprop</code> attribute.</li>
</ul>
    ]]></content>
  </entry>
  <entry>
    <title>Microdata + RDF</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/162" />
    <id>http://www.jenitennison.com/blog/node/162</id>
    <published>2011-07-31T20:55:44+01:00</published>
    <updated>2011-08-02T09:49:38+01:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="microdata" />
    <category term="rdf" />
    <category term="rdfa" />
    <summary type="html"><![CDATA[<p>As part of the ongoing discussion about how to reconcile RDFa and microdata (if at all), <a href="http://webr3.org/blog/">Nathan Rixham</a> has put together a suggested <a href="http://www.w3.org/wiki/Microdata_RDFa_Merge">Microdata RDFa Merge</a> which brings together parts of <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html">microdata</a> and parts of <a href="http://www.w3.org/TR/rdfa-core/">RDFa</a>, creating a completely new set of attributes, but a parsing model that more or less follows microdata&#8217;s.</p>

<p>I want here to put forward another possibility to the debate. I should say that this is just some noodling on my part as a way of exploring options, not any kind of official position on the behalf of the W3C or the TAG or any other body that you might associate me with, nor even a decided position on my part.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>As part of the ongoing discussion about how to reconcile RDFa and microdata (if at all), <a href="http://webr3.org/blog/">Nathan Rixham</a> has put together a suggested <a href="http://www.w3.org/wiki/Microdata_RDFa_Merge">Microdata RDFa Merge</a> which brings together parts of <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html">microdata</a> and parts of <a href="http://www.w3.org/TR/rdfa-core/">RDFa</a>, creating a completely new set of attributes, but a parsing model that more or less follows microdata&#8217;s.</p>

<p>I want here to put forward another possibility to the debate. I should say that this is just some noodling on my part as a way of exploring options, not any kind of official position on the behalf of the W3C or the TAG or any other body that you might associate me with, nor even a decided position on my part.</p>

<!--break-->

<h2>Simplifying RDFa</h2>

<p>As <a href="http://www.jenitennison.com/blog/node/103">I&#8217;ve said before</a>, RDFa, in my experience, is complicated not primarily because of the whole namespaces/CURIEs issue but because its processing model tries to be too clever. RDFa was designed to largely fit in with existing markup and turn it into embedded data &#8220;just&#8221; by adding a few attributes here and there. Thus a simple image like:</p>

<pre><code>&lt;img src="photo1.jpg"&gt;
</code></pre>

<p>is first marked up to indicate that it&#8217;s an image:</p>

<pre><code>&lt;img src="photo1.jpg" typeof="foaf:Image"&gt;
</code></pre>

<p>then to provide its license:</p>

<pre><code>&lt;img src="photo1.jpg" typeof="foaf:Image"
  rel="license" resource="http://creativecommons.org/licenses/by/2.0/"&gt;
</code></pre>

<p>and finally to add a title:</p>

<pre><code>&lt;img src="photo1.jpg" typeof="foaf:Image"
  rel="license" resource="http://creativecommons.org/licenses/by/2.0/"
  property="dc:title" content="A Pretty Picture"&gt;
</code></pre>

<p>all by adding attributes to the one <code>&lt;img&gt;</code> element. The trouble with this approach is that the rules about how statements are made become extremely complex, dependent on context (eg what other attributes are present, what the parent element has on it, what content it has) and default in ways that are hard to remember.</p>

<p>Even having written an RDFa parser, having written code to mark up documents with RDFa, having <em>taught</em> it, I still cannot write RDFa past a trivial example and be 100% sure that it will produce what I was aiming to produce.</p>

<p>If we were to look at really simplifying RDFa, rather than making cosmetic changes, we need to address this complexity. It would certainly mean backwards-incompatible changes, such as dropping the use of particular attributes and revising the way the processing model works, such that future RDFa processors couldn&#8217;t be used on RDFa 1.0. There are two possible ways of approaching this:</p>

<ol>
<li>retaining some backwards compatibility, and aiming for a simplified subset of RDFa 1.0 such that RDFa 1.0 processor will still get the intended triples out of data marked up with RDFa 1.1</li>
<li>dropping backwards compatibility entirely and using completely different attributes, essentially creating a new language</li>
</ol>

<p>I do not know which of these routes is the best one to take.</p>

<p>My instinct is that the first will be hard to do. For example, there are already certain simplifications in RDFa 1.1 &#8212; such as assuming an element with no <code>datatype</code> attribute is giving a string value rather than looking to see if there are any non-text-nodes in the content of the element &#8212; which lead to markup that will not be processed correctly by RDFa 1.0 processors. Perhaps that could be addressed by rewriting history: creating a RDFa 1.0 Second Edition that includes any changes that are needed to make a simple subset viable.</p>

<p>What I want to explore here is what the second route &#8212; using entirely different attributes from those currently used in RDFa 1.0 &#8212; might mean. I think that in this case the substantial difference between microdata and this new language would be support for that much-derided requirement: decentralised extensibility.</p>

<h2>Adding Decentralised Extensibility to Microdata</h2>

<p>As I discussed <a href="http://www.jenitennison.com/blog/node/161">earlier in the week</a>, microdata is simply not designed for use in a web where publishers might want to use multiple vocabularies to mark up the same thing for different consumers. This focus is very probably the right one for the majority of uses, where publishers address single consumers or everyone has standardised on a single vocabulary. It&#8217;s certainly an assumption that keeps the markup simple.</p>

<p>However, there is a larger data web out there. It&#8217;s not just browsers and search engines who might look for and process data embedded within a page. Unlike with HTML, those few, large consumers don&#8217;t have to understand a particular vocabulary for other consumers to get valuable information from it. If you operate in a world of multiple consumers with different requirements, you need decentralised extensibility. And support for decentralised extensibility is RDF&#8217;s niche as a data model, its unique selling point.</p>

<p>Given that a new language would have to use a different processing model from RDFa 1.0, I would suggest that it simply uses microdata&#8217;s as a starting point. Using attributes from RDFa 1.0 would only cause conflicts with RDFa 1.0 processors. Microdata processing is there, already defined, already implemented. It isn&#8217;t going to go away. And you know, <em>it&#8217;s pretty good</em>.</p>

<p>The &#8216;new language&#8217; would then not so much a &#8216;new language&#8217; as an enhancement on something that already exists. It would be a set of additions that augment the data that is generated from normal microdata processing with a few extra features that are useful in a world where there are multiple vocabularies for the same domain, where publishers have to provide data to multiple consumers, where an RDF view of data is useful. Call it microdata+RDF.</p>

<p>So what would we need to add? Well, there are three things, I think, that make microdata hard to use in a decentralised world, and make it hard to generate good RDF from microdata markup:</p>

<ol>
<li>lack of support for multiple types</li>
<li>scoping of properties by type</li>
<li>lack of datatypes</li>
</ol>

<p>We would need to find a way to add these for use within the RDF extracted from the microdata markup such that a basic microdata parser would still generate the same JSON, and such that microdata&#8217;s DOM API would work as specified in the microdata spec. So we can&#8217;t change the types of values that are possible in microdata&#8217;s attributes or how they&#8217;re interpreted in the DOM API.</p>

<h3>Multiple Types</h3>

<p>Because of the restrictions I just mentioned in not touching microdata itself, we can&#8217;t simply make <code>itemtype</code> take multiple URLs. We could rely on <code>itemprop="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"</code> as a mechanism of providing types for use by RDF processors, but I think that the types of something is such a fundamental property that it makes sense to have a dedicated attribute.</p>

<p>I suggest <code>itemclass</code>. It would only be allowed on elements with an <code>itemscope</code> attribute and would take a space-separated set of values in exactly the same way as the <code>itemprop</code> attribute. The values would be turned into URIs in the same way as for the <code>itemprop</code> attribute, which I&#8217;ll describe below.</p>

<p>Microdata+RDF would add a method to the existing microdata DOM API to enable people to access items by class rather than their single type. So:</p>

<pre><code>document . getItemsByClass( classes )
Returns a NodeList of the elements in the Document that create items, that are not 
part of other items, and that have one or more of the types or classes given in the 
argument.

The classes argument is interpreted as a space-separated list of classes.
</code></pre>

<p>Note that for simplicity, because they are interpreted in the same way within the RDF model, this returns items whose <code>itemtype</code> is listed in the argument list of classes as well as those whose <code>itemclass</code> is listed.</p>

<p>Within the DOM API, the <code>itemClass</code> IDL attribute on HTML elements would reflect the <code>itemclass</code> attribute.</p>

<p>The <code>itemclass</code> attribute would be ignored for the purpose of creating JSON from microdata, and only be used when creating RDF.</p>

<p>An example would be:</p>

<pre><code>&lt;li class="conference" itemscope itemid="/2011/oscon/"
    itemtype="http://schema.org/Event"
    itemclass="http://microformats.org/profile/hcalendar#vevent /vocab/Conference"&gt;
  ...
&lt;/li&gt;
</code></pre>

<p>The JSON generated from this would look like:</p>

<pre><code>{
  "type": "http://schema.org/Event" ,
  "id": "http://lanyrd.com/2011/oscon/",
  "properties": {}
}
</code></pre>

<p>The RDF would look like:</p>

<pre><code>&lt;http://lanyrd.com/2011/oscon&gt;
  a &lt;http://schema.org/Event&gt; ,
    &lt;http://microformats.org/profile/hcalendar#vevent&gt; ,
    &lt;http://lanyrd.com/vocab/Conference&gt; ;
  .
</code></pre>

<h3>Disambiguating Properties</h3>

<p>To work with the RDF model, properties have to have URIs. We need to have a way of easily creating the URIs for the short-name properties without people changing their existing microdata markup.</p>

<blockquote>
  <p>Note: I&#8217;ve substantially revised this section following discussion with <a href="http://blog.foolip.org/">Philip Jägenstedt</a>. Old text is struck through, new text underlined.</p>
</blockquote>

<p>The way that this is done in RDFa 1.1 is through a <code>vocab</code> attribute, which provides a URI prefix that is concatenated to any short-name properties or types. <strike>We could use the same approach here, but call the attribute <code>itemvocab</code> to fit in with the general method of naming attributes in microdata.</strike> <u>Using this with microdata would be tedious for users however, and it would be easy for the <code>itemtype</code> and <code>itemvocab</code> to get out of sync in weird ways.</u></p>

<p><strike><code>itemvocab</code> would only be allowed on elements with an <code>itemscope</code>. The scope of <code>itemvocab</code> would be limited to the item itself, so that it&#8217;s not forgotten when it&#8217;s needed, particularly in copy-and-paste scenarios. However, to make it easier to use I think it should probably be given a default value if it isn&#8217;t present, as follows:</strike></p>

<p><u>Instead, the vocabulary for the properties could be identified as follows:</u></p>

<ol>
<li>set <em>vocab</em> to the <code>itemtype</code> of the item if it is present, and the URL of the document if not</li>
<li>use a substring of <em>vocab</em>:
<ol><li>if <em>vocab</em> contains a <code>#</code>, the substring of <em>vocab</em> up to and including the <code>#</code></li>
<li>otherwise, the substring of <em>vocab</em> up to and including its final <code>/</code></li></ol></li>
</ol>

<p>For example, if you have:</p>

<pre><code>&lt;li class="conference" itemscope itemid="/2011/oscon/"
    itemtype="http://schema.org/Event"
    itemclass="http://microformats.org/profile/hcalendar#vevent /vocab/Conference"&gt;
  ...
&lt;/li&gt;
</code></pre>

<p>then <strike><code>itemvocab</code></strike> <u>the item vocabulary</u> would default to <code>http://schema.org/</code>.</p>

<p><strike>There could be an extra restriction that if <code>itemtype</code> is specified, <code>itemvocab</code> must be in the same domain as that type; that could help prevent the weird situation where in the generated RDF the properties would be interpreted as being in a completely different vocabulary from the <code>itemtype</code>.</strike></p>

<p><strike>Within the DOM API, the <code>itemVocab</code> IDL attribute on HTML elements would reflect the <code>itemvocab</code> attribute.</strike></p>

<p><u>Note: the following example has been altered in place.</u></p>

<p>For example, take the following markup:</p>

<pre><code>&lt;li class="conference" itemscope itemid="/2011/oscon/"
    itemtype="http://schema.org/Event" 
    itemclass="SocialEvent BusinessEvent EducationEvent"&gt;
  &lt;h3&gt;
    &lt;a itemprop="url" href="/2011/oscon/"&gt;
      &lt;span itemprop="name"&gt;OSCON 2011&lt;/span&gt;
    &lt;/a&gt;
  &lt;/h3&gt;
  &lt;p itemprop="location" itemscope itemid="/places/portland/"
     itemtype="http://schema.org/Place"&gt;
    &lt;span itemprop="name"&gt;&lt;a href="/places/usa/"&gt;United States&lt;/a&gt; / &lt;a itemprop="url" href="/places/portland/"&gt;Portland&lt;/a&gt;&lt;/span&gt;
  &lt;/p&gt;
  &lt;p class="date"&gt;
    &lt;time itemprop="startDate" datetime="2011-07-25"&gt;25th&lt;/time&gt;–
    &lt;time itemprop="endDate" datetime="2011-07-29"&gt;29th July 2011&lt;/time&gt;
  &lt;/p&gt;
  ...
&lt;/li&gt;
</code></pre>

<p>The vocabulary for the <code>&lt;li&gt;</code> element defaults to <code>http://schema.org/</code> based on the value of the <code>itemtype</code>. The short-named properties and classes within that item are turned into URIs by pre-pending <code>http://schema.org/</code> to their name. Similarly, the properties on the nested <code>http://schema.org/Place</code> are pre-pended with <code>http://schema.org/Place/</code>. The resulting RDF would be:</p>

<pre><code>@prefix s: &lt;http://schema.org/&gt;

&lt;/2011/oscon/&gt;
  a s:Event ,
    s:SocialEvent ,
    s:BusinessEvent ,
    s:EducationEvent ;
  s:url &lt;http://lanyrd.com/2011/oscon/&gt; ;
  s:name "OSCON 2011" ;
  s:location &lt;/places/portland/&gt; ;
  s:startDate "2011-07-25"^^xsd:date ;
  s:endDate "2011-07-29"^^xsd:date ;
  .

&lt;/places/portland/&gt;
  a s:Place ;
  s:url &lt;http://lanyrd.com/places/portland/&gt; ;
  s:name "United States / Portland" ;
  .
</code></pre>

<p>Note: see below for how the values are created in this example.</p>

<p>The JSON would be just the same as from a standard microdata processor; there&#8217;s no mapping to URIs for that output:</p>

<pre><code>{
  "type": "http://schema.org/Event",
  "id": "http://lanyrd.com/2011/oscon/",
  "properties": {
    "url": [
      "http://lanyrd.com/2011/oscon/"
    ],
    "name": [
      "OSCON 2011"
    ],
    "location": [
      {
        "type": "http://schema.org/Place",
        "id": "http://lanyrd.com/places/portland/",
        "properties": {
          "name": [
            "United States / Portland"
          ],
          "url": [
            "http://lanyrd.com/places/portland/"
          ]
        }
      }
    ],
    "startDate": [
      "2011-07-25"
    ],
    "endDate": [
      "2011-07-29"
    ]
  }
}
</code></pre>

<h3>Adding Datatypes</h3>

<p>How to manage datatypes in RDF generated from microdata is something where the best approach is not at all clear to me. A couple of years ago I talked about some <a href="http://www.jenitennison.com/blog/node/120">frustrations with RDF datatyping</a>, and datatypes in RDF still frustrate me by being hard to use in sensible ways throughout the RDF toolchain. Nevertheless, it&#8217;s what we have. </p>

<p>The possibilities I can see for microdata+RDF are:</p>

<ol>
<li><p>Use plain literals for everything, including URIs, equivalent to using strings as microdata does. This makes things simple for the publisher and keeps the markup in the page clean, but makes it difficult for consumers who are using RDF toolchains: they will <em>usually</em> have to do some kind of processing of the RDF generated from microdata+RDF to add appropriate datatypes to the values. There are two issues with this approach:</p>

<ul><li>I have a feeling that microdata+RDF processors will make up their own rules to add datatypes to the data extracted from a page (using rules like those described below and/or sniffing of values and/or using information from known built-in vocabularies), in an effort to add value for their users. But if different processors do that in different ways, we have an interoperability problem.</li>
<li>In some vocabularies, the datatype of a value is not derivable from the property. The most important/common example of this is <a href="http://www.w3.org/TR/skos-reference/#notations"><code>skos:notation</code></a>, which uses values with different datatypes to supply different identifiers from different identification schemes for a given concept.</li></ul></li>
<li><p>Assign datatypes based on the element type in the HTML. If the property value has come from a URL attribute, assume that it&#8217;s a resource rather than a literal; if the element is a <code>&lt;time&gt;</code> element, work out the datatype based on the syntax of the <code>datetime</code> attribute; otherwise assume it&#8217;s a string and give it a language in the case that one is specified. This gives some information but leads to a somewhat strange situation where you can mark up something as a date/time but not as a number.</p></li>
<li><p>Supplement the processing described in 2. with some basic datatype sniffing. Basically, if the value looks like a number or a boolean value then assign it a numeric or boolean datatype based on its syntax. This could reuse the <a href="http://www.w3.org/TeamSubmission/turtle/#literal">rules for recognising different literals from Turtle</a>. This wouldn&#8217;t be perfect; in particular, it would guess that strings that consist purely of numbers such as zip codes were numbers. I&#8217;m inclined not to go down this path.</p></li>
<li><p>Supplement the processing described in 2. with a <code>itemvaltype</code> attribute that takes a token from the list of <a href="http://www.w3.org/TR/xmlschema-2/#built-in-datatypes">built-in XML Schema Datatypes</a> or the token &#8216;<code>literal</code>&#8217;. The &#8216;<code>literal</code>&#8217; token would be used to override the normal processing of URL attributes in the case where those really should be literals rather than resources. In this design, it would be easy to create literals using one of the most usual datatypes, but not possible to use datatypes that are specific to a given vocabulary.</p></li>
<li><p>Supplement the processing described in 4. by allowing the <code>itemvaltype</code> to take either a token or a URL. The thing I don&#8217;t like about this design is that the token would be interpreted as being within the XML Schema Datatypes vocabulary rather than the vocabulary specified for <code>itemvocab</code> (used for tokens in <code>itemprop</code> and <code>itemclass</code>). This seems like it might turn into a source of confusion, but if we went the other way and had <code>itemvaltype</code> being interpreted based on <code>itemvocab</code>, it would be harder to give a value the more common datatypes such as numbers and boolean values.</p></li>
</ol>

<p>My inclination, somewhat reluctantly as it&#8217;s the most complex, would be to use the last of these, because it provides for decentralised extensibility of datatypes, and support for decentralised extensibility is the core aim of these extensions. In other words, have a <code>itemvaltype</code> attribute that can hold either a token, which must be one of <code>literal</code> or the local name of an XML Schema datatype, or a URL. On a <code>&lt;time&gt;</code> element, this would default to the appropriate type based on the syntax of the value of the <code>datetime</code> attribute.</p>

<p>To be conformant, the <code>itemvaltype</code> would have to be an allowed value type for the properties given in <code>itemprop</code> and the value of the property must be a legal value for the datatype. (In keeping with the style of the microdata specification, the mechanisms for working out what value types are allowed and what the legal values are for non-XML Schema datatypes would be left undefined &#8212; a consuming application would look at the definition of the vocabulary.)</p>

<p>Within the DOM API, the <code>itemValType</code> IDL attribute on HTML elements would reflect the <code>itemvaltype</code> attribute. The value of <code>itemvaltype</code> <em>wouldn&#8217;t</em> change the types of the values returned by <code>element.itemValue</code> or in the JSON mapping from microdata; it would purely be used when generating RDF from that data.</p>

<p>For example, if someone started with some markup like:</p>

<pre><code>&lt;div itemscope itemtype="http://schema.org/AggregateOffer"&gt;
  Priced from: &lt;span itemprop="lowPrice"&gt;$35&lt;/span&gt;
  &lt;span itemprop="offerCount"&gt;1938&lt;/span&gt; tickets left
&lt;/div&gt;
</code></pre>

<p>it might be supplemented with some type information like:</p>

<pre><code>&lt;div itemscope itemtype="http://schema.org/AggregateOffer"&gt;
  Priced from: &lt;span itemprop="lowPrice" itemvaltype="http://schema.org/Price"&gt;$35&lt;/span&gt;
  &lt;span itemprop="offerCount" itemvaltype="integer"&gt;1938&lt;/span&gt; tickets left
&lt;/div&gt;
</code></pre>

<p>which would generate RDF like:</p>

<pre><code>@prefix s: &lt;http://schema.org/&gt;

[] a s:AggregateOffer ;
  s:lowPrice "$35"^^s:Price ;
  s:offerCount 1938 ;
  .
</code></pre>

<p>(Note: Here I&#8217;m assuming that schema.org defines a <code>http://schema.org/Price</code> datatype which includes a currency and a number. They don&#8217;t currently.)</p>

<p>The JSON would still be:</p>

<pre><code>{
  "type": "http://schema.org/AggregateOffer",
  "properties": {
    "lowPrice": [
      "$35"
    ],
    "offerCount": [
      "1938"
    ]
  }
}
</code></pre>

<h3>Non-Additions</h3>

<p>When I wrote a couple of years ago about <a href="http://www.jenitennison.com/blog/node/103">what microdata can&#8217;t do</a>, one of the things that I identified was not being able to express XML Literals. Having thought about this more, what&#8217;s actually missing isn&#8217;t to do with RDF, but is the ability to use the <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/content-models.html#innerhtml"><code>innerHTML</code></a> of an element to provide a value for a property rather than its <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#textcontent"><code>textContent</code></a>.</p>

<p>For example, the description of an event might run over several paragraphs, or even in a single paragraph include other markup such as emphasised text, ruby markup, or links to additional information. People who are working from the DOM API can capture this information when they need it by getting the <code>innerHTML</code> of the element rather than its <code>itemValue</code>, but in the JSON mapping, the value is always the <code>itemValue</code> &#8212; the text content of the element.</p>

<p>So this is a general microdata simplifying limitation. I&#8217;d argue that we shouldn&#8217;t add any special handling to plug this hole at the microdata+RDF level. If it turns out that having values that contain markup is useful then it will be added to microdata, and the microdata+RDF mapping would then be extended to create <code>rdf:XMLLiteral</code>s or HTML literals (for which there is no defined datatype in RDF at the moment) for such values.</p>

<p>Similarly, I haven&#8217;t said anything in this post about providing machine-readable values to override the text content of an element. There is <a href="http://www.w3.org/Bugs/Public/show_bug.cgi?id=13240">an open bug</a> about whether and how that capability might be added to HTML/microdata. I happen to think that it&#8217;s useful, but that utility isn&#8217;t limited to RDF processing. Whichever route is chosen there, I think it&#8217;s important to keep the property values used by basic microdata and microdata+RDF aligned.</p>

<h2>Summary</h2>

<p>To summarise, one direction that we could take in aligning microdata and RDFa would be to define an extension to microdata to add support for decentralised extensibility and the RDF data model. I think that would entail adding attributes such as:</p>

<ul>
<li><code>itemclass</code> to make it easy to define multiple types for an item</li>
<li><code>itemvocab</code> and some default processing to provide nice mappings for short-name properties into URIs</li>
<li><code>itemvaltype</code> and some default processing to assign datatypes to values</li>
</ul>

<p>For publishers and consumers, a single language with optional extensions greatly simplifies the use of embedded data. Property names don&#8217;t have to be repeated or balancing acts made between different processing models.</p>

<p>RDFa proponents get a syntax that can be used to generate a natural RDF model against which they can build RDF-oriented APIs and map to other formats such as JSON-LD.</p>

<p>For microdata proponents, this approach doesn&#8217;t pollute microdata with requirements that they see as superfluous, and doesn&#8217;t change the behaviour of core microdata processors. Browsers, search engines and other consumers can continue to use the JSON output and only those who really want to support RDF need to do so.</p>

<p>I&#8217;m sure that there are things that I&#8217;ve missed in my outline above, issues that I haven&#8217;t thought of. But if there is to be any kind of convergence between microdata/RDFa, this layered approach seems to me to be the kind of convergence that is most likely to eventually result in one language for embedding data in HTML rather than two or three.</p>

<p><strong>Note: if you prefer to comment on Google+, please add your comment to <a href="https://plus.google.com/u/0/112095156983892490612/posts/aUqGQSLzDPv">my announcement post there</a></strong></p>
    ]]></content>
  </entry>
  <entry>
    <title>Using Multiple Vocabularies in Microdata</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/161" />
    <id>http://www.jenitennison.com/blog/node/161</id>
    <published>2011-07-28T09:25:21+01:00</published>
    <updated>2011-07-28T09:25:21+01:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="microdata" />
    <category term="schema.org" />
    <summary type="html"><![CDATA[<p>I <a href="http://www.jenitennison.com/blog/node/160">wrote the other day</a> about how <a href="http://www.legislation.gov.uk/">legislation.gov.uk</a> needs to share data at three levels to satisfy its goals as a website:</p>

<ul>
<li>large-scale consumers such as search engines</li>
<li>small-scale consumers that provide us with a useful service</li>
<li>specialist consumers that are interested specifically in our data</li>
</ul>

<p>and the requirement to use multiple, incrementally more specialised, vocabularies to describe the same things as a result.</p>

<p>What I want to do here is explore how a publisher might handle this kind of situation using microdata. The ground has already been substantially covered by <a href="http://openspring.net/blog/2011/06/10/microdata-multiple-vocabularies">Stéphane Corlosquet</a>; what I do here is work through an example where the consumers are microdata&#8217;s primary targets &#8212; search engines and browsers &#8212; look at why it&#8217;s hard to fix this within microdata itself, and discuss how people who create vocabularies to be used with microdata might help publishers who find themselves in this situation by designing those vocabularies to be used together as well as on their own.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>I <a href="http://www.jenitennison.com/blog/node/160">wrote the other day</a> about how <a href="http://www.legislation.gov.uk/">legislation.gov.uk</a> needs to share data at three levels to satisfy its goals as a website:</p>

<ul>
<li>large-scale consumers such as search engines</li>
<li>small-scale consumers that provide us with a useful service</li>
<li>specialist consumers that are interested specifically in our data</li>
</ul>

<p>and the requirement to use multiple, incrementally more specialised, vocabularies to describe the same things as a result.</p>

<p>What I want to do here is explore how a publisher might handle this kind of situation using microdata. The ground has already been substantially covered by <a href="http://openspring.net/blog/2011/06/10/microdata-multiple-vocabularies">Stéphane Corlosquet</a>; what I do here is work through an example where the consumers are microdata&#8217;s primary targets &#8212; search engines and browsers &#8212; look at why it&#8217;s hard to fix this within microdata itself, and discuss how people who create vocabularies to be used with microdata might help publishers who find themselves in this situation by designing those vocabularies to be used together as well as on their own.</p>

<!--break-->

<h2>Use Case</h2>

<p>I&#8217;m going to use <a href="http://lanyrd.com/">Lanyrd</a> from <a href="http://simonwillison.net/">Simon Willison</a> and <a href="http://natbat.net/">Natalie Downe</a> as an example. Lanyrd is &#8220;the social conference directory&#8221;: it keeps track of conferences that you&#8217;re attending or speaking at, and lets you know about ones that your friends (or at least the people you follow on Twitter) are going to as well, as well as providing a bunch of other useful facilities.</p>

<p>Lanyrd currently uses microformats to mark up events so that nice summaries appear within search engine results. Here&#8217;s a (slightly simplified for concision) example from the front page:</p>

<pre><code>&lt;li class="conference vevent"&gt;
  &lt;h3&gt;&lt;a href="/2011/oscon/" class="summary url"&gt;OSCON 2011&lt;/a&gt;&lt;/h3&gt;
  &lt;p class="location"&gt;
    &lt;a href="/places/usa/"&gt;United States&lt;/a&gt; / &lt;a href="/places/portland/"&gt;Portland&lt;/a&gt;
  &lt;/p&gt;
  &lt;p class="date"&gt;
    &lt;abbr class="dtstart" title="2011-07-25"&gt;25th&lt;/abbr&gt;–
    &lt;abbr class="dtend"   title="2011-07-29"&gt;29th July 2011&lt;/abbr&gt;
  &lt;/p&gt;
  ...
&lt;/li&gt;
</code></pre>

<p>This is easy enough to understand: OSCON 2011 has a URL of <a href="http://lanyrd.com/2011/oscon/"><code>http://lanyrd.com/2011/oscon/</code></a>, is located in Portland (US), starts on 25th July and ends on 29th July 2011.</p>

<p>Say that Lanyrd decided to switch to using <a href="http://www.schema.org/">schema.org</a> microdata. The markup would change to something like the following:</p>

<pre><code>&lt;li class="conference" itemscope 
    itemtype="http://schema.org/Event" itemid="/2011/oscon/"&gt;
  &lt;h3&gt;
    &lt;a itemprop="url" href="/2011/oscon/"&gt;
      &lt;span itemprop="name"&gt;OSCON 2011&lt;/span&gt;
    &lt;/a&gt;
  &lt;/h3&gt;
  &lt;p itemprop="location" itemscope 
     itemtype="http://schema.org/Place" itemid="/places/portland/"&gt;
    &lt;span itemprop="name"&gt;
      &lt;a href="/places/usa/"&gt;United States&lt;/a&gt; / &lt;a itemprop="url" href="/places/portland/"&gt;Portland&lt;/a&gt;
    &lt;/span&gt;
  &lt;/p&gt;
  &lt;p class="date"&gt;
    &lt;time itemprop="startDate" datetime="2011-07-25"&gt;25th&lt;/time&gt;–
    &lt;time itemprop="endDate"   datetime="2011-07-29"&gt;29th July 2011&lt;/time&gt;
  &lt;/p&gt;
  ...
&lt;/li&gt;
</code></pre>

<p>A few notes, because there were some design decisions involved in this mapping:</p>

<ul>
<li>I&#8217;ve used a plain <a href="http://schema.org/Event"><code>http://schema.org/Event</code></a> because I wasn&#8217;t sure how to classify a conference &#8212; is it a <code>SocialEvent</code> or a <code>BusinessEvent</code> or an <code>EducationEvent</code>? Depends on the conference, I guess</li>
<li>I&#8217;ve assumed that the URIs for both the conference and its location are also item identifiers</li>
<li>I&#8217;ve changed the markup a bit to add <code>&lt;span&gt;</code> elements where necessary to get the desired data out, namely around the names of the conference and the place; I could have used separate <code>&lt;meta&gt;</code> or <code>&lt;link&gt;</code> elements instead but that would have meant repetition of data within the page</li>
</ul>

<p>All well and good.</p>

<p>Now let&#8217;s say that browsers start to support the <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#vevent">vEvent vocabulary defined within the WHATWG HTML microdata specification</a> and offer some really nice functionality: because there&#8217;s a clear mapping to iCalendar, they enable users to drag an event from the browser to a calendar application, and have it create an entry within the calendar.</p>

<p>Say Lanyrd really want to take advantage of this. It means marking up their pages in something like a mix between the two examples we&#8217;ve looked at so far &#8212; microdata syntax but with the vEvent vocabulary (which is based on the hCalendar microformat vocabulary) rather than the schema.org vocabulary:</p>

<pre><code>&lt;li class="conference" itemscope 
    itemtype="http://microformats.org/profile/hcalendar#vevent" itemid="/2011/oscon/"&gt;
  &lt;h3&gt;
    &lt;a itemprop="url" href="/2011/oscon/"&gt;
      &lt;span itemprop="summary"&gt;OSCON 2011&lt;/span&gt;
    &lt;/a&gt;
  &lt;/h3&gt;
  &lt;p itemprop="location"&gt;
    &lt;a href="/places/usa/"&gt;United States&lt;/a&gt; / &lt;a href="/places/portland/"&gt;Portland&lt;/a&gt;
  &lt;/p&gt;
  &lt;p class="date"&gt;
    &lt;time itemprop="dtstart" datetime="2011-07-25"&gt;25th&lt;/time&gt;–
    &lt;time itemprop="dtend"   datetime="2011-07-29"&gt;29th July 2011&lt;/time&gt;
  &lt;/p&gt;
  ...
&lt;/li&gt;
</code></pre>

<p>But now Lanyrd have a dilemma. If they mark up their pages using the schema.org vocabulary, they can&#8217;t take advantage of the browser drag-and-drop support; if they mark up their pages using the vEvent vocabulary they won&#8217;t get their pages displaying nicely in search engine results. They can get the benefits from one consumer or the other but not both at the same time. What to do?</p>

<h2>Publisher Workarounds</h2>

<p>What could Lanyrd do to work around this problem?</p>

<h3>Different Syntaxes</h3>

<p>The first, eminently pragmatic, workaround, would be to use different syntaxes to encode the event information for the two different consumers. Since schema.org is likely to continue to understand microformats for the forseeable future, Lanyrd could stick to their original microformat markup and just add similar microdata for browsers to pull out to create iCalendar data. The page would look like:</p>

<pre><code>&lt;li class="conference vevent" itemscope 
    itemtype="http://microformats.org/profile/hcalendar#vevent" itemid="/2011/oscon/"&gt;
  &lt;h3&gt;
    &lt;a itemprop="url" class="url" href="/2011/oscon/"&gt;
      &lt;span itemprop="summary" class="summary"&gt;OSCON 2011&lt;/span&gt;
    &lt;/a&gt;
  &lt;/h3&gt;
  &lt;p itemprop="location" class="location"&gt;
    &lt;a href="/places/usa/"&gt;United States&lt;/a&gt; / &lt;a href="/places/portland/"&gt;Portland&lt;/a&gt;
  &lt;/p&gt;
  &lt;p class="date"&gt;
    &lt;time itemprop="dtstart" class="dtstart" datetime="2011-07-25"&gt;25th&lt;/time&gt;–
    &lt;time itemprop="dtend"   class="dtend"   datetime="2011-07-29"&gt;29th July 2011&lt;/time&gt;
  &lt;/p&gt;
  ...
&lt;/li&gt;
</code></pre>

<p>In other words, they could handle their requirement by not using microdata for one of the vocabularies. I don&#8217;t think this is a particularly acceptable solution, given that schema.org specifically wants publishers to use microdata, but it would work.</p>

<h3>Repeated Data</h3>

<p>A second workaround that Lanyrd could use would be to have some shadow markup for the data targeted at schema.org; the visible event information in the page itself should still be marked up using the vEvent vocabulary because it gives an area of the page that users can drag and drop. The basic version of this would look like:</p>

<pre><code>&lt;li class="conference"&gt;
  &lt;!-- data for browsers --&gt;
  &lt;span itemscope 
    itemtype="http://microformats.org/profile/hcalendar#vevent" itemid="/2011/oscon/"&gt;
    &lt;h3&gt;
      &lt;a itemprop="url" href="/2011/oscon/"&gt;
        &lt;span itemprop="summary"&gt;OSCON 2011&lt;/span&gt;
      &lt;/a&gt;
    &lt;/h3&gt;
    &lt;p itemprop="location"&gt;
      &lt;a href="/places/usa/"&gt;United States&lt;/a&gt; / &lt;a href="/places/portland/"&gt;Portland&lt;/a&gt;
    &lt;/p&gt;
    &lt;p class="date"&gt;
      &lt;time itemprop="dtstart" datetime="2011-07-25"&gt;25th&lt;/abbr&gt;–
      &lt;time itemprop="dtend"   datetime="2011-07-29"&gt;29th July 2011&lt;/abbr&gt;
    &lt;/p&gt;
    ...
  &lt;/span&gt;

  &lt;!-- data for search engines --&gt;
  &lt;span itemscope itemtype="http://schema.org/Event" itemid="/2011/oscon/"&gt;
    &lt;link itemprop="url" href="/2011/oscon/"&gt;
    &lt;meta itemprop="name" content="OSCON 2011"&gt;
    &lt;span itemprop="location" itemscope 
          itemtype="http://schema.org/Place" itemid="/places/portland/"&gt;
      &lt;link itemprop="url" href="/places/portland/"&gt;
      &lt;meta itemprop="name" content="United States / Portland"&gt;
    &lt;/span&gt;
    &lt;time itemprop="startDate" datetime="2011-07-25"&gt;&lt;/time&gt;
    &lt;time itemprop="endDate"   datetime="2011-07-29"&gt;&lt;/time&gt;
    ...
  &lt;/li&gt;
&lt;/li&gt;
</code></pre>

<p>Note: I&#8217;ve used empty <code>&lt;time&gt;</code> elements to mark up the dates for the conference in the schema.org shadow data because the microdata spec says &#8220;If a property&#8217;s value represents a date, time, or global date and time, the property must be specified using the <code>datetime</code> attribute of a <code>time</code> element.&#8221; They&#8217;re empty, though, so they won&#8217;t be displayed on the page.</p>

<p>There are a few issues with this workaround:</p>

<ul>
<li>it repeats content and thus bloats the page</li>
<li>in the microdata DOM API, there now appear to be two items when really there&#8217;s one conference; this might not be a big deal if scripts access items by type rather than just getting all the items</li>
<li>search engines might (wild speculation follows) be more suspicious of data that isn&#8217;t visible within the page; there&#8217;s no way for schema.org to know that the same data appears visibly elsewhere with equivalent markup</li>
</ul>

<h3>Use <code>itemref</code></h3>

<p>A third possibility for Lanyrd would be to something similar to the previous example but use the <code>itemref</code> attribute to point to any shared data. Unfortunately in this case, there&#8217;s only one property that&#8217;s actually shared (with the same semantics) between the two vocabularies &#8212; <code>url</code> &#8212; so using this technique doesn&#8217;t improve the markup all that much from the previous example:</p>

<pre><code>&lt;li class="conference"&gt;
  &lt;!-- data for browsers --&gt;
  &lt;span itemscope 
    itemtype="http://microformats.org/profile/hcalendar#vevent" itemid="/2011/oscon/"&gt;
    &lt;h3&gt;
      &lt;a id="oscon-url" itemprop="url" href="/2011/oscon/"&gt;
        &lt;span itemprop="summary"&gt;OSCON 2011&lt;/span&gt;
      &lt;/a&gt;
    &lt;/h3&gt;
    &lt;p itemprop="location"&gt;
      &lt;a href="/places/usa/"&gt;United States&lt;/a&gt; / &lt;a href="/places/portland/"&gt;Portland&lt;/a&gt;
    &lt;/p&gt;
    &lt;p class="date"&gt;
      &lt;time itemprop="dtstart" datetime="2011-07-25"&gt;25th&lt;/abbr&gt;–
      &lt;time itemprop="dtend"   datetime="2011-07-29"&gt;29th July 2011&lt;/abbr&gt;
    &lt;/p&gt;
    ...
  &lt;/span&gt;

  &lt;!-- data for search engines --&gt;
  &lt;span itemscope itemtype="http://schema.org/Event" itemid="/2011/oscon/"
    itemref="oscon-url"&gt;
    &lt;meta itemprop="name" content="OSCON 2011"&gt;
    &lt;span itemprop="location" itemscope 
          itemtype="http://schema.org/Place" itemid="/places/portland/"&gt;
      &lt;link itemprop="url" href="/places/portland/"&gt;
      &lt;meta itemprop="name" content="United States / Portland"&gt;
    &lt;/span&gt;
    &lt;time itemprop="startDate" datetime="2011-07-25"&gt;&lt;/time&gt;
    &lt;time itemprop="endDate"   datetime="2011-07-29"&gt;&lt;/time&gt;
    ...
  &lt;/li&gt;
&lt;/li&gt;
</code></pre>

<p>In other situations, where there is more overlap in the property names used by the two types, there might be more advantage in this approach.</p>

<h3>Content Negotiation</h3>

<p>A final workaround would be for Lanyrd to serve up an HTML page that uses the schema.org vocabulary to search engines and an HTML page that uses the vEvent vocabulary to browsers, by sniffing the <code>User-Agent</code> header.</p>

<p>This has the advantage of not having to try to cram two conflicting vocabularies into a single page but the disadvantage of having to code for the content negotiation. Essentially, it shifts the complexity and repetition from the HTML page to the code that generates the HTML page, but does address the three disadvantages that I listed for the &#8216;repeated content&#8217; solution described above.</p>

<h2>Publisher Workarounds</h2>

<p>Lanyrd could also lobby schema.org and/or WHATWG to make changes to what data they consume.</p>

<h3>Lobby for Convergence</h3>

<p>Lanyrd could lobby schema.org to understand the vEvent vocabulary and/or WHATWG to specify browser handling of the schema.org vocabulary.</p>

<p>This might work, but the vocabularies do have different goals and requirements, which might make it hard to unify them: vEvent maps neatly and easily to iCalendar, schema.org is oriented around Rich Snippets in search engine results. The modelling of the <code>location</code> property in each shows this different emphasis: it only needs to map to a string in iCalendar so there&#8217;s no need to model the location as an item itself, but in search engine results it&#8217;s useful to link to the location, display a map and so on, which is only possible if the location is modelled as an item in its own right.</p>

<h3>Lobby for Different Processing</h3>

<p>Finally, Lanyrd could lobby schema.org and/or WHATWG to trigger their recognition of an event based on something other than the <code>itemtype</code> of an item, and to interpret full URIs for properties in the same way as equivalent short names.</p>

<p>For example, currently the <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#conversion-to-icalendar">conversion of vEvent to iCalendar defined in the WHATWG HTML specification</a> is triggered by the presence of an item that is an vEvent:</p>

<blockquote>
  <p>If none of the nodes in nodes are items with the type <code>http://microformats.org/profile/hcalendar#vevent</code>, then there is no vEvent data. Abort the algorithm, returning nothing.</p>
</blockquote>

<p>Let&#8217;s say that it were instead triggered by items with <em>either</em>:</p>

<ul>
<li>an <code>itemtype</code> of <code>http://microformats.org/profile/hcalendar#vevent</code> <em>or</em></li>
<li>a <code>http://microformats.org/profile/hcalendar#type</code> of <code>vevent</code></li>
</ul>

<p>and that in the former case, it would read short name properties but in the latter case it would read properties with URIs like <code>http://microformats.org/profile/hcalendar#location</code>.</p>

<p>In that case, Lanyrd could use the schema.org vocabulary for the type given in the <code>itemtype</code> attribute, but markup extra properties for the item using the vEvent property URIs. The markup would be something like:</p>

<pre><code>&lt;li class="conference" itemscope 
    itemtype="http://schema.org/Event" itemid="/2011/oscon/"&gt;
  &lt;meta itemprop="http://microformats.org/profile/hcalendar#type" content="vevent"&gt;
  &lt;h3&gt;
    &lt;a itemprop="url http://microformats.org/profile/hcalendar#url" href="/2011/oscon/"&gt;
      &lt;span itemprop="name http://microformats.org/profile/hcalendar#summary"&gt;OSCON 2011&lt;/span&gt;
    &lt;/a&gt;
  &lt;/h3&gt;
  &lt;p itemprop="location" itemscope 
     itemtype="http://schema.org/Place" itemid="/places/portland/"&gt;
    &lt;span itemprop="name"&gt;
      &lt;a href="/places/usa/"&gt;United States&lt;/a&gt; / &lt;a itemprop="url" href="/places/portland/"&gt;Portland&lt;/a&gt;
    &lt;/span&gt;
  &lt;/p&gt;
  &lt;meta itemprop="http://microformats.org/profile/hcalendar#location" 
        content="United States / Portland"&gt;
  &lt;p class="date"&gt;
    &lt;time itemprop="startDate http://microformats.org/profile/hcalendar#dtstart" 
          datetime="2011-07-25"&gt;25th&lt;/time&gt;–
    &lt;time itemprop="endDate   http://microformats.org/profile/hcalendar#dtend"   
          datetime="2011-07-29"&gt;29th July 2011&lt;/time&gt;
  &lt;/p&gt;
  ...
&lt;/li&gt;
</code></pre>

<p>Note that here the location has to be repeated because vEvent expects a string while schema.org expects an item.</p>

<p>The same kind of pattern could work the other way around: schema.org could recognised events based on a <code>http://schema.org/type</code> property with the value <code>Event</code>, and understand property URIs that were equivalent to each of the short-name properties that it uses. (Such <a href="http://schema.org/schema.owl">URIs for schema.org properties</a> already exist.)</p>

<h2>Multiple Types for Microdata Items</h2>

<p>Earlier this year there was some <a href="http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-June/032243.html">discussion on the WHATWG mailing list</a> about the requirement for multiple types for items.</p>

<p>The use cases there were not the same multiple-consumers use case that I have outlined above, but around support for inheritance in types. For example, schema.org lets people define their own types in the schema.org domain, such as <code>http://schema.org/Event/Conference</code>. There&#8217;s no way of exposing the type hierarchy explicitly (that all <code>Conference</code>s are <code>Event</code>s) so scripts that use microdata&#8217;s DOM API to search for items of the type <code>http://schema.org/Event</code> won&#8217;t find conferences. The discussion was about alleviating this by allowing publishers to put both <code>http://schema.org/Event</code> and <code>http://schema.org/Event/Conference</code> within the <code>itemtype</code> attribute. Alternatively, a conference could be typed as a <code>SocialEvent</code>, <code>BusinessEvent</code> <em>and</em> a <code>EducationEvent</code>, enabling it to take properties from all three.</p>

<p>The conclusion of the discussion was that it just wasn&#8217;t possible to use what would seem to be the obvious method of assigning multiple types to an item: having a space-separated list in the <code>itemtype</code> attribute. If we look at the markup that you would get for this example, we can see why there&#8217;s a problem:</p>

<pre><code>&lt;li class="conference" itemscope itemid="/2011/oscon/"
    itemtype="http://schema.org/Event http://microformats.org/profile/hcalendar#vevent"&gt;
  &lt;h3&gt;
    &lt;a itemprop="url" href="/2011/oscon/"&gt;
      &lt;span itemprop="name summary"&gt;OSCON 2011&lt;/span&gt;
    &lt;/a&gt;
  &lt;/h3&gt;
  &lt;p itemprop="location" itemscope 
     itemtype="http://schema.org/Place" itemid="/places/portland/"&gt;
    &lt;span itemprop="name"&gt;
      &lt;a href="/places/usa/"&gt;United States&lt;/a&gt; / &lt;a itemprop="url" href="/places/portland/"&gt;Portland&lt;/a&gt;
    &lt;/span&gt;
  &lt;/p&gt;
  &lt;meta itemprop="location" content="United States / Portland"&gt;
  &lt;p class="date"&gt;
    &lt;time itemprop="startDate dtstart" datetime="2011-07-25"&gt;25th&lt;/time&gt;–
    &lt;time itemprop="endDate   dtend"   datetime="2011-07-29"&gt;29th July 2011&lt;/time&gt;
  &lt;/p&gt;
  ...
&lt;/li&gt;
</code></pre>

<p>There are two issues with this markup. First, the definitions of the two types (in prose within the two specs) have different expectations about:</p>

<ul>
<li>what properties will be present: schema.org expects a <code>name</code> property and not a <code>summary</code> property, and vice versa for vEvent; similarly for <code>startDate</code>/<code>dtstart</code> and <code>endDate</code>/<code>dtend</code></li>
<li>what values the properties will have: schema.org expects <code>location</code> to have an item value whereas vEvent expects a string</li>
</ul>

<p>The result is that the mixed markup isn&#8217;t conformant with either vocabulary, and hence not a conformant HTML document. (Whether microdata consumers do anything about that non-conformance is a different question &#8212; they could just ignore properties that they don&#8217;t understand, or with value types that they don&#8217;t expect.)</p>

<p>Second, if the data is turned into any kind of format that needs full URIs for properties, such as RDF through <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#rdf">microdata&#8217;s RDF mapping</a>, it&#8217;s impossible to tell what type URI to use as the basis of that URI. If the item is assigned <em>just</em> the schema.org type, the <code>name</code> property would map to the URI:</p>

<pre><code>http://www.w3.org/1999/xhtml/microdata#http://schema.org/Event%23:name
</code></pre>

<p>If there is <em>just</em> the vEvent type, the <code>summary</code> property would map to the URI:</p>

<pre><code>http://www.w3.org/1999/xhtml/microdata#http://microformats.org/profile/hcalendar%23vevent:summary
</code></pre>

<p>When the item has more than one type, there is no way to know which type should be used as the basis of the URI generated for the property, or even if <em>both</em> should be used, as in the properties used by both vocabularies such as <code>url</code>.</p>

<p>(This issue isn&#8217;t specific to the RDF mapping defined in the microdata specification; it would arise in any RDF mapping from microdata, or any mapping in which short names for properties needed to be turned into globally unique terms.)</p>

<h2>Guidance for Microdata Vocabularies</h2>

<p>Having short names for properties makes writing microdata simple. They are much easier on the fingers and the eye than URIs, and, because they are scoped by type rather than vocabulary, they can be given simple names while still being specified tightly in terms of the types on which they can appear and the values that they can take. For example, the <code>location</code> of an <code>Event</code> can be limited to a geographical place while the <code>location</code> of a <code>Click</code> can be specified in terms of a point on a web page, rather than having to use more complex property names like <code>placeLocation</code> and <code>pointLocation</code>.</p>

<p>This could be a particular advantage in large and wide-ranging vocabularies such as schema.org&#8217;s where it&#8217;s likely that at some point there will be a clash in meaning between properties with the same name for different things. (Though the flip side for schema.org is that it has lots of inherited properties which really do have the same meaning across subtypes.)</p>

<p>The biggest problem with short names arise when you want to provide data to different consumers that use different vocabularies for that data. My guess is that in real life, in many cases this won&#8217;t be an issue, and certainly microdata has been designed with that assumption. Realistically, the majority of websites will probably only care about embedding data in web pages to the extent that search engines will read it, and will therefore only use one vocabulary &#8212; schema.org&#8217;s. Where more than one vocabulary <em>is</em> used in the page, it may well be that they are used in different locations (eg OGP for Facebook in the head of a page, schema.org in the body), or to mark up data about completely different kinds of things.</p>

<p>However, if you&#8217;re a publisher who wants to provide data to multiple consumers who understand different vocabularies &#8212; search engines <em>and</em> browsers as in the Lanyrd example above, for example &#8212; and those consumers define what they will consume solely based on the <code>itemtype</code> of an item, then you&#8217;re going to have to either workaround consumer&#8217;s behaviour as I described above, or ask those consumers to change how they work.</p>

<p>The most promising direction I can see at the moment would be to ask consumers to define their vocabularies such that they include</p>

<ol>
<li>a property that is used to identify the in-vocabulary type of items whose <code>itemtype</code> is not in that vocabulary</li>
<li>defined URIs for properties that are equivalent (and processed in the same way as) the short name properties for a given type</li>
</ol>

<p>The type-defining property <em>could</em> be <code>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</code>, with the value being the URI of the relevant type. However,</p>

<pre><code>&lt;link itemprop="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
      href="http://schema.org/Event"&gt;
</code></pre>

<p>is a lot more verbose than:</p>

<pre><code>&lt;meta itemprop="http://schema.org/type" content="Event"&gt;
</code></pre>

<p>so I imagine that the designers of microdata vocabularies would prefer to ask publishers to do the latter. On the other hand, if publishers are using multiple vocabularies, they might find it easier to use a consistent type-defining property across vocabularies; it&#8217;s hard to tell what the global usability payoffs might be here.</p>

<p>Or microdata could standardise the pattern by adding an attribute (eg <code>itemkind</code>, <code>iteminherit</code>, <code>itemothertype</code>, <code>itemmixin</code>, I dunno) which would list additional types. These could be exposed within the DOM API (which would be a big advantage for in-page scripts) but not used in the interpretation of short-name properties.</p>

<p>Vocabularies that don&#8217;t support processing of items based on a type-defining property and property URIs are effectively indicating that they don&#8217;t anticipate being mixed with others that have <em>also</em> made the same assumption that they won&#8217;t be mixed with others. Currently, for example, schema.org and the vocabularies defined within the WHATWG microdata specification both make this assumption. Working with one vocabulary that makes that assumption for a particular type is fine; working with two in microdata is much harder.</p>
    ]]></content>
  </entry>
  <entry>
    <title>My Experience of Web Standards</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/160" />
    <id>http://www.jenitennison.com/blog/node/160</id>
    <published>2011-07-24T17:24:00+01:00</published>
    <updated>2011-07-26T18:18:44+01:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="xml" />
    <category term="html5" />
    <category term="microdata" />
    <category term="rdf" />
    <category term="rdfa" />
    <summary type="html"><![CDATA[<p>One of the things that&#8217;s been niggling at the back of my mind since the <a href="http://schema.org">schema.org</a> announcement is how small a role search engine results plays in the wider data sharing efforts that I&#8217;m more familiar with in my work on <a href="http://www.legislation.gov.uk/">legislation.gov.uk</a>, and more generally how my day job experience differs from (what seem to be) more common experiences of development on the web. In this post, I&#8217;m going to talk about that experience, and about the particular problems that I see with the coexistence of microdata and RDFa as a result.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>One of the things that&#8217;s been niggling at the back of my mind since the <a href="http://schema.org">schema.org</a> announcement is how small a role search engine results plays in the wider data sharing efforts that I&#8217;m more familiar with in my work on <a href="http://www.legislation.gov.uk/">legislation.gov.uk</a>, and more generally how my day job experience differs from (what seem to be) more common experiences of development on the web. In this post, I&#8217;m going to talk about that experience, and about the particular problems that I see with the coexistence of microdata and RDFa as a result.</p>

<!--break-->

<p>My day job (the one I actually get paid for) is web development. The site I spend most of my time and effort on is <a href="http://www.legislation.gov.uk/">legislation.gov.uk</a>. This deals with complex content (UK legislation) that has to be presented in multiple formats (users love PDFs of legislation). Our aim is to make the data as reusable as possible by third parties through good, RESTful, web architecture, and we want to use open standards and open source technologies as part of the <a href="http://www.cabinetoffice.gov.uk/resource-library/open-source-open-standards-and-re-use-government-action-plan">UK government&#8217;s general strategy</a>.</p>

<p>legislation.gov.uk is not a global website like Amazon or eBay, but it&#8217;s not small either: it covers 60,000 changing items of legislation, providing point-in-time views for many of them, and with more added every day. It&#8217;s one of the top ten most used UK Government websites, with 2 million visits (about 10-12 million page views) each month and typically about 120 requests/second during the active times of the day. Legislation might sound like a highly specialist interest, but if you <a href="http://twitter.com/search/legislation.gov.uk">search for legislation.gov.uk on Twitter</a> you&#8217;ll see it being referenced over and over by people who want to share what the law says.</p>

<p>I do not by any means claim that my experience is representative of the wider web. I know that there are large numbers of sites that deal only in data, not documents, and certainly not documents with the kind of rich semantic structure that legislation has. I offer the following discussion as a data point, partly because I can&#8217;t quite believe that legislation.gov.uk is <em>completely</em> unique in its requirements and partly because obviously my perspective on a bunch of issues arises from this experience.</p>

<h2>Technology Stacks</h2>

<p>Legislation items are complex, semi-structured documents. Their natural fit is XML (well, that&#8217;s not quite true &#8212; their natural fit would be something that allowed overlapping markup &#8212; but XML is the closest that we have). So we store it in XML in a native XML database and we use an XML toolset to query it (XQuery) and transform it (XSLT) into various formats including rendering it as PDF (through XSL-FO).</p>

<p>Our next step for the development of the site involves looking at legislative effects. These form a graph: one item of legislation affects other items of legislation which may in turn affect other items and so on. There are all sorts of other links between items of legislation in terms of commencements, conferred powers and so on. Particularly because we already have well-thought-through URIs for legislation, the natural fit is to use RDF to represent this graph. We already offer a SPARQL endpoint for accessing some aspects of our data, but we expect to expand and develop this over the next few months and to use it as a layer under the website and exposed for reusers, in much the same way as we use the XML database.</p>

<p>As a government site, we have fairly strict limits on what we can do within our web pages: we have to make sure that they&#8217;re accessible by everyone who wants to view them. We aren&#8217;t able to use technologies that are only available in the latest browsers, but that&#8217;s OK because with the kind of content we deal with, we don&#8217;t have to do anything fancy anyway. So we use pretty basic HTML and CSS and Javascript, because that&#8217;s how you deliver content to end-users on the web (as well as exposing the underlying XML and RDF, to enable others to reuse the data).</p>

<p>In other words, we use three web stacks for delivering legislation.gov.uk:</p>

<ul>
<li>the XML stack, which is great for single-source publishing of documents that have more semantic structures than those supported by HTML</li>
<li>the RDF stack, which is well-suited for metadata about things that are identified by URIs</li>
<li>the HTML stack, which is absolutely necessary for delivering human-accessible content on the web</li>
</ul>

<p>What bemuses me, because of this experience, is that sometimes it appears that the narrative around these technologies is framed in terms of an exclusive choice between them. For example, <a href="http://twitter.com/mattur/status/89331716430372864">@mattur asked</a>:</p>

<p style="text-align:center;">
  <a href="http://twitter.com/mattur/status/89331716430372864"><img src="/blog/files/mattur-tweet.jpg" alt="@gimsieke @JeniT how may TAG members believe RDF(a) and X(HT)ML are way forward? How many think they aren't?" /></a>
</p>

<p>It is as if, if you use XML you <em>cannot</em> appreciate the utility of error-handling in HTML; or if you use RDF you <em>cannot</em> understand the need to represent documents in XML; or if you want to utilise HTML fully, you <em>cannot</em> adopt RDF&#8217;s view of data on the web. That&#8217;s simply not my experience. They each have their role on the web; supporting the use of one does not necessitate rejecting the use of the others.</p>

<p>It&#8217;s interesting that some of the standards that are most reviled are those that arise at the intersections, where it appears that one technology is trying to encroach on the space of another:</p>

<ul>
<li>XHTML at the border of XML and HTML</li>
<li>RDF/XML at the border of RDF and XML</li>
<li>RDFa at the border of all three</li>
</ul>

<p>At the same time, within legislation.gov.uk, we publish XHTML (because it&#8217;s the natural output from an XML toolchain) and create and process RDF/XML (because it gives us access to that data from within the XML toolchain). We use a small bit of RDFa in the XHTML to indicate the rights under which our information is avaialble, and don&#8217;t yet, but are thinking about using RDFa to mark up non-document semantics within our XML (to enable the XML markup to focus on the document structures that it&#8217;s good at). For all their imperfections, these intersection technologies are useful for managing cross-overs; the problems arise when they overstep their remit and people start to think that <em>all</em> HTML must be XHTML or <em>all</em> XML must be RDF/XML or <em>all</em> RDF must be RDFa.</p>

<h2>Sharing Scenarios</h2>

<p>The second thing that I wanted to explore is the experience from legislation.gov.uk of what it&#8217;s like to be a publisher who actively wants to share their data. We need to operate simultaneously at three levels in our data sharing efforts.</p>

<h3>Large-Scale Consumer-Driven Data Sharing</h3>

<p>The first target for our data sharing efforts are the search engines. Obviously we&#8217;re not selling anything, but we want people to be able to locate legislation easily when they want it, and we want people who have done the search to be able to see some information about the legislation so that they know that they&#8217;ve located the right item.</p>

<p>This is large-scale consumer (search engine) driven data sharing, typified by schema.org and Facebook&#8217;s <a href="http://developers.facebook.com/docs/opengraph/">Open Graph Protocol</a> (OGP). There are a few very big data consumers (Google, Microsoft, Yahoo!, Facebook etc) who need to consume data from large numbers of data providers. These consumers obviously can&#8217;t understand <em>everything</em>, so they determine and document what syntaxes and vocabularies they <em>do</em> understand and expect publishers to follow.</p>

<p>The benefits that publishers get from a particular consumer determines which syntax/vocabulary they use; publishers who are particularly keen to show up prettily within search results will target schema.org whereas those who want to be sharable within Facebook will target OGP. Many publishers will want to target both. There is probably a driver towards eventual convergence:</p>

<ul>
<li>publishers might push back about inserting two lots of very similar data in their pages</li>
<li>consumers might want to include data from publishers who haven&#8217;t specifically targeted them</li>
</ul>

<p>although there&#8217;s likely to be a period where they coexist, much as there was for VHS and Betamax (and <a href="http://en.wikipedia.org/wiki/Video_2000">V2000</a>, I know, dad) during the early days of video players.</p>

<p>As <a href="http://www.jenitennison.com/blog/node/157">I discussed previously</a>, these large-scale consumers will be driven by the data that they find in the wild, in all its messy variety. They get relatively little benefit directly from using a generic <em>syntax</em>, as they are really interested in only a few, pretty generic, <em>vocabularies</em> for which they have hardwired processing. Indirectly, adopting a generic syntax has benefits in that publishers might find it easier to find tools that enable them to generate it, tutorials about how to use it, and feel that they aren&#8217;t being quite as locked in to something proprietary. However, rejecting data that isn&#8217;t marked up properly using that syntax has no benefit for consumers except in so far as it makes them feel that they are being good community members. </p>

<p>This is the pattern we see with schema.org (which accepts microdata but, based on its documentation, won&#8217;t reject data that isn&#8217;t fully compliant with it) and with OGP (which accepts a subset of RDFa but doesn&#8217;t reject data that hasn&#8217;t got prefixes properly bound, for example).</p>

<p>Another point to mention is that there is very little trust in this scenario. The communication between consumers and publishers is very limited, and the consumers will want to protect themselves against accidental or malicious errors that are evident in mismatches between explicit metadata and that which is parsed from the visible content of the page.</p>

<p>The parallels to HTML and browser vendors are very strong in this type of data sharing.</p>

<h3>Small-Scale Consumer-Driven Data Sharing</h3>

<p>A second type of data sharing is again driven by consumers, but this time at a lot smaller and more specialised scale. For legislation.gov.uk, these are services such as <a href="http://www.glin.gov/">GLIN</a>, which is a global legislation registry. Other examples are the recent work that we&#8217;ve done to publish <a href="http://data.gov.uk/organogram">UK Government organograms</a> or <a href="http://countculture.wordpress.com/">Chris Taggart</a>&#8217;s <a href="http://openelectiondata.org/">Open Election Data</a> project. In these cases, there&#8217;s a single, relatively small and specialised consumer and a small number of publishers which are closely coordinated together.</p>

<p>As in the large-scale case, the consumer ultimately determines the syntax/vocabulary that it recognises, and communicates that to the publishers. However, small-scale consumers typically have close coordination with the publishers, which has a number of side-effects:</p>

<ul>
<li>consumers may be more able to both apply pressure to and help publishers to do well in their markup</li>
<li>publishers have the opportunity to feed back directly to the consumer any suggestions that they have about changes to the syntax/vocabulary</li>
<li>publishers are likely to gain an immediate and tangible benefit from their cooperation, such as visualisations of their data that they otherwise wouldn&#8217;t have seen</li>
</ul>

<p>Another noteworthy point about small-scale consumers is that they&#8217;re unlikely to have the engineering capability to create a custom parser for a particular syntax, but will instead want to use something off-the-shelf to extract data from pages and into their own backend systems. This, coupled with the closer coordination with publishers, means that they&#8217;re much more likely to stick to a specification, assuming that the off-the-shelf tools do.</p>

<h3>Publisher-Driven Data Sharing</h3>

<p>The final type of data sharing is driven by publishers. At legislation.gov.uk, we&#8217;re motivated to make our data available for reuse for transparency/accountability reasons (to help citizens understand the law), efficiency reasons (to help parliament and government departments to publish new legislation better) and economic reasons (to foster innovation in legal publishing). We don&#8217;t have any individual consumers in mind when we publish our data, but have found that simply by publishing it well, we foster reuse.</p>

<p>In this case, we as publishers are highly motivated to ensure that the data we publish is easily parsed with something off-the-shelf, since that lowers the barrier for potential consumers. Publishers like us are very likely to have unique, specialised, content and need to use a vocabulary that fits closely to our internal data structures as this lowers implementation cost. Consumers can also trust publishers like us: we simply have no motivation to lie in the data that we provide for reuse.</p>

<h2>Mixed Markup</h2>

<p>As I&#8217;ve outlined above, publishers like legislation.gov.uk need to target several potential consumers at the same time:</p>

<ul>
<li>large-scale consumers such as search engines</li>
<li>small-scale consumers that provide us with a useful service</li>
<li>specialist consumers that are interested specifically in our data</li>
</ul>

<p>We cannot use a single vocabulary for all these different purposes. (Well, we could write our own vocabulary and describe mappings to other vocabularies using RDFS, but search engines wouldn&#8217;t read it.)</p>

<p>We must therefore use a mix of vocabularies:</p>

<ul>
<li>generic vocabularies about things that search engines care about</li>
<li>specialised vocabularies for particular small consumers</li>
<li>site-specific vocabularies for sharing our unique data</li>
</ul>

<p>It&#8217;s repetitive, but it&#8217;s manageable so long as we have a syntax that enables us to say that an item of legislation is a <code>http://scheme.org/CreativeWork</code> and a <code>http://purl.org/dc/dcmitype/Text</code> and a <code>http://www.legislation.gov.uk/def/legislation/Legislation</code> and allows us to give multiple properties the same value.</p>

<p>The way things are going at the moment, we might well end up having to use multiple <em>syntaxes</em> on the same page, as some consumers understand microdata, others consume RDFa, and still others will parse microformats. This leads to more repetition: adding <code>itemprop</code> for microdata, <code>property</code> for RDFa and specialised <code>class</code> attributes for microformats. But worse (much worse), each of the syntaxes uses a different parsing model to create an entity-property-value structure, so not only do we have to learn substantially different markup patterns but our pages quickly become some kind of hideous polyglot mess trying to balance between them.</p>

<h2>Looking Forward</h2>

<p>As I said at the start, I&#8217;m fairly sure that my experience at legislation.gov.uk isn&#8217;t representative of the wider web, but I don&#8217;t have a clear idea about just how unrepresentative it is, in terms of technology use or motivations around data sharing. When I read my twitter stream or blogs, there&#8217;s a massive sampling bias, both in terms of who I follow and what I read, but also about who talks about what they&#8217;re doing. (I&#8217;m reminded of <a href="http://www.codinghorror.com/blog/">Jeff Atwood</a>&#8217;s post on the <a href="http://www.codinghorror.com/blog/2007/11/the-two-types-of-programmers.html">Two Types of Programmers</a>: the vast majority of web developers don&#8217;t make a noise about what they do.)</p>

<p>Taking part in web standardisation today often feels like being part on an ongoing cold war between distinct camps rather than a community working towards common aims. The underlying question seems to be &#8220;who&#8217;s side are you on?&#8221; Every decision and activity is cast as a victory or defeat. Time is wasted on attack and defence, or on raking over past slights and stupidities, rather than on progress. Valid criticism from outside a group cannot be listened to for fear of giving ground, cannot be made within a group where it seems like betrayal.</p>

<p>It is the <a href="http://en.wikipedia.org/wiki/Realistic_conflict_theory#The_Robbers_Cave_Experiment">Robbers Cave Experiment</a> played out in web standards. As a psychologist, I find it fascinating. As a developer, and particularly one who doesn&#8217;t self-identify with any single group, it is frustrating. As a TAG member, trying to work for the longer-term good of the web, it is worrying, because situations of intergroup conflict lead to <a href="http://en.wikipedia.org/wiki/Groupthink">groupthink</a> and non-optimal solutions.</p>

<p>As I described above, a non-optimal outcome seems to be the most likely result of the particular microdata vs RDFa conflict for us at legislation.gov.uk. While I know we are not generally representative, I believe that it will be similarly bad for other developers: publishers, consumers and tool implementers.</p>

<p>This is a problem for all who want to foster data sharing on the web using open standards; it is not one that any one group can fix on their own. It&#8217;s my hope that a balanced task force of individuals with a variety of experience and backgrounds can provide a focus for us all to work together to solve it. If we can&#8217;t, then we have let our prejudice and bias overcome our judgement.</p>
    ]]></content>
  </entry>
</feed>

