<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Jeni's Musings</title>
  <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog"/>
  <link rel="self" type="application/atom+xml" href="http://www.jenitennison.com/blog/atom/feed"/>
  <id>http://www.jenitennison.com/blog/atom/feed</id>
  <updated>2011-06-10T20:27:38+00:00</updated>
  <entry>
    <title>Microdata and RDFa Living Together in Harmony</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/165" />
    <id>http://www.jenitennison.com/blog/node/165</id>
    <published>2011-08-20T16:39:11+00:00</published>
    <updated>2011-08-24T21:01:01+00:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="microdata" />
    <category term="rdfa" />
    <summary type="html"><![CDATA[<p>One of the options that the TAG put forward when it <a href="http://lists.w3.org/Archives/Public/public-html/2011Jun/0366.html">asked the W3C to put together task force on embedded data in HTML</a> was the co-existence of RDFa and microdata. If <a href="http://lists.w3.org/Archives/Public/www-tag/2011Aug/0050.html">that&#8217;s what we&#8217;re headed for</a>, what might make things easier for consumers and publishers who have to live in that world?</p>

<p>In a situation where there are two competing standards, I think that developers &#8212; both on the publication and consumption sides &#8212; are going to want to hedge their bets. They will want to avoid being tied to one syntax in case it turns out that that syntax isn&#8217;t supported by the majority of publishers/consumers in the long term and they have to switch.</p>

<p>Publishers like us at <a href="http://www.legislation.gov.uk/">legislation.gov.uk</a> who are aiming to share their data to whoever is interested in it (rather than having a particular consumer in mind) are also likely to want to publish in both microdata and RDFa, rather than force potential consumers to adopt a particular processing model, and will therefore need to mix the syntaxes within their pages.</p>

<p>(Of course developers might just avoid embedded data altogether while they wait to see what happens, but let&#8217;s assume that they want to press ahead regardless of the lack of consensus from the standardistas.)</p>

<p>I&#8217;ve therefore embarked on a task of trying:</p>

<ul>
<li>to identify the differences in approach and functionality of the two languages, which should help developers choose between them</li>
<li>to identify any guidelines for developers of vocabularies for use with both languages</li>
<li>to identify a subset of functionality that is common between the two languages, which developers might want to stick to to make switching and mixing easier</li>
<li>to identify mapping rules that might be applied to automatically or manually map from one language to another if the simple subset is used</li>
</ul>

<p>I&#8217;ve done this by looking at converting microdata examples to RDFa and vice versa, and the lessons to be drawn from that exercise. I&#8217;ve broken down the result into three posts:</p>

<ul>
<li><a href="http://www.jenitennison.com/blog/node/163">converting microdata to RDFa</a></li>
<li><a href="http://www.jenitennison.com/blog/node/164">converting RDFa to microdata</a></li>
<li><a href="http://www.jenitennison.com/blog/node/165">lessons learned from this exercise</a></li>
</ul>

<p>This is the last of these posts. It is probably the only one you will want to read :)</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>One of the options that the TAG put forward when it <a href="http://lists.w3.org/Archives/Public/public-html/2011Jun/0366.html">asked the W3C to put together task force on embedded data in HTML</a> was the co-existence of RDFa and microdata. If <a href="http://lists.w3.org/Archives/Public/www-tag/2011Aug/0050.html">that&#8217;s what we&#8217;re headed for</a>, what might make things easier for consumers and publishers who have to live in that world?</p>

<p>In a situation where there are two competing standards, I think that developers &#8212; both on the publication and consumption sides &#8212; are going to want to hedge their bets. They will want to avoid being tied to one syntax in case it turns out that that syntax isn&#8217;t supported by the majority of publishers/consumers in the long term and they have to switch.</p>

<p>Publishers like us at <a href="http://www.legislation.gov.uk/">legislation.gov.uk</a> who are aiming to share their data to whoever is interested in it (rather than having a particular consumer in mind) are also likely to want to publish in both microdata and RDFa, rather than force potential consumers to adopt a particular processing model, and will therefore need to mix the syntaxes within their pages.</p>

<p>(Of course developers might just avoid embedded data altogether while they wait to see what happens, but let&#8217;s assume that they want to press ahead regardless of the lack of consensus from the standardistas.)</p>

<p>I&#8217;ve therefore embarked on a task of trying:</p>

<ul>
<li>to identify the differences in approach and functionality of the two languages, which should help developers choose between them</li>
<li>to identify any guidelines for developers of vocabularies for use with both languages</li>
<li>to identify a subset of functionality that is common between the two languages, which developers might want to stick to to make switching and mixing easier</li>
<li>to identify mapping rules that might be applied to automatically or manually map from one language to another if the simple subset is used</li>
</ul>

<p>I&#8217;ve done this by looking at converting microdata examples to RDFa and vice versa, and the lessons to be drawn from that exercise. I&#8217;ve broken down the result into three posts:</p>

<ul>
<li><a href="http://www.jenitennison.com/blog/node/163">converting microdata to RDFa</a></li>
<li><a href="http://www.jenitennison.com/blog/node/164">converting RDFa to microdata</a></li>
<li><a href="http://www.jenitennison.com/blog/node/165">lessons learned from this exercise</a></li>
</ul>

<p>This is the last of these posts. It is probably the only one you will want to read :)</p>

<!--break-->

<p>Please treat this as a draft on which I&#8217;d welcome comments. I have based what&#8217;s written here on the latest specifications of both microdata (in its <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html">WHAT WG</a> and <a href="http://dev.w3.org/html5/md/Overview.html">W3C</a> variants) and <a href="http://www.w3.org/2010/02/rdfa/drafts/2011/ED-rdfa-core-20110814/">RDFa Core</a> and <a href="http://dev.w3.org/html5/rdfa/">HTML+RDFa</a> but I haven&#8217;t consulted with anyone involved in these efforts and may well have got things wrong. Plus the specs are changing all the time. I have only here considered the syntax of the two languages, not the features such as DOM APIs or drag-and-drop support, where there are also clear differences.</p>

<p>Please add comments if there are things that I&#8217;ve missed or got wrong, or just to have your say.</p>

<p><a href="http://foolip.org/microdatajs/live/">Philip Jägenstedt&#8217;s Live Microdata service</a> and <a href="http://rdf.greggkellogg.net/distiller">Gregg Kellogg&#8217;s Distiller service</a> have both proved invaluable for testing &#8212; thank you to both for making these services available. I heartily recommend them.</p>

<h2>Mapping Rules</h2>

<p>The first problem is how to judge equivalence when microdata and RDFa have different data models. Microdata essentially uses a JSON data model: there are objects (items) with properties that have values that are strings, other objects, or arrays of strings or objects or both. RDFa naturally uses a RDF data model: there are resources with properties that have values that are literals (of some datatype or with a language) or other resources.</p>

<p>Underlying both is the same basic entity-attribute-value pattern, but there are various mismatches between the models that make some mappings more complicated than others, or in other cases mean that information is necessarily lost on conversion.</p>

<p>In performing the analysis, I&#8217;ve tried to map microdata into sensible RDF and then match that RDF output using RDFa, and to map RDFa into sensible microdata+JSON and then match that microdata+JSON using microdata. The microdata-to-RDF mapping rules that I&#8217;ve followed are basically those outlined in my post on <a href="http://www.jenitennison.com/blog/node/162">Microdata + RDF</a>. To create microdata JSON from RDFa, I&#8217;ve used the rule that the URI of the first type of a resource is processed to provide a namespace that is stripped from the URIs of the properties (to create simple names where possible). In addition, when a resource has no properties, it will be represented as a string (URI) value rather than as a nested item.</p>

<p>These rules need to be formalised, obviously, but the basics above work well enough for the examples from the specs.</p>

<h2>Mismatched Features</h2>

<p>The following features are problematic when mapping from microdata to RDFa or vice versa. I&#8217;ve described them roughly in an order from things where it might be relatively easy to address the problem by changing one or other specification, to places where the necessary changes would be difficult to make in the specs, which means that publishers and consumers need to be aware of the issue so that they can make an educated choice about how they proceed.</p>

<h3>Local Property Names</h3>

<p>Many of the microdata examples involve items with no type and local property names. I&#8217;ve assumed in the analysis below that this generates properties whose URI is based on the document in which they are found, but this is not a helpful solution for data sharing: if a whole site uses short property names across its pages, those properties really need to be recognised as being the same across the site for any kind of useful processing to occur.</p>

<p>What microdata actually creates here is a global namespace, shared by everyone, specifically for embedded data. There are three things that could be done at different levels here:</p>

<ol>
<li><p>In a mapping from microdata to RDF, any short property names on items that don&#8217;t have a type could be assigned to a global namespace (eg <code>http://w3.org/ns/global/</code>). Of course there will be clashes in semantics within this namespace, but that is true in microdata generally and not having to create a new namespace makes the initial experimentation easier for those starting with embedded data. The W3C (or whoever operates the namespace) could operate a wiki at that location that would operate as an informal registry for the property names.</p></li>
<li><p>HTML+RDFa could change to use this global namespace as the default vocabulary URI (rather than not having one). This would make it a little easier for people to convert microdata to RDFa: if they don&#8217;t use types for their items, there would then be no need for a <code>vocab</code> attribute to be added to the HTML. It also makes it possible to use RDFa in a basic, lightweight way, which might help people get started with it.</p></li>
<li><p>Publishers can be advised to use <code>itemtype</code> within their microdata, reusing existing classes or creating their own, if they want to ensure that the embedded data within their pages isn&#8217;t misinterpreted by global consumers.</p></li>
</ol>

<h3>Interpretation of <code>&lt;time&gt;</code> Element&#8217;s <code>datetime</code> Attribute</h3>

<p>Interpreting the <code>datetime</code> attribute of the <code>&lt;time&gt;</code> element to supply a value, rather than repeating that value in a <code>content</code> attribute, is <a href="http://www.w3.org/2010/02/rdfa/track/issues/97">ISSUE-97</a> on RDFa, and hopefully RDFa will be changed to use that value (or the content of the element if there is no <code>datetime</code> attribute), add a seconds component if necessary, and work out an appropriate date/time datatype for it based on its syntax.</p>

<h3>Content Overrides</h3>

<p>In RDFa, publishers can provide a machine-readable version of the content of an element (or even an entirely different value) using the <code>content</code> attribute. This can only be done for date/times in microdata. The ability to <a href="http://www.w3.org/Bugs/Public/show_bug.cgi?id=13240">annotate non-date/time content with machine-readable values</a> is a current issue on HTML5. Resolving this in favour of providing such annotation would make using RDFa and microdata in concert, or converting between them, easier, particularly if HTML5 uses the attribute <code>content</code> or RDFa adopts whatever attribute is introduced to HTML5.</p>

<h3><code>&lt;link&gt;</code> and <code>&lt;meta&gt;</code> Elements in Flow Content</h3>

<p>The ability to <a href="http://dev.w3.org/html5/md/Overview.html#content-models">use <code>&lt;link&gt;</code> and <code>&lt;meta&gt;</code> elements in flow content</a> is only supported in microdata: it&#8217;s support that&#8217;s added by the microdata specification (in the Editor&#8217;s Draft since May 31st; the text allowing this didn&#8217;t make it into the Last Call version of the spec), in which it&#8217;s limited to <code>&lt;link&gt;</code> and <code>&lt;meta&gt;</code> elements with an <code>itemprop</code> attribute. </p>

<p>It would be possible for the RDFa specification to similarly make the statement that <code>&lt;link&gt;</code> and <code>&lt;meta&gt;</code> elements are allowed in flow content as long as they have particular attributes. This would ease the transition between the two formats, and works a lot better than empty <code>&lt;span&gt;</code> elements which crop up fairly commonly in RDFa content.</p>

<p>(One oddity here is that because date/time values have to be on a <code>&lt;time&gt;</code> element in microdata, publishers cannot replace empty <code>&lt;time&gt;</code> elements with <code>&lt;meta&gt;</code> elements as they might an empty <code>&lt;span&gt;</code>.)</p>

<h3>Identifiers without Types</h3>

<p>Many of the RDFa examples are of resources that have a URI identifier but for which no type is supplied. Microdata, on the other hand, states that <code>itemid</code> is only allowed on elements that also have an <code>itemtype</code> (and an <code>itemscope</code>). The reason given is because the <code>itemid</code> needs to be interpreted based on the <code>itemtype</code>. This would be understandable if it held a string, but given that the <code>itemid</code> provides a URI it seems a bit strange. Perhaps it&#8217;s an attempt to avoid the whole <a href="http://www.jenitennison.com/blog/node/159">httpRange-14 / ambiguity in URIs issue</a>.</p>

<p>If this restriction remains, the advice to RDFa users who might want to convert to microdata at a future date would be to always provide a type for your (non-blank-node) resources. It may be useful to define a <code>http://w3.org/ns/global/Thing</code> within the vocabulary that I propose above, given that the URI for <code>rdfs:Resource</code> is long and hard to recall.</p>

<h3>Built-in Prefixes</h3>

<p>The built-in <a href="http://www.w3.org/profile/rdfa-1.1">profile for RDFa</a> defines a number of prefixes for vocabularies that are either coined by the W3C or coined elsewhere but in common use on the web. This, coupled with <code>vocab</code> and the ability to directly use URIs in the relevant attributes, means that declaring prefixes within the document is increasingly unnecessary in RDFa.</p>

<p>In contrast, using existing vocabularies, even popular ones, within microdata is relatively difficult, particularly when vocabularies are mixed on the same item.</p>

<p>Most useful for publishers would be if both RDFa and microdata recognised the same set of prefixes. This would reduce the size of microdata created from existing RDFa content as well as making it easier to move between the languages. At the very least, it would be good to have <code>rdf:</code>, <code>rdfs:</code>, <code>xsd:</code> and <code>xhv:</code> built into both.</p>

<p>The list of popular vocabularies is likely to change over time; for example a prefix for the schema.org vocabulary might be useful at some point in the near future. The problem is that publishers and consumers need to be synchronised in their use of prefixes: it&#8217;s no good for a publisher to use the prefix <code>sch:</code> if there might be processors for the page that don&#8217;t recognise it. Equally, consumers shouldn&#8217;t be reliant on a network connection to retrieve the latest set of prefix mappings in order to parse the page. It&#8217;s not clear to me how best to manage this evolution, but even a fixed set of prefixes at the point the specs reach Recommendation is more usable than spelling out URIs all the time.</p>

<h3>Literals Including Markup</h3>

<p>RDFa supports literals that include markup (the <code>innerHTML</code> of an element) as well as those that don&#8217;t (the <code>textContent</code> of an element), whereas microdata only supports creating values from particular attributes or the <code>textContent</code> of the element. This makes it hard to create embedded microdata that includes values which contain things like mathematical or chemical formulae, ruby text, or multiple paragraphs.</p>

<p>A solution would be for microdata to introduce an <code>itemhtml</code> (or something) attribute that, when present, indicates that the value of the property should include markup. There is a current issue on microdata to <a href="http://www.w3.org/Bugs/Public/show_bug.cgi?id=13468">support HTML values</a>.</p>

<h3>Itemref</h3>

<p>RDFa can support a subset of <code>itemref</code>&#8217;s functionality, namely to have properties defined elsewhere in a document be associated with a given resource. What it doesn&#8217;t support is the sharing of properties defined in one place by two or more resources.</p>

<p>RDFa could add such support by adding an attribute that mirrors <code>itemref</code> (eg <code>ref</code>, I guess), with the referenced element being processed using the <a href="http://www.w3.org/TR/rdfa-core/#evaluation-context">evaluation context</a> inherited by the referencing element (which means that attributes such as <code>vocab</code> would sometimes have a scope that wasn&#8217;t based on the document tree). This would make it easier to tackle the use case for <code>itemref</code> using RDFa as well as making it easier to move between or mix RDFa and microdata.</p>

<h3>Lists</h3>

<p>It is easy for microdata to represent a property with a list of values, and really really hard to do the same in RDFa. This is in part because RDF views lists resources rather than a distinct data type, and in part because RDFa hasn&#8217;t added any syntax sugar to make creating <code>rdf:List</code> resources easy. Adding some syntax sugar for lists would make life a lot easier for anyone using RDFa, but especially if they are adapting existing microdata content to RDFa.</p>

<h3>Datatypes</h3>

<p>Microdata assumes that consumers will convert values to appropriate datatypes based on the property (which they understand) as a separate process after microdata processing, whereas RDFa supports the use of a <code>datatype</code> attribute to explicitly indicate the datatype of each value. This mismatch means that information is lost when RDFa is converted to microdata, and has to be added when microdata is converted to RDFa.</p>

<p>Bringing the languages completely into sync would mean either microdata adding a facility to support (at least some) datatypes, or deprecating the <code>datatype</code> attribute in RDFa. Alternatively, this may simply be an area where the differences in behaviour between the two specifications doesn&#8217;t matter because the data models that they use are distinct anyway.</p>

<h3>Languages</h3>

<p>Languages are similar to datatypes, in that RDF (and hence RDFa) supports annotating strings with the language that they are in whereas microdata doesn&#8217;t within its core data model or its JSON serialisation. However, the elements that represent properties within the HTML, used within the DOM API access to microdata, will have a language.</p>

<p>It may be that in practice consumers need to base their microdata processing on the DOM API rather than the core microdata data model or JSON extracted through a standalone process, and thus pick up the language from the property elements, I don&#8217;t know. In any case, the microdata JSON serialisation, used for drag-and-drop, is lossy and could be extended to include the language of each value when available, at fairly substantial complexity cost.</p>

<p>For publishers, it doesn&#8217;t much matter either way; if they are dealing with multi-lingual text they will want to include a <code>lang</code> attribute in the HTML anyway, regardless of the impact on embedded data.</p>

<h3>Multiple Types</h3>

<p>RDFa supports having multiple types named in the <code>typeof</code> attribute whereas microdata only supports one type per item. In any mapping from RDFa to microdata, publishers have to choose which type is the primary type for the item and move the others to be expressed via <code>rdf:type</code> properties. Consumers who want to support publishers who might not choose their type as the primary type have to detect items that have the type they are interested in within the <code>rdf:type</code> property as well as those which have the type as the main type. Given that the <code>rdf:type</code> URI is long and (naturally) associated with RDF, it might be better to define a property such as <code>http://w3.org/ns/global/type</code> for this use.</p>

<p>Microdata could be extended to allow multiple values in the <code>itemtype</code> attribute, with the first being used to interpret any properties that aren&#8217;t full URIs. This would make it easier for both consumers to detect when a type they were interested in was used and for publishers to use RDFa and microdata in tandem or move between them.</p>

<h3>The <code>src</code> Attribute</h3>

<p>RDFa and microdata interpret the <code>src</code> attribute in opposite ways. In RDFa, it provides the identifier for a new resource (equivalent to <code>itemid</code> in microdata); in microdata, it provides a URL value of a property on elements that support it (equivalent to <code>resource</code> or <code>href</code> in RDFa).</p>

<p>RDFa interprets <code>src</code> in this way to make it easier to make assertions about an image, but it&#8217;s of limited effect as even in RDFa its only possible to make three such assertions (through the <code>typeof</code>, <code>rel</code> and <code>property</code> attributes). So, for example, you can specify the type of the image, link to its license and give the name of its creator, with:</p>

<pre><code>&lt;img src="photo1.jpg" typeof="foaf:Image"
  rel="license" resource="http://creativecommons.org/licenses/by/2.0/"
  property="dc:creator" content="Mark Birbeck"&gt;
</code></pre>

<p>but this won&#8217;t help you if you <em>also</em> want to give the title for the image and when it was created (say). At that point, the microdata and RDFa start to look similar:</p>

<pre><code>&lt;div itemscope itemid="photo1.jpg" itemtype="http://xmlns.com/foaf/0.1/Image"&gt;
  &lt;link itemprop="license" href="http://creativecommons.org/licenses/by/2.0/"&gt;
  &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Mark Birbeck"&gt;
  &lt;meta itemprop="http://purl.org/dc/terms/title" content="Picture of Mark"&gt;
  &lt;time itemprop="http://purl.org/dc/terms/created" datetime="2009-03-17"&gt;&lt;/time&gt;
  &lt;img src="photo1.jpg"&gt;
&lt;/div&gt;
</code></pre>

<p>or:</p>

<pre><code>&lt;div about="photo1.jpg" typeof="http://xmlns.com/foaf/0.1/Image"&gt;
  &lt;span property="http://purl.org/dc/terms/title" content="Picture of Mark"&gt;&lt;/span&gt;
  &lt;time property="http://purl.org/dc/terms/created" content="2009-03-17" datatype="xsd:date" datetime="2009-03-17"&gt;&lt;/time&gt;
  &lt;img src="photo1.jpg" typeof="foaf:Image"
    rel="license" resource="http://creativecommons.org/licenses/by/2.0/"
    property="dc:creator" content="Mark Birbeck"&gt;
&lt;/div&gt;
</code></pre>

<p>and really, to make the markup consistent, you may as well not use the <code>src</code> of the image at all in the RDFa either:</p>

<pre><code>&lt;div about="photo1.jpg" typeof="http://xmlns.com/foaf/0.1/Image"&gt;
  &lt;span rel="license" href="http://creativecommons.org/licenses/by/2.0/"&gt;&lt;/span&gt;
  &lt;span property="http://purl.org/dc/terms/creator" content="Mark Birbeck"&gt;&lt;/span&gt;
  &lt;span property="http://purl.org/dc/terms/title" content="Picture of Mark"&gt;&lt;/span&gt;
  &lt;time property="http://purl.org/dc/terms/created" content="2009-03-17" datatype="xsd:date" datetime="2009-03-17"&gt;&lt;/time&gt;
  &lt;img src="photo1.jpg"&gt;
&lt;/div&gt;
</code></pre>

<p>So it&#8217;s not clear to me that interpreting the <code>src</code> attribute as the subject of triples offers such a huge advantage that it&#8217;s worth the inconvenience that it brings for the simple things, such as having to use:</p>

<pre><code>&lt;span rel="image"&gt;&lt;img src="google-logo.png" alt="Google"&gt;&lt;/span&gt;
</code></pre>

<p>rather than:</p>

<pre><code>&lt;img property="image" src="google-logo.png" alt="Google"&gt;
</code></pre>

<h3>Link relations</h3>

<p>This isn&#8217;t so much a clash between RDFa and microdata as between the interpretation that RDFa has for the <code>rel</code> attribute and that specified in HTML.</p>

<p>The built-in <code>rel</code> values in HTML are a bit of a mix. Some of them, like <code>alternate</code>, <code>prev</code> and <code>next</code> encode relationships between the document in which the link appears and another document. Others, such as <code>bookmark</code> and <code>help</code>, create relationships between the context in which the link is found and the referenced document. Still others, like <code>nofollow</code>, <code>noreferrer</code> and <code>prefetch</code>, are really instructions to the client about how to manage the act of traversing the link.</p>

<p>It doesn&#8217;t seem semantically correct to automatically create relationships based on the built-in HTML <code>rel</code> values, unless you are deliberately trying to extract <a href="http://lin-clark.com/blog/two-meanings-semantics-html5"><em>document</em> semantics</a> from the page. This is a problem for RDFa, which reuses the <code>rel</code> attribute to provide property values for the embedded <em>data</em>.</p>

<p>One thing that could be done would be for RDFa to consistently use the <code>property</code> attribute everywhere rather than the <code>rel</code> attribute. This would not only ease the overloading but also reduce the confusion for users, who currently have to work out which attribute to use based on whether the value is a resource or a literal.</p>

<h2>Possible Subset of RDFa</h2>

<p>When mapping from microdata to RDFa, the only attributes that are really needed are:</p>

<ul>
<li><code>vocab</code> to define a vocabulary for the types and properties within its scope (not technically necessary, but keeps the markup simple compared to spelling out URIs for everything)</li>
<li><code>typeof</code> to define the type of a resource or indicate a new blank node</li>
<li><code>about</code> to provide a URI for a resource or a local identifier for a blank node</li>
<li><code>property</code> and <code>rel</code> to define property names (though see above for discussion about dropping <code>rel</code>)</li>
<li><code>href</code>, <code>src</code> and <code>content</code> to provide values (and <code>datetime</code> assuming that is supported)</li>
</ul>

<p>In the mappings in the analysis below, I did also use the <code>resource</code> attribute, but only to create a reference to a blank node that was described elsewhere, when replicating the functionality of <code>itemref</code>. If RDFa were to enable <code>&lt;link&gt;</code> and <code>&lt;meta&gt;</code> in content in the same way as microdata, <code>resource</code> functionality could be replicated using <code>&lt;link&gt;</code>; as it is, you can get away with using an empty <code>&lt;a&gt;</code> element.</p>

<p>Similarly, I only used <code>datatype</code> when providing a datatype for date/time values, something that could be done automatically by RDFa. But this isn&#8217;t surprising given that microdata doesn&#8217;t support datatypes at all and the examples I was using for the mapping were from the microdata specification.</p>

<p>There was no need for:</p>

<ul>
<li><code>prefix</code> which defines prefixes to simplify references to properties and classes; this is hardly surprising as few of the microdata examples involved mixing namespaces, but it&#8217;s notable that the built-in prefixes of <code>rdf:</code> and <code>xsd:</code> were useful</li>
<li><code>profile</code> which is a pointer to an external document that defines a set of terms; this is being dropped from RDFa in any case</li>
</ul>

<p>I also kept to a simplified version of the syntax in which each property element only provided one value. This subset is basically:</p>

<ul>
<li>resource elements can have <code>about</code> (equivalent to <code>itemid</code>) and <code>typeof</code> (equivalent to <code>itemtype</code>) attributes on them</li>
<li>property elements can have <code>property</code> or <code>rel</code> (equivalent to <code>itemprop</code>), and a value-providing attribute on them such as <code>href</code> or <code>content</code></li>
<li>no element is both a resource element and a property element; to provide a property whose value is a resource, nest the resource element within the property element (using &#8220;hanging rel&#8221; processing)</li>
<li>no property element should provide more than one value for a property; in particular, a &#8220;hanging rel&#8221; should only have a single resource element child</li>
</ul>

<p>This simplified profile of RDFa is fairly easy to remember and maps easily to and from microdata: most attributes can be simply renamed; the only attribute that needs to be moved as well as renamed is the &#8220;hanging rel&#8221;, which moves onto the resource element and is renamed to <code>itemprop</code>. Note that it also means avoiding using the <code>src</code> attribute to encode embedded data.</p>

<p>In addition to sticking to this subset of attributes, developers might be advised that using HTML link relations may lead to clashes with browser or search engine interpretation of the links in the page.</p>

<h2>Possible Subset of Microdata</h2>

<p>Microdata is pretty minimalistic already. The only feature that developers need to be warned about is <code>itemref</code>, which has no RDFa equivalent at the moment.</p>

<h2>Guidelines for Vocabulary Authors</h2>

<p>There are a several guidelines that come out of this comparison for people putting together vocabularies that aim to be usable in both RDFa and microdata:</p>

<ul>
<li>The classes in the vocabulary should be distinct, or subclasses created with any relevant combinations of superclasses, so that publishers don&#8217;t have to assign more than one type to an item/resource. This restriction helps with using the vocabulary with microdata, which assumes that every item has a single type.</li>
<li>Provide explicit classes for everything which you anticipate might be given an identifier, as microdata doesn&#8217;t (currently) enable items to have an identifier without also having a type.</li>
<li>Put classes and properties in the same namespace, but do not name classes and properties with the same local name; while this doesn&#8217;t matter in microdata because the properties are interpreted relative to the class, standard conversions to RDF will create a class and a property with the same URI. URIs are case-sensitive to a simple way of ensuring that there aren&#8217;t clashes is to follow the usual RDF convention of beginning class names with an upper-case letter and property names with a lower-case letter.</li>
<li>Avoid property names that contain dots, as these aren&#8217;t allowed in non-URI property names in microdata.</li>
<li>Ensure that properties either only expect one type of value or expect values whose type can be sniffed based on the syntax of the value. If publishers use microdata, they will not be able to indicate the type of a value through the markup.</li>
<li>Be aware that consumers of microdata using your vocabulary will have to use the DOM API to identify the language used in any strings, and that language information won&#8217;t be carried through the standard microdata JSON serialisation (used by drag-and-drop, for example). If you anticipate multi-lingual use of your vocabulary, you may way to define a <code>MultiLingual</code> class with <code>value</code> and <code>language</code> properties that people can use as nested items. (It may be useful for this class and properties to be defined in the proposed &#8216;global&#8217; W3C namespace so that it can be used anywhere.) If you know what languages will be used then provide separate properties for each language (eg for UK legislation I know the languages are English and Welsh so on a vocabulary for UK Legislation I could have <code>title-en</code> and <code>title-cy</code> properties).</li>
<li>To make markup cleaner, only reuse properties from other vocabularies on your classes if they have built-in prefixes (eg unless <code>rdfs:</code> is built-in to microdata as well as RDFa, don&#8217;t use <code>rdfs:label</code> to provide a label, but create your own <code>label</code> property). On the other hand, do reuse classes from other vocabularies if you don&#8217;t need to add any specialised properties to them. Note that avoiding reuse has the unfortunate side-effect of not enabling processors that understand these other vocabularies to process your data.</li>
<li>Avoid having properties whose values need to be retrieved in order, as these are hard to represent in RDFa. Instead, use properties with distinct names when position is important. (Yes, I know this sucks.)</li>
</ul>

<h2>Choosing Between Microdata and RDFa</h2>

<p>The choices developers make between microdata and RDFa will, I suspect, be largely dictated by what their consumers/toolsets/publishers will support. Nevertheless, there are some features that are better supported by one or other format and might therefore sway developers one way or another:</p>

<ul>
<li><strong>multi-lingual embedded data</strong> is better supported in RDF than microdata+JSON</li>
<li><strong>explicit datatypes for values</strong> can be provided by RDFa but not microdata</li>
<li><strong>resources with multiple types</strong> are a lot easier to describe in RDFa</li>
<li><strong>property values that include markup</strong> are a lot easier to write in RDFa</li>
<li><strong>mixed vocabulary use</strong> is a bit easier in RDFa than in microdata</li>
<li><strong>HTML5 link relations</strong> may be misinterpreted by RDFa processors</li>
<li><strong>properties with list values</strong> are much easier to support in microdata</li>
<li><strong>common content</strong> adopted by multiple entities is much easier in microdata</li>
</ul>

<h2>Final Words</h2>

<p>I have no doubt that developers would be better off if there were only one recommended way of embedding data in HTML (so long as it met their requirements of course). But realistically that is, and always has been, a long shot, given the entrenched positions of the microdata and RDFa communities.</p>

<p>Regardless, there are lessons that RDFa and microdata could learn from each other, and changes to both languages that would help developers use them on their own, switch between them and mix them in the same document. I expect and welcome debate about the viability and effectiveness of the changes and guidelines that I&#8217;ve suggested here.</p>

<p>Investigating those lessons, documenting those changes and generating those guidelines was something that I had hoped the microdata/RDFa task force would be able to do. The other question to ask, given the argument that there shouldn&#8217;t be a task force at all if it&#8217;s not going to be able to bring the languages together, is whether this kind of analysis is worthwhile, and worth publishing as something more official than a blog post?</p>
    ]]></content>
  </entry>
  <entry>
    <title>Mapping RDFa to Microdata</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/164" />
    <id>http://www.jenitennison.com/blog/node/164</id>
    <published>2011-08-20T16:38:38+00:00</published>
    <updated>2011-08-20T16:38:38+00:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="microdata" />
    <category term="rdfa" />
    <summary type="html"><![CDATA[<p>This post is part of a three-part series that analyses the differences in features and syntax between microdata and RDFa. The series attempts:</p>

<ul>
<li>to identify the differences in approach and functionality of the two languages, which should help developers choose between them</li>
<li>to identify any guidelines for developers of vocabularies for use with both languages</li>
<li>to identify a subset of functionality that is common between the two languages, which developers might want to stick to to make switching and mixing easier</li>
<li>to identify mapping rules that might be applied to automatically or manually map from one language to another if the simple subset is used</li>
</ul>

<p>I&#8217;ve done this by looking at converting microdata examples to RDFa and vice versa, and the lessons to be drawn from that exercise. The three posts are on:</p>

<ul>
<li><a href="http://www.jenitennison.com/blog/node/163">converting microdata to RDFa</a></li>
<li><a href="http://www.jenitennison.com/blog/node/164">converting RDFa to microdata</a></li>
<li><a href="http://www.jenitennison.com/blog/node/165">lessons learned from this exercise</a></li>
</ul>

<p>This post is the second of these, which looks at how RDFa might be mapped to microdata.  In this case, I will aim to express the RDF created from the RDFa as the equivalent microdata JSON, and aim to create that JSON with the microdata.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>This post is part of a three-part series that analyses the differences in features and syntax between microdata and RDFa. The series attempts:</p>

<ul>
<li>to identify the differences in approach and functionality of the two languages, which should help developers choose between them</li>
<li>to identify any guidelines for developers of vocabularies for use with both languages</li>
<li>to identify a subset of functionality that is common between the two languages, which developers might want to stick to to make switching and mixing easier</li>
<li>to identify mapping rules that might be applied to automatically or manually map from one language to another if the simple subset is used</li>
</ul>

<p>I&#8217;ve done this by looking at converting microdata examples to RDFa and vice versa, and the lessons to be drawn from that exercise. The three posts are on:</p>

<ul>
<li><a href="http://www.jenitennison.com/blog/node/163">converting microdata to RDFa</a></li>
<li><a href="http://www.jenitennison.com/blog/node/164">converting RDFa to microdata</a></li>
<li><a href="http://www.jenitennison.com/blog/node/165">lessons learned from this exercise</a></li>
</ul>

<p>This post is the second of these, which looks at how RDFa might be mapped to microdata.  In this case, I will aim to express the RDF created from the RDFa as the equivalent microdata JSON, and aim to create that JSON with the microdata.</p>

<!--break-->

<p>To create the microdata JSON, I&#8217;ve used the rule that the URI of the first type of a resource is processed to provide a namespace that is stripped from the URIs of the properties (to create simple names where possible). In addition, when a resource has no properties, it will be represented as a string (URI) value rather than as a nested item. Other than that I hope the mapping will be obvious; I&#8217;ll point out where it involves a loss of information. I&#8217;m assuming that the document is at <code>http://example.org/</code> throughout.</p>

<p>I have based what&#8217;s written here on the latest specifications of both microdata (in its <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html">WHAT WG</a> and <a href="http://dev.w3.org/html5/md/Overview.html">W3C</a> variants) and <a href="http://www.w3.org/2010/02/rdfa/drafts/2011/ED-rdfa-core-20110814/">RDFa Core</a> and <a href="http://dev.w3.org/html5/rdfa/">HTML+RDFa</a> but I haven&#8217;t consulted with anyone involved in these efforts and may well have got things wrong. <a href="http://foolip.org/microdatajs/live/">Philip Jägenstedt&#8217;s Live Microdata service</a> has proved invaluable for testing, so many thanks to him for providing that service.</p>

<p>The post is rather heavy going and you might want to just <a href="http://www.jenitennison.com/blog/node/165">skip to the summary</a> instead of reading the whole thing.</p>

<p>The post goes through the examples from the RDFa specification plus one additional example from the wild. I haven&#8217;t included examples that don&#8217;t illustrate anything new, so there are some that are skipped. Other examples would be welcome.</p>

<h2>Page Metadata</h2>

<blockquote>
  <p>When parsing begins, the current subject will be the IRI of the document being parsed, or a value as set by a Host Language-provided mechanism (e.g., the base element in (X)HTML). This means that by default any metadata found in the document will concern the document itself:</p>

<pre><code>&lt;html&gt;
  &lt;head&gt;
    &lt;title&gt;Jo's Friends and Family Blog&lt;/title&gt;
    &lt;link rel="foaf:primaryTopic" href="#bbq" /&gt;
    &lt;meta property="dc:creator" content="Jo" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    ...
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>
</blockquote>

<p>The equivalent microdata JSON for this document that we&#8217;d want to create is (<strong>note: invalid example</strong>):</p>

<pre><code>{ "items": [
  {
    "id": "http://example.org/" ,
    "properties": {
      "http://xmlns.com/foaf/0.1/primaryTopic": [ "http://example.org/#bbq" ],
      "http://purl.org/dc/terms/creator": [ "Jo" ]
    }
  }
]}
</code></pre>

<p>and we&#8217;d want to create it with (<strong>note: invalid example</strong>):</p>

<pre><code>&lt;html&gt;
  &lt;head itemscope itemid="."&gt;
    &lt;title&gt;Jo's Friends and Family Blog&lt;/title&gt;
    &lt;link itemprop="http://xmlns.com/foaf/0.1/primaryTopic" href="#bbq" /&gt;
    &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Jo" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    ...
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<p>However, this is is not valid according to the microdata specification. In microdata, <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#attr-itemid">only items that have types are allowed to have identifiers</a>. Rather than losing the identifier, we&#8217;ll add a type; I&#8217;m going to use <code>rdfs:Resource</code>. It&#8217;s not the nicest of URIs to type, but it&#8217;s got something close to the correct semantics. So we&#8217;ll aim for the microdata JSON:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://example.org/" ,
    "properties": {
      "http://xmlns.com/foaf/0.1/primaryTopic": [ "http://example.org/#bbq" ] ,
      "http://purl.org/dc/terms/creator": [ "Jo" ]
    }
  }
]}
</code></pre>

<p>which means we need:</p>

<pre><code>&lt;html&gt;
  &lt;head itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" itemid="."&gt;
    &lt;title&gt;Jo's Friends and Family Blog&lt;/title&gt;
    &lt;link itemprop="http://xmlns.com/foaf/0.1/primaryTopic" href="#bbq" /&gt;
    &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Jo" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    ...
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>The <code>itemscope</code> is necessary for the page to be recognised as containing any data at all.</li>
<li>The <code>itemid</code> can&#8217;t just be empty: the <code>.</code> is the shortest URI you can use to reference the page itself.</li>
<li>I put the <code>itemscope</code>, <code>itemtype</code> and <code>itemid</code> on the <code>&lt;head&gt;</code> element rather than the <code>&lt;html&gt;</code> element so that they wouldn&#8217;t be inherited into the <code>&lt;body&gt;</code>: it seems to make sense for any data within the <code>&lt;head&gt;</code> to be about the page itself.</li>
<li>The <code>foaf:</code> and <code>dc:</code> prefixes are built-in to RDFa, so it&#8217;s easy for people to use classes and properties in those common vocabularies without having to remember their full URI. In microdata, that URI and the one for the <code>rdfs:Resource</code> class have to be spelled out in full.</li>
</ul>

<h2>Base URI</h2>

<blockquote>
  <p>In (X)HTML the value of base may change the initial value of current subject:</p>

<pre><code>&lt;html&gt;
  &lt;head&gt;
    &lt;base href="http://www.example.org/jo/blog" /&gt;
    &lt;title&gt;Jo's Friends and Family Blog&lt;/title&gt;
    &lt;link rel="foaf:primaryTopic" href="#bbq" /&gt;
    &lt;meta property="dc:creator" content="Jo" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    ...
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>
</blockquote>

<p>This changes the id of the item generated:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://www.example.org/jo/blog" ,
    "properties": {
      "http://xmlns.com/foaf/0.1/primaryTopic": [ "http://www.example.org/jo/blog#bbq" ] ,
      "http://purl.org/dc/terms/creator": [ "Jo" ]
    }
  }
]}
</code></pre>

<p>In the microdata, the <code>itemid</code> can still be <code>.</code> as the base URI set by the <code>&lt;base&gt;</code> element is used to resolve it:</p>

<pre><code>&lt;html&gt;
  &lt;head itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" itemid="."&gt;
    &lt;base href="http://www.example.org/jo/blog" /&gt;
    &lt;title&gt;Jo's Friends and Family Blog&lt;/title&gt;
    &lt;link itemprop="http://xmlns.com/foaf/0.1/primaryTopic" href="#bbq" /&gt;
    &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Jo" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    ...
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<h2>Explicit Subjects / ItemIds</h2>

<blockquote>
  <p>To illustrate how this affects the statements, note in this markup how the properties inside the (X)HTML body element become part of a new calendar event object, rather than referring to the document as they do in the head of the document:</p>

<pre><code>&lt;html prefix="cal: http://www.w3.org/2002/12/cal/ical#"&gt;
  &lt;head&gt;
    &lt;title&gt;Jo's Friends and Family Blog&lt;/title&gt;
    &lt;link rel="foaf:primaryTopic" href="#bbq" /&gt;
    &lt;meta property="dc:creator" content="Jo" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;p about="#bbq" typeof="cal:Vevent"&gt;
      I'm holding
      &lt;span property="cal:summary"&gt;
        one last summer barbecue
      &lt;/span&gt;,
      on
      &lt;span property="cal:dtstart" content="2015-09-16T16:00:00-05:00" 
            datatype="xsd:dateTime"&gt;
        September 16th at 4pm
      &lt;/span&gt;.
    &lt;/p&gt;
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>
</blockquote>

<p>In microdata JSON, this would be:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://example.org/" ,
    "properties": {
      "http://xmlns.com/foaf/0.1/primaryTopic": [ "http://example.org/#bbq" ],
      "http://purl.org/dc/terms/creator": [ "Jo" ]
    }
  } ,
  {
    "type": "http://www.w3.org/2002/12/cal/ical#Vevent" ,
    "id": "http://example.org/#bbq" ,
    "properties": {
      "summary": [ "one last summer barbecue" ] ,
      "dtstart": [ "2015-09-16T16:00:00-05:00" ] ,
    }
  }
]}
</code></pre>

<p>Note that this mapping loses the fact that the value of the <code>dtstart</code> property is a date-and-time. Processors of this JSON are expected to know that the <code>dtstart</code> property takes a date/time value and would have to sniff the value to work out that it&#8217;s a date-and-time rather than a date.</p>

<p>In-browser microdata processors can identify the value as a date/time value because the property element itself is accessed through the <code>element.properties</code> IDL attribute; processors that work with this DOM API can tell that it&#8217;s a <code>&lt;time&gt;</code> element, get hold of the date/time itself and access the content of the element for the human-readable representation used on the page. However, this information isn&#8217;t part of the core <a href="http://www.w3.org/TR/microdata/#the-microdata-model">microdata data model</a>.</p>

<p>To create this JSON from microdata you need:</p>

<pre><code>&lt;html&gt;
  &lt;head itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" itemid="."&gt;
    &lt;title&gt;Jo's Friends and Family Blog&lt;/title&gt;
    &lt;link itemprop="http://xmlns.com/foaf/0.1/primaryTopic" href="#bbq" /&gt;
    &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Jo" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;p itemscope itemid="#bbq" itemtype="http://www.w3.org/2002/12/cal/ical#Vevent"&gt;
      I'm holding
      &lt;span itemprop="summary"&gt;
        one last summer barbecue
      &lt;/span&gt;,
      on
      &lt;time itemprop="dtstart" datetime="2015-09-16T16:00:00-05:00"&gt;
        September 16th at 4pm
      &lt;/time&gt;.
    &lt;/p&gt;
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>There are no prefix definitions in microdata, so the type has to be spelled out in full. However, with the mapping I&#8217;m assuming from RDFa to microdata JSON, the properties in that same namespace for items in that class don&#8217;t.</li>
<li>The <code>itemscope</code> has to be added despite the <code>&lt;p&gt;</code> element having both an <code>itemid</code> and an <code>itemtype</code>; if the <code>itemscope</code> is forgotten, the item isn&#8217;t recognised.</li>
<li>The original <code>&lt;span&gt;</code> element has to be changed to a <code>&lt;time&gt;</code> element because it isn&#8217;t conformant microdata for a date/time value to be supplied by any other element.</li>
</ul>

<h2>Items from the <code>src</code> Attribute</h2>

<blockquote>
  <p>If @about is not present, then @src is next in priority order, for setting the subject of a statement. A typical use would be to indicate the licensing type of an image:</p>

<pre><code>&lt;img src="photo1.jpg" rel="license" 
     resource="http://creativecommons.org/licenses/by/2.0/" /&gt;
</code></pre>
</blockquote>

<p>This should generate the microdata JSON:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://example.org/photo1.jpg" ,
    "properties": {
      "http://www.w3.org/1999/xhtml/vocab#license": [ "http://creativecommons.org/licenses/by/2.0/" ] ,
    }
  }
]}
</code></pre>

<p>The <code>src</code> attribute in microdata is only used for a value, so creating the microdata about the image means a wrapper <code>&lt;span&gt;</code> element and a separate <code>&lt;link&gt;</code> element:</p>

<pre><code> &lt;span itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" itemid="photo1.jpg"&gt;
   &lt;img src="photo1.jpg" /&gt;
   &lt;link itemprop="http://www.w3.org/1999/xhtml/vocab#license" href="http://creativecommons.org/licenses/by/2.0/" /&gt;
 &lt;/span&gt;
</code></pre>

<p>Note:</p>

<ul>
<li>The <code>license</code> property is part of the built-in set of link relationships in HTML, but there is no easy way to refer to that property from microdata; they have to be spelled out as full URLs.</li>
</ul>

<h2>Additional Properties for Images</h2>

<blockquote>
  <p>Since there is no difference between @src and @about, then the information expressed in the last example in the section on @about (the creator of an image), could be expressed as follows:</p>

<pre><code>&lt;img src="photo1.jpg"
  rel="license" resource="http://creativecommons.org/licenses/by/2.0/"
  property="dc:creator" content="Mark Birbeck"
/&gt;
</code></pre>
</blockquote>

<p>This is a simple additional property in the microdata JSON:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://example.org/photo1.jpg" ,
    "properties": {
      "http://www.w3.org/1999/xhtml/vocab#license": [ "http://creativecommons.org/licenses/by/2.0/" ] ,
      "http://purl.org/dc/terms/creator": [ "Mark Birbeck" ]
    }
  }
]}
</code></pre>

<p>which can be created through a <code>&lt;meta&gt;</code> element within the <code>&lt;span&gt;</code>:</p>

<pre><code>&lt;span itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" itemid="photo1.jpg"&gt;
  &lt;img src="photo1.jpg" /&gt;
  &lt;link itemprop="http://www.w3.org/1999/xhtml/vocab#license" href="http://creativecommons.org/licenses/by/2.0/" /&gt;
  &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Mark Birbeck" /&gt;
&lt;/span&gt;
</code></pre>

<h2>Nested Images</h2>

<blockquote>
  <p>Since normal chaining rules will apply, the image IRI can also be used to complete hanging triples:</p>

<pre><code>&lt;div about="http://www.blogger.com/profile/1109404" rel="foaf:img"&gt;
  &lt;img src="photo1.jpg"
    rel="license" resource="http://creativecommons.org/licenses/by/2.0/"
    property="dc:creator" content="Mark Birbeck"
  /&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>This should generate the microdata JSON:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://www.blogger.com/profile/1109404" ,
    "properties": {
      "http://xmlns.com/foaf/0.1/img": [{
        "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
        "id": "http://example.org/photo1.jpg" ,
        "properties": {
          "http://www.w3.org/1999/xhtml/vocab#license": [ "http://creativecommons.org/licenses/by/2.0/" ] ,
          "http://purl.org/dc/terms/creator": [ "Mark Birbeck" ]
        }
      }]
    }
  }
]}
</code></pre>

<p>The microdata to generate this is:</p>

<pre><code>&lt;div itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" 
     itemid="http://www.blogger.com/profile/1109404"&gt;
  &lt;span itemprop="http://xmlns.com/foaf/0.1/img" 
        itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" 
        itemid="photo1.jpg"&gt;
    &lt;img src="photo1.jpg" /&gt;
    &lt;link itemprop="http://www.w3.org/1999/xhtml/vocab#license" href="http://creativecommons.org/licenses/by/2.0/" /&gt;
    &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Mark Birbeck" /&gt;
  &lt;/span&gt;
&lt;/div&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>The big gotcha in this conversion is that in microdata, the <code>foaf:img</code> property has to be moved onto the item that is a value of that property; there&#8217;s no equivalent to the &#8220;hanging rel&#8221; processing in RDFa. A disadvantage of this is that anyone copying-and-pasting the <code>&lt;span&gt;</code> element to embed the same information about the image within their own page will have the <code>itemprop</code> attribute carried along with the image, into a context where the <code>foaf:img</code> property might not be relevant.</li>
</ul>

<h2>Types with Blank Nodes</h2>

<blockquote>
  <p>For example, an author may wish to create markup for a person using the FOAF vocabulary, but without having a clear identifier for the item:</p>

<pre><code>&lt;div typeof="foaf:Person"&gt;
  &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
  &lt;span property="foaf:givenName"&gt;Albert&lt;/span&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>Now we have an explicit type, we can create microdata JSON that uses short names:</p>

<pre><code>{ "items": [
  {
    "type": "http://xmlns.com/foaf/0.1/Person" ,
    "properties": {
      "name": [ "Albert Einstein" ] ,
      "givenName": [ "Albert" ] ,
    }
  }
]}
</code></pre>

<p>This can be generated with the microdata:</p>

<pre><code>&lt;div itemscope itemtype="http://xmlns.com/foaf/0.1/Person"&gt;
  &lt;span itemprop="name"&gt;Albert Einstein&lt;/span&gt;
  &lt;span itemprop="givenName"&gt;Albert&lt;/span&gt;
&lt;/div&gt;
</code></pre>

<p>which is nice and simple.</p>

<h2>Inherited Subject</h2>

<blockquote>
  <p>The most usual way that an inherited subject might get set would be when the parent statement has an object that is a resource. Returning to the earlier example, in which the long name for the German_Empire was added, the following markup was used:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Albert_Einstein"&gt;
  &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
  &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
  &lt;div rel="dbp:birthPlace" resource="http://dbpedia.org/resource/German_Empire" /&gt;
  &lt;span about="http://dbpedia.org/resource/German_Empire"
    property="dbp:conventionalLongName"&gt;the German Empire&lt;/span&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>The equivalent microdata JSON for this would be:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://dbpedia.org/resource/Albert_Einstein" ,
    "properties": {
      "http://xmlns.com/foaf/0.1/name": [ "Albert Einstein" ] ,
      "http://dbpedia.org/property/dateOfBirth": [ "1879-03-14" ] ,
      "http://dbpedia.org/property/birthPlace": [ "http://dbpedia.org/resource/German_Empire" ]
    }
  } , {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://dbpedia.org/resource/German_Empire" ,
    "properties": {
      "http://dbpedia.org/property/conventionalLongName": [ "the German Empire" ] ,
    }
  }
]}
</code></pre>

<p>Note that this microdata JSON could only be generated syntactically from the RDFa, not via RDF, because going via RDF would make it impossible to know whether to give the <code>dbp:birthPlace</code> property a string (which is a URI) value or a nested item. We&#8217;ll see the alternative version of the microdata RDF in the next example.</p>

<p>To create this microdata JSON, we need:</p>

<pre><code>&lt;div itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" 
     itemid="http://dbpedia.org/resource/Albert_Einstein"&gt;
  &lt;span itemprop="http://xmlns.com/foaf/0.1/name"&gt;Albert Einstein&lt;/span&gt;
  &lt;time itemprop="http://dbpedia.org/property/dateOfBirth"&gt;1879-03-14&lt;/time&gt;
  &lt;link itemprop="http://dbpedia.org/property/birthPlace" href="http://dbpedia.org/resource/German_Empire" /&gt;
  &lt;span itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource"  
        itemid="http://dbpedia.org/resource/German_Empire"&gt;
    &lt;span itemprop="http://dbpedia.org/property/conventionalLongName"&gt;the German Empire&lt;/span&gt;
  &lt;/span&gt;
&lt;/div&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>I&#8217;ve had to change two elements here: the <code>&lt;span&gt;</code> for the date of birth has become a <code>&lt;time&gt;</code> element as the value of the property is a date, and the <code>&lt;div&gt;</code> for the birth place has become a <code>&lt;link&gt;</code> element because the value of that property is a URL.</li>
<li>I&#8217;ve also had to add a nested <code>&lt;span&gt;</code> element as it&#8217;s not possible in microdata to have a single element describe both an item and a property for that item as it is in RDFa.</li>
</ul>

<blockquote>
  <p>In an earlier illustration the subject and object for the German Empire were connected by removing the @resource, relying on the @about to set the object:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Albert_Einstein"&gt;
  &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
  &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
  &lt;div rel="dbp:birthPlace"&gt;
    &lt;span about="http://dbpedia.org/resource/German_Empire"
      property="dbp:conventionalLongName"&gt;the German Empire&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>While this generates the same RDF as the previous example, the microdata JSON that it generates should probably be different: this time, the item for the German Empire is nested within the item for Albert Einstein:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://dbpedia.org/resource/Albert_Einstein" ,
    "properties": {
      "http://xmlns.com/foaf/0.1/name": [ "Albert Einstein" ] ,
      "http://dbpedia.org/property/dateOfBirth": [ "1879-03-14" ] ,
      "http://dbpedia.org/property/birthPlace": [{
        "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
        "id": "http://dbpedia.org/resource/German_Empire" ,
        "properties": {
          "http://dbpedia.org/property/conventionalLongName": [ "the German Empire" ]
        }
      }
    }
  }
]}
</code></pre>

<p>To create this, the microdata needs to look like:</p>

<pre><code>&lt;div itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" 
     itemid="http://dbpedia.org/resource/Albert_Einstein"&gt;
  &lt;span itemprop="http://xmlns.com/foaf/0.1/name"&gt;Albert Einstein&lt;/span&gt;
  &lt;time itemprop="http://dbpedia.org/property/dateOfBirth"&gt;1879-03-14&lt;/time&gt;
  &lt;div itemprop="http://dbpedia.org/property/birthPlace" 
       itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource"  
       itemid="http://dbpedia.org/resource/German_Empire"&gt;
    &lt;span itemprop="http://dbpedia.org/property/conventionalLongName"&gt;the German Empire&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>

<p>Note that while this looks quite similar to the RDFa version, in fact the <code>itemid</code> attribute that holds the URI for the German Empire is on a different element from the <code>about</code> attribute in the RDFa.</p>

<p>The third RDFa example around this same content is:</p>

<blockquote>
  <p>but it is also possible for authors to achieve the same effect by removing the @about and leaving the @resource:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Albert_Einstein"&gt;
  &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
  &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
  &lt;div rel="dbp:birthPlace" resource="http://dbpedia.org/resource/German_Empire"&gt;
    &lt;span property="dbp:conventionalLongName"&gt;the German Empire&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>This should lead to the same microdata JSON, so I won&#8217;t bother repeating the microdata. What&#8217;s interesting is that this pattern: the wrapper element containing the property (<code>rel</code>) and identifier for the item that is the value for that property (<code>resource</code>) is a lot closer to the microdata pattern of expressing nested items. The big distinction here is that while in microdata, the <code>itemtype</code> also resides on that element, if you tried adding a <code>typeof</code> attribute to the inner <code>&lt;div&gt;</code> in RDFa, you&#8217;d end up with a new blank node.</p>

<h2>Anonymous Nested Resources</h2>

<blockquote>
  <p>However, an author could just as easily say that Spinoza influenced something by the name of Albert Einstein, that was born on March 14th, 1879:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Baruch_Spinoza" rel="dbp-owl:influenced"&gt;
  &lt;div&gt;
    &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
    &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>This should generate the microdata JSON:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://dbpedia.org/resource/Baruch_Spinoza" ,
    "properties": {
      "http://dbpedia.org/ontology/influenced": [{
        "properties": {
          "http://xmlns.com/foaf/0.1/name": [ "Albert Einstein" ] ,
          "http://dbpedia.org/property/dateOfBirth": [ "1879-03-14" ]
        }
      }]
    }
  }
]}
</code></pre>

<p>which again means moving an attribute in microdata:</p>

<pre><code>&lt;div itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource"
     itemid="http://dbpedia.org/resource/Baruch_Spinoza"&gt;
  &lt;div itemprop="http://dbpedia.org/ontology/influenced"
       itemscope&gt;
    &lt;span itemprop="http://xmlns.com/foaf/0.1/name"&gt;Albert Einstein&lt;/span&gt;
    &lt;time itemprop="http://dbpedia.org/property/dateOfBirth"&gt;1879-03-14&lt;/time&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>

<p>It is generally harder to move to microdata from RDFa when the RDFa has an element that both provides a subject and provides a property.</p>

<p>The RDFa spec provides a couple of additional methods of marking up the same content to give exactly the same RDF (and microdata JSON):</p>

<blockquote>
  <p>Note that the div is superfluous, and an RDFa Processor will create the intermediate object even if the element is removed:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Baruch_Spinoza" rel="dbp-owl:influenced"&gt;
  &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
  &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
&lt;/div&gt;
</code></pre>
  
  <p>An alternative pattern is to keep the div and move the @rel onto it:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Baruch_Spinoza"&gt;
  &lt;div rel="dbp-owl:influenced"&gt;
    &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
    &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>
  
  <p>From the point of view of the markup, this latter layout is to be preferred, since it draws attention to the &#8216;hanging rel&#8217;. But from the point of view of an RDFa Processor, all of these permutations need to be supported.</p>
</blockquote>

<p>Interestingly, it&#8217;s this latter permutation that is the one that&#8217;s closest to the microdata method of expressing the data, though as we will see in the next section, the &#8220;hanging rel&#8221; is not exactly equivalent to the <code>itemprop</code> on the wrapper element.</p>

<h2>Hanging Rels</h2>

<blockquote>
  <p>Note that each occurrence of @about will complete any incomplete triples. For example, to mark up the fact that Albert Einstein had a residence both in the German Empire and Switzerland, an author need only specify one @rel value that is then used with multiple @about values:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Albert_Einstein" rel="dbp-owl:residence"&gt;
  &lt;span about="http://dbpedia.org/resource/German_Empire" /&gt;
  &lt;span about="http://dbpedia.org/resource/Switzerland" /&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>The data embedded here gives two values for the <code>dbp-owl:residence</code> property:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://dbpedia.org/resource/Albert_Einstein" ,
    "properties": {
      "http://dbpedia.org/ontology/residence": [
        "http://dbpedia.org/resource/German_Empire" ,
        "http://dbpedia.org/resource/Switzerland"
      ]
    }
  }
]}
</code></pre>

<p>In microdata, the <code>itemprop</code> attribute has to appear on both the nested elements to make it clear that they both provide values for that property:</p>

<pre><code>&lt;div itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" 
     itemid="http://dbpedia.org/resource/Albert_Einstein"&gt;
 &lt;link itemprop="http://dbpedia.org/ontology/residence" 
       href="http://dbpedia.org/resource/German_Empire" /&gt;
 &lt;link itemprop="http://dbpedia.org/ontology/residence"
       href="http://dbpedia.org/resource/Switzerland" /&gt;
&lt;/div&gt;
</code></pre>

<p>The next example illustrates this with nested items rather than strings:</p>

<blockquote>
  <p>To illustrate, to indicate that Spinoza influenced both Einstein and Schopenhauer, the following markup could be used:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Baruch_Spinoza"&gt;
  &lt;div rel="dbp-owl:influenced"&gt;
    &lt;div typeof="foaf:Person"&gt;
      &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
      &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
    &lt;/div&gt;
    &lt;div typeof="foaf:Person"&gt;
      &lt;span property="foaf:name"&gt;Arthur Schopenhauer&lt;/span&gt;
      &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1788-02-22&lt;/span&gt;
    &lt;/div&gt;          
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>This should generate:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://dbpedia.org/resource/Baruch_Spinoza" ,
    "properties": {
      "http://dbpedia.org/ontology/influenced": [{
        "type": "http://xmlns.com/foaf/0.1/Person" ,
        "properties": {
          "name": [ "Albert Einstein" ] ,
          "http://dbpedia.org/property/dateOfBirth": [ "1879-03-14" ]
        }
      }, {
        "type": "http://xmlns.com/foaf/0.1/Person" ,
        "properties": {
          "name": [ "Arthur Schopenhauer" ] ,
          "http://dbpedia.org/property/dateOfBirth": [ "1788-02-22" ]
        }
      }]
    }
  }
]}
</code></pre>

<p>In this case, the <code>itemprop</code> that is equivalent to the RDFa <code>rel</code> has to move down onto the elements representing the items:</p>

<pre><code>&lt;div itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource"
     itemid="http://dbpedia.org/resource/Baruch_Spinoza"&gt;
  &lt;div&gt;
    &lt;div itemprop="http://dbpedia.org/ontology/influenced"
         itemscope itemtype="http://xmlns.com/foaf/0.1/Person"&gt;
      &lt;span itemprop="name"&gt;Albert Einstein&lt;/span&gt;
      &lt;time itemprop="http://dbpedia.org/property/dateOfBirth"&gt;1879-03-14&lt;/time&gt;
    &lt;/div&gt;
    &lt;div itemprop="http://dbpedia.org/ontology/influenced"
         itemscope itemtype="http://xmlns.com/foaf/0.1/Person"&gt;
      &lt;span itemprop="name"&gt;Arthur Schopenhauer&lt;/span&gt;
      &lt;time itemprop="http://dbpedia.org/property/dateOfBirth"&gt;1788-02-22&lt;/time&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>

<p>The wrapper <code>&lt;div&gt;</code> around both items isn&#8217;t necessary; I&#8217;ve left it to stay as close to the markup of the original RDFa as possible.</p>

<h2>Implicit Resources</h2>

<blockquote>
  <p>Triples are also &#8216;completed&#8217; if any one of @property, @rel or @rev are present. However, unlike the situation when @about or @typeof are present, all predicates are attached to one bnode:</p>

<pre><code>&lt;div about="http://dbpedia.org/resource/Baruch_Spinoza" rel="dbp-owl:influenced"&gt;
  &lt;span property="foaf:name"&gt;Albert Einstein&lt;/span&gt;
  &lt;span property="dbp:dateOfBirth" datatype="xsd:date"&gt;1879-03-14&lt;/span&gt;
  &lt;div rel="dbp-owl:residence"&gt;
    &lt;span about="http://dbpedia.org/resource/German_Empire" /&gt;
    &lt;span about="http://dbpedia.org/resource/Switzerland" /&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>To be equivalent to the RDF generated from this markup, the microdata JSON would be:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://dbpedia.org/resource/Baruch_Spinoza" ,
    "properties": {
      "http://dbpedia.org/ontology/influenced": [{
        "properties": {
          "http://xmlns.com/foaf/0.1/name": [ "Albert Einstein" ] ,
          "http://dbpedia.org/property/dateOfBirth": [ "1879-03-14" ]
          "http://dbpedia.org/ontology/residence": [
            "http://dbpedia.org/resource/German_Empire" ,
            "http://dbpedia.org/resource/Switzerland"
          ]
        }
      }]
    }
  }
]}
</code></pre>

<p>Microdata is a lot more explicit about when items get created, and consequently requires a bit more markup:</p>

<pre><code>&lt;div itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource"
     itemid="http://dbpedia.org/resource/Baruch_Spinoza"&gt;
  &lt;div itemprop="http://dbpedia.org/ontology/influenced" itemscope&gt;
    &lt;span itemprop="name"&gt;Albert Einstein&lt;/span&gt;
    &lt;time itemprop="http://dbpedia.org/property/dateOfBirth"&gt;1879-03-14&lt;/time&gt;
    &lt;div&gt;
     &lt;link itemprop="http://dbpedia.org/ontology/residence" 
           href="http://dbpedia.org/resource/German_Empire" /&gt;
     &lt;link itemprop="http://dbpedia.org/ontology/residence"
           href="http://dbpedia.org/resource/Switzerland" /&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>

<h2>Overriding Text Content</h2>

<blockquote>
  <p>The value of @content is given precedence over any element content, so the following would give exactly the same triple as shown above:</p>

<pre><code>&lt;span about="http://internet-apps.blogspot.com/"
      property="dc:creator" content="Mark Birbeck"&gt;John Doe&lt;/span&gt;
</code></pre>
</blockquote>

<p>The equivalent microdata should generate the JSON:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://internet-apps.blogspot.com/" ,
    "properties": {
      "http://purl.org/dc/terms/creator": [ "Mark Birbeck" ]
    }
  }
]}
</code></pre>

<p>Only the <code>&lt;time&gt;</code> element and links override the content of an element in microdata. So a mirror of this example needs a separate element:</p>

<pre><code>  &lt;span itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource"
        itemid="http://internet-apps.blogspot.com/"&gt;
    &lt;meta itemprop="http://purl.org/dc/terms/creator" content="Mark Birbeck" /&gt;
    John Doe
  &lt;/span&gt;
</code></pre>

<h2>Language Support</h2>

<blockquote>
  <p>In RDFa the Host Language may provide a mechanism for setting the language tag. In XHTML+RDFa [XHTML-RDFA], for example, the XML language attribute @xml:lang or the attribute @lang is used to add this information, whether the plain literal is designated by @content, or by the inline text of the element:</p>

<pre><code>&lt;meta about="http://example.org/node"
  property="ex:property" xml:lang="fr" content="chat" /&gt;
</code></pre>
</blockquote>

<p>Like the datatype of a value, the language of a value isn&#8217;t captured by the microdata data model or the JSON representation of that data model. So the fact that &#8216;chat&#8217; is French is lost:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://example.org/node" ,
    "properties": {
      "http://example.org/property": [ "chat" ]
    }
  }
]}
</code></pre>

<p>The equivalent microdata is thus:</p>

<pre><code>&lt;span itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource"
      itemid="http://example.org/node"
  &lt;meta itemprop="ex:property" xml:lang="fr" content="chat" /&gt;
&lt;/span&gt;
</code></pre>

<p>with the language only accessible if you are using the DOM to process the microdata.</p>

<h2>Literals that Include Markup</h2>

<blockquote>
  <p>RDFa therefore supports the use of normal markup to express XML literals, by using @datatype:</p>

<pre><code>&lt;h2 property="dc:title" datatype="rdf:XMLLiteral"&gt;
  E = mc&lt;sup&gt;2&lt;/sup&gt;: The Most Urgent Problem of Our Time
&lt;/h2&gt;
</code></pre>
</blockquote>

<p>The <code>datatype="rdf:XMLLiteral"</code> acts like a flag to indicate that the serialised content of the element (<code>innerHTML</code>) needs to be used as the value of the property, rather than the <code>textContent</code>, which includes markup, can be expressed in microdata JSON as follows:</p>

<pre><code>{ "http://purl.org/dc/terms/title": "E = mc&lt;sup&gt;2&lt;/sup&gt;: The Most Urgent Problem of Our Time" }
</code></pre>

<p>There&#8217;s no way to generate this in microdata except by repeating the escaped version of the content in a <code>content</code> attribute:</p>

<pre><code>&lt;h2&gt;
  E = mc&lt;sup&gt;2&lt;/sup&gt;: The Most Urgent Problem of Our Time
  &lt;meta itemprop="http://purl.org/dc/terms/title"
    content="E = mc&amp;lt;sup&gt;2&amp;lt;/sup&gt;: The Most Urgent Problem of Our Time" /&gt;
&lt;/h2&gt;
</code></pre>

<p>This is hardly ideal. It&#8217;s tedious enough with a short string like this one; for larger amounts of information such as long descriptions of an event, it would be very tedious.</p>

<h2>The <code>resource</code> Attribute</h2>

<blockquote>
  <p>RDFa provides the @resource attribute as a way to set the object of statements. This is particularly useful when referring to resources that are not themselves navigable links:</p>

<pre><code>&lt;html&gt;
  &lt;head&gt;
    &lt;title&gt;On Crime and Punishment&lt;/title&gt;
    &lt;base href="http://www.example.com/candp.xhtml" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;blockquote about="#q1" rel="dc:source" resource="urn:ISBN:0140449132" &gt;
      &lt;p id="q1"&gt;
        Rodion Romanovitch! My dear friend! If you go on in this way
        you will go mad, I am positive! Drink, pray, if only a few drops!
      &lt;/p&gt;
    &lt;/blockquote&gt;
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>
</blockquote>

<p>This should produce:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2000/01/rdf-schema#Resource" ,
    "id": "http://www.example.com/candp.xhtml#q1" ,
    "properties": {
      "http://purl.org/dc/terms/source": [ "urn:ISBN:0140449132" ]
    }
  }
]}
</code></pre>

<p>which is expressed through:</p>

<pre><code>&lt;html&gt;
  &lt;head&gt;
    &lt;title&gt;On Crime and Punishment&lt;/title&gt;
    &lt;base href="http://www.example.com/candp.xhtml" /&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;blockquote itemscope itemtype="http://www.w3.org/2000/01/rdf-schema#Resource" itemid="#q1"&gt;
      &lt;link itemprop="http://purl.org/dc/terms/source" href="urn:ISBN:0140449132" /&gt;
      &lt;p id="q1"&gt;
        Rodion Romanovitch! My dear friend! If you go on in this way
        you will go mad, I am positive! Drink, pray, if only a few drops!
      &lt;/p&gt;
    &lt;/blockquote&gt;
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<p>The property and value have to be moved onto a nested <code>&lt;link&gt;</code>, but this is a more extensible pattern than the RDFa method as it enables other properties to be expressed in the same way.</p>

<h2>Multiple Types</h2>

<p>This last example comes from the wild rather than being an example in the specification. At <a href="http://bitmunk.com/browse">http://bitmunk.com/browse</a> we find:</p>

<pre><code>&lt;span about="http://bitmunk.com/about#service" 
      typeof="vcard:VCard commerce:Business gr:BusinessEntity" 
      property="rdfs:label vcard:fn"&gt;Bitmunk&lt;/span&gt;
</code></pre>

<p>This shows the use of multiple types and of multiple properties with the same value, because the pages are attempting to use multiple vocabularies that cover the same domain (organisations) to different depths. In the equivalent microdata, we have to choose one of the types; I&#8217;m going to assume that it should just use the first one from the <code>typeof</code> attribute:</p>

<pre><code>{ "items": [
  {
    "type": "http://www.w3.org/2006/vcard/ns#VCard" ,
    "id": "http://bitmunk.com/about#service" ,
    "properties": {
      "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": [
        "http://purl.org/commerce#Business" ,
        "http://purl.org/goodrelations/v1#BusinessEntity"
      ] ,
      "http://www.w3.org/2000/01/rdf-schema#label": [ "Bitmunk" ] ,
      "fn": [ "Bitmunk" ]
    }
  }
]}
</code></pre>

<p>The microdata equivalent is:</p>

<pre><code>&lt;span itemscope itemid="http://bitmunk.com/about#service" 
      itemtype="http://www.w3.org/2006/vcard/ns#VCard"&gt;
  &lt;link itemprop="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
       href="http://purl.org/commerce#Business" /&gt;
  &lt;link itemprop="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
       href="http://purl.org/goodrelations/v1#BusinessEntity" /&gt;
  &lt;span itemprop="http://www.w3.org/2000/01/rdf-schema#label fn"&gt;Bitmunk&lt;/span&gt;
&lt;/span&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>Technically, the RDFa doesn&#8217;t place any ordering on the three classes, but I&#8217;m picking the first for the purpose of the microdata conversion. The other classes are harder to get at in the JSON: they have to be referenced via the <code>rdf:type</code> microdata property rather than the <code>type</code> JSON property. Consumers that are on the lookout for items of the type <code>gr:BusinessEntity</code> wouldn&#8217;t spot these items.</li>
</ul>
    ]]></content>
  </entry>
  <entry>
    <title>Mapping Microdata to RDFa</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/163" />
    <id>http://www.jenitennison.com/blog/node/163</id>
    <published>2011-08-20T16:35:28+00:00</published>
    <updated>2011-08-22T14:50:57+00:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="microdata" />
    <category term="rdfa" />
    <summary type="html"><![CDATA[<p>This post is part of a three-part series that analyses the differences in features and syntax between microdata and RDFa. The series attempts:</p>

<ul>
<li>to identify the differences in approach and functionality of the two languages, which should help developers choose between them</li>
<li>to identify any guidelines for developers of vocabularies for use with both languages</li>
<li>to identify a subset of functionality that is common between the two languages, which developers might want to stick to to make switching and mixing easier</li>
<li>to identify mapping rules that might be applied to automatically or manually map from one language to another if the simple subset is used</li>
</ul>

<p>I&#8217;ve done this by looking at converting microdata examples to RDFa and vice versa, and the lessons to be drawn from that exercise. The three posts are on:</p>

<ul>
<li><a href="http://www.jenitennison.com/blog/node/163">converting microdata to RDFa</a></li>
<li><a href="http://www.jenitennison.com/blog/node/164">converting RDFa to microdata</a></li>
<li><a href="http://www.jenitennison.com/blog/node/165">lessons learned from this exercise</a></li>
</ul>

<p>This post is the first of these, which looks at how microdata might be mapped to RDFa, in terms of generating the same RDF according to the microdata-to-RDF mapping rules that I outlined in my post on <a href="http://www.jenitennison.com/blog/node/162">Microdata + RDF</a>.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>This post is part of a three-part series that analyses the differences in features and syntax between microdata and RDFa. The series attempts:</p>

<ul>
<li>to identify the differences in approach and functionality of the two languages, which should help developers choose between them</li>
<li>to identify any guidelines for developers of vocabularies for use with both languages</li>
<li>to identify a subset of functionality that is common between the two languages, which developers might want to stick to to make switching and mixing easier</li>
<li>to identify mapping rules that might be applied to automatically or manually map from one language to another if the simple subset is used</li>
</ul>

<p>I&#8217;ve done this by looking at converting microdata examples to RDFa and vice versa, and the lessons to be drawn from that exercise. The three posts are on:</p>

<ul>
<li><a href="http://www.jenitennison.com/blog/node/163">converting microdata to RDFa</a></li>
<li><a href="http://www.jenitennison.com/blog/node/164">converting RDFa to microdata</a></li>
<li><a href="http://www.jenitennison.com/blog/node/165">lessons learned from this exercise</a></li>
</ul>

<p>This post is the first of these, which looks at how microdata might be mapped to RDFa, in terms of generating the same RDF according to the microdata-to-RDF mapping rules that I outlined in my post on <a href="http://www.jenitennison.com/blog/node/162">Microdata + RDF</a>.</p>

<!--break-->

<p>I have based what&#8217;s written here on the latest specifications of both microdata (in its <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html">WHAT WG</a> and <a href="http://dev.w3.org/html5/md/Overview.html">W3C</a> variants) and <a href="http://www.w3.org/2010/02/rdfa/drafts/2011/ED-rdfa-core-20110814/">RDFa Core</a> and <a href="http://dev.w3.org/html5/rdfa/">HTML+RDFa</a> but I haven&#8217;t consulted with anyone involved in these efforts and may well have got things wrong. <a href="http://rdf.greggkellogg.net/distiller">Gregg Kellogg&#8217;s Distiller service</a> has proved invaluable for testing, so many thanks to him for providing that service.</p>

<p>The post is rather heavy going and you might want to just <a href="http://www.jenitennison.com/blog/node/165">skip to the summary</a> instead of reading the whole thing.</p>

<p>The post goes through the examples from the microdata specification (most of them are in both versions, the only exceptions being those that use the vCard vocabulary). I haven&#8217;t included examples that don&#8217;t illustrate anything new, so there are some that are skipped. Other examples would be welcome.</p>

<h2>Unidentified Items / Blank Node Subjects</h2>

<blockquote>
  <p>Here there are two items, each of which has the property &#8220;name&#8221;:</p>

<pre><code>&lt;div itemscope&gt;
 &lt;p&gt;My name is &lt;span itemprop="name"&gt;Elizabeth&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;

&lt;div itemscope&gt;
 &lt;p&gt;My name is &lt;span itemprop="name"&gt;Daniel&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>The first challenge is to map this into RDFa because the properties are tokens rather than URIs and there is no type for either of the items. What I&#8217;ll assume here is that the <code>name</code> properties are local to the document itself and thus the equivalent RDF is:</p>

<pre><code>[ &lt;#name&gt; "Elizabeth" ] .
[ &lt;#name&gt; "Daniel" ] .
</code></pre>

<p>This can be achieved in RDFa through either:</p>

<pre><code>&lt;div vocab="#" about="_:elizabeth"&gt;
  &lt;p&gt;My name is &lt;span property="name"&gt;Elizabeth&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;

&lt;div vocab="#" about="_:daniel"&gt;
  &lt;p&gt;My name is &lt;span property="name"&gt;Daniel&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;
</code></pre>

<p>or:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
  &lt;p&gt;My name is &lt;span property="name"&gt;Elizabeth&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;

&lt;div vocab="#" typeof&gt;
  &lt;p&gt;My name is &lt;span property="name"&gt;Daniel&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>The <code>vocab="#"</code> sets the vocabulary to the location of the current document (plus an empty fragment identifier); this URI is then concatenated to the property token (<code>name</code>) to create a URI that is unique to the document. In a document such as this it would make sense to put the <code>vocab="#"</code> attribute on the <code>&lt;html&gt;</code> element rather than on every single item.</li>
<li>With no type in sight, blank nodes can either be created by having an empty <code>typeof</code> attribute or through an <code>about</code> attributes whose value starts with <code>_:</code>. The latter has the advantage of providing an identifier for the blank node that can be used elsewhere in the document, but the former is shorter so will be used where possible in the remaining examples of this post.</li>
</ul>

<h2>Values from the <code>src</code> Attribute</h2>

<p>The next example introduces the use of the <code>src</code> attribute to set the value of the property.</p>

<blockquote>
  <p>In this example, the item has one property, &#8220;image&#8221;, whose value is a URL:</p>

<pre><code>&lt;div itemscope&gt;
 &lt;img itemprop="image" src="google-logo.png" alt="Google"&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>This should probably be mapped to the RDF:</p>

<pre><code>[ &lt;#image&gt; &lt;google-logo.png&gt; ] .
</code></pre>

<p>The difficulty with this is that in RDFa, the <code>src</code> attribute is used for the <em>subject</em> of a statement (equivalent to a microdata item) rather than the <em>object</em> (equivalent to a microdata value). So we have two choices for equivalent RDFa. One is to use a similar pattern to that used above, but introduce a wrapper element that provides the property:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;span rel="image"&gt;&lt;img src="google-logo.png" alt="Google"&gt;&lt;/span&gt;
&lt;/div&gt;
</code></pre>

<p>Another is to provide what would normally be an <em>object</em> through a <code>resource</code> attribute and then use a <code>rev</code> attribute (rather than the usual <code>rel</code>) attribute to reverse the relationship:</p>

<pre><code>&lt;div vocab="#"&gt;
 &lt;img resource="_:thing" rev="image" src="google-logo.png" alt="Google"&gt;
&lt;/div&gt;
</code></pre>

<p>This has three disadvantages over the first option:</p>

<ul>
<li>the <code>resource</code> attribute that creates the item is on the <code>&lt;img&gt;</code> element rather than on the wrapper <code>&lt;div&gt;</code> which makes it hard to create other properties for that item</li>
<li>we have to use a <code>rev</code> attribute, reversing the normal flow of relationships; I (at least) find this hard to figure out when there&#8217;s not a <code>rel</code> attribute as well</li>
<li><ins>we have to make up an id for the blank node we want to generate</ins></li>
</ul>

<p>I&#8217;ll note that it took me five or six failed attempts to generate the above options. If I hadn&#8217;t had the <a href="http://rdf.greggkellogg.net/distiller">RDF Distiller</a> to test with, I would have got it wrong. <del>Note that at least through the RDF Distiller, to be recognised, the <code>resource</code> attribute has to have an (empty) value &#8212; it is not enough for it to simply be present, unlike with the <code>typeof</code> attribute.</del> <ins>Note that the <code>resource</code> attribute has to explicitly point to a blank node to create a blank node rather than having the property be associated with the document in which this appears.</ins></p>

<h2>Values from the <code>datetime</code> Attribute</h2>

<p>The next example illustrates the use of the <code>&lt;time&gt;</code> element to provide a date/time value for a property.</p>

<blockquote>
  <p>In this example, the item has one property, &#8220;birthday&#8221;, whose value is a date:</p>

<pre><code>&lt;div itemscope&gt;
 I was born on &lt;time itemprop="birthday" datetime="2009-05-10"&gt;May 10th 2009&lt;/time&gt;.
&lt;/div&gt;
</code></pre>
</blockquote>

<p>I&#8217;m assuming this should map to the RDF:</p>

<pre><code>[ &lt;#birthday&gt; "2009-05-10"^^&lt;http://www.w3.org/2001/XMLSchema#date&gt; ]
</code></pre>

<p>There is an open issue (<a href="http://www.w3.org/2010/02/rdfa/track/issues/97">ISSUE-97</a>) about this on RDFa, which currently requires the use of the <code>content</code> attribute to provide the value as follows:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 I was born on &lt;time property="birthday" content="2009-05-10" datatype="xsd:date" datetime="2009-05-10"&gt;May 10th 2009&lt;/time&gt;.
&lt;/div&gt;
</code></pre>

<p>Note that the <code>xsd:</code> prefix is built-in within RDFa so there&#8217;s on need for any declaration for it, which makes it fairly easy to specify the standard date/time datatypes.</p>

<p>If ISSUE-97 were resolved nicely it would be possible to instead do:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 I was born on &lt;time property="birthday" datetime="2009-05-10"&gt;May 10th 2009&lt;/time&gt;.
&lt;/div&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>To make this work, RDFa processors would have to look at the syntax of the <code>datetime</code> attribute to work out what datatype the value should be matched to.</li>
<li>The syntax permitted in the <code>datetime</code> attribute isn&#8217;t exactly the same as that permitted by the XML Schema <code>time</code> and <code>dateTime</code> types usually used in RDF (and XML), in that the seconds component is optional within HTML. The resolution to ISSUE-97 will need to take this into account. Otherwise, anyone mapping from microdata to RDFa manually will need to ensure that the <code>content</code> attribute includes the seconds component.</li>
</ul>

<h2>Nested Items / Object Properties</h2>

<blockquote>
  <p>In this example, the outer item represents a person, and the inner one represents a band:</p>

<pre><code>&lt;div itemscope&gt;
 &lt;p&gt;Name: &lt;span itemprop="name"&gt;Amanda&lt;/span&gt;&lt;/p&gt;
 &lt;p&gt;Band: &lt;span itemprop="band" itemscope&gt; &lt;span itemprop="name"&gt;Jazz Band&lt;/span&gt; (&lt;span itemprop="size"&gt;12&lt;/span&gt; players)&lt;/span&gt;&lt;/p&gt;
&lt;/div&gt;
</code></pre>
  
  <p>The outer item here has two properties, &#8220;name&#8221; and &#8220;band&#8221;. The &#8220;name&#8221; is &#8220;Amanda&#8221;, and the &#8220;band&#8221; is an item in its own right, with two properties, &#8220;name&#8221; and &#8220;size&#8221;. The &#8220;name&#8221; of the band is &#8220;Jazz Band&#8221;, and the &#8220;size&#8221; is &#8220;12&#8221;.</p>
</blockquote>

<p>The equivalent RDF for this example would be:</p>

<pre><code>[
  &lt;#name&gt; "Amanda" ;
  &lt;#band&gt; [
    &lt;#name&gt; "Jazz Band" ;
    &lt;#size&gt; "12"
  ]
]
</code></pre>

<p>Note that the <code>size</code> property is just a plain literal value; unlike with date/times, there&#8217;s no way to tell from the microdata that the value is a number.</p>

<p>In RDFa this could be done with:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;p&gt;Name: &lt;span property="name"&gt;Amanda&lt;/span&gt;&lt;/p&gt;
 &lt;p&gt;Band: &lt;span rel="band"&gt; &lt;span property="name"&gt;Jazz Band&lt;/span&gt; (&lt;span property="size"&gt;12&lt;/span&gt; players)&lt;/span&gt;&lt;/p&gt;
&lt;/div&gt;
</code></pre>

<p>This follows the microdata fairly closely but note that the nested resource doesn&#8217;t need an empty <code>typeof</code> attribute: it&#8217;s only the top-level items that do. It might be easier, for consistency and extensibility, to always include an explicit nested element (with an empty <code>typeof</code> attribute in this case) to represent the nested resource:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;p&gt;Name: &lt;span property="name"&gt;Amanda&lt;/span&gt;&lt;/p&gt;
 &lt;p&gt;Band: &lt;span rel="band"&gt;&lt;span typeof&gt; &lt;span property="name"&gt;Jazz Band&lt;/span&gt; (&lt;span property="size"&gt;12&lt;/span&gt; players)&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;/div&gt;
</code></pre>

<p>The other thing that people have to watch out for is that because the value of the <code>band</code> property is a resource rather than a literal, we have to use the <code>rel</code> attribute rather than the <code>property</code> attribute as we do elsewhere.</p>

<h2>Itemref</h2>

<blockquote>
  <p>This example is the same as the previous one, but all the properties are separated from their items:</p>

<pre><code>&lt;div itemscope id="amanda" itemref="a b"&gt;&lt;/div&gt;
&lt;p id="a"&gt;Name: &lt;span itemprop="name"&gt;Amanda&lt;/span&gt;&lt;/p&gt;
&lt;div id="b" itemprop="band" itemscope itemref="c"&gt;&lt;/div&gt;
&lt;div id="c"&gt;
 &lt;p&gt;Band: &lt;span itemprop="name"&gt;Jazz Band&lt;/span&gt;&lt;/p&gt;
 &lt;p&gt;Size: &lt;span itemprop="size"&gt;12&lt;/span&gt; players&lt;/p&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>This should create the same RDF as the previous example:</p>

<pre><code>[
  &lt;#name&gt; "Amanda" ;
  &lt;#band&gt; [
    &lt;#name&gt; "Jazz Band" ;
    &lt;#size&gt; "12"
  ]
]
</code></pre>

<p>changing the markup as little as possible. The RDFa equivalent is:</p>

<pre><code>&lt;div id="amanda"&gt;&lt;/div&gt;
&lt;p vocab="#" about="_:amanda"&gt;Name: &lt;span property="name"&gt;Amanda&lt;/span&gt;&lt;/p&gt;
&lt;div vocab="#" about="_:amanda" rel="band" resource="_:c"&gt;&lt;/div&gt;
&lt;div vocab="#" about="_:c"&gt;
 &lt;p&gt;Band: &lt;span property="name"&gt;Jazz Band&lt;/span&gt;&lt;/p&gt;
 &lt;p&gt;Size: &lt;span property="size"&gt;12&lt;/span&gt; players&lt;/p&gt;
&lt;/div&gt;
</code></pre>

<p>In microdata, the <code>itemref</code> attribute is a method of an item adopting name/value pairs described in a separate location within the page. In RDFa, the equivalent is to say that the name/value pairs are all related to the same resource by consistently referring to the resource as the subject of the statements. In the above case, there are two blank nodes labelled <code>_:amanda</code> and <code>_:c</code>, and the <code>about</code> attribute is used on the same elements that provide the properties (or a wrapper element) to indicate the identity of the subject of the statements.</p>

<p>Notes:</p>

<ul>
<li>The <code>resource</code> attribute has to be used to indicate the blank node for the band.</li>
<li>As before, the <code>rel</code> attribute has to be used for the <code>band</code> property, rather than the <code>property</code> attribute, because the object of the statement is a resource. The rule is that if you&#8217;re using <code>resource</code>, you should use <code>rel</code>. (I used <code>property</code> erroneously the first time I tried to write this mapping. I will never learn.)</li>
</ul>

<p>There is another example of <code>itemref</code> in use later in the microdata specification:</p>

<blockquote>
<pre><code>&lt;!DOCTYPE HTML&gt;
&lt;html&gt;
 &lt;head&gt;
  &lt;title&gt;Photo gallery&lt;/title&gt;
 &lt;/head&gt;
 &lt;body&gt;
  &lt;h1&gt;My photos&lt;/h1&gt;
  &lt;figure itemscope itemtype="http://n.whatwg.org/work" itemref="licenses"&gt;
   &lt;img itemprop="work" src="images/house.jpeg" alt="A white house, boarded up, sits in a forest."&gt;
   &lt;figcaption itemprop="title"&gt;The house I found.&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;figure itemscope itemtype="http://n.whatwg.org/work" itemref="licenses"&gt;
   &lt;img itemprop="work" src="images/mailbox.jpeg" alt="Outside the house is a mailbox. It has a leaflet inside."&gt;
   &lt;figcaption itemprop="title"&gt;The mailbox.&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;footer&gt;
   &lt;p id="licenses"&gt;All images licensed under the &lt;a itemprop="license"
   href="http://www.opensource.org/licenses/mit-license.php"&gt;MIT
   license&lt;/a&gt;.&lt;/p&gt;
  &lt;/footer&gt;
 &lt;/body&gt;
&lt;/html&gt;
</code></pre>
</blockquote>

<p>This is equivalent to the RDF:</p>

<pre><code>[
  a &lt;http://n.whatwg.org/work&gt; ;
  &lt;http://n.whatwg.org/license&gt; &lt;http://www.opensource.org/licenses/mit-license.php&gt; ;
  &lt;http://n.whatwg.org/work&gt; &lt;images/house.jpeg&gt; ;
  &lt;http://n.whatwg.org/title&gt; "The house I found." ;
] .
[
  a &lt;http://n.whatwg.org/work&gt; ;
  &lt;http://n.whatwg.org/license&gt; &lt;http://www.opensource.org/licenses/mit-license.php&gt; ;
  &lt;http://n.whatwg.org/work&gt; &lt;images/mailbox.jpeg&gt; ;
  &lt;http://n.whatwg.org/title&gt; "The mailbox." ;
] .
</code></pre>

<p>Note that the <code>license</code> property is adopted by both the items in the microdata. In this particular example, the two items have the same type, and thus the <code>license</code> property has the same meaning in each item. It&#8217;s also possible for <code>itemref</code> to be used on two items that have different types, pointing to the same element, in which case the shared properties defined within that element could mean different things for the two items.</p>

<p>There is no way that I am aware of within RDFa to support shared use of portions of content. There could be a rough equivalent that would work in the case where the shared properties had the same semantics if RDFa allowed the <code>about</code> attribute to take multiple values (<strong>note: invalid example</strong>):</p>

<pre><code>&lt;!DOCTYPE HTML&gt;
&lt;html&gt;
 &lt;head&gt;
  &lt;title&gt;Photo gallery&lt;/title&gt;
 &lt;/head&gt;
 &lt;body vocab="http://n.whatwg.org/"&gt;
  &lt;h1&gt;My photos&lt;/h1&gt;
  &lt;figure about="_:house" typeof="work"&gt;
   &lt;span rel="work"&gt;&lt;img src="images/house.jpeg" alt="A white house, boarded up, sits in a forest."&gt;&lt;/span&gt;
   &lt;figcaption property="title"&gt;The house I found.&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;figure about="_:mailbox" typeof="work"&gt;
   &lt;span rel="work"&gt;&lt;img src="images/mailbox.jpeg" alt="Outside the house is a mailbox. It has a leaflet inside."&gt;&lt;/span&gt;
   &lt;figcaption property="title"&gt;The mailbox.&lt;/figcaption&gt;
  &lt;/figure&gt;
  &lt;footer&gt;
   &lt;p about="_:house _:mailbox"&gt;All images licensed under the &lt;a rel="license"
   href="http://www.opensource.org/licenses/mit-license.php"&gt;MIT
   license&lt;/a&gt;.&lt;/p&gt;
  &lt;/footer&gt;
 &lt;/body&gt;
&lt;/html&gt;
</code></pre>

<p>but this wouldn&#8217;t support the possibility of the same property having different semantics (and therefore different URIs) for the separate resources.</p>

<p>It&#8217;s also worth noting in this example that the mapping to RDF that I&#8217;m assuming results, in this example, in <code>http://n.whatwg.org/work</code> being both a class and a property. The creators of RDF vocabularies tend to name classes with an Uppercase initial letter and properties with a lowercase initial letter, and thus avoid these kinds of clashes. Vocabulary designers who are mindful of mappings to RDF may want to take the same approach.</p>

<h2>Multiple Values</h2>

<blockquote>
  <p>This example describes an ice cream, with two flavors:</p>

<pre><code>&lt;div itemscope&gt;
 &lt;p&gt;Flavors in my favorite ice cream:&lt;/p&gt;
 &lt;ul&gt;
  &lt;li itemprop="flavor"&gt;Lemon sorbet&lt;/li&gt;
  &lt;li itemprop="flavor"&gt;Apricot sorbet&lt;/li&gt;
 &lt;/ul&gt;
&lt;/div&gt;
</code></pre>
  
  <p>This thus results in an item with two properties, both &#8220;flavor&#8221;, having the values &#8220;Lemon sorbet&#8221; and &#8220;Apricot sorbet&#8221;.</p>
</blockquote>

<p>This example highlights one of the real nightmares of RDF: lists. In microdata, the order of the values &#8216;Lemon sorbet&#8217; and &#8216;Apricot sorbet&#8217; is naturally retained. There are three possible mappings to RDF.</p>

<h3>Creating Multiple Statements</h3>

<p>If the order of the flavours of ice cream in this example don&#8217;t actually matter, the equivalent RDF is:</p>

<pre><code>[ &lt;#flavor&gt; "Lemon sorbet" , "Apricot sorbet" ]
</code></pre>

<p>which is equivalent to:</p>

<pre><code>[ &lt;#flavor&gt; "Apricot sorbet" , "Lemon sorbet" ]
</code></pre>

<p>In this case, the RDFa is straight-forward:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;p&gt;Flavors in my favorite ice cream:&lt;/p&gt;
 &lt;ul&gt;
  &lt;li property="flavor"&gt;Lemon sorbet&lt;/li&gt;
  &lt;li property="flavor"&gt;Apricot sorbet&lt;/li&gt;
 &lt;/ul&gt;
&lt;/div&gt;
</code></pre>

<p>It&#8217;s surprising how common it is that order doesn&#8217;t actually matter when there are multiple values for a property, such that this mapping is quite sufficient. But I&#8217;m absolutely not going to pretend that order is never important&#8230;</p>

<h3>Creating an <code>rdf:Seq</code></h3>

<p>If the order of the flavours does matter, there are two ways of representing that order using RDF. The first is to use an <code>rdf:Seq</code> resource. This method was the original method of representing lists in RDF and is very natural to do in RDF/XML, but has largely fallen out of favour for the second method which I&#8217;ll describe below.</p>

<p>Using the <code>rdf:Seq</code> method, the equivalent RDF for the microdata would be:</p>

<pre><code>@prefix rdf: &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; .
[
  &lt;#flavor&gt; [
    a rdf:Seq ;
    rdf:_1 "Lemon sorbet" ;
    rdf:_2 "Apricot sorbet"
  ]
]
</code></pre>

<p>which can be generated with:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;p&gt;Flavors in my favorite ice cream:&lt;/p&gt;
 &lt;div rel="flavor"&gt;
  &lt;ul typeof="rdf:Seq"&gt;
   &lt;li property="rdf:_1"&gt;Lemon sorbet&lt;/li&gt;
   &lt;li property="rdf:_2"&gt;Apricot sorbet&lt;/li&gt;
  &lt;/ul&gt;
 &lt;/div&gt;
&lt;/div&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>There are various other ways in which the namespace for the <code>rdf:Seq</code> could be created, but since the <code>rdf:</code> prefix is built-in to RDFa 1.1, it seems easier to use that than anything that explicitly writes out the full (ugly) RDF namespace.</li>
<li>The <code>&lt;div&gt;</code> wrapper for the <code>&lt;ul&gt;</code> is needed in the same way as the wrapper <code>&lt;span&gt;</code> element was needed in the <code>&lt;img&gt;</code> example above. Whereas in microdata, the property element also describes the value of that property, in RDFa when the object of a statement is a resource the description of that resource is nested inside the property element (in a similar way to RDF/XML).</li>
</ul>

<h3>Creating a <code>rdf:List</code></h3>

<p>The current recommended way to create a list in RDF is to use a <code>rdf:List</code> resource. This essentially uses a <a href="http://en.wikipedia.org/wiki/Linked_list">linked list</a> model to represent lists, with the <code>rdf:first</code> item of a list being a value and the <code>rdf:rest</code> being either another <code>rdf:List</code> or <code>rdf:nil</code>. Spelled out, the RDF would look like:</p>

<pre><code>@prefix rdf: &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; .
[ 
  &lt;#flavor&gt; [
    a rdf:List ;
    rdf:first "Lemon sorbet" ;
    rdf:rest [
      rdf:first "Apricot sorbet" ;
      rdf:rest rdf:nil
    ]
  ]
]
</code></pre>

<p>but of course Turtle lets you write it:</p>

<pre><code>[] &lt;#flavor&gt; ( "Lemon sorbet" "Apricot sorbet" ) .
</code></pre>

<p>Unfortunately, RDFa has no such syntax sugar. Which means:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;p&gt;Flavors in my favorite ice cream:&lt;/p&gt;
 &lt;div rel="flavor"&gt;
  &lt;ul typeof="rdf:List"&gt;
   &lt;li property="rdf:first"&gt;Lemon sorbet&lt;/li&gt;
   &lt;li rel="rdf:rest"&gt;
    &lt;span typeof="rdf:List"&gt;
     &lt;span property="rdf:first"&gt;Apricot sorbet&lt;/span&gt;
     &lt;a rel="rdf:rest" href="rdf:nil"&gt;&lt;/a&gt;
    &lt;/span&gt;
   &lt;/li&gt;
  &lt;/ul&gt;
 &lt;/div&gt;
&lt;/div&gt;
</code></pre>

<p>Yep, horrific. Verbose and easy to get wrong, and that&#8217;s just for two items. If a third is added, the pattern is to add an <code>about</code> attribute on the middle items of the list so that the <code>rdf:rest</code> property which covers the next item in the list can be assigned to it. For example:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;p&gt;Flavors in my favorite ice cream:&lt;/p&gt;
 &lt;div rel="flavor"&gt;
  &lt;ul typeof="rdf:List"&gt;
   &lt;li property="rdf:first"&gt;Lemon sorbet&lt;/li&gt;
   &lt;li rel="rdf:rest"&gt;
    &lt;span about="_:2" typeof="List"&gt;
     &lt;span property="rdf:first"&gt;Apricot sorbet&lt;/span&gt;
    &lt;/span&gt;
   &lt;/li&gt;
   &lt;li about="_:2" rel="rdf:rest"&gt;
     &lt;span typeof="rdf:List"&gt;
     &lt;span property="rdf:first"&gt;Raspberry sorbet&lt;/span&gt;
     &lt;a rel="rdf:rest" href="http://www.w3.org/1999/02/22-rdf-syntax-ns#nil"&gt;&lt;/a&gt;
     &lt;/span&gt;
   &lt;/li&gt;
  &lt;/ul&gt;
 &lt;/div&gt;
&lt;/div&gt;
</code></pre>

<p>Note:</p>

<ul>
<li>I&#8217;ve used an empty <code>&lt;a&gt;</code> element with a <code>href</code> attribute to point to the <code>rdf:nil</code> resource. An alternative would be to use the <code>resource</code> attribute, which would have the advantage of not having to spell out the full URI for <code>rdf:nil</code>, but I&#8217;m trying to stick to using as few attributes as possible.</li>
<li>Using an empty <code>&lt;a&gt;</code> element for a link isn&#8217;t ideal; it would be neater to use a <code>&lt;link&gt;</code> element, but these aren&#8217;t allowed in flow content within HTML5 (<code>&lt;link&gt;</code> and <code>&lt;meta&gt;</code> are only permitted within the microdata specification, and then only if they have an <code>itemprop</code> attribute). The RDFa specification could likewise allow them.</li>
</ul>

<h2>Multiple Properties Sharing a Value</h2>

<blockquote>
  <p>Here we see an item with two properties, &#8220;favorite-color&#8221; and &#8220;favorite-fruit&#8221;, both set to the value &#8220;orange&#8221;:</p>

<pre><code>&lt;div itemscope&gt;
 &lt;span itemprop="favorite-color favorite-fruit"&gt;orange&lt;/span&gt;
&lt;/div&gt;
</code></pre>
</blockquote>

<p>This should map to the RDF:</p>

<pre><code>[
  &lt;#favorite-color&gt; "orange" ;
  &lt;#favorite-fruit&gt; "orange"
]
</code></pre>

<p>Like <code>itemprop</code>, <code>property</code> can take multiple values, so the RDFa equivalent is simply:</p>

<pre><code>&lt;div vocab="#" typeof&gt;
 &lt;span property="favorite-color favorite-fruit"&gt;orange&lt;/span&gt;
&lt;/div&gt;
</code></pre>

<h2>Types</h2>

<blockquote>
  <p>Here, the item&#8217;s type is &#8220;http://example.org/animals#cat&#8221;:</p>

<pre><code>&lt;section itemscope itemtype="http://example.org/animals#cat"&gt;
 &lt;h1 itemprop="name"&gt;Hedral&lt;/h1&gt;
 &lt;p itemprop="desc"&gt;Hedral is a male american domestic
 shorthair, with a fluffy black fur with white paws and belly.&lt;/p&gt;
 &lt;img itemprop="img" src="hedral.jpeg" alt="" title="Hedral, age 18 months"&gt;
&lt;/section&gt;
</code></pre>
  
  <p>In this example the &#8220;http://example.org/animals#cat&#8221; item has three properties, a &#8220;name&#8221; (&#8220;Hedral&#8221;), a &#8220;desc&#8221; (&#8220;Hedral is&#8230;&#8221;), and an &#8220;img&#8221; (&#8220;hedral.jpeg&#8221;).</p>
</blockquote>

<p>I&#8217;ll assume that this should be mapped to the RDF:</p>

<pre><code>[
  a &lt;http://example.org/animals#cat&gt; ;
  &lt;http://example.org/animals#name&gt; "Hedral" ;
  &lt;http://example.org/animals#desc&gt; "Hedral is a male american domestic shorthair, with a fluffy black fur with white paws and belly." ;
  &lt;http://example.org/animals#img&gt; &lt;hedral.jpeg&gt;
]
</code></pre>

<p>In this case, the <code>vocab</code> can be set to <code>http://example.org/animals#</code> and both the <code>itemtype</code> and the various <code>property</code> and <code>rel</code> attributes will use that as the basis for their identifying URIs:</p>

<pre><code>&lt;section vocab="http://example.org/animals#" typeof="cat"&gt;
 &lt;h1 property="name"&gt;Hedral&lt;/h1&gt;
 &lt;p property="desc"&gt;Hedral is a male american domestic
 shorthair, with a fluffy black fur with white paws and belly.&lt;/p&gt;
 &lt;div rel="img"&gt;&lt;img src="hedral.jpeg" alt="" title="Hedral, age 18 months"&gt;&lt;/div&gt;
&lt;/section&gt;
</code></pre>

<h2>Global Identifiers</h2>

<blockquote>
  <p>Here, an item is talking about a particular book:</p>

<pre><code>&lt;dl itemscope
    itemtype="http://vocab.example.net/book"
    itemid="urn:isbn:0-330-34032-8"&gt;
 &lt;dt&gt;Title
 &lt;dd itemprop="title"&gt;The Reality Dysfunction
 &lt;dt&gt;Author
 &lt;dd itemprop="author"&gt;Peter F. Hamilton
 &lt;dt&gt;Publication date
 &lt;dd&gt;&lt;time itemprop="pubdate" datetime="1996-01-26"&gt;26 January 1996&lt;/time&gt;
&lt;/dl&gt;
</code></pre>
</blockquote>

<p>Here, the item has an identifier so unlike the previous examples, the subject of the statements in the RDF is no longer a blank node:</p>

<pre><code>@prefix xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt; .
&lt;urn:isbn:0-330-34032-8&gt;
  a &lt;http://vocab.example.net/book&gt; ;
  &lt;http://vocab.example.net/title&gt; "The Reality Dysfunction\n " ;
  &lt;http://vocab.example.net/author&gt; "Peter F. Hamilton\n " ;
  &lt;http://vocab.example.net/pubdate&gt; "1996-01-26"^^xsd:date ;
  .
</code></pre>

<p>In RDFa, the subject is provided using the <code>about</code> attribute:</p>

<pre><code>&lt;dl vocab="http://vocab.example.net/"
    typeof="book"
    about="urn:isbn:0-330-34032-8"&gt;
 &lt;dt&gt;Title
 &lt;dd property="title"&gt;The Reality Dysfunction
 &lt;dt&gt;Author
 &lt;dd property="author"&gt;Peter F. Hamilton
 &lt;dt&gt;Publication date
 &lt;dd&gt;&lt;time property="pubdate" content="1996-01-26" datatype="xsd:date" datetime="1996-01-26"&gt;26 January 1996&lt;/time&gt;
&lt;/dl&gt;
</code></pre>

<h2>Global Property Names</h2>

<blockquote>
  <p>Here, an item is an &#8220;http://example.org/animals#cat&#8221;, and most of the properties have names that are words defined in the context of that type. There are also a few additional properties whose names come from other vocabularies.</p>

<pre><code>&lt;section itemscope itemtype="http://example.org/animals#cat"&gt;
 &lt;h1 itemprop="name http://example.com/fn"&gt;Hedral&lt;/h1&gt;
 &lt;p itemprop="desc"&gt;Hedral is a male american domestic
 shorthair, with a fluffy &lt;span
 itemprop="http://example.com/color"&gt;black&lt;/span&gt; fur with &lt;span
 itemprop="http://example.com/color"&gt;white&lt;/span&gt; paws and belly.&lt;/p&gt;
 &lt;img itemprop="img" src="hedral.jpeg" alt="" title="Hedral, age 18 months"&gt;
&lt;/section&gt;
</code></pre>
</blockquote>

<p>The RDF equivalent to this is:</p>

<pre><code>[
  a &lt;http://example.org/animals#cat&gt; ;
  &lt;http://example.org/animals#name&gt; "Hedral" ;
  &lt;http://example.com/fn&gt; "Hedral" ;
  &lt;http://example.org/animals#desc&gt; "Hedral is a male american domestic shorthair, with a fluffy black fur with white paws and belly." ;
  &lt;http://example.com/color&gt; "black" , "white" ;
  &lt;http://example.org/animals#img&gt; &lt;hedral.jpeg&gt;
]
</code></pre>

<p>To create this, we need the RDFa:</p>

<pre><code>&lt;section vocab="http://example.org/animals#" typeof="cat"&gt;
 &lt;h1 property="name http://example.com/fn"&gt;Hedral&lt;/h1&gt;
 &lt;p property="desc"&gt;Hedral is a male american domestic
 shorthair, with a fluffy &lt;span
 property="http://example.com/color"&gt;black&lt;/span&gt; fur with &lt;span
 property="http://example.com/color"&gt;white&lt;/span&gt; paws and belly.&lt;/p&gt;
 &lt;span rel="img"&gt;&lt;img src="hedral.jpeg" alt="" title="Hedral, age 18 months"&gt;&lt;/span&gt;
&lt;/section&gt;
</code></pre>

<h2>Link Relations</h2>

<blockquote>
  <p>Here is an example of a page that uses the vEvent vocabulary to mark up an event:</p>

<pre><code>&lt;body itemscope itemtype="http://microformats.org/profile/hcalendar#vevent"&gt;
 ...
 &lt;h1 itemprop="summary"&gt;Bluesday Tuesday: Money Road&lt;/h1&gt;
 ...
 &lt;time itemprop="dtstart" datetime="2009-05-05T19:00:00Z"&gt;May 5th @ 7pm&lt;/time&gt;
 (until &lt;time itemprop="dtend" datetime="2009-05-05T21:00:00Z"&gt;9pm&lt;/time&gt;)
 ...
 &lt;a href="http://livebrum.co.uk/2009/05/05/bluesday-tuesday-money-road"
    rel="bookmark" itemprop="url"&gt;Link to this page&lt;/a&gt;
 ...
 &lt;p&gt;Location: &lt;span itemprop="location"&gt;The RoadHouse&lt;/span&gt;&lt;/p&gt;
 ...
 &lt;p&gt;&lt;input type=button value="Add to Calendar"
           onclick="location = getCalendar(this)"&gt;&lt;/p&gt;
 ...
 &lt;meta itemprop="description" content="via livebrum.co.uk"&gt;
&lt;/body&gt;
</code></pre>
</blockquote>

<p>This example is interesting because it contains, in the natural markup of the page, a <code>rel</code> attribute with the value <a href="http://www.w3.org/TR/html5/links.html#link-type-bookmark"><code>bookmark</code></a>, which is used for links that go to the page or section of the page within which the link is found. In this case, it&#8217;s the page. The RDF that should be generated from the page is:</p>

<pre><code>[
  a &lt;http://microformats.org/profile/hcalendar#vevent&gt; ;
  &lt;http://microformats.org/profile/hcalendar#summary&gt; "Bluesday Tuesday: Money Road" ;
  &lt;http://microformats.org/profile/hcalendar#dtstart&gt; "2009-05-05T19:00:00Z"^^xsd:dateTime ;
  &lt;http://microformats.org/profile/hcalendar#dtend&gt; "2009-05-05T21:00:00Z"^^xsd:dateTime ;
  &lt;http://microformats.org/profile/hcalendar#url&gt; &lt;http://livebrum.co.uk/2009/05/05/bluesday-tuesday-money-road&gt; ;
  &lt;http://microformats.org/profile/hcalendar#location&gt; "The RoadHouse" ;
  &lt;http://microformats.org/profile/hcalendar#description&gt; "via livebrum.co.uk"
] .
</code></pre>

<p>The following statement could legitimately be generated as well:</p>

<pre><code>&lt;&gt; 
  &lt;http://www.w3.org/1999/xhtml/vocab#bookmark&gt; &lt;http://livebrum.co.uk/2009/05/05/bluesday-tuesday-money-road&gt; ;
  .
</code></pre>

<p>but the item representing the event should definitely not have the <code>http://www.w3.org/1999/xhtml/vocab#bookmark</code> property.</p>

<p>Achieving this without significantly changing the HTML markup is problematic in RDFa because RDFa uses the <code>rel</code> attribute to provide properties for the resources that it describes within the page, overloading its standard use in HTML which is to describe properties of the page or sections within the page. The following involves the least amount of repetition:</p>

<pre><code>&lt;body vocab="http://microformats.org/profile/hcalendar#"&gt;
 &lt;div typeof="vevent"&gt;
  ...
  &lt;h1 property="summary"&gt;Bluesday Tuesday: Money Road&lt;/h1&gt;
  ...
  &lt;time property="dtstart" content="2009-05-05T19:00:00Z" datatype="xsd:dateTime" 
        datetime="2009-05-05T19:00:00Z"&gt;May 5th @ 7pm&lt;/time&gt;
  (until &lt;time property="dtend" content="2009-05-05T21:00:00Z" datatype="xsd:dateTime" 
               datetime="2009-05-05T21:00:00Z"&gt;9pm&lt;/time&gt;)
  ...
  &lt;a rel="url" href="http://livebrum.co.uk/2009/05/05/bluesday-tuesday-money-road"&gt;&lt;/a&gt;
  &lt;a about href="http://livebrum.co.uk/2009/05/05/bluesday-tuesday-money-road"
     rel="bookmark"&gt;Link to this page&lt;/a&gt;
  ...
  &lt;p&gt;Location: &lt;span property="location"&gt;The RoadHouse&lt;/span&gt;&lt;/p&gt;
  ...
  &lt;p&gt;&lt;input type=button value="Add to Calendar"
            onclick="location = getCalendar(this)"&gt;&lt;/p&gt;
  ...
  &lt;span property="description" content="via livebrum.co.uk"&gt;&lt;/span&gt;
 &lt;/div&gt;
&lt;/body&gt;
</code></pre>

<p>Notes:</p>

<ul>
<li>In the above, the <code>typeof</code> attribute has been moved onto a wrapper <code>&lt;div&gt;</code> that encompasses the entirety of the page because if it resides on the <code>&lt;body&gt;</code> element, it&#8217;s assumed to apply to the document itself rather than a blank node. An alternative mapping would use <code>about="_:event"</code> to create a blank node for the event.</li>
<li>There&#8217;s no way to avoid creating a statement for the <code>rel="bookmark"</code> link, so the best we can do is make sure that the statement is accurate, and relates the current document to the provided URI. Unfortunately, that means creating a separate element for the <code>url</code> property, repeating that URL within the page, and adding an empty <code>about</code> attribute; here I&#8217;ve used an empty <code>&lt;a&gt;</code> element to express the relationship; a <code>&lt;link&gt;</code> element would do the same job if it were allowed in flow content.</li>
<li>The <code>&lt;meta&gt;</code> element in the original has been mapped to an empty <code>&lt;span&gt;</code> element as it isn&#8217;t allowed in flow content without an <code>itemprop</code> attribute.</li>
</ul>
    ]]></content>
  </entry>
  <entry>
    <title>Microdata + RDF</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/162" />
    <id>http://www.jenitennison.com/blog/node/162</id>
    <published>2011-07-31T19:55:44+00:00</published>
    <updated>2011-08-02T08:49:38+00:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="microdata" />
    <category term="rdf" />
    <category term="rdfa" />
    <summary type="html"><![CDATA[<p>As part of the ongoing discussion about how to reconcile RDFa and microdata (if at all), <a href="http://webr3.org/blog/">Nathan Rixham</a> has put together a suggested <a href="http://www.w3.org/wiki/Microdata_RDFa_Merge">Microdata RDFa Merge</a> which brings together parts of <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html">microdata</a> and parts of <a href="http://www.w3.org/TR/rdfa-core/">RDFa</a>, creating a completely new set of attributes, but a parsing model that more or less follows microdata&#8217;s.</p>

<p>I want here to put forward another possibility to the debate. I should say that this is just some noodling on my part as a way of exploring options, not any kind of official position on the behalf of the W3C or the TAG or any other body that you might associate me with, nor even a decided position on my part.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>As part of the ongoing discussion about how to reconcile RDFa and microdata (if at all), <a href="http://webr3.org/blog/">Nathan Rixham</a> has put together a suggested <a href="http://www.w3.org/wiki/Microdata_RDFa_Merge">Microdata RDFa Merge</a> which brings together parts of <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html">microdata</a> and parts of <a href="http://www.w3.org/TR/rdfa-core/">RDFa</a>, creating a completely new set of attributes, but a parsing model that more or less follows microdata&#8217;s.</p>

<p>I want here to put forward another possibility to the debate. I should say that this is just some noodling on my part as a way of exploring options, not any kind of official position on the behalf of the W3C or the TAG or any other body that you might associate me with, nor even a decided position on my part.</p>

<!--break-->

<h2>Simplifying RDFa</h2>

<p>As <a href="http://www.jenitennison.com/blog/node/103">I&#8217;ve said before</a>, RDFa, in my experience, is complicated not primarily because of the whole namespaces/CURIEs issue but because its processing model tries to be too clever. RDFa was designed to largely fit in with existing markup and turn it into embedded data &#8220;just&#8221; by adding a few attributes here and there. Thus a simple image like:</p>

<pre><code>&lt;img src="photo1.jpg"&gt;
</code></pre>

<p>is first marked up to indicate that it&#8217;s an image:</p>

<pre><code>&lt;img src="photo1.jpg" typeof="foaf:Image"&gt;
</code></pre>

<p>then to provide its license:</p>

<pre><code>&lt;img src="photo1.jpg" typeof="foaf:Image"
  rel="license" resource="http://creativecommons.org/licenses/by/2.0/"&gt;
</code></pre>

<p>and finally to add a title:</p>

<pre><code>&lt;img src="photo1.jpg" typeof="foaf:Image"
  rel="license" resource="http://creativecommons.org/licenses/by/2.0/"
  property="dc:title" content="A Pretty Picture"&gt;
</code></pre>

<p>all by adding attributes to the one <code>&lt;img&gt;</code> element. The trouble with this approach is that the rules about how statements are made become extremely complex, dependent on context (eg what other attributes are present, what the parent element has on it, what content it has) and default in ways that are hard to remember.</p>

<p>Even having written an RDFa parser, having written code to mark up documents with RDFa, having <em>taught</em> it, I still cannot write RDFa past a trivial example and be 100% sure that it will produce what I was aiming to produce.</p>

<p>If we were to look at really simplifying RDFa, rather than making cosmetic changes, we need to address this complexity. It would certainly mean backwards-incompatible changes, such as dropping the use of particular attributes and revising the way the processing model works, such that future RDFa processors couldn&#8217;t be used on RDFa 1.0. There are two possible ways of approaching this:</p>

<ol>
<li>retaining some backwards compatibility, and aiming for a simplified subset of RDFa 1.0 such that RDFa 1.0 processor will still get the intended triples out of data marked up with RDFa 1.1</li>
<li>dropping backwards compatibility entirely and using completely different attributes, essentially creating a new language</li>
</ol>

<p>I do not know which of these routes is the best one to take.</p>

<p>My instinct is that the first will be hard to do. For example, there are already certain simplifications in RDFa 1.1 &#8212; such as assuming an element with no <code>datatype</code> attribute is giving a string value rather than looking to see if there are any non-text-nodes in the content of the element &#8212; which lead to markup that will not be processed correctly by RDFa 1.0 processors. Perhaps that could be addressed by rewriting history: creating a RDFa 1.0 Second Edition that includes any changes that are needed to make a simple subset viable.</p>

<p>What I want to explore here is what the second route &#8212; using entirely different attributes from those currently used in RDFa 1.0 &#8212; might mean. I think that in this case the substantial difference between microdata and this new language would be support for that much-derided requirement: decentralised extensibility.</p>

<h2>Adding Decentralised Extensibility to Microdata</h2>

<p>As I discussed <a href="http://www.jenitennison.com/blog/node/161">earlier in the week</a>, microdata is simply not designed for use in a web where publishers might want to use multiple vocabularies to mark up the same thing for different consumers. This focus is very probably the right one for the majority of uses, where publishers address single consumers or everyone has standardised on a single vocabulary. It&#8217;s certainly an assumption that keeps the markup simple.</p>

<p>However, there is a larger data web out there. It&#8217;s not just browsers and search engines who might look for and process data embedded within a page. Unlike with HTML, those few, large consumers don&#8217;t have to understand a particular vocabulary for other consumers to get valuable information from it. If you operate in a world of multiple consumers with different requirements, you need decentralised extensibility. And support for decentralised extensibility is RDF&#8217;s niche as a data model, its unique selling point.</p>

<p>Given that a new language would have to use a different processing model from RDFa 1.0, I would suggest that it simply uses microdata&#8217;s as a starting point. Using attributes from RDFa 1.0 would only cause conflicts with RDFa 1.0 processors. Microdata processing is there, already defined, already implemented. It isn&#8217;t going to go away. And you know, <em>it&#8217;s pretty good</em>.</p>

<p>The &#8216;new language&#8217; would then not so much a &#8216;new language&#8217; as an enhancement on something that already exists. It would be a set of additions that augment the data that is generated from normal microdata processing with a few extra features that are useful in a world where there are multiple vocabularies for the same domain, where publishers have to provide data to multiple consumers, where an RDF view of data is useful. Call it microdata+RDF.</p>

<p>So what would we need to add? Well, there are three things, I think, that make microdata hard to use in a decentralised world, and make it hard to generate good RDF from microdata markup:</p>

<ol>
<li>lack of support for multiple types</li>
<li>scoping of properties by type</li>
<li>lack of datatypes</li>
</ol>

<p>We would need to find a way to add these for use within the RDF extracted from the microdata markup such that a basic microdata parser would still generate the same JSON, and such that microdata&#8217;s DOM API would work as specified in the microdata spec. So we can&#8217;t change the types of values that are possible in microdata&#8217;s attributes or how they&#8217;re interpreted in the DOM API.</p>

<h3>Multiple Types</h3>

<p>Because of the restrictions I just mentioned in not touching microdata itself, we can&#8217;t simply make <code>itemtype</code> take multiple URLs. We could rely on <code>itemprop="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"</code> as a mechanism of providing types for use by RDF processors, but I think that the types of something is such a fundamental property that it makes sense to have a dedicated attribute.</p>

<p>I suggest <code>itemclass</code>. It would only be allowed on elements with an <code>itemscope</code> attribute and would take a space-separated set of values in exactly the same way as the <code>itemprop</code> attribute. The values would be turned into URIs in the same way as for the <code>itemprop</code> attribute, which I&#8217;ll describe below.</p>

<p>Microdata+RDF would add a method to the existing microdata DOM API to enable people to access items by class rather than their single type. So:</p>

<pre><code>document . getItemsByClass( classes )
Returns a NodeList of the elements in the Document that create items, that are not 
part of other items, and that have one or more of the types or classes given in the 
argument.

The classes argument is interpreted as a space-separated list of classes.
</code></pre>

<p>Note that for simplicity, because they are interpreted in the same way within the RDF model, this returns items whose <code>itemtype</code> is listed in the argument list of classes as well as those whose <code>itemclass</code> is listed.</p>

<p>Within the DOM API, the <code>itemClass</code> IDL attribute on HTML elements would reflect the <code>itemclass</code> attribute.</p>

<p>The <code>itemclass</code> attribute would be ignored for the purpose of creating JSON from microdata, and only be used when creating RDF.</p>

<p>An example would be:</p>

<pre><code>&lt;li class="conference" itemscope itemid="/2011/oscon/"
    itemtype="http://schema.org/Event"
    itemclass="http://microformats.org/profile/hcalendar#vevent /vocab/Conference"&gt;
  ...
&lt;/li&gt;
</code></pre>

<p>The JSON generated from this would look like:</p>

<pre><code>{
  "type": "http://schema.org/Event" ,
  "id": "http://lanyrd.com/2011/oscon/",
  "properties": {}
}
</code></pre>

<p>The RDF would look like:</p>

<pre><code>&lt;http://lanyrd.com/2011/oscon&gt;
  a &lt;http://schema.org/Event&gt; ,
    &lt;http://microformats.org/profile/hcalendar#vevent&gt; ,
    &lt;http://lanyrd.com/vocab/Conference&gt; ;
  .
</code></pre>

<h3>Disambiguating Properties</h3>

<p>To work with the RDF model, properties have to have URIs. We need to have a way of easily creating the URIs for the short-name properties without people changing their existing microdata markup.</p>

<blockquote>
  <p>Note: I&#8217;ve substantially revised this section following discussion with <a href="http://blog.foolip.org/">Philip Jägenstedt</a>. Old text is struck through, new text underlined.</p>
</blockquote>

<p>The way that this is done in RDFa 1.1 is through a <code>vocab</code> attribute, which provides a URI prefix that is concatenated to any short-name properties or types. <strike>We could use the same approach here, but call the attribute <code>itemvocab</code> to fit in with the general method of naming attributes in microdata.</strike> <u>Using this with microdata would be tedious for users however, and it would be easy for the <code>itemtype</code> and <code>itemvocab</code> to get out of sync in weird ways.</u></p>

<p><strike><code>itemvocab</code> would only be allowed on elements with an <code>itemscope</code>. The scope of <code>itemvocab</code> would be limited to the item itself, so that it&#8217;s not forgotten when it&#8217;s needed, particularly in copy-and-paste scenarios. However, to make it easier to use I think it should probably be given a default value if it isn&#8217;t present, as follows:</strike></p>

<p><u>Instead, the vocabulary for the properties could be identified as follows:</u></p>

<ol>
<li>set <em>vocab</em> to the <code>itemtype</code> of the item if it is present, and the URL of the document if not</li>
<li>use a substring of <em>vocab</em>:
<ol><li>if <em>vocab</em> contains a <code>#</code>, the substring of <em>vocab</em> up to and including the <code>#</code></li>
<li>otherwise, the substring of <em>vocab</em> up to and including its final <code>/</code></li></ol></li>
</ol>

<p>For example, if you have:</p>

<pre><code>&lt;li class="conference" itemscope itemid="/2011/oscon/"
    itemtype="http://schema.org/Event"
    itemclass="http://microformats.org/profile/hcalendar#vevent /vocab/Conference"&gt;
  ...
&lt;/li&gt;
</code></pre>

<p>then <strike><code>itemvocab</code></strike> <u>the item vocabulary</u> would default to <code>http://schema.org/</code>.</p>

<p><strike>There could be an extra restriction that if <code>itemtype</code> is specified, <code>itemvocab</code> must be in the same domain as that type; that could help prevent the weird situation where in the generated RDF the properties would be interpreted as being in a completely different vocabulary from the <code>itemtype</code>.</strike></p>

<p><strike>Within the DOM API, the <code>itemVocab</code> IDL attribute on HTML elements would reflect the <code>itemvocab</code> attribute.</strike></p>

<p><u>Note: the following example has been altered in place.</u></p>

<p>For example, take the following markup:</p>

<pre><code>&lt;li class="conference" itemscope itemid="/2011/oscon/"
    itemtype="http://schema.org/Event" 
    itemclass="SocialEvent BusinessEvent EducationEvent"&gt;
  &lt;h3&gt;
    &lt;a itemprop="url" href="/2011/oscon/"&gt;
      &lt;span itemprop="name"&gt;OSCON 2011&lt;/span&gt;
    &lt;/a&gt;
  &lt;/h3&gt;
  &lt;p itemprop="location" itemscope itemid="/places/portland/"
     itemtype="http://schema.org/Place"&gt;
    &lt;span itemprop="name"&gt;&lt;a href="/places/usa/"&gt;United States&lt;/a&gt; / &lt;a itemprop="url" href="/places/portland/"&gt;Portland&lt;/a&gt;&lt;/span&gt;
  &lt;/p&gt;
  &lt;p class="date"&gt;
    &lt;time itemprop="startDate" datetime="2011-07-25"&gt;25th&lt;/time&gt;–
    &lt;time itemprop="endDate" datetime="2011-07-29"&gt;29th July 2011&lt;/time&gt;
  &lt;/p&gt;
  ...
&lt;/li&gt;
</code></pre>

<p>The vocabulary for the <code>&lt;li&gt;</code> element defaults to <code>http://schema.org/</code> based on the value of the <code>itemtype</code>. The short-named properties and classes within that item are turned into URIs by pre-pending <code>http://schema.org/</code> to their name. Similarly, the properties on the nested <code>http://schema.org/Place</code> are pre-pended with <code>http://schema.org/Place/</code>. The resulting RDF would be:</p>

<pre><code>@prefix s: &lt;http://schema.org/&gt;

&lt;/2011/oscon/&gt;
  a s:Event ,
    s:SocialEvent ,
    s:BusinessEvent ,
    s:EducationEvent ;
  s:url &lt;http://lanyrd.com/2011/oscon/&gt; ;
  s:name "OSCON 2011" ;
  s:location &lt;/places/portland/&gt; ;
  s:startDate "2011-07-25"^^xsd:date ;
  s:endDate "2011-07-29"^^xsd:date ;
  .

&lt;/places/portland/&gt;
  a s:Place ;
  s:url &lt;http://lanyrd.com/places/portland/&gt; ;
  s:name "United States / Portland" ;
  .
</code></pre>

<p>Note: see below for how the values are created in this example.</p>

<p>The JSON would be just the same as from a standard microdata processor; there&#8217;s no mapping to URIs for that output:</p>

<pre><code>{
  "type": "http://schema.org/Event",
  "id": "http://lanyrd.com/2011/oscon/",
  "properties": {
    "url": [
      "http://lanyrd.com/2011/oscon/"
    ],
    "name": [
      "OSCON 2011"
    ],
    "location": [
      {
        "type": "http://schema.org/Place",
        "id": "http://lanyrd.com/places/portland/",
        "properties": {
          "name": [
            "United States / Portland"
          ],
          "url": [
            "http://lanyrd.com/places/portland/"
          ]
        }
      }
    ],
    "startDate": [
      "2011-07-25"
    ],
    "endDate": [
      "2011-07-29"
    ]
  }
}
</code></pre>

<h3>Adding Datatypes</h3>

<p>How to manage datatypes in RDF generated from microdata is something where the best approach is not at all clear to me. A couple of years ago I talked about some <a href="http://www.jenitennison.com/blog/node/120">frustrations with RDF datatyping</a>, and datatypes in RDF still frustrate me by being hard to use in sensible ways throughout the RDF toolchain. Nevertheless, it&#8217;s what we have. </p>

<p>The possibilities I can see for microdata+RDF are:</p>

<ol>
<li><p>Use plain literals for everything, including URIs, equivalent to using strings as microdata does. This makes things simple for the publisher and keeps the markup in the page clean, but makes it difficult for consumers who are using RDF toolchains: they will <em>usually</em> have to do some kind of processing of the RDF generated from microdata+RDF to add appropriate datatypes to the values. There are two issues with this approach:</p>

<ul><li>I have a feeling that microdata+RDF processors will make up their own rules to add datatypes to the data extracted from a page (using rules like those described below and/or sniffing of values and/or using information from known built-in vocabularies), in an effort to add value for their users. But if different processors do that in different ways, we have an interoperability problem.</li>
<li>In some vocabularies, the datatype of a value is not derivable from the property. The most important/common example of this is <a href="http://www.w3.org/TR/skos-reference/#notations"><code>skos:notation</code></a>, which uses values with different datatypes to supply different identifiers from different identification schemes for a given concept.</li></ul></li>
<li><p>Assign datatypes based on the element type in the HTML. If the property value has come from a URL attribute, assume that it&#8217;s a resource rather than a literal; if the element is a <code>&lt;time&gt;</code> element, work out the datatype based on the syntax of the <code>datetime</code> attribute; otherwise assume it&#8217;s a string and give it a language in the case that one is specified. This gives some information but leads to a somewhat strange situation where you can mark up something as a date/time but not as a number.</p></li>
<li><p>Supplement the processing described in 2. with some basic datatype sniffing. Basically, if the value looks like a number or a boolean value then assign it a numeric or boolean datatype based on its syntax. This could reuse the <a href="http://www.w3.org/TeamSubmission/turtle/#literal">rules for recognising different literals from Turtle</a>. This wouldn&#8217;t be perfect; in particular, it would guess that strings that consist purely of numbers such as zip codes were numbers. I&#8217;m inclined not to go down this path.</p></li>
<li><p>Supplement the processing described in 2. with a <code>itemvaltype</code> attribute that takes a token from the list of <a href="http://www.w3.org/TR/xmlschema-2/#built-in-datatypes">built-in XML Schema Datatypes</a> or the token &#8216;<code>literal</code>&#8217;. The &#8216;<code>literal</code>&#8217; token would be used to override the normal processing of URL attributes in the case where those really should be literals rather than resources. In this design, it would be easy to create literals using one of the most usual datatypes, but not possible to use datatypes that are specific to a given vocabulary.</p></li>
<li><p>Supplement the processing described in 4. by allowing the <code>itemvaltype</code> to take either a token or a URL. The thing I don&#8217;t like about this design is that the token would be interpreted as being within the XML Schema Datatypes vocabulary rather than the vocabulary specified for <code>itemvocab</code> (used for tokens in <code>itemprop</code> and <code>itemclass</code>). This seems like it might turn into a source of confusion, but if we went the other way and had <code>itemvaltype</code> being interpreted based on <code>itemvocab</code>, it would be harder to give a value the more common datatypes such as numbers and boolean values.</p></li>
</ol>

<p>My inclination, somewhat reluctantly as it&#8217;s the most complex, would be to use the last of these, because it provides for decentralised extensibility of datatypes, and support for decentralised extensibility is the core aim of these extensions. In other words, have a <code>itemvaltype</code> attribute that can hold either a token, which must be one of <code>literal</code> or the local name of an XML Schema datatype, or a URL. On a <code>&lt;time&gt;</code> element, this would default to the appropriate type based on the syntax of the value of the <code>datetime</code> attribute.</p>

<p>To be conformant, the <code>itemvaltype</code> would have to be an allowed value type for the properties given in <code>itemprop</code> and the value of the property must be a legal value for the datatype. (In keeping with the style of the microdata specification, the mechanisms for working out what value types are allowed and what the legal values are for non-XML Schema datatypes would be left undefined &#8212; a consuming application would look at the definition of the vocabulary.)</p>

<p>Within the DOM API, the <code>itemValType</code> IDL attribute on HTML elements would reflect the <code>itemvaltype</code> attribute. The value of <code>itemvaltype</code> <em>wouldn&#8217;t</em> change the types of the values returned by <code>element.itemValue</code> or in the JSON mapping from microdata; it would purely be used when generating RDF from that data.</p>

<p>For example, if someone started with some markup like:</p>

<pre><code>&lt;div itemscope itemtype="http://schema.org/AggregateOffer"&gt;
  Priced from: &lt;span itemprop="lowPrice"&gt;$35&lt;/span&gt;
  &lt;span itemprop="offerCount"&gt;1938&lt;/span&gt; tickets left
&lt;/div&gt;
</code></pre>

<p>it might be supplemented with some type information like:</p>

<pre><code>&lt;div itemscope itemtype="http://schema.org/AggregateOffer"&gt;
  Priced from: &lt;span itemprop="lowPrice" itemvaltype="http://schema.org/Price"&gt;$35&lt;/span&gt;
  &lt;span itemprop="offerCount" itemvaltype="integer"&gt;1938&lt;/span&gt; tickets left
&lt;/div&gt;
</code></pre>

<p>which would generate RDF like:</p>

<pre><code>@prefix s: &lt;http://schema.org/&gt;

[] a s:AggregateOffer ;
  s:lowPrice "$35"^^s:Price ;
  s:offerCount 1938 ;
  .
</code></pre>

<p>(Note: Here I&#8217;m assuming that schema.org defines a <code>http://schema.org/Price</code> datatype which includes a currency and a number. They don&#8217;t currently.)</p>

<p>The JSON would still be:</p>

<pre><code>{
  "type": "http://schema.org/AggregateOffer",
  "properties": {
    "lowPrice": [
      "$35"
    ],
    "offerCount": [
      "1938"
    ]
  }
}
</code></pre>

<h3>Non-Additions</h3>

<p>When I wrote a couple of years ago about <a href="http://www.jenitennison.com/blog/node/103">what microdata can&#8217;t do</a>, one of the things that I identified was not being able to express XML Literals. Having thought about this more, what&#8217;s actually missing isn&#8217;t to do with RDF, but is the ability to use the <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/content-models.html#innerhtml"><code>innerHTML</code></a> of an element to provide a value for a property rather than its <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#textcontent"><code>textContent</code></a>.</p>

<p>For example, the description of an event might run over several paragraphs, or even in a single paragraph include other markup such as emphasised text, ruby markup, or links to additional information. People who are working from the DOM API can capture this information when they need it by getting the <code>innerHTML</code> of the element rather than its <code>itemValue</code>, but in the JSON mapping, the value is always the <code>itemValue</code> &#8212; the text content of the element.</p>

<p>So this is a general microdata simplifying limitation. I&#8217;d argue that we shouldn&#8217;t add any special handling to plug this hole at the microdata+RDF level. If it turns out that having values that contain markup is useful then it will be added to microdata, and the microdata+RDF mapping would then be extended to create <code>rdf:XMLLiteral</code>s or HTML literals (for which there is no defined datatype in RDF at the moment) for such values.</p>

<p>Similarly, I haven&#8217;t said anything in this post about providing machine-readable values to override the text content of an element. There is <a href="http://www.w3.org/Bugs/Public/show_bug.cgi?id=13240">an open bug</a> about whether and how that capability might be added to HTML/microdata. I happen to think that it&#8217;s useful, but that utility isn&#8217;t limited to RDF processing. Whichever route is chosen there, I think it&#8217;s important to keep the property values used by basic microdata and microdata+RDF aligned.</p>

<h2>Summary</h2>

<p>To summarise, one direction that we could take in aligning microdata and RDFa would be to define an extension to microdata to add support for decentralised extensibility and the RDF data model. I think that would entail adding attributes such as:</p>

<ul>
<li><code>itemclass</code> to make it easy to define multiple types for an item</li>
<li><code>itemvocab</code> and some default processing to provide nice mappings for short-name properties into URIs</li>
<li><code>itemvaltype</code> and some default processing to assign datatypes to values</li>
</ul>

<p>For publishers and consumers, a single language with optional extensions greatly simplifies the use of embedded data. Property names don&#8217;t have to be repeated or balancing acts made between different processing models.</p>

<p>RDFa proponents get a syntax that can be used to generate a natural RDF model against which they can build RDF-oriented APIs and map to other formats such as JSON-LD.</p>

<p>For microdata proponents, this approach doesn&#8217;t pollute microdata with requirements that they see as superfluous, and doesn&#8217;t change the behaviour of core microdata processors. Browsers, search engines and other consumers can continue to use the JSON output and only those who really want to support RDF need to do so.</p>

<p>I&#8217;m sure that there are things that I&#8217;ve missed in my outline above, issues that I haven&#8217;t thought of. But if there is to be any kind of convergence between microdata/RDFa, this layered approach seems to me to be the kind of convergence that is most likely to eventually result in one language for embedding data in HTML rather than two or three.</p>

<p><strong>Note: if you prefer to comment on Google+, please add your comment to <a href="https://plus.google.com/u/0/112095156983892490612/posts/aUqGQSLzDPv">my announcement post there</a></strong></p>
    ]]></content>
  </entry>
  <entry>
    <title>Using Multiple Vocabularies in Microdata</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/161" />
    <id>http://www.jenitennison.com/blog/node/161</id>
    <published>2011-07-28T08:25:21+00:00</published>
    <updated>2011-07-28T08:25:21+00:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="microdata" />
    <category term="schema.org" />
    <summary type="html"><![CDATA[<p>I <a href="http://www.jenitennison.com/blog/node/160">wrote the other day</a> about how <a href="http://www.legislation.gov.uk/">legislation.gov.uk</a> needs to share data at three levels to satisfy its goals as a website:</p>

<ul>
<li>large-scale consumers such as search engines</li>
<li>small-scale consumers that provide us with a useful service</li>
<li>specialist consumers that are interested specifically in our data</li>
</ul>

<p>and the requirement to use multiple, incrementally more specialised, vocabularies to describe the same things as a result.</p>

<p>What I want to do here is explore how a publisher might handle this kind of situation using microdata. The ground has already been substantially covered by <a href="http://openspring.net/blog/2011/06/10/microdata-multiple-vocabularies">Stéphane Corlosquet</a>; what I do here is work through an example where the consumers are microdata&#8217;s primary targets &#8212; search engines and browsers &#8212; look at why it&#8217;s hard to fix this within microdata itself, and discuss how people who create vocabularies to be used with microdata might help publishers who find themselves in this situation by designing those vocabularies to be used together as well as on their own.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>I <a href="http://www.jenitennison.com/blog/node/160">wrote the other day</a> about how <a href="http://www.legislation.gov.uk/">legislation.gov.uk</a> needs to share data at three levels to satisfy its goals as a website:</p>

<ul>
<li>large-scale consumers such as search engines</li>
<li>small-scale consumers that provide us with a useful service</li>
<li>specialist consumers that are interested specifically in our data</li>
</ul>

<p>and the requirement to use multiple, incrementally more specialised, vocabularies to describe the same things as a result.</p>

<p>What I want to do here is explore how a publisher might handle this kind of situation using microdata. The ground has already been substantially covered by <a href="http://openspring.net/blog/2011/06/10/microdata-multiple-vocabularies">Stéphane Corlosquet</a>; what I do here is work through an example where the consumers are microdata&#8217;s primary targets &#8212; search engines and browsers &#8212; look at why it&#8217;s hard to fix this within microdata itself, and discuss how people who create vocabularies to be used with microdata might help publishers who find themselves in this situation by designing those vocabularies to be used together as well as on their own.</p>

<!--break-->

<h2>Use Case</h2>

<p>I&#8217;m going to use <a href="http://lanyrd.com/">Lanyrd</a> from <a href="http://simonwillison.net/">Simon Willison</a> and <a href="http://natbat.net/">Natalie Downe</a> as an example. Lanyrd is &#8220;the social conference directory&#8221;: it keeps track of conferences that you&#8217;re attending or speaking at, and lets you know about ones that your friends (or at least the people you follow on Twitter) are going to as well, as well as providing a bunch of other useful facilities.</p>

<p>Lanyrd currently uses microformats to mark up events so that nice summaries appear within search engine results. Here&#8217;s a (slightly simplified for concision) example from the front page:</p>

<pre><code>&lt;li class="conference vevent"&gt;
  &lt;h3&gt;&lt;a href="/2011/oscon/" class="summary url"&gt;OSCON 2011&lt;/a&gt;&lt;/h3&gt;
  &lt;p class="location"&gt;
    &lt;a href="/places/usa/"&gt;United States&lt;/a&gt; / &lt;a href="/places/portland/"&gt;Portland&lt;/a&gt;
  &lt;/p&gt;
  &lt;p class="date"&gt;
    &lt;abbr class="dtstart" title="2011-07-25"&gt;25th&lt;/abbr&gt;–
    &lt;abbr class="dtend"   title="2011-07-29"&gt;29th July 2011&lt;/abbr&gt;
  &lt;/p&gt;
  ...
&lt;/li&gt;
</code></pre>

<p>This is easy enough to understand: OSCON 2011 has a URL of <a href="http://lanyrd.com/2011/oscon/"><code>http://lanyrd.com/2011/oscon/</code></a>, is located in Portland (US), starts on 25th July and ends on 29th July 2011.</p>

<p>Say that Lanyrd decided to switch to using <a href="http://www.schema.org/">schema.org</a> microdata. The markup would change to something like the following:</p>

<pre><code>&lt;li class="conference" itemscope 
    itemtype="http://schema.org/Event" itemid="/2011/oscon/"&gt;
  &lt;h3&gt;
    &lt;a itemprop="url" href="/2011/oscon/"&gt;
      &lt;span itemprop="name"&gt;OSCON 2011&lt;/span&gt;
    &lt;/a&gt;
  &lt;/h3&gt;
  &lt;p itemprop="location" itemscope 
     itemtype="http://schema.org/Place" itemid="/places/portland/"&gt;
    &lt;span itemprop="name"&gt;
      &lt;a href="/places/usa/"&gt;United States&lt;/a&gt; / &lt;a itemprop="url" href="/places/portland/"&gt;Portland&lt;/a&gt;
    &lt;/span&gt;
  &lt;/p&gt;
  &lt;p class="date"&gt;
    &lt;time itemprop="startDate" datetime="2011-07-25"&gt;25th&lt;/time&gt;–
    &lt;time itemprop="endDate"   datetime="2011-07-29"&gt;29th July 2011&lt;/time&gt;
  &lt;/p&gt;
  ...
&lt;/li&gt;
</code></pre>

<p>A few notes, because there were some design decisions involved in this mapping:</p>

<ul>
<li>I&#8217;ve used a plain <a href="http://schema.org/Event"><code>http://schema.org/Event</code></a> because I wasn&#8217;t sure how to classify a conference &#8212; is it a <code>SocialEvent</code> or a <code>BusinessEvent</code> or an <code>EducationEvent</code>? Depends on the conference, I guess</li>
<li>I&#8217;ve assumed that the URIs for both the conference and its location are also item identifiers</li>
<li>I&#8217;ve changed the markup a bit to add <code>&lt;span&gt;</code> elements where necessary to get the desired data out, namely around the names of the conference and the place; I could have used separate <code>&lt;meta&gt;</code> or <code>&lt;link&gt;</code> elements instead but that would have meant repetition of data within the page</li>
</ul>

<p>All well and good.</p>

<p>Now let&#8217;s say that browsers start to support the <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#vevent">vEvent vocabulary defined within the WHATWG HTML microdata specification</a> and offer some really nice functionality: because there&#8217;s a clear mapping to iCalendar, they enable users to drag an event from the browser to a calendar application, and have it create an entry within the calendar.</p>

<p>Say Lanyrd really want to take advantage of this. It means marking up their pages in something like a mix between the two examples we&#8217;ve looked at so far &#8212; microdata syntax but with the vEvent vocabulary (which is based on the hCalendar microformat vocabulary) rather than the schema.org vocabulary:</p>

<pre><code>&lt;li class="conference" itemscope 
    itemtype="http://microformats.org/profile/hcalendar#vevent" itemid="/2011/oscon/"&gt;
  &lt;h3&gt;
    &lt;a itemprop="url" href="/2011/oscon/"&gt;
      &lt;span itemprop="summary"&gt;OSCON 2011&lt;/span&gt;
    &lt;/a&gt;
  &lt;/h3&gt;
  &lt;p itemprop="location"&gt;
    &lt;a href="/places/usa/"&gt;United States&lt;/a&gt; / &lt;a href="/places/portland/"&gt;Portland&lt;/a&gt;
  &lt;/p&gt;
  &lt;p class="date"&gt;
    &lt;time itemprop="dtstart" datetime="2011-07-25"&gt;25th&lt;/time&gt;–
    &lt;time itemprop="dtend"   datetime="2011-07-29"&gt;29th July 2011&lt;/time&gt;
  &lt;/p&gt;
  ...
&lt;/li&gt;
</code></pre>

<p>But now Lanyrd have a dilemma. If they mark up their pages using the schema.org vocabulary, they can&#8217;t take advantage of the browser drag-and-drop support; if they mark up their pages using the vEvent vocabulary they won&#8217;t get their pages displaying nicely in search engine results. They can get the benefits from one consumer or the other but not both at the same time. What to do?</p>

<h2>Publisher Workarounds</h2>

<p>What could Lanyrd do to work around this problem?</p>

<h3>Different Syntaxes</h3>

<p>The first, eminently pragmatic, workaround, would be to use different syntaxes to encode the event information for the two different consumers. Since schema.org is likely to continue to understand microformats for the forseeable future, Lanyrd could stick to their original microformat markup and just add similar microdata for browsers to pull out to create iCalendar data. The page would look like:</p>

<pre><code>&lt;li class="conference vevent" itemscope 
    itemtype="http://microformats.org/profile/hcalendar#vevent" itemid="/2011/oscon/"&gt;
  &lt;h3&gt;
    &lt;a itemprop="url" class="url" href="/2011/oscon/"&gt;
      &lt;span itemprop="summary" class="summary"&gt;OSCON 2011&lt;/span&gt;
    &lt;/a&gt;
  &lt;/h3&gt;
  &lt;p itemprop="location" class="location"&gt;
    &lt;a href="/places/usa/"&gt;United States&lt;/a&gt; / &lt;a href="/places/portland/"&gt;Portland&lt;/a&gt;
  &lt;/p&gt;
  &lt;p class="date"&gt;
    &lt;time itemprop="dtstart" class="dtstart" datetime="2011-07-25"&gt;25th&lt;/time&gt;–
    &lt;time itemprop="dtend"   class="dtend"   datetime="2011-07-29"&gt;29th July 2011&lt;/time&gt;
  &lt;/p&gt;
  ...
&lt;/li&gt;
</code></pre>

<p>In other words, they could handle their requirement by not using microdata for one of the vocabularies. I don&#8217;t think this is a particularly acceptable solution, given that schema.org specifically wants publishers to use microdata, but it would work.</p>

<h3>Repeated Data</h3>

<p>A second workaround that Lanyrd could use would be to have some shadow markup for the data targeted at schema.org; the visible event information in the page itself should still be marked up using the vEvent vocabulary because it gives an area of the page that users can drag and drop. The basic version of this would look like:</p>

<pre><code>&lt;li class="conference"&gt;
  &lt;!-- data for browsers --&gt;
  &lt;span itemscope 
    itemtype="http://microformats.org/profile/hcalendar#vevent" itemid="/2011/oscon/"&gt;
    &lt;h3&gt;
      &lt;a itemprop="url" href="/2011/oscon/"&gt;
        &lt;span itemprop="summary"&gt;OSCON 2011&lt;/span&gt;
      &lt;/a&gt;
    &lt;/h3&gt;
    &lt;p itemprop="location"&gt;
      &lt;a href="/places/usa/"&gt;United States&lt;/a&gt; / &lt;a href="/places/portland/"&gt;Portland&lt;/a&gt;
    &lt;/p&gt;
    &lt;p class="date"&gt;
      &lt;time itemprop="dtstart" datetime="2011-07-25"&gt;25th&lt;/abbr&gt;–
      &lt;time itemprop="dtend"   datetime="2011-07-29"&gt;29th July 2011&lt;/abbr&gt;
    &lt;/p&gt;
    ...
  &lt;/span&gt;

  &lt;!-- data for search engines --&gt;
  &lt;span itemscope itemtype="http://schema.org/Event" itemid="/2011/oscon/"&gt;
    &lt;link itemprop="url" href="/2011/oscon/"&gt;
    &lt;meta itemprop="name" content="OSCON 2011"&gt;
    &lt;span itemprop="location" itemscope 
          itemtype="http://schema.org/Place" itemid="/places/portland/"&gt;
      &lt;link itemprop="url" href="/places/portland/"&gt;
      &lt;meta itemprop="name" content="United States / Portland"&gt;
    &lt;/span&gt;
    &lt;time itemprop="startDate" datetime="2011-07-25"&gt;&lt;/time&gt;
    &lt;time itemprop="endDate"   datetime="2011-07-29"&gt;&lt;/time&gt;
    ...
  &lt;/li&gt;
&lt;/li&gt;
</code></pre>

<p>Note: I&#8217;ve used empty <code>&lt;time&gt;</code> elements to mark up the dates for the conference in the schema.org shadow data because the microdata spec says &#8220;If a property&#8217;s value represents a date, time, or global date and time, the property must be specified using the <code>datetime</code> attribute of a <code>time</code> element.&#8221; They&#8217;re empty, though, so they won&#8217;t be displayed on the page.</p>

<p>There are a few issues with this workaround:</p>

<ul>
<li>it repeats content and thus bloats the page</li>
<li>in the microdata DOM API, there now appear to be two items when really there&#8217;s one conference; this might not be a big deal if scripts access items by type rather than just getting all the items</li>
<li>search engines might (wild speculation follows) be more suspicious of data that isn&#8217;t visible within the page; there&#8217;s no way for schema.org to know that the same data appears visibly elsewhere with equivalent markup</li>
</ul>

<h3>Use <code>itemref</code></h3>

<p>A third possibility for Lanyrd would be to something similar to the previous example but use the <code>itemref</code> attribute to point to any shared data. Unfortunately in this case, there&#8217;s only one property that&#8217;s actually shared (with the same semantics) between the two vocabularies &#8212; <code>url</code> &#8212; so using this technique doesn&#8217;t improve the markup all that much from the previous example:</p>

<pre><code>&lt;li class="conference"&gt;
  &lt;!-- data for browsers --&gt;
  &lt;span itemscope 
    itemtype="http://microformats.org/profile/hcalendar#vevent" itemid="/2011/oscon/"&gt;
    &lt;h3&gt;
      &lt;a id="oscon-url" itemprop="url" href="/2011/oscon/"&gt;
        &lt;span itemprop="summary"&gt;OSCON 2011&lt;/span&gt;
      &lt;/a&gt;
    &lt;/h3&gt;
    &lt;p itemprop="location"&gt;
      &lt;a href="/places/usa/"&gt;United States&lt;/a&gt; / &lt;a href="/places/portland/"&gt;Portland&lt;/a&gt;
    &lt;/p&gt;
    &lt;p class="date"&gt;
      &lt;time itemprop="dtstart" datetime="2011-07-25"&gt;25th&lt;/abbr&gt;–
      &lt;time itemprop="dtend"   datetime="2011-07-29"&gt;29th July 2011&lt;/abbr&gt;
    &lt;/p&gt;
    ...
  &lt;/span&gt;

  &lt;!-- data for search engines --&gt;
  &lt;span itemscope itemtype="http://schema.org/Event" itemid="/2011/oscon/"
    itemref="oscon-url"&gt;
    &lt;meta itemprop="name" content="OSCON 2011"&gt;
    &lt;span itemprop="location" itemscope 
          itemtype="http://schema.org/Place" itemid="/places/portland/"&gt;
      &lt;link itemprop="url" href="/places/portland/"&gt;
      &lt;meta itemprop="name" content="United States / Portland"&gt;
    &lt;/span&gt;
    &lt;time itemprop="startDate" datetime="2011-07-25"&gt;&lt;/time&gt;
    &lt;time itemprop="endDate"   datetime="2011-07-29"&gt;&lt;/time&gt;
    ...
  &lt;/li&gt;
&lt;/li&gt;
</code></pre>

<p>In other situations, where there is more overlap in the property names used by the two types, there might be more advantage in this approach.</p>

<h3>Content Negotiation</h3>

<p>A final workaround would be for Lanyrd to serve up an HTML page that uses the schema.org vocabulary to search engines and an HTML page that uses the vEvent vocabulary to browsers, by sniffing the <code>User-Agent</code> header.</p>

<p>This has the advantage of not having to try to cram two conflicting vocabularies into a single page but the disadvantage of having to code for the content negotiation. Essentially, it shifts the complexity and repetition from the HTML page to the code that generates the HTML page, but does address the three disadvantages that I listed for the &#8216;repeated content&#8217; solution described above.</p>

<h2>Publisher Workarounds</h2>

<p>Lanyrd could also lobby schema.org and/or WHATWG to make changes to what data they consume.</p>

<h3>Lobby for Convergence</h3>

<p>Lanyrd could lobby schema.org to understand the vEvent vocabulary and/or WHATWG to specify browser handling of the schema.org vocabulary.</p>

<p>This might work, but the vocabularies do have different goals and requirements, which might make it hard to unify them: vEvent maps neatly and easily to iCalendar, schema.org is oriented around Rich Snippets in search engine results. The modelling of the <code>location</code> property in each shows this different emphasis: it only needs to map to a string in iCalendar so there&#8217;s no need to model the location as an item itself, but in search engine results it&#8217;s useful to link to the location, display a map and so on, which is only possible if the location is modelled as an item in its own right.</p>

<h3>Lobby for Different Processing</h3>

<p>Finally, Lanyrd could lobby schema.org and/or WHATWG to trigger their recognition of an event based on something other than the <code>itemtype</code> of an item, and to interpret full URIs for properties in the same way as equivalent short names.</p>

<p>For example, currently the <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#conversion-to-icalendar">conversion of vEvent to iCalendar defined in the WHATWG HTML specification</a> is triggered by the presence of an item that is an vEvent:</p>

<blockquote>
  <p>If none of the nodes in nodes are items with the type <code>http://microformats.org/profile/hcalendar#vevent</code>, then there is no vEvent data. Abort the algorithm, returning nothing.</p>
</blockquote>

<p>Let&#8217;s say that it were instead triggered by items with <em>either</em>:</p>

<ul>
<li>an <code>itemtype</code> of <code>http://microformats.org/profile/hcalendar#vevent</code> <em>or</em></li>
<li>a <code>http://microformats.org/profile/hcalendar#type</code> of <code>vevent</code></li>
</ul>

<p>and that in the former case, it would read short name properties but in the latter case it would read properties with URIs like <code>http://microformats.org/profile/hcalendar#location</code>.</p>

<p>In that case, Lanyrd could use the schema.org vocabulary for the type given in the <code>itemtype</code> attribute, but markup extra properties for the item using the vEvent property URIs. The markup would be something like:</p>

<pre><code>&lt;li class="conference" itemscope 
    itemtype="http://schema.org/Event" itemid="/2011/oscon/"&gt;
  &lt;meta itemprop="http://microformats.org/profile/hcalendar#type" content="vevent"&gt;
  &lt;h3&gt;
    &lt;a itemprop="url http://microformats.org/profile/hcalendar#url" href="/2011/oscon/"&gt;
      &lt;span itemprop="name http://microformats.org/profile/hcalendar#summary"&gt;OSCON 2011&lt;/span&gt;
    &lt;/a&gt;
  &lt;/h3&gt;
  &lt;p itemprop="location" itemscope 
     itemtype="http://schema.org/Place" itemid="/places/portland/"&gt;
    &lt;span itemprop="name"&gt;
      &lt;a href="/places/usa/"&gt;United States&lt;/a&gt; / &lt;a itemprop="url" href="/places/portland/"&gt;Portland&lt;/a&gt;
    &lt;/span&gt;
  &lt;/p&gt;
  &lt;meta itemprop="http://microformats.org/profile/hcalendar#location" 
        content="United States / Portland"&gt;
  &lt;p class="date"&gt;
    &lt;time itemprop="startDate http://microformats.org/profile/hcalendar#dtstart" 
          datetime="2011-07-25"&gt;25th&lt;/time&gt;–
    &lt;time itemprop="endDate   http://microformats.org/profile/hcalendar#dtend"   
          datetime="2011-07-29"&gt;29th July 2011&lt;/time&gt;
  &lt;/p&gt;
  ...
&lt;/li&gt;
</code></pre>

<p>Note that here the location has to be repeated because vEvent expects a string while schema.org expects an item.</p>

<p>The same kind of pattern could work the other way around: schema.org could recognised events based on a <code>http://schema.org/type</code> property with the value <code>Event</code>, and understand property URIs that were equivalent to each of the short-name properties that it uses. (Such <a href="http://schema.org/schema.owl">URIs for schema.org properties</a> already exist.)</p>

<h2>Multiple Types for Microdata Items</h2>

<p>Earlier this year there was some <a href="http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-June/032243.html">discussion on the WHATWG mailing list</a> about the requirement for multiple types for items.</p>

<p>The use cases there were not the same multiple-consumers use case that I have outlined above, but around support for inheritance in types. For example, schema.org lets people define their own types in the schema.org domain, such as <code>http://schema.org/Event/Conference</code>. There&#8217;s no way of exposing the type hierarchy explicitly (that all <code>Conference</code>s are <code>Event</code>s) so scripts that use microdata&#8217;s DOM API to search for items of the type <code>http://schema.org/Event</code> won&#8217;t find conferences. The discussion was about alleviating this by allowing publishers to put both <code>http://schema.org/Event</code> and <code>http://schema.org/Event/Conference</code> within the <code>itemtype</code> attribute. Alternatively, a conference could be typed as a <code>SocialEvent</code>, <code>BusinessEvent</code> <em>and</em> a <code>EducationEvent</code>, enabling it to take properties from all three.</p>

<p>The conclusion of the discussion was that it just wasn&#8217;t possible to use what would seem to be the obvious method of assigning multiple types to an item: having a space-separated list in the <code>itemtype</code> attribute. If we look at the markup that you would get for this example, we can see why there&#8217;s a problem:</p>

<pre><code>&lt;li class="conference" itemscope itemid="/2011/oscon/"
    itemtype="http://schema.org/Event http://microformats.org/profile/hcalendar#vevent"&gt;
  &lt;h3&gt;
    &lt;a itemprop="url" href="/2011/oscon/"&gt;
      &lt;span itemprop="name summary"&gt;OSCON 2011&lt;/span&gt;
    &lt;/a&gt;
  &lt;/h3&gt;
  &lt;p itemprop="location" itemscope 
     itemtype="http://schema.org/Place" itemid="/places/portland/"&gt;
    &lt;span itemprop="name"&gt;
      &lt;a href="/places/usa/"&gt;United States&lt;/a&gt; / &lt;a itemprop="url" href="/places/portland/"&gt;Portland&lt;/a&gt;
    &lt;/span&gt;
  &lt;/p&gt;
  &lt;meta itemprop="location" content="United States / Portland"&gt;
  &lt;p class="date"&gt;
    &lt;time itemprop="startDate dtstart" datetime="2011-07-25"&gt;25th&lt;/time&gt;–
    &lt;time itemprop="endDate   dtend"   datetime="2011-07-29"&gt;29th July 2011&lt;/time&gt;
  &lt;/p&gt;
  ...
&lt;/li&gt;
</code></pre>

<p>There are two issues with this markup. First, the definitions of the two types (in prose within the two specs) have different expectations about:</p>

<ul>
<li>what properties will be present: schema.org expects a <code>name</code> property and not a <code>summary</code> property, and vice versa for vEvent; similarly for <code>startDate</code>/<code>dtstart</code> and <code>endDate</code>/<code>dtend</code></li>
<li>what values the properties will have: schema.org expects <code>location</code> to have an item value whereas vEvent expects a string</li>
</ul>

<p>The result is that the mixed markup isn&#8217;t conformant with either vocabulary, and hence not a conformant HTML document. (Whether microdata consumers do anything about that non-conformance is a different question &#8212; they could just ignore properties that they don&#8217;t understand, or with value types that they don&#8217;t expect.)</p>

<p>Second, if the data is turned into any kind of format that needs full URIs for properties, such as RDF through <a href="http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#rdf">microdata&#8217;s RDF mapping</a>, it&#8217;s impossible to tell what type URI to use as the basis of that URI. If the item is assigned <em>just</em> the schema.org type, the <code>name</code> property would map to the URI:</p>

<pre><code>http://www.w3.org/1999/xhtml/microdata#http://schema.org/Event%23:name
</code></pre>

<p>If there is <em>just</em> the vEvent type, the <code>summary</code> property would map to the URI:</p>

<pre><code>http://www.w3.org/1999/xhtml/microdata#http://microformats.org/profile/hcalendar%23vevent:summary
</code></pre>

<p>When the item has more than one type, there is no way to know which type should be used as the basis of the URI generated for the property, or even if <em>both</em> should be used, as in the properties used by both vocabularies such as <code>url</code>.</p>

<p>(This issue isn&#8217;t specific to the RDF mapping defined in the microdata specification; it would arise in any RDF mapping from microdata, or any mapping in which short names for properties needed to be turned into globally unique terms.)</p>

<h2>Guidance for Microdata Vocabularies</h2>

<p>Having short names for properties makes writing microdata simple. They are much easier on the fingers and the eye than URIs, and, because they are scoped by type rather than vocabulary, they can be given simple names while still being specified tightly in terms of the types on which they can appear and the values that they can take. For example, the <code>location</code> of an <code>Event</code> can be limited to a geographical place while the <code>location</code> of a <code>Click</code> can be specified in terms of a point on a web page, rather than having to use more complex property names like <code>placeLocation</code> and <code>pointLocation</code>.</p>

<p>This could be a particular advantage in large and wide-ranging vocabularies such as schema.org&#8217;s where it&#8217;s likely that at some point there will be a clash in meaning between properties with the same name for different things. (Though the flip side for schema.org is that it has lots of inherited properties which really do have the same meaning across subtypes.)</p>

<p>The biggest problem with short names arise when you want to provide data to different consumers that use different vocabularies for that data. My guess is that in real life, in many cases this won&#8217;t be an issue, and certainly microdata has been designed with that assumption. Realistically, the majority of websites will probably only care about embedding data in web pages to the extent that search engines will read it, and will therefore only use one vocabulary &#8212; schema.org&#8217;s. Where more than one vocabulary <em>is</em> used in the page, it may well be that they are used in different locations (eg OGP for Facebook in the head of a page, schema.org in the body), or to mark up data about completely different kinds of things.</p>

<p>However, if you&#8217;re a publisher who wants to provide data to multiple consumers who understand different vocabularies &#8212; search engines <em>and</em> browsers as in the Lanyrd example above, for example &#8212; and those consumers define what they will consume solely based on the <code>itemtype</code> of an item, then you&#8217;re going to have to either workaround consumer&#8217;s behaviour as I described above, or ask those consumers to change how they work.</p>

<p>The most promising direction I can see at the moment would be to ask consumers to define their vocabularies such that they include</p>

<ol>
<li>a property that is used to identify the in-vocabulary type of items whose <code>itemtype</code> is not in that vocabulary</li>
<li>defined URIs for properties that are equivalent (and processed in the same way as) the short name properties for a given type</li>
</ol>

<p>The type-defining property <em>could</em> be <code>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</code>, with the value being the URI of the relevant type. However,</p>

<pre><code>&lt;link itemprop="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
      href="http://schema.org/Event"&gt;
</code></pre>

<p>is a lot more verbose than:</p>

<pre><code>&lt;meta itemprop="http://schema.org/type" content="Event"&gt;
</code></pre>

<p>so I imagine that the designers of microdata vocabularies would prefer to ask publishers to do the latter. On the other hand, if publishers are using multiple vocabularies, they might find it easier to use a consistent type-defining property across vocabularies; it&#8217;s hard to tell what the global usability payoffs might be here.</p>

<p>Or microdata could standardise the pattern by adding an attribute (eg <code>itemkind</code>, <code>iteminherit</code>, <code>itemothertype</code>, <code>itemmixin</code>, I dunno) which would list additional types. These could be exposed within the DOM API (which would be a big advantage for in-page scripts) but not used in the interpretation of short-name properties.</p>

<p>Vocabularies that don&#8217;t support processing of items based on a type-defining property and property URIs are effectively indicating that they don&#8217;t anticipate being mixed with others that have <em>also</em> made the same assumption that they won&#8217;t be mixed with others. Currently, for example, schema.org and the vocabularies defined within the WHATWG microdata specification both make this assumption. Working with one vocabulary that makes that assumption for a particular type is fine; working with two in microdata is much harder.</p>
    ]]></content>
  </entry>
  <entry>
    <title>My Experience of Web Standards</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/160" />
    <id>http://www.jenitennison.com/blog/node/160</id>
    <published>2011-07-24T16:24:00+00:00</published>
    <updated>2011-07-26T17:18:44+00:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="xml" />
    <category term="html5" />
    <category term="microdata" />
    <category term="rdf" />
    <category term="rdfa" />
    <summary type="html"><![CDATA[<p>One of the things that&#8217;s been niggling at the back of my mind since the <a href="http://schema.org">schema.org</a> announcement is how small a role search engine results plays in the wider data sharing efforts that I&#8217;m more familiar with in my work on <a href="http://www.legislation.gov.uk/">legislation.gov.uk</a>, and more generally how my day job experience differs from (what seem to be) more common experiences of development on the web. In this post, I&#8217;m going to talk about that experience, and about the particular problems that I see with the coexistence of microdata and RDFa as a result.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>One of the things that&#8217;s been niggling at the back of my mind since the <a href="http://schema.org">schema.org</a> announcement is how small a role search engine results plays in the wider data sharing efforts that I&#8217;m more familiar with in my work on <a href="http://www.legislation.gov.uk/">legislation.gov.uk</a>, and more generally how my day job experience differs from (what seem to be) more common experiences of development on the web. In this post, I&#8217;m going to talk about that experience, and about the particular problems that I see with the coexistence of microdata and RDFa as a result.</p>

<!--break-->

<p>My day job (the one I actually get paid for) is web development. The site I spend most of my time and effort on is <a href="http://www.legislation.gov.uk/">legislation.gov.uk</a>. This deals with complex content (UK legislation) that has to be presented in multiple formats (users love PDFs of legislation). Our aim is to make the data as reusable as possible by third parties through good, RESTful, web architecture, and we want to use open standards and open source technologies as part of the <a href="http://www.cabinetoffice.gov.uk/resource-library/open-source-open-standards-and-re-use-government-action-plan">UK government&#8217;s general strategy</a>.</p>

<p>legislation.gov.uk is not a global website like Amazon or eBay, but it&#8217;s not small either: it covers 60,000 changing items of legislation, providing point-in-time views for many of them, and with more added every day. It&#8217;s one of the top ten most used UK Government websites, with 2 million visits (about 10-12 million page views) each month and typically about 120 requests/second during the active times of the day. Legislation might sound like a highly specialist interest, but if you <a href="http://twitter.com/search/legislation.gov.uk">search for legislation.gov.uk on Twitter</a> you&#8217;ll see it being referenced over and over by people who want to share what the law says.</p>

<p>I do not by any means claim that my experience is representative of the wider web. I know that there are large numbers of sites that deal only in data, not documents, and certainly not documents with the kind of rich semantic structure that legislation has. I offer the following discussion as a data point, partly because I can&#8217;t quite believe that legislation.gov.uk is <em>completely</em> unique in its requirements and partly because obviously my perspective on a bunch of issues arises from this experience.</p>

<h2>Technology Stacks</h2>

<p>Legislation items are complex, semi-structured documents. Their natural fit is XML (well, that&#8217;s not quite true &#8212; their natural fit would be something that allowed overlapping markup &#8212; but XML is the closest that we have). So we store it in XML in a native XML database and we use an XML toolset to query it (XQuery) and transform it (XSLT) into various formats including rendering it as PDF (through XSL-FO).</p>

<p>Our next step for the development of the site involves looking at legislative effects. These form a graph: one item of legislation affects other items of legislation which may in turn affect other items and so on. There are all sorts of other links between items of legislation in terms of commencements, conferred powers and so on. Particularly because we already have well-thought-through URIs for legislation, the natural fit is to use RDF to represent this graph. We already offer a SPARQL endpoint for accessing some aspects of our data, but we expect to expand and develop this over the next few months and to use it as a layer under the website and exposed for reusers, in much the same way as we use the XML database.</p>

<p>As a government site, we have fairly strict limits on what we can do within our web pages: we have to make sure that they&#8217;re accessible by everyone who wants to view them. We aren&#8217;t able to use technologies that are only available in the latest browsers, but that&#8217;s OK because with the kind of content we deal with, we don&#8217;t have to do anything fancy anyway. So we use pretty basic HTML and CSS and Javascript, because that&#8217;s how you deliver content to end-users on the web (as well as exposing the underlying XML and RDF, to enable others to reuse the data).</p>

<p>In other words, we use three web stacks for delivering legislation.gov.uk:</p>

<ul>
<li>the XML stack, which is great for single-source publishing of documents that have more semantic structures than those supported by HTML</li>
<li>the RDF stack, which is well-suited for metadata about things that are identified by URIs</li>
<li>the HTML stack, which is absolutely necessary for delivering human-accessible content on the web</li>
</ul>

<p>What bemuses me, because of this experience, is that sometimes it appears that the narrative around these technologies is framed in terms of an exclusive choice between them. For example, <a href="http://twitter.com/mattur/status/89331716430372864">@mattur asked</a>:</p>

<p style="text-align:center;">
  <a href="http://twitter.com/mattur/status/89331716430372864"><img src="/blog/files/mattur-tweet.jpg" alt="@gimsieke @JeniT how may TAG members believe RDF(a) and X(HT)ML are way forward? How many think they aren't?" /></a>
</p>

<p>It is as if, if you use XML you <em>cannot</em> appreciate the utility of error-handling in HTML; or if you use RDF you <em>cannot</em> understand the need to represent documents in XML; or if you want to utilise HTML fully, you <em>cannot</em> adopt RDF&#8217;s view of data on the web. That&#8217;s simply not my experience. They each have their role on the web; supporting the use of one does not necessitate rejecting the use of the others.</p>

<p>It&#8217;s interesting that some of the standards that are most reviled are those that arise at the intersections, where it appears that one technology is trying to encroach on the space of another:</p>

<ul>
<li>XHTML at the border of XML and HTML</li>
<li>RDF/XML at the border of RDF and XML</li>
<li>RDFa at the border of all three</li>
</ul>

<p>At the same time, within legislation.gov.uk, we publish XHTML (because it&#8217;s the natural output from an XML toolchain) and create and process RDF/XML (because it gives us access to that data from within the XML toolchain). We use a small bit of RDFa in the XHTML to indicate the rights under which our information is avaialble, and don&#8217;t yet, but are thinking about using RDFa to mark up non-document semantics within our XML (to enable the XML markup to focus on the document structures that it&#8217;s good at). For all their imperfections, these intersection technologies are useful for managing cross-overs; the problems arise when they overstep their remit and people start to think that <em>all</em> HTML must be XHTML or <em>all</em> XML must be RDF/XML or <em>all</em> RDF must be RDFa.</p>

<h2>Sharing Scenarios</h2>

<p>The second thing that I wanted to explore is the experience from legislation.gov.uk of what it&#8217;s like to be a publisher who actively wants to share their data. We need to operate simultaneously at three levels in our data sharing efforts.</p>

<h3>Large-Scale Consumer-Driven Data Sharing</h3>

<p>The first target for our data sharing efforts are the search engines. Obviously we&#8217;re not selling anything, but we want people to be able to locate legislation easily when they want it, and we want people who have done the search to be able to see some information about the legislation so that they know that they&#8217;ve located the right item.</p>

<p>This is large-scale consumer (search engine) driven data sharing, typified by schema.org and Facebook&#8217;s <a href="http://developers.facebook.com/docs/opengraph/">Open Graph Protocol</a> (OGP). There are a few very big data consumers (Google, Microsoft, Yahoo!, Facebook etc) who need to consume data from large numbers of data providers. These consumers obviously can&#8217;t understand <em>everything</em>, so they determine and document what syntaxes and vocabularies they <em>do</em> understand and expect publishers to follow.</p>

<p>The benefits that publishers get from a particular consumer determines which syntax/vocabulary they use; publishers who are particularly keen to show up prettily within search results will target schema.org whereas those who want to be sharable within Facebook will target OGP. Many publishers will want to target both. There is probably a driver towards eventual convergence:</p>

<ul>
<li>publishers might push back about inserting two lots of very similar data in their pages</li>
<li>consumers might want to include data from publishers who haven&#8217;t specifically targeted them</li>
</ul>

<p>although there&#8217;s likely to be a period where they coexist, much as there was for VHS and Betamax (and <a href="http://en.wikipedia.org/wiki/Video_2000">V2000</a>, I know, dad) during the early days of video players.</p>

<p>As <a href="http://www.jenitennison.com/blog/node/157">I discussed previously</a>, these large-scale consumers will be driven by the data that they find in the wild, in all its messy variety. They get relatively little benefit directly from using a generic <em>syntax</em>, as they are really interested in only a few, pretty generic, <em>vocabularies</em> for which they have hardwired processing. Indirectly, adopting a generic syntax has benefits in that publishers might find it easier to find tools that enable them to generate it, tutorials about how to use it, and feel that they aren&#8217;t being quite as locked in to something proprietary. However, rejecting data that isn&#8217;t marked up properly using that syntax has no benefit for consumers except in so far as it makes them feel that they are being good community members. </p>

<p>This is the pattern we see with schema.org (which accepts microdata but, based on its documentation, won&#8217;t reject data that isn&#8217;t fully compliant with it) and with OGP (which accepts a subset of RDFa but doesn&#8217;t reject data that hasn&#8217;t got prefixes properly bound, for example).</p>

<p>Another point to mention is that there is very little trust in this scenario. The communication between consumers and publishers is very limited, and the consumers will want to protect themselves against accidental or malicious errors that are evident in mismatches between explicit metadata and that which is parsed from the visible content of the page.</p>

<p>The parallels to HTML and browser vendors are very strong in this type of data sharing.</p>

<h3>Small-Scale Consumer-Driven Data Sharing</h3>

<p>A second type of data sharing is again driven by consumers, but this time at a lot smaller and more specialised scale. For legislation.gov.uk, these are services such as <a href="http://www.glin.gov/">GLIN</a>, which is a global legislation registry. Other examples are the recent work that we&#8217;ve done to publish <a href="http://data.gov.uk/organogram">UK Government organograms</a> or <a href="http://countculture.wordpress.com/">Chris Taggart</a>&#8217;s <a href="http://openelectiondata.org/">Open Election Data</a> project. In these cases, there&#8217;s a single, relatively small and specialised consumer and a small number of publishers which are closely coordinated together.</p>

<p>As in the large-scale case, the consumer ultimately determines the syntax/vocabulary that it recognises, and communicates that to the publishers. However, small-scale consumers typically have close coordination with the publishers, which has a number of side-effects:</p>

<ul>
<li>consumers may be more able to both apply pressure to and help publishers to do well in their markup</li>
<li>publishers have the opportunity to feed back directly to the consumer any suggestions that they have about changes to the syntax/vocabulary</li>
<li>publishers are likely to gain an immediate and tangible benefit from their cooperation, such as visualisations of their data that they otherwise wouldn&#8217;t have seen</li>
</ul>

<p>Another noteworthy point about small-scale consumers is that they&#8217;re unlikely to have the engineering capability to create a custom parser for a particular syntax, but will instead want to use something off-the-shelf to extract data from pages and into their own backend systems. This, coupled with the closer coordination with publishers, means that they&#8217;re much more likely to stick to a specification, assuming that the off-the-shelf tools do.</p>

<h3>Publisher-Driven Data Sharing</h3>

<p>The final type of data sharing is driven by publishers. At legislation.gov.uk, we&#8217;re motivated to make our data available for reuse for transparency/accountability reasons (to help citizens understand the law), efficiency reasons (to help parliament and government departments to publish new legislation better) and economic reasons (to foster innovation in legal publishing). We don&#8217;t have any individual consumers in mind when we publish our data, but have found that simply by publishing it well, we foster reuse.</p>

<p>In this case, we as publishers are highly motivated to ensure that the data we publish is easily parsed with something off-the-shelf, since that lowers the barrier for potential consumers. Publishers like us are very likely to have unique, specialised, content and need to use a vocabulary that fits closely to our internal data structures as this lowers implementation cost. Consumers can also trust publishers like us: we simply have no motivation to lie in the data that we provide for reuse.</p>

<h2>Mixed Markup</h2>

<p>As I&#8217;ve outlined above, publishers like legislation.gov.uk need to target several potential consumers at the same time:</p>

<ul>
<li>large-scale consumers such as search engines</li>
<li>small-scale consumers that provide us with a useful service</li>
<li>specialist consumers that are interested specifically in our data</li>
</ul>

<p>We cannot use a single vocabulary for all these different purposes. (Well, we could write our own vocabulary and describe mappings to other vocabularies using RDFS, but search engines wouldn&#8217;t read it.)</p>

<p>We must therefore use a mix of vocabularies:</p>

<ul>
<li>generic vocabularies about things that search engines care about</li>
<li>specialised vocabularies for particular small consumers</li>
<li>site-specific vocabularies for sharing our unique data</li>
</ul>

<p>It&#8217;s repetitive, but it&#8217;s manageable so long as we have a syntax that enables us to say that an item of legislation is a <code>http://scheme.org/CreativeWork</code> and a <code>http://purl.org/dc/dcmitype/Text</code> and a <code>http://www.legislation.gov.uk/def/legislation/Legislation</code> and allows us to give multiple properties the same value.</p>

<p>The way things are going at the moment, we might well end up having to use multiple <em>syntaxes</em> on the same page, as some consumers understand microdata, others consume RDFa, and still others will parse microformats. This leads to more repetition: adding <code>itemprop</code> for microdata, <code>property</code> for RDFa and specialised <code>class</code> attributes for microformats. But worse (much worse), each of the syntaxes uses a different parsing model to create an entity-property-value structure, so not only do we have to learn substantially different markup patterns but our pages quickly become some kind of hideous polyglot mess trying to balance between them.</p>

<h2>Looking Forward</h2>

<p>As I said at the start, I&#8217;m fairly sure that my experience at legislation.gov.uk isn&#8217;t representative of the wider web, but I don&#8217;t have a clear idea about just how unrepresentative it is, in terms of technology use or motivations around data sharing. When I read my twitter stream or blogs, there&#8217;s a massive sampling bias, both in terms of who I follow and what I read, but also about who talks about what they&#8217;re doing. (I&#8217;m reminded of <a href="http://www.codinghorror.com/blog/">Jeff Atwood</a>&#8217;s post on the <a href="http://www.codinghorror.com/blog/2007/11/the-two-types-of-programmers.html">Two Types of Programmers</a>: the vast majority of web developers don&#8217;t make a noise about what they do.)</p>

<p>Taking part in web standardisation today often feels like being part on an ongoing cold war between distinct camps rather than a community working towards common aims. The underlying question seems to be &#8220;who&#8217;s side are you on?&#8221; Every decision and activity is cast as a victory or defeat. Time is wasted on attack and defence, or on raking over past slights and stupidities, rather than on progress. Valid criticism from outside a group cannot be listened to for fear of giving ground, cannot be made within a group where it seems like betrayal.</p>

<p>It is the <a href="http://en.wikipedia.org/wiki/Realistic_conflict_theory#The_Robbers_Cave_Experiment">Robbers Cave Experiment</a> played out in web standards. As a psychologist, I find it fascinating. As a developer, and particularly one who doesn&#8217;t self-identify with any single group, it is frustrating. As a TAG member, trying to work for the longer-term good of the web, it is worrying, because situations of intergroup conflict lead to <a href="http://en.wikipedia.org/wiki/Groupthink">groupthink</a> and non-optimal solutions.</p>

<p>As I described above, a non-optimal outcome seems to be the most likely result of the particular microdata vs RDFa conflict for us at legislation.gov.uk. While I know we are not generally representative, I believe that it will be similarly bad for other developers: publishers, consumers and tool implementers.</p>

<p>This is a problem for all who want to foster data sharing on the web using open standards; it is not one that any one group can fix on their own. It&#8217;s my hope that a balanced task force of individuals with a variety of experience and backgrounds can provide a focus for us all to work together to solve it. If we can&#8217;t, then we have let our prejudice and bias overcome our judgement.</p>
    ]]></content>
  </entry>
  <entry>
    <title>What Do URIs Mean Anyway?</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/159" />
    <id>http://www.jenitennison.com/blog/node/159</id>
    <published>2011-07-05T22:06:33+00:00</published>
    <updated>2011-07-05T22:06:33+00:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="linked data" />
    <category term="uris" />
    <summary type="html"><![CDATA[<p>If you&#8217;ve hung around in linked data circles for any amount of time, you&#8217;ll probably have come across the <a href="http://www.w3.org/wiki/HttpRange14Webography">httpRange-14 issue</a>. This was an issue placed before the <a href="http://www.w3.org/2001/tag/">W3C TAG</a> years and years ago which has become a <a href="http://en.wiktionary.org/wiki/permathread">permathread</a> on semantic web and linked data mailing lists. The basic question (or my interpretation of it) is:</p>

<blockquote>
  <p>Given that URIs can sometimes be used to name things that aren&#8217;t on the web (eg the novel Moby Dick) and sometimes things that are (eg the <a href="http://en.wikipedia.org/wiki/Moby-Dick">Wikipedia page about Moby Dick</a>), how can you tell, for a given URI, how it&#8217;s being used so that you can work out what a statement (say, about its author) means?</p>
</blockquote>
    ]]></summary>
    <content type="html"><![CDATA[<p>If you&#8217;ve hung around in linked data circles for any amount of time, you&#8217;ll probably have come across the <a href="http://www.w3.org/wiki/HttpRange14Webography">httpRange-14 issue</a>. This was an issue placed before the <a href="http://www.w3.org/2001/tag/">W3C TAG</a> years and years ago which has become a <a href="http://en.wiktionary.org/wiki/permathread">permathread</a> on semantic web and linked data mailing lists. The basic question (or my interpretation of it) is:</p>

<blockquote>
  <p>Given that URIs can sometimes be used to name things that aren&#8217;t on the web (eg the novel Moby Dick) and sometimes things that are (eg the <a href="http://en.wikipedia.org/wiki/Moby-Dick">Wikipedia page about Moby Dick</a>), how can you tell, for a given URI, how it&#8217;s being used so that you can work out what a statement (say, about its author) means?</p>
</blockquote>

<!--break-->

<p>One answer is to use a <a href="http://www.jenitennison.com/blog/node/154">hash URI</a> whenever you want to refer to something that doesn&#8217;t live on the web, with the base URI providing information about that thing. For example:</p>

<ul>
<li><code>http://en.wikipedia.org/wiki/Moby-Dick</code> is the URI for the Wikipedia page</li>
<li><code>http://en.wikipedia.org/wiki/Moby-Dick#thing</code> is a URI for the novel itself</li>
</ul>

<p>The problem some people (including me) have with this is that hash URIs are primarily used to indicate portions of a web page, and using them for things that aren&#8217;t page fragments overloads them. It&#8217;s also an inflexible method, because the server isn&#8217;t told what the fragment identifier is, and therefore it can&#8217;t be used as the basis for a redirection, for example.</p>

<p>The <a href="http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html">2005 TAG resolution</a> for people who wanted to use separate non-hash URIs, such as [<em>warning, made-up URIs</em>]</p>

<ul>
<li><code>http://en.wikipedia.org/wiki/Moby-Dick</code> is the URI for the Wikipedia page</li>
<li><code>http://wikipedia.org/thing/Moby-Dick</code> is the URI for the novel itself</li>
</ul>

<p>was:</p>

<ol>
<li>if you get a <code>2XX</code> response when you request a URI, that URI refers to a document (the document that you get back)</li>
<li>if you get a <code>303</code> response when you request a URI, that URI could refer to anything, and the resource you get by following the redirection describes that thing (hence if a URI should refer to something that isn&#8217;t on the web then requests to it should respond with a 303)</li>
<li>if you get a <code>4XX</code> response when you request a URI, that URI could represent anything</li>
</ol>

<p>This leads to the <code>303</code> pattern described for example within <a href="http://www.w3.org/TR/cooluris/#r303gendocument">Cool URIs for the Semantic Web</a>; in the example here, the response to <code>http://wikipedia.org/thing/Moby-Dick</code> would be a 303 redirection to <code>http://en.wikipedia.org/wiki/Moby-Dick</code>.</p>

<p>Six years later, we have a lot of experience about this technique of distinguishing between things that are or are not on the web, and it has a bunch of practical limitations.</p>

<ul>
<li>it requires access to web server configuration (to add <code>303</code> redirections) that make life difficult for people without that level of access</li>
<li>URIs for things that aren&#8217;t on the web always require two round-trips to get hold of information, as the first always responds with a <code>303</code> redirection, which adds server load and slows things down (this is made worse as <code>303</code> responses can&#8217;t be cached &#8212; an oversight in the HTTP spec that I gather is fixed in <a href="http://tools.ietf.org/wg/httpbis/">HTTPbis</a>)</li>
<li>using the <code>303</code> pattern requires a level of knowledge and understanding that is beyond most web developers, particularly if they get no benefit from taking care over their use of URIs (for example, Facebook, schema.org and so on all encourage the use of URIs for non-web things without a word about <code>303</code> redirections)</li>
<li>even people who do have this knowledge and understanding sometimes find it hard to work out whether a particular thing that they want to talk about is a thing-on-the-web or not and therefore whether the use of a <code>303</code> redirection is required</li>
<li>even people who <em>do</em> try to take care in their use of URIs easily make mistakes because we interact with URIs by copy-and-pasting them from browser address bars, and the only URIs that appear there are URIs for things on the web</li>
</ul>

<p>Basically, while the web architectural principles behind the use of <code>303</code> redirections are (arguably!) sound, the collective experience of the past six years indicates that many publishers will not use it because they don&#8217;t know to, because they don&#8217;t care to, because they make mistakes or because they simply can&#8217;t while meeting the other practical constraints of their project.</p>

<p>A number of other approaches have been suggested, before and after the TAG decision, many of which are documented within the draft TAG finding <a href="http://www.w3.org/2001/tag/awwsw/issue57/latest/">Providing and discovering definitions of URIs</a>.</p>

<p>The first observation that I want to make is that many of the objections to the <code>303</code> pattern are about the practicalities of publishers using it. Therefore, any suggestions to provide an alternative technique that involves</p>

<ul>
<li>introducing new URI schemes (eg <code>tdb</code>)</li>
<li>introducing new HTTP methods (eg <code>MGET</code>)</li>
<li>introducing new HTTP status codes (eg <code>209</code>)</li>
<li>using particular HTTP headers (eg <code>Link</code> or <code>Content-Location</code> or other specialist headers)</li>
</ul>

<p>are not going to be widely used for exactly the same reason. I&#8217;m not at all persuaded that it&#8217;s worth spending time developing them.</p>

<p>My second observation is that there are three questions that are being conflated and we might make more progress if we separated them:</p>

<ul>
<li>Must publishers provide separate URIs for things-on-the-web and the non-web-things that they describe?</li>
<li>How can you tell what a reference to a particular URI within a piece of data (eg an RDF statement) means?</li>
<li>How can you get from a URI to information about whatever that URI refers to?</li>
</ul>

<h2>Ambiguity in URIs</h2>

<p>Both the hash URI pattern and the <code>303</code> pattern make the assumption that you need to have separate URIs for things that are not on the web (eg books) and documents on the web about them (eg pages about books). This is useful because it enables people to make separate statements about the author of a book:</p>

<pre><code>&lt;http://wikipedia.org/thing/Moby-Dick&gt; 
  dct:creator &lt;http://wikipedia.org/thing/Herman_Melville&gt; ;
  .
</code></pre>

<p>from the authors of the Wikipedia page about that book:</p>

<pre><code>&lt;http://en.wikipedia.org/wiki/Moby-Dick&gt;
  dct:creator 
    &lt;http://wikipedia.org/user/Aristophanes68&gt; ,
    &lt;http://wikipedia.org/user/SporkBot&gt; ,
    &lt;http://wikipedia.org/user/Curb_Chain&gt; ,
    ...
  .
</code></pre>

<p>If we only have the URI <code>http://en.wikipedia.org/wiki/Moby-Dick</code> then we run into difficulties interpreting statements made about that URI, and indeed different people might use the URI in different ways, or make some statements that use the URI to mean the novel and some to mean the Wikipedia page.</p>

<p>So there are good reasons to have two separate URIs in these cases.</p>

<p>But the fact is that many publishers currently have a one-URI-fits-all policy. And even if they don&#8217;t, people reusing those URIs will often make mistakes and use the wrong one. It would be nice if we could make the world see that this leads to all sorts of logical problems for the Semantic Web, but I just can&#8217;t see that happening.</p>

<p>This situation reminds me of one of the central innovations that the web had over previous hypertext systems. There is a <a href="http://www.w3.org/2006/09dc-aus/swpf#(7">great slide</a>) by <a href="http://en.wikipedia.org/wiki/Dan_Connolly">Dan Connolly</a> which roughly looks like:</p>

<blockquote>
  <table border="1">
    <tr>
      <th></th>
      <th>Web</th>
      <th>Semantic Web</th>
    </tr>
    <tr>
      <th>Traditional Design</th>
      <td style="text-align: center">hypertext</td>
      <td style="text-align: center">logic/database</td>
    </tr>
    <tr>
      <th>+</th>
      <td colspan="2" style="text-align: center">URIs</td>
    </tr>
    <tr>
      <th>-</th>
      <td style="text-align: center">link integrity</td>
      <td style="text-align: center">?</td>
    </tr>
    <tr>
      <th>=</th>
      <td colspan="2" style="text-align: center">viral growth</td>
    </tr>
  </table>
  <p>Are there parts of traditional logic and databases that, if we set them aside, will result in viral growth of the Semantic Web?</p>
</blockquote>

<p>(By the way, in case my replication of this slide is interpreted incorrectly: I&#8217;m certainly not implying that viral growth of the Semantic Web as an end in itself, though I would like to see viral growth in data sharing.)</p>

<p>Dropping the requirement for link integrity, coping with the fact that sometimes links would break, was what made the web work. It would have been simply impossible to build the web as a decentralised system if there had been a requirement for links to always work.</p>

<p>Of course that doesn&#8217;t mean that we <em>like it</em> when links get broken. There&#8217;s oodles of best practice advice out there on making sure that you retain support for old URIs if you change your web space; we have backup systems in place in the form of web archives so we can work out what was once at the end of a particular URI; and the resolvability of links is something a linter will check about your website.</p>

<p>So it&#8217;s not that when he developed the web TimBL rejected entirely the very concept of link integrity, it&#8217;s that he recognised that we have to work with the imperfection of the real world. Links break. HTTP copes. Browsers cope. People cope.</p>

<p>The imperfection of the real world as it applies to linked data is that <a href="http://www.ibiblio.org/hhalpin/homepage/publications/indefenseofambiguity.html">URIs will be used in ambiguous ways</a>. We might not like it; we might write best practice documents that encourage people to have separate URIs for web-thing and non-web-thing, develop tools that help people detect when they&#8217;ve used the wrong URI, and so on. But it will still happen, and in my opinion we need to work out how to cope.</p>

<p>In fact, ambiguity in URIs goes much further than just a confusion between the Wikipedia page about Moby Dick and the novel Moby Dick itself. URIs are names, and names are used by different people to mean different things. The same URI might end up meaning:</p>

<ul>
<li>the Wikipedia page about Moby Dick</li>
<li>the novel Moby Dick</li>
<li>the whale Moby Dick</li>
<li>the story Moby Dick (originally a novel but later adapted as a film)</li>
<li><em>and so on</em></li>
</ul>

<p>Even if the publisher provides a clear and unambiguous definition about what the URI <code>http://en.wikipedia.org/wiki/Moby-Dick</code> means, other people will use it to mean something different because it&#8217;s close enough for what they want to say.</p>

<p>So I think the answer to the first question I posed &#8212; &#8220;Must publishers provide separate URIs for things-on-the-web and the non-web-things that they describe?&#8221; &#8212; has to be &#8220;No, though it is good practice to.&#8221; We can fight against ambiguity, but we have to accept that we cannot win.</p>

<h2>Disambiguating Statements</h2>

<p>As discussed above, in a perfect world, we would have separate URIs for things-on-the-web and non-web-things and any data that we published about Moby Dick would use the URI for the Wikipedia page to talk about things like the licence for that information, or how the information was created (its provenance), and the URI for the novel to talk about things like the licence for the novel and what characters appeared in it.</p>

<p>But the world is not perfect, and we are going to end up with situations where the same URI is used to refer to a whole range of different things. How do we cope?</p>

<p>Well, first let me say that I don&#8217;t see people merging data together willy-nilly and hoping to get something useful out of it. URIs give us connection points and RDF gives us a flexible data model, which means that merging data can be easier than the kinds of custom merging that you have to do with CSV and JSON, but I don&#8217;t think it can ever remove entirely the requirement for curation. We want to ensure that the need for intervention in merging two datasets is kept to a minimum, but we can&#8217;t expect it to be entirely removed.</p>

<p>So with that in mind, there are at least three techniques that can be used to get useful data out of a world in which the same URI is used to mean different things.</p>

<h3>One-Step-Removed Properties</h3>

<p>The first technique is to interpret particular properties as describing a one-or-more-step-removed relationship between a resource and a value. For example, the <code>bib:author</code> and <code>dct:creator</code> properties would be defined such that the RDF statements</p>

<pre><code>&lt;http://en.wikipedia.org/wiki/Moby-Dick&gt;
  bib:author &lt;http://en.wikipedia.org/wiki/Herman_Melville&gt; ;
  dct:creator &lt;http://en.wikipedia.org/wiki/User:Aristophanes68&gt; ;
  .
</code></pre>

<p>would be interpreted as saying</p>

<blockquote>
  <p>The <strong>topic of the page</strong> <code>http://en.wikipedia.org/wiki/Moby-Dick</code> was authored by the <strong>topic of the page</strong> <code>http://en.wikipedia.org/wiki/Herman_Melville</code>. The creator of the page <code>http://en.wikipedia.org/wiki/Moby-Dick</code> is the <strong>topic of the page</strong> <code>http://en.wikipedia.org/wiki/User:Aristophanes68</code>.</p>
</blockquote>

<p>The biggest problem with the global application of this approach is that there are a lot of existing properties defined in vocabularies such as FOAF or Dublin Core that aren&#8217;t defined as one-step-removed properties. One publisher might use <code>dct:creator</code> to link to &#8220;a page describing the creator of this page&#8221; and another might use it to point directly to a (non-web-thing) URI for the creator of the page. So practically, this approach requires the interpretation of properties to be done on a dataset-by-dataset basis. Which leads onto the next approach.</p>

<h3>Named Graphs</h3>

<p>A second technique would be to make the assumption that within a single dataset, a single URI has a single meaning, but that the meaning may differ between datasets. I suspect that this is true even when publishers attempt to take care about which URI they use, because, like names, the meaning of a URI is slightly different depending on its use.</p>

<p>Re-users of data need to work out whether the way URIs are used in one dataset is close enough to the way they are used in another dataset, to ascertain whether it&#8217;s appropriate to simply merge the datasets or whether something slightly more complicated needs to be done to bring the datasets together.</p>

<p>The problem with this approach is that it raises the barrier to joining together graphs: you can&#8217;t just bung the data into a triplestore and perform queries on it, you have to work out some kind of mapping between the datasets up front.</p>

<h3>Duck Typing</h3>

<p>The final technique that I&#8217;ll talk about here is to say that different applications need to access different properties, and can ignore any properties that don&#8217;t fit with how they want to use the data. It is relatively rarely useful to have generic RDF viewers; people (generally) build applications to answer questions and perform tasks, not to just browse around data.</p>

<p>For example, if a single dataset were to contain:</p>

<pre><code>&lt;http://en.wikipedia.org/wiki/Moby-Dick&gt;
  a bib:Book ;
  bib:author &lt;http://en.wikipedia.org/wiki/Herman_Melville&gt; ;
  a foaf:Document ;
  dct:creator &lt;http://en.wikipedia.org/wiki/User:Aristophanes68&gt; ;
  .
</code></pre>

<p>then an application that was interested in gathering data about books would only care about the fact that <code>http://en.wikipedia.org/wiki/Moby-Dick</code> was a book with an author of <code>http://en.wikipedia.org/wiki/Herman_Melville</code> and wouldn&#8217;t care about the FOAF or Dublin Core classes or properties associated with the URI. An application that was interested in gathering information about the authorship of documents on the web, on the other hand, might look for the <code>foaf:Document</code> class and Dublin Core properties and ignore everything else.</p>

<p>To me, this approach seems the most promising way of retaining the core benefits of RDF. It seems more robust in the face of user error than the idea of defining one-step-removed properties, and retains the ease of mashing together data from different sources in a way that you wouldn&#8217;t get if you had to think about the URI usage within each of the datasets that you want to bring together.</p>

<h2>Locating Data From URIs</h2>

<p>And so we get to the final question: how should people be able to get from a URI to information about whatever the URI refers to?</p>

<p>I&#8217;ve discussed above how I think distinguishing between things-on-the-web and non-web-things has to be seen as a best practice. I think we should continue to recommend the <code>303</code> or hash URI methods as the best practice for accessing data from a URI. My reason for this is that introducing yet another method will just makes it harder for publishers to know which method to use when, plus I don&#8217;t want to see people who have adopted these techniques in good faith being told that they were doing the wrong thing all along. What I&#8217;d like to aim to do is to find a way of fitting these methods into a larger approach.</p>

<p>I also recognise the argument that articulating the relationships between on-the-web and not-on-the-web resources purely through HTTP responses isn&#8217;t ideal. It&#8217;s useful to have explicit links between resources within the data itself. Within the linked data work that I&#8217;ve done for <code>data.gov.uk</code> I&#8217;ve tried to adopt a pattern of explicitly using <code>foaf:primaryTopic</code>, <code>foaf:primaryTopicOf</code> and <code>foaf:page</code> to link together the different resources. Other people have suggested the <a href="http://www.w3.org/2007/05/powder-s#describedby"><code>wdrs:describedby</code></a> property for pointers from a resource to information about that resource; <code>rdfs:isDefinedBy</code> performs a similar function for classes and properties within RDFS.</p>

<p>It would be nice to have one defined property or set of properties to describe these relationships, but we have to recognise that not everyone will use them, so the approach we take has to work when these links aren&#8217;t present. The majority of people and sites are going to start off by publishing data about something at a single URI, and simply return data about that thing (a <code>200</code> response) when the URI is requested. If they then progress to wanting to have separate URIs for that thing and the page about the thing, or indeed to disambiguate the URI that they&#8217;ve used in some other way, we need to make it easy for them to do so.</p>

<p>I think we need two properties: <code>eg:describedBy</code> and <code>eg:couldBe</code>. <code>eg:describedBy</code> describes the link between a resource (of any type) and a document that describes it; <code>eg:couldBe</code> is a disambiguation link that points from a URI to other possible, more precise, URIs.</p>

<p>Then I think we need some rules along the lines of (I don&#8217;t pretend these are entirely worked out):</p>

<ul>
<li>if you get a <code>303</code> response redirecting to <code>U'</code> when you fetch a URI <code>U</code> then behave as if the response from <code>U'</code> included the triple <code>U eg:describedBy U'</code></li>
<li>if the URI <code>U</code> is a hash URI whose base URI is <code>U'</code> then behave as if the response from <code>U'</code> included the triple <code>U eg:describedBy U'</code></li>
<li>if you get a <code>2XX</code> response in response to a URI <code>U</code> then:
<ul><li>if there are multiple triples that match the pattern <code>U eg:describedBy ?page</code> then assume that the document you have is <code>U'</code> where <code>U'</code> <code>eg:couldBe</code> any of the <code>?page</code>s</li>
<li>otherwise, if there is a single triple that matches the pattern <code>U eg:describedBy ?page</code> then assume that the document that you have is <code>?page</code> and it is about <code>U</code> (along with other things, possibly); statements about <code>?page</code> might include information about the licence or provenance of the returned document</li>
<li>if there are any triples that match the pattern <code>?thing eg:describedBy U</code> then assume that the document you have is <code>U</code> and it is about (possibly multiple) <code>?thing</code>s</li>
<li>otherwise, behave as if there is a triple <code>U eg:describedBy U</code>; in this case, <code>U</code> is being used in an ambiguous way</li></ul></li>
</ul>

<p>We could go further and say:</p>

<ul>
<li>if there are two triples that match the pattern <code>U eg:couldBe ?page . ?thing eg:describedBy ?page</code> then assume that the document you have is <code>?page</code> and it is about <code>?thing</code></li>
<li>if there are two triples that match the pattern <code>U eg:couldBe ?thing . ?thing eg:describedBy ?page</code> then assume that the document you have is <code>?page</code> and it is about <code>?thing</code></li>
</ul>

<p>This way, if someone starts off using <code>U</code> in an ambiguous way, or to mean only the page or only the thing, they can later add <code>eg:describedBy</code> and <code>eg:couldBe</code> statements to disambiguate and add information about the page or thing the page describes.</p>

<p>It&#8217;s worth bearing in mind that we shouldn&#8217;t just be concerned about locating information about things that aren&#8217;t on the web, but about things that <em>are</em> on the web but that cannot have metadata embedded within them. For example, how do we discover the licence associated with a particular image? Although there are methods of embedding metadata within image and other binary formats, such as <a href="http://en.wikipedia.org/wiki/Extensible_Metadata_Platform">XMP</a>, it&#8217;s still useful to be able to locate metadata about images based on their URI.</p>

<p>With a scheme such as that described above, publishers that used content negotiation to return some data about the image in another format could use <code>eg:describedBy</code> to indicate that the returned document is about the image (or set of images in different formats).</p>

<h2>Summary</h2>

<p>The summary of my thinking is:</p>

<ul>
<li>we should learn to cope with ambiguity in URIs</li>
<li>we should not constrain how applications manage that ambiguity, though duck typing seems the most promising approach to me</li>
<li>we should define some specific properties that can be used to disambiguate URIs, describe their defaults with <code>303</code>s and hash URIs and provide an easy upgrade path as publishers choose to add more specificity</li>
</ul>

<p>The key will be how we find practical ways to cope with the real, imperfect, fuzzy web of data while providing an evolutionary path to greater clarity and specificity that publishers can take when they see the benefit of doing so.</p>
    ]]></content>
  </entry>
  <entry>
    <title>TAG F2F, June 2011</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/158" />
    <id>http://www.jenitennison.com/blog/node/158</id>
    <published>2011-06-17T10:44:12+00:00</published>
    <updated>2011-06-17T10:44:12+00:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="html5" />
    <category term="linked data" />
    <category term="microdata" />
    <category term="rdfa" />
    <category term="tag" />
    <category term="uris" />
    <category term="web" />
    <summary type="html"><![CDATA[<p>As you may know, I accepted an appointment to the <a href="http://www.w3.org/2001/tag/">W3C&#8217;s Technical Architecture Group</a> earlier this year. Last week was the first face-to-face meeting that I attended, hosted in the <a href="http://en.wikipedia.org/wiki/Ray_and_Maria_Stata_Center">Stata Center</a> at MIT. As you can tell from the <a href="http://www.w3.org/2001/tag/2011/06/06-agenda">agenda</a> (which was in fact revised as we went along) it was a packed three days.</p>

<p>What I intend to do here is to briefly report on the major areas that we discussed and give a tiny bit of my own personal take on them. In no way should any of what I write here be judged as revealing the official opinion of the TAG, it&#8217;s just me saying what I think, and I&#8217;m not going to go into anything in depth because they&#8217;re all incredibly gnarly and contentious topics and I&#8217;d not only be here all year but also end up in a tar pit.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>As you may know, I accepted an appointment to the <a href="http://www.w3.org/2001/tag/">W3C&#8217;s Technical Architecture Group</a> earlier this year. Last week was the first face-to-face meeting that I attended, hosted in the <a href="http://en.wikipedia.org/wiki/Ray_and_Maria_Stata_Center">Stata Center</a> at MIT. As you can tell from the <a href="http://www.w3.org/2001/tag/2011/06/06-agenda">agenda</a> (which was in fact revised as we went along) it was a packed three days.</p>

<p>What I intend to do here is to briefly report on the major areas that we discussed and give a tiny bit of my own personal take on them. In no way should any of what I write here be judged as revealing the official opinion of the TAG, it&#8217;s just me saying what I think, and I&#8217;m not going to go into anything in depth because they&#8217;re all incredibly gnarly and contentious topics and I&#8217;d not only be here all year but also end up in a tar pit.</p>

<!--break-->

<h2>Role of the TAG</h2>

<p>Usefully for me as a newcomer, our first session was about the ongoing role of the TAG. The TAG occupies a unique position within the W3C. According to its <a href="http://www.w3.org/2004/10/27-tag-charter.html">charter</a> it was set up</p>

<blockquote>
  <p>To improve the effectiveness of Working Groups, to reduce misunderstandings and overlapping work, and to improve the consistency of Web technologies developed inside and outside W3C</p>
</blockquote>

<p>The TAG ultimately has three routes to do this:</p>

<ol>
<li>by providing specific advice on issues that are brought to its attention</li>
<li>by writing documents on basic web architecture principles that go through community review, particularly through the general review of the W3C standards track and become Recommendations</li>
<li>by advising the W3C Director (Tim Berners-Lee) about what he should do on the extremely rare occasions when there are issues that he is supposed to adjudicate on</li>
</ol>

<p>In none of these cases is there anything that binds the people receiving the advice of the TAG, or reading Findings or Recommendations made by the TAG, to accept them or do anything about them. The power and authority of the TAG depends solely on the quality and utility of its arguments, which is how it should be in my opinion.</p>

<h2>Client-Side Application State</h2>

<p>The first technical session was about <a href="http://www.w3.org/2001/tag/2011/06/06-agenda#clientState">client-side application state</a> and was a review of the <a href="http://www.w3.org/2001/tag/doc/IdentifyingApplicationState-20110515.html">Identifying Application State draft</a> that <a href="http://en.wikipedia.org/wiki/T._V._Raman">T.V. Raman</a> began before he left the TAG and that <a href="http://www.linkedin.com/pub/ashok-malhotra/4/675/6a2">Ashok Malhotra</a> has been working on since. This should in the next few months or so be published as a TAG Finding (though it is currently on the Recommendation track).</p>

<p>This work is essentially about documenting the different ways in which you can identify application state within a URI, why that&#8217;s a useful thing to do, and some of the pitfalls of using <a href="http://www.jenitennison.com/blog/node/154">hash URIs</a> to do so. Most of the discussion was about details to do with wording within the document. One thing I thought particularly interesting was the point that URI-based application state is relevant in all &#8216;active content&#8217;, not just in HTML; for example, scripting in SVG or in PDFs bring the same considerations.</p>

<h2>Buffer Bloat</h2>

<p>Over lunch on Monday we listened to and discussed a presentation by <a href="http://en.wikipedia.org/wiki/Jim_Gettys">Jim Gettys</a> on <a href="http://www.w3.org/2001/tag/2011/06/06-agenda#bufferbloat">buffer bloat</a>. Basically (and all the errors here are introduced by me), TCP/IP is designed to route around network blockages, but it can only do so if it detects them quickly. When you have big buffers in place, as in the case of all modern operating systems and hardware, blockages aren&#8217;t detected quickly; they&#8217;re only detected when the buffers fill up. Then buffers empty and the data has to be sent again. The net result is that connections get really slow, not just for upload or download but for both, not just for you but for everyone using the network.</p>

<p>Jim talked about how this is exacerbated by the large amount of web traffic and the design of HTTP, particularly the lack of use of HTTP pipelining (whereby several HTTP requests and responses are sent over one long-term connection), because it leads to lots of small messages which can&#8217;t be handled effectively. There&#8217;s lots more about this <a href="http://gettys.wordpress.com/">on his blog</a>.</p>

<p>Jim also talked about the failure of certificate authorities and how we should be looking at distributed protocols using digitally signed data, pointing us in particular to <a href="http://www.ccnx.org/">CCNx</a>.</p>

<h2>Fragment ID Semantics</h2>

<p>First thing Tuesday was a session that I led on <a href="http://www.w3.org/2001/tag/2011/06/06-agenda.html#mimefrag">fragids</a>, in particular the problems that are arising out of the mime type registration of +xml types (<a href="http://www.w3.org/2006/02/son-of-3023/draft-murata-kohn-lilley-xml-04.html#frag">3023bis</a>) clashing with those that are used for, say, <a href="http://www.w3.org/TR/2011/WD-media-frags-20110317/">images</a>, and what happens when these come together in something like SVG.</p>

<p>The same issues arise whenever you have documents with types that &#8216;inherit&#8217; fragid semantics from two directions. For example, XHTML documents are XML documents, so constraints on +xml mean you shouldn&#8217;t use interpreted fragids (eg hash-bangs) on them, but they are also &#8216;active content&#8217; which makes interpreted fragids useful. Similarly, in linked data you shouldn&#8217;t really use a hash URI to mean a Person with a primary resource that provides as a response an XML document with embedded RDFa, because according to XML fragid semantics, such a URI should point to an XML element.</p>

<p>Basically the use of fragids has grown markedly outside their original scope and these situations aren&#8217;t really covered in the specs. I am now tasked to create a document that describes the issues and suggests ways forward. So that will be fun.</p>

<h2>Telcon with IAB</h2>

<p>The second session on Tuesday was a telcon with the <a href="http://www.iab.org/">IAB</a> which has a similar role within the <a href="http://www.ietf.org/">IETF</a> as the TAG does within the W3C. This was a bit of a &#8216;getting to know you&#8217; session, covering the work of the two groups on:</p>

<ul>
<li>versioning and extensibility</li>
<li>security</li>
<li>privacy, including Do Not Track</li>
</ul>

<p>and talking about opportunities to meet and work together on various topics like these.</p>

<h2>URI Definition Discovery and Metadata Architecture</h2>

<p>The <a href="http://www.w3.org/2001/tag/2011/06/06-agenda#metadata">afternoon session on Tuesday</a> was spent on <a href="http://mumble.net/~jar/">Jonathan Rees&#8217;s</a> work on the <a href="http://www.w3.org/wiki/AwwswHome">Architecture of the World Wide Semantic Web</a>, which covers, amongst other things, what people in semantic web circles call <a href="http://www.w3.org/wiki/HttpRange14Webography">httpRange-14</a>. At core, this is about the kinds of URIs we can use to refer to real-world things, what the response to HTTP requests on those URIs should be, and how we find out information about these resources.</p>

<p>Jonathan has put together a document called <a href="http://www.w3.org/2001/tag/awwsw/issue57/20110531/">Providing and discovering definitions of URIs</a> which covers the various ways that have been suggested over time, including the 303 method that was <a href="http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039">recommended by the TAG in 2005</a> and methods that have been suggested by various people since that time.</p>

<p>It&#8217;s clear that the 303 method has lots of practical shortcomings for people deploying linked data, and isn&#8217;t the way in which URIs are commonly used by Facebook and schema.org, who don&#8217;t currently care about using separate URIs for documents and the things those documents are about. We discussed these alongside concerns that we continue to support people who want to do things like describe the license or provenance of a document (as well as the facts that it contains) and don&#8217;t introduce anything that is incompatible with the ways in which people who have been following recommended practice are publishing their linked data. The general mood was that we need to support some kind of &#8216;punning&#8217;, whereby a single URI could be used to refer to both a document and a real-world thing, with different properties being assigned to different &#8216;views&#8217; of that resource.</p>

<p>Jonathan is going to continue to work on the draft, incorporating some other possible approaches. It&#8217;s a <a href="http://lists.w3.org/Archives/Public/public-lod/2011Jun/0186.html">very contentious topic within the linked data community</a>. My opinion is while we need to provide some &#8216;good practice&#8217; guides for linked data publishers, we can&#8217;t just stick to a theoretical ideal that experience has shown not to be practical. What I&#8217;d hope is that the TAG can help to pull together the various arguments for and against different options, and document whatever approach the wider community supports.</p>

<h2>Can publication of hyperlinks cause copyright infringment?</h2>

<p>The <a href="http://www.w3.org/2001/tag/2011/06/06-agenda.html#linkcopyright">first session on Wednesday</a> was another session that I led, discussing the <a href="http://www.w3.org/2001/tag/doc/publishingAndLinkingOnTheWeb-2011-05-28">Publishing and Linking on the Web draft</a> that <a href="http://torgo.com/blog/">Dan Appelquist</a> and I have been working on.</p>

<p>The aim of this document is to explain the tensions between terms that are commonly used in legal documents such as &#8220;possession&#8221;, &#8220;adaptation&#8221; and &#8220;distribution&#8221; and the way that publication works on the web, in which multiple servers may have copies of the same document (because they cache copies to make the &#8216;net go faster), automated agents may make changes to those documents (such as compressing or resizing documents, or merging Javascript) and people may refer others to those documents through linking.</p>

<p>We&#8217;re particularly keen to argue that linking to something is not the same thing as distributing it. The web&#8217;s power arises through its links, so it&#8217;s important that people are able to link to something without being worried about what happens when/if the domain they link to is taken over by something illegal.</p>

<p>Dan and I are going to continue to work on this document in response to various suggestions around organisation and terminology, with a view to getting some &#8216;friendly legal experts&#8217; to look it over and then seeking wider review. The intention is for it to eventually become a Recommendation as this will give greater weight to it as a document for a legal audience.</p>

<h2>API Minimisation and Client-Side Storage</h2>

<p>There were then a couple of short sessions.</p>

<p>Dan talked about <a href="http://www.w3.org/2001/tag/2011/06/06-agenda.html#apis">API Minimisation</a>, which is the design principle that to increase privacy we should design APIs that enable people requesting information to say exactly what information they need, and return only that rather that everything known about a think. Dan&#8217;s put together an <a href="http://www.w3.org/2001/tag/doc/APIMinimization-20100605.html">draft</a> and should be calling for review for it soon.</p>

<p>Ashok then led discussion on <a href="http://www.w3.org/2001/tag/2011/06/06-agenda.html#webAppStorage">client-side storage</a> and we brainstormed around some of the architectural/design issues about which we might want to write if we were to put together a document. This work is at a very early stage.</p>

<h2>TAG Priorities</h2>

<p>After lunch, we had a <a href="http://www.w3.org/2001/tag/2011/06/06-agenda#priorities">session on TAG priorities</a> where we discussed which of the various pieces of work that we&#8217;re doing should receive the most attention and had a quick review of who is doing what within the TAG.</p>

<p>Our basic problem is that a lot of this stuff feels quite urgent, and we want to be responsive, but with only 5-6 of us &#8220;actively involved&#8221; (which means 1 day/week) in drafting documents, and other TAG duties taking up our time, it feels like we have taken on too much work. Our focus for the next little while is going to be on responding to issues where our lack of response might either hold people up or cause longer term problems (for example the publication of contradictory mime type definitions), which means things like the document on publishing and linking on the web will need to bubble in the background rather than being the focus of activity.</p>

<h2>HTML5 Last Call</h2>

<p>Our <a href="http://www.w3.org/2001/tag/2011/06/06-agenda.html#htmlreview">final session</a>, for which we were joined by <a href="http://www.w3.org/People/LeHegaret/">Philippe Le Hégaret</a>, was on the HTML5 Last Call documents. The TAG has raised various issues over the course of HTML5 development and want to follow up on how those issues have been addressed in the documents. Our role means that we&#8217;re responsible for making sure there&#8217;s consistency with other specifications, and that there isn&#8217;t anything that seems like it&#8217;s going to cause problems in the long term.</p>

<p>The part that we spent most discussion time on was the relationship between <a href="http://www.w3.org/TR/2011/WD-microdata-20110525/">Microdata</a> and <a href="http://www.w3.org/TR/2011/WD-rdfa-in-html-20110525/">RDFa</a>. We talked about the precedents for having two specifications that do very similar things but with different approaches, such as CSS and XSL, and how this isn&#8217;t necessarily a bad thing so long as they don&#8217;t contradict each other and people can move between them easily (because they have the same conceptual foundations).</p>

<p>I&#8217;m going to save my opinion on this topic for another post. Suffice it to say that microdata and RDFa as currently specified don&#8217;t work well with each other but it&#8217;s not at all clear what the best path forward is. The TAG decided to recommend that the W3C set up a Task Force to look at what the best way forward might be.</p>

<h2>Final Words</h2>

<p>If you want links to the minutes of the TAG F2F, they&#8217;re available within the agenda or on separate pages for:</p>

<ul>
<li><a href="http://www.w3.org/2001/tag/2011/06/06-minutes">Monday 6th June</a></li>
<li><a href="http://www.w3.org/2001/tag/2011/06/07-minutes">Tuesday 7th June</a></li>
<li><a href="http://www.w3.org/2001/tag/2011/06/08-minutes">Wednesday 8th June</a></li>
</ul>

<p>If you have anything to say on any of these topics, please send email to the <a href="mailto:www-tag@w3.org">TAG mailing list</a>. Or you could comment here or <a href="mailto:jeni@jenitennison.com">email me directly</a> if you like. Which leads me on to talking about what I&#8217;d like to do in the TAG.</p>

<p>One of the guidance notes for new members to the TAG says:</p>

<blockquote>
  <p>TAG members are elected or appointed not to represent their individual member organizations, but the Web community as a whole. We try to take that responsibility very seriously.</p>
</blockquote>

<p>I do take that responsibility seriously. Web architecture has to be a combination of practice and theory, balancing approaches that work right now with a desire to not break anything long term. I do practical work developing web applications with HTML, CSS, Javascript, XML, RDF, XSLT, XQuery and so on and so on every day, but I know I don&#8217;t see all the difficult corners of the open web standard space: no one person can.</p>

<p>I can listen though, so that&#8217;s what I will try to do: listen, digest, reflect and act.</p>

<p>But I have limited resources. Unlike most of the members of the TAG, I am not employed by a large organisation that pays me for time I take on the work that I do for the TAG. The W3C kindly paid for my flights to and from F2Fs, but not hotels or expenses. I wouldn&#8217;t have taken this on if I wasn&#8217;t prepared to shoulder the financial burden, but if there is anyone out there who might sponsor my participation, I&#8217;d love to hear from you.</p>
    ]]></content>
  </entry>
  <entry>
    <title>Schema.org and the Responsibility of Monopoly</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/157" />
    <id>http://www.jenitennison.com/blog/node/157</id>
    <published>2011-06-12T19:23:27+00:00</published>
    <updated>2011-06-30T16:33:53+00:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="google" />
    <category term="schema.org" />
    <summary type="html"><![CDATA[<p><em>Update: This post has been translated to Italian on the <a href="http://www.linkedopendata.it/schema-org-e-le-responsabilita-dei-monopolisti">Linked Open Data Italia</a> blog.</em></p>

<p>In this post about <a href="http://schema.org">schema.org</a> I&#8217;m going to speculate about the economic drivers that affect how search engines use structured metadata on the web. I discuss how the technical features and choices within schema.org may cause wider long-term harm, and the role of open standards as a method for responsible companies to avoid the pitfalls of monopoly.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p><em>Update: This post has been translated to Italian on the <a href="http://www.linkedopendata.it/schema-org-e-le-responsabilita-dei-monopolisti">Linked Open Data Italia</a> blog.</em></p>

<p>In this post about <a href="http://schema.org">schema.org</a> I&#8217;m going to speculate about the economic drivers that affect how search engines use structured metadata on the web. I discuss how the technical features and choices within schema.org may cause wider long-term harm, and the role of open standards as a method for responsible companies to avoid the pitfalls of monopoly.</p>

<!--break-->

<p>Before I launch into this, two things. The first is the standard disclaimer that I am speaking purely for myself. The second is that I recommend that you read <a href="http://en.wikipedia.org/wiki/Rufus_Pollock">Rufus Pollock</a>&#8217;s paper <a href="http://rufuspollock.org/economics/papers/search_engines.pdf">Is Google the Next Microsoft? Competition, Welfare and Regulation in Internet Search</a>. In it, he demonstrates how the search engine market will naturally tend to monopoly, and that because of the economic drivers in the search engine market, those monopolies will generally under-perform in terms of social good. In other words, if you are a <a href="http://www.guardian.co.uk/commentisfree/2011/jun/02/google-claws-web-dominance-challenged">search engine monopolist</a>, you have to take positive steps to <a href="http://en.wikipedia.org/wiki/Don't_be_evil">not be evil</a> because all the market drivers force you in that direction.</p>

<p>Clearly schema.org is a significant move by our current search engine monopolist, Google, on several fronts and while I don&#8217;t pretend to have any particular insight, it&#8217;s fun to speculate about how schema.org fits with their wider goals, the extent to which they are avoiding monopolist traps, and what it might mean for the web in general.</p>

<p>Search engines serve their customers: advertisers. So why the interest in structured metadata? Structured metadata benefits search engines in at least three ways:</p>

<ol>
<li>presenting richer information increases the utility of the search engine for users, thus attracting more of them (more users => more attention overall => more money from advertisers)</li>
<li>presenting richer information keeps users on the site for longer because search engines can present relevant information directly rather than users navigating away from the search engine&#8217;s site (more time on the site => more attention from individual users => more money from advertisers)</li>
<li>analysing social metadata extracted from web pages, such as <a href="http://schema.org/Person">social graphs</a> and individual interests can aid the targeting of adverts to particular users (more targeted adverts => more effective adverts => more money from advertisers)</li>
</ol>

<p>Clearly there&#8217;s a lot of potential for search engines in structured metadata. Their difficulty is in getting people to use it such that they don&#8217;t lie, don&#8217;t find it too much hassle, and don&#8217;t make too many mistakes, because that way lies <a href="http://www.well.com/~doctorow/metacrap.htm">metacrap</a>.</p>

<p>So the drivers for search engines are towards making it as easy as it could possibly be for publishers to embed metadata in their pages. It is also in their interest to ensure that the information that they extract is based as much as possible on the visible content of the page as this reduces the opportunity for people to lie (or make honest mistakes) by providing one value in the metadata and another in the content of the page. And it is in their interest to correct for errors when publishers make them.</p>

<p>The trap is that blindly pursuing these interests can also lead to anti-competitive behaviour.</p>

<h2>Raising Barriers to Entry</h2>

<p>The <a href="http://schema.org/docs/datamodel.html">Conformance section of the Data Model page</a> says (my emphasis):</p>

<blockquote>
  <p>While we would like all the markup we get to follow the schema, in practice, we expect a lot of data that does not. We expect schema.org properties to be used with new types. We also expect that often, where we expect a property value of type Person, Place, Organization or some other subClassOf Thing, we will get a text string. <strong>In the spirit of &#8220;some data is better than none&#8221;, we will accept this markup and do the best we can.</strong></p>
</blockquote>

<p>Schema.org contains multiple examples of properties whose values should be interpreted as being of a particular type, such as dates, times, numbers, durations, and specialised micro-syntaxes such as for an <a href="http://schema.org/EventVenue"><code>EventVenue</code>&#8217;s</a> <code>openingHours</code> property or an <a href="http://schema.org/Article"><code>Article</code>&#8217;s</a> <code>interactionCount</code> property which (from the examples, if not the text) expects a syntax like <code>UserTweets:65</code>. These seem clear enough.</p>

<p>However, looking in more detail at the examples, it seems that even putting aside the option of providing a string when the schema expects an item, there are a variety of ways of expressing values for properties in schema.org. There are examples where <a href="http://schema.org/Offer">numbers contain commas or are preceded by currency signs</a>. <a href="http://schema.org/Distance">Distances</a> are a number followed by a &#8220;unit of measurement&#8221; without any indication of what acceptable units of measurements are. <a href="http://schema.org/NutritionInformation">Fat content</a> seems to follow some kind of syntax that includes a number and a measure but various other text as well. Even when values have to adhere to a particular microsyntax, there are examples that are non-standard (such as initial &#8216;<code>P</code>&#8217;s missing from durations).</p>

<p>In other words, there is no documentation about the way in which the values of the schema.org properties will be interpreted by search engines and there is a clear intention on the part of the search engines behind schema.org to be generous in what they accept, so as to ensure that publishers can be lazy while search engines maximise the amount of data that they can understand on the web. Lacking a specification that describes how values are interpreted, the only way for publishers, validators and tool developers to work it out will be to try it out, see what happens, and attempt to find patterns that are generally interpreted in the same way by at least the major search engines, or more likely (because why bother with anyone else), try to work out what Google is going to do with it.</p>

<p>We have been here before, with HTML, pre-WHATWG. Then, IE, which dominated the browser market, had the clear intention to be generous in what it accepted, and there was no specification that described the various error handling quirks that had to be reproduced in bug-for-bug compatible user agents. WHATWG have had to work extremely hard to reverse engineer a specification that provides some kind of predictability and consistency for publishers as well as making it possible for new entrants to the browser market (such as Google&#8217;s Chrome), validators, and other tools to reproduce the behaviour of existing browsers. This work has paid off: over the past few years, <a href="http://en.wikipedia.org/wiki/Usage_share_of_web_browsers#Historical_usage_share">browser market share has diversified</a> somewhat, largely due to the rise of mobile browsers and Chrome taking market share from IE.</p>

<p>With structured metadata, Google is in an extremely dominant position. Concretely, it will be very hard for Google to reveal the methods by which they extract meaningful metadata from the huge variety of textual content on the web: they may have patents that cover some aspects, and in other cases (particularly when that interpretation depends on the analysis of their vast caches of web pages, as in the case of natural language translation) the behaviour simply might not be replicable by any third party.</p>

<p>None of this, by the way, would be helped by using a different syntax to express the data within the page. The only way it could be addressed is by much more clarity, detail, and conformance criteria within the schema.org vocabulary specification.</p>

<p>Without that specificity, we get into a world where Bing, Facebook and any other search engines will spend a lot of time and effort trying to reverse engineer Google behaviour to extract the same data as they do. They might even sometimes manage to introduce useful quirks of interpretation of their own, but that&#8217;s unlikely given that their constrained engineering effort will naturally be focused on matching Google. This also forms a massive barrier to entry (as if those weren&#8217;t already significant) to potential new search engines. Overall, the lack of specificity suppresses innovation in the market.</p>

<p>And of course publishers, writers and tool creators are left struggling to keep up.</p>

<h2>Syntax Fixing</h2>

<p>While both Google and Yahoo! have previously used information described using <a href="http://microformats.org/">microformats</a> and <a href="http://www.w3.org/TR/xhtml-rdfa-primer/">RDFa</a> to provide similar functionality, in schema.org they deprecate that support, both by using microdata throughout the examples and by <a href="http://schema.org/docs/faq.html#11">explicitly saying</a>:</p>

<blockquote>
  <p>If you have already done markup and it is already being used by Google, Microsoft, or Yahoo!, the markup format will continue to be supported. Changing to the new markup format could be helpful over time because you will be switching to a standard that is accepted across all three companies, but you don&#8217;t have to do it.</p>
</blockquote>

<p>Whichever technology they choose, the act of search engine monopolies making that choice and the consequent widespread adoption via SEO creates a large barrier to changes to the technology. Even if the specification for the technology changes, those changes will be likely to be ignored in practice as Google (and hence other search engines) seek to retain backwards compatibility with the examples and guidance published on schema.org as they stand now.</p>

<p>It is particularly damaging to have the choice be microdata because microdata is a relatively new technology that has only just reached W3C Last Call Working Draft. In my experience, Last Call is usually the <em>first time</em> that a wider community outside interested Working Groups start to look at a technology seriously. To create better technologies and better specifications, Working Groups must be able to change in response to this review.</p>

<p>The ultimate result is again standardisation-by-implementation, which has long term adverse consequences in restricting competition (not between technologies, but between organisations using those technologies) and leads us to a situation where we could end up using something that is less than optimal for any kind of wider purpose outside the interests of the monopolist.</p>

<h2>Standards Bodies</h2>

<p>The development of schema.org might seem like a very minor thing, only of interest to people interested in SEO and structured metadata, but it is part of a bigger picture of the kinds of ripple-through effects the dominant players on the internet can have. It is almost <a href="http://gigaom.com/2010/02/26/the-myth-of-the-benign-monopoly/">impossible for monopolies not to do harm</a>, not because anyone within them sets out to, but simply because they are so large that their behaviour is that much more important than anyone else&#8217;s.</p>

<p>The kinds of effects described above &#8212; ones that result in an overall sub-optimal outcome for society as a whole &#8212; are why society has <a href="http://en.wikipedia.org/wiki/Competition_law">competition laws</a> that constrain monopolies and <a href="http://en.wikipedia.org/wiki/Cartel">cartels</a>. Sooner or later, <a href="http://en.wikipedia.org/wiki/European_Union_Microsoft_competition_case">just as it did with Microsoft</a>, society applies the corrective force of regulation. <a href="http://www.huffingtonpost.com/2011/05/24/sarkozy-eg8-governments-regulate-internet_n_866065.html">There are already rumblings of this storm approaching.</a></p>

<p>It is also why we have neutral standards bodies, such as the <a href="http://www.w3.org/">W3C</a> or the <a href="http://www.ietf.org/">IETF</a>, which provide a <a href="http://www.w3.org/Consortium/Patent-Policy/">royalty-free patent policy</a> as well as a <a href="http://www.w3.org/Consortium/Process/">defined process</a> for developing specifications. These might seem tedious to comply with, and it might seem beneficial to companies to form a small cabal in order to get things done more quickly without having to seek wide consensus, but the bigger picture is that open standards developed within standards bodies protect companies from antitrust actions. Companies can point to royalty-free standards developed through a defined and fair process as proof of good behaviour that demonstrates their understanding of a wider responsibility to society as a whole.</p>

<p>As <a href="http://hansard.millbanksystems.com/commons/1947/nov/11/parliament-bill#column_207">Winston Churchill might have said</a>:</p>

<blockquote>
  <p>Many ways of developing standards have been tried and will be tried in this world of sin and woe. No one pretends that standards bodies are perfect or all-wise, and it has been said that developing standards within standards bodies is the worst possible way to do it except all those other ways that have been tried from time to time.</p>
</blockquote>

<p>Objections to schema.org may seem to be <a href="http://hsivonen.iki.fi/schema-org-and-communities/">sour grapes</a> because they didn&#8217;t use a particular existing syntax or vocabulary, but look deeper and the issues schema.org raises are all about the responsibilities of monopolies and the role of open standards. The parallels with HTML, IE and Microsoft are striking; it will be interesting to see if this turns out the same way.</p>
    ]]></content>
  </entry>
  <entry>
    <title>Lessons for Microdata from schema.org</title>
    <link rel="alternate" type="text/html" href="http://www.jenitennison.com/blog/node/156" />
    <id>http://www.jenitennison.com/blog/node/156</id>
    <published>2011-06-10T20:27:38+00:00</published>
    <updated>2011-06-10T20:27:38+00:00</updated>
    <author>
      <name>Jeni</name>
    </author>
    <category term="html5" />
    <category term="microdata" />
    <category term="schema.org" />
    <summary type="html"><![CDATA[<p>There is (obviously, from the way my tweet stream, feed reader and email have filled up) lots to say at many levels about <a href="http://schema.org/">schema.org</a>, a new collaboration between Google, Microsoft and Yahoo! that describes the next phase in search engines&#8217; extraction of semantics from web pages. In this post I&#8217;m going to focus on what we can learn from schema.org about the design of <a href="http://www.w3.org/TR/microdata/">microdata</a> and how it might be improved.</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>There is (obviously, from the way my tweet stream, feed reader and email have filled up) lots to say at many levels about <a href="http://schema.org/">schema.org</a>, a new collaboration between Google, Microsoft and Yahoo! that describes the next phase in search engines&#8217; extraction of semantics from web pages. In this post I&#8217;m going to focus on what we can learn from schema.org about the design of <a href="http://www.w3.org/TR/microdata/">microdata</a> and how it might be improved.</p>

<!--break-->

<p>Digging into the details of schema.org there are several examples of places where its recommended method of marking up metadata directly contradicts the HTML5 specs. Given the number of internal contradictions within schema.org, I&#8217;m assuming that these are mistakes that will be corrected as the material is reviewed and matures rather than deliberate forking of HTML5.</p>

<blockquote>
  <p><em>Note: What I say about HTML5 here is equally true &#8212; at least at time of writing &#8212; of the WHATWG version of HTML, which of course already diverges from HTML5.</em></p>
</blockquote>

<p>One of the inputs to the design of microdata was to look at the mistakes that people make and try to design something to address the cause of those errors, so it&#8217;s interesting to apply that method to the errors made by schema.org. This doesn&#8217;t mean changing specs so that erroneous markup is conformant, but it does mean providing facilities that enable people to more easily do things in a conformant way, removing the temptation of non-conformance and lowering the likelihood of future mistakes.</p>

<blockquote>
  <p><em>Note: I am certain that there would also have been errors had schema.org used RDFa or microformats, indeed I gather that they are common in the documentation of <a href="http://www.google.com/support/webmasters/bin/answer.py?answer=99170">Google&#8217;s Rich Snippets</a>.</em></p>
</blockquote>

<h2>Use of <code>&lt;time&gt;</code> element</h2>

<p>The first example is one <a href="http://tantek.com/2011/155/t5/schemaorg-html5-fork-smoke-openinghours-time-duration">spotted by Tantek</a>: the value of the <code>openingHours</code> property of <a href="http://schema.org/EventVenue">EventVenue</a> is described as:</p>

<blockquote>
  <p>The opening hours for a business. Opening hours can be specified as a weekly time range, starting with days, then times per day. Multiple days can be listed with commas &#8216;,&#8217; separating each day. Day or time ranges are specified using a hyphen &#8216;-&#8216;.</p>
  
  <ul>
  <li>Days are specified using the following two-letter combinations: <code>Mo</code>, <code>Tu</code>, <code>We</code>, <code>Th</code>, <code>Fr</code>, <code>Sa</code>, <code>Su</code>.</li>
  <li>Times are specified using 24:00 time. For example, 3pm is specified as <code>15:00</code>.</li>
  </ul>
  
  <p>Here is an example: <code>&lt;time itemprop="openingHours" datetime="Tu,Th 16:00-20:00"&gt;Tuesdays and Thursdays 4-8pm&lt;/time&gt;</code></p>
</blockquote>

<p>A similar error involving the <code>&lt;time&gt;</code> element can be found on the <a href="http://schema.org/docs/gs.html#advanced_dates">Getting Started page</a> which has an example in which the <code>datetime</code> attribute contains an ISO 8601 <em>duration</em>:</p>

<blockquote>
  <p>Durations can be specified in an analogous way using the time tag with the datetime attribute. Durations are prefixed with the letter P (stands for &#8220;period&#8221;). Here&#8217;s how you can specify a recipe cook time of 1 ½ hours:</p>

<pre><code>&lt;time itemprop="cookTime" datetime="P1H30M"&gt;1 1/2 hrs&lt;/time&gt;
</code></pre>
  
  <p>H is used to designate the number of hours, and M is used to designate the number of minutes.</p>
</blockquote>

<p>According to the HTML5 specification, the <code>datetime</code> attribute holds a <a href="http://www.w3.org/TR/html5/Overview.html#valid-date-or-time-string">valid date or time string</a> which is specified as using the normal ISO 8601 syntaxes for dates and times. <a href="http://www.w3.org/TR/html5/Overview.html#conforming-html5-documents">Conforming HTML5 document</a> must not hold the syntax used in the above example. From what I can tell (please correct me if I have this wrong), <a href="http://www.w3.org/TR/html5/Overview.html#data-mining">conforming data-mining tools</a> <em>can</em> process this syntax because they use the <a href="http://www.w3.org/TR/microdata/#values">value of the <code>datetime</code> content attribute</a>, rather than the value of the <code>dateTime</code> IDL attribute to provide the property&#8217;s value, but from an authoring perspective, no one should be encouraging people to create non-conformant HTML5.</p>

<p>In fact, given that values from microdata are never typed, it&#8217;s not clear why these examples use the <code>&lt;time&gt;</code> element at all. The conformant way to provide the data would be to use a separate <code>&lt;meta&gt;</code> element to hold the data:</p>

<pre><code>&lt;meta itemprop="openingHours" content="Tu,Th 16:00-20:00"&gt;
Tuesdays and Thursdays 4-8pm
</code></pre>

<p>But I can understand the itch to use the <code>&lt;time&gt;</code> element here; the <code>&lt;meta&gt;</code> element above is <em>before</em>, rather than <em>around</em>, the textual content of the page which it reflects, whereas with the <code>&lt;time&gt;</code> element it is more obviously an explicit machine-readable version of that content. In microformats, the pattern is to use a <code>title</code> attribute:</p>

<pre><code>&lt;span class="openingHours" title="Tu,Th 16:00-20:00"&gt;
  Tuesdays and Thursdays 4-8pm
&lt;/span&gt;
</code></pre>

<p>and in RDFa a <code>content</code> attribute:</p>

<pre><code>&lt;span property="openingHours" content="Tu,Th 16:00-20:00"&gt;
  Tuesdays and Thursdays 4-8pm
&lt;/span&gt;
</code></pre>

<p>So maybe this error is an indication that microdata needs an <code>itemvalue</code> attribute for those cases where the human-readable content can be expressed in a more formal machine-readable microsyntax, with special handling with the <code>&lt;time&gt;</code> element for the case when the value is a date/time and <code>href</code>/<code>src</code> when the value is a URI.</p>

<h2>String parsing of URIs</h2>

<p>The second example of non-conformance with HTML5 in schema.org is the method by which they <a href="http://schema.org/docs/extension.html">support extensibility</a>. Schema.org provides a set of types for the things described by web pages. Naturally, this set does not cover everything, but search engines still want to be able to use the metadata within the page about a <code>Person</code> even when that <code>Person</code> is described as a <code>Minister</code> (say).</p>

<p>So schema.org says that to extend their type hierarchy, you simply append the name of the new type after a <code>/</code> at the end of the URI for the parent type. In this example, a <code>Minister</code> should be given the type <code>http://schema.org/Person/Minister</code>.</p>

<p>My guess (as I can&#8217;t see any other way in which they&#8217;d do it) is that the search engines intend to use string processing on the type URI in order to work out whether it&#8217;s a subtype of a known type (ie, does the URI start with the string <code>http://schema.org/Person</code>? If it does, it&#8217;s a Person of some kind).</p>

<p>The <a href="http://www.w3.org/TR/2011/WD-microdata-20110525/#items">microdata specification</a> states that:</p>

<blockquote>
  <p>The item type must be a type defined in an <a href="http://www.w3.org/TR/html5/Overview.html#other-applicable-specifications">applicable specification</a>.</p>
</blockquote>

<p>and that:</p>

<blockquote>
  <p>Item types are opaque identifiers, and user agents must not dereference unknown item types, or otherwise deconstruct them, in order to determine how to process items that use them.</p>
</blockquote>

<p>Treating URIs as opaque identifiers, and forbidding inferring semantic meaning through string processing, is a fairly fundamental web architectural principle as well as fitting with microdata&#8217;s constraint that types are specified somewhere.</p>

<p>Perhaps the gloss is that schema.org is an applicable specification that states that any URI that looks like <code>http://schema.org/{known-type}/{extension-type}</code> is a type that is defined in schema.org. But I think what&#8217;s actually happening is that schema.org wants to grow their vocabulary organically, responding to the data that the search engines find on the web. They recognise that people will want to use their own vocabularies for their own purposes (for example to provide data for scripts, reusers, or for browsers and other agents that aren&#8217;t search engines) but want to continue to be able to understand the semantics of that data.</p>

<p>Schema.org is constrained in the mechanisms that it could use to recognise these new types:</p>

<ul>
<li>they can&#8217;t resolve the type URI and expect metadata at the end of it to indicate the parent schema.org type, because the HTML5 microdata specification forbids resolving unrecognised item type URIs; even if they were allowed to do so, working out inheritance by declarative mechanisms that involve resolving URIs and interpreting the results is computationally expensive compared to string munging; for search engines that expect to do this with billions of web pages, this may be a significant processing burden</li>
<li>they can&#8217;t tell people to use the schema.org type in addition to their own type because the HTML5 microdata specification doesn&#8217;t allow an item to have more than one type; if it did, they could encourage people to use the schema.org type as well as their more specific one, so rather than <code>itemtype="http://schema.org/Person/Minister"</code>, enable publishers to do <code>itemtype="http://schema.org/Person http://reference.data.gov.uk/def/central-government/Minister"</code></li>
</ul>

<p>I suspect that having multiple types is currently disallowed in microdata because the semantics of the (non-URI) properties of an item are based on its type, but I think that microdata could handle multiple types for an item if instead of saying:</p>

<blockquote>
  <p>If the item is a typed item: [the property token must be] a defined property name allowed in this situation according to the specification that defines the relevant type for the item</p>
</blockquote>

<p>it said:</p>

<blockquote>
  <p>If the item is a typed item: [the property token must be] a defined property name allowed in this situation according to any of the specifications that define the relevant types for the item; if the property is defined for more than one type, these definitions must be identical</p>
</blockquote>

<p>It&#8217;s worth noting that, unlike <code>itemtype</code>, the <code>itemprop</code> attribute <em>can</em> take multiple values which, along with its ability to take URIs, provides for an easier inheritance mechanism. However, schema.org again recommends a string-based approach for extended properties; maybe that&#8217;s for consistency, or perhaps the real reason is that they want to become a centralised repository for acceptable microdata on the web.</p>

<h2>Summary</h2>

<p>Whatever method publishers use, creating structured machine-processable data is hard and embedding it in a page is even harder. It is a layer that necessarily sits above or alongside the content of a page, invisible to people simply looking at the page in a browser and therefore difficult to get right without additional tooling.</p>

<p>Microdata specifically aims to make this easy to do; schema.org demonstrates some of the ways in which it doesn&#8217;t <em>quite</em> meet that requirement. Perhaps the search engines behind the effort haven&#8217;t managed to (or bothered to) implement HTML5/microdata parsing correctly or perhaps the people writing the documentation thought they understood how HTML5/microdata works but actually didn&#8217;t. </p>

<p>Either way, the mistakes are worth learning from to improve the specs while they are not yet final. As I discussed above I think that means:</p>

<ul>
<li>adding an <code>itemvalue</code> attribute for machine-readable versions of content</li>
<li>enabling <code>itemtype</code> to take multiple values to support extensibility</li>
</ul>

<p>though I suspect the latter point at least will be contentious among those who don&#8217;t think decentralised extensibility is ever a desirable feature.</p>
    ]]></content>
  </entry>
</feed>

