<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.jenitennison.com/blog" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>html5</title>
 <link>http://www.jenitennison.com/blog/taxonomy/term/44</link>
 <description>The taxonomy view with a depth of 0.</description>
 <language>en</language>
<item>
 <title>My Experience of Web Standards</title>
 <link>http://www.jenitennison.com/blog/node/160</link>
 <description>&lt;p&gt;One of the things that&amp;#8217;s been niggling at the back of my mind since the &lt;a href=&quot;http://schema.org&quot;&gt;schema.org&lt;/a&gt; announcement is how small a role search engine results plays in the wider data sharing efforts that I&amp;#8217;m more familiar with in my work on &lt;a href=&quot;http://www.legislation.gov.uk/&quot;&gt;legislation.gov.uk&lt;/a&gt;, and more generally how my day job experience differs from (what seem to be) more common experiences of development on the web. In this post, I&amp;#8217;m going to talk about that experience, and about the particular problems that I see with the coexistence of microdata and RDFa as a result.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;My day job (the one I actually get paid for) is web development. The site I spend most of my time and effort on is &lt;a href=&quot;http://www.legislation.gov.uk/&quot;&gt;legislation.gov.uk&lt;/a&gt;. This deals with complex content (UK legislation) that has to be presented in multiple formats (users love PDFs of legislation). Our aim is to make the data as reusable as possible by third parties through good, RESTful, web architecture, and we want to use open standards and open source technologies as part of the &lt;a href=&quot;http://www.cabinetoffice.gov.uk/resource-library/open-source-open-standards-and-re-use-government-action-plan&quot;&gt;UK government&amp;#8217;s general strategy&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;legislation.gov.uk is not a global website like Amazon or eBay, but it&amp;#8217;s not small either: it covers 60,000 changing items of legislation, providing point-in-time views for many of them, and with more added every day. It&amp;#8217;s one of the top ten most used UK Government websites, with 2 million visits (about 10-12 million page views) each month and typically about 120 requests/second during the active times of the day. Legislation might sound like a highly specialist interest, but if you &lt;a href=&quot;http://twitter.com/search/legislation.gov.uk&quot;&gt;search for legislation.gov.uk on Twitter&lt;/a&gt; you&amp;#8217;ll see it being referenced over and over by people who want to share what the law says.&lt;/p&gt;

&lt;p&gt;I do not by any means claim that my experience is representative of the wider web. I know that there are large numbers of sites that deal only in data, not documents, and certainly not documents with the kind of rich semantic structure that legislation has. I offer the following discussion as a data point, partly because I can&amp;#8217;t quite believe that legislation.gov.uk is &lt;em&gt;completely&lt;/em&gt; unique in its requirements and partly because obviously my perspective on a bunch of issues arises from this experience.&lt;/p&gt;

&lt;h2&gt;Technology Stacks&lt;/h2&gt;

&lt;p&gt;Legislation items are complex, semi-structured documents. Their natural fit is XML (well, that&amp;#8217;s not quite true &amp;#8212; their natural fit would be something that allowed overlapping markup &amp;#8212; but XML is the closest that we have). So we store it in XML in a native XML database and we use an XML toolset to query it (XQuery) and transform it (XSLT) into various formats including rendering it as PDF (through XSL-FO).&lt;/p&gt;

&lt;p&gt;Our next step for the development of the site involves looking at legislative effects. These form a graph: one item of legislation affects other items of legislation which may in turn affect other items and so on. There are all sorts of other links between items of legislation in terms of commencements, conferred powers and so on. Particularly because we already have well-thought-through URIs for legislation, the natural fit is to use RDF to represent this graph. We already offer a SPARQL endpoint for accessing some aspects of our data, but we expect to expand and develop this over the next few months and to use it as a layer under the website and exposed for reusers, in much the same way as we use the XML database.&lt;/p&gt;

&lt;p&gt;As a government site, we have fairly strict limits on what we can do within our web pages: we have to make sure that they&amp;#8217;re accessible by everyone who wants to view them. We aren&amp;#8217;t able to use technologies that are only available in the latest browsers, but that&amp;#8217;s OK because with the kind of content we deal with, we don&amp;#8217;t have to do anything fancy anyway. So we use pretty basic HTML and CSS and Javascript, because that&amp;#8217;s how you deliver content to end-users on the web (as well as exposing the underlying XML and RDF, to enable others to reuse the data).&lt;/p&gt;

&lt;p&gt;In other words, we use three web stacks for delivering legislation.gov.uk:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the XML stack, which is great for single-source publishing of documents that have more semantic structures than those supported by HTML&lt;/li&gt;
&lt;li&gt;the RDF stack, which is well-suited for metadata about things that are identified by URIs&lt;/li&gt;
&lt;li&gt;the HTML stack, which is absolutely necessary for delivering human-accessible content on the web&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What bemuses me, because of this experience, is that sometimes it appears that the narrative around these technologies is framed in terms of an exclusive choice between them. For example, &lt;a href=&quot;http://twitter.com/mattur/status/89331716430372864&quot;&gt;@mattur asked&lt;/a&gt;:&lt;/p&gt;

&lt;p style=&quot;text-align:center;&quot;&gt;
  &lt;a href=&quot;http://twitter.com/mattur/status/89331716430372864&quot;&gt;&lt;img src=&quot;/blog/files/mattur-tweet.jpg&quot; alt=&quot;@gimsieke @JeniT how may TAG members believe RDF(a) and X(HT)ML are way forward? How many think they aren&#039;t?&quot; /&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;It is as if, if you use XML you &lt;em&gt;cannot&lt;/em&gt; appreciate the utility of error-handling in HTML; or if you use RDF you &lt;em&gt;cannot&lt;/em&gt; understand the need to represent documents in XML; or if you want to utilise HTML fully, you &lt;em&gt;cannot&lt;/em&gt; adopt RDF&amp;#8217;s view of data on the web. That&amp;#8217;s simply not my experience. They each have their role on the web; supporting the use of one does not necessitate rejecting the use of the others.&lt;/p&gt;

&lt;p&gt;It&amp;#8217;s interesting that some of the standards that are most reviled are those that arise at the intersections, where it appears that one technology is trying to encroach on the space of another:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;XHTML at the border of XML and HTML&lt;/li&gt;
&lt;li&gt;RDF/XML at the border of RDF and XML&lt;/li&gt;
&lt;li&gt;RDFa at the border of all three&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At the same time, within legislation.gov.uk, we publish XHTML (because it&amp;#8217;s the natural output from an XML toolchain) and create and process RDF/XML (because it gives us access to that data from within the XML toolchain). We use a small bit of RDFa in the XHTML to indicate the rights under which our information is avaialble, and don&amp;#8217;t yet, but are thinking about using RDFa to mark up non-document semantics within our XML (to enable the XML markup to focus on the document structures that it&amp;#8217;s good at). For all their imperfections, these intersection technologies are useful for managing cross-overs; the problems arise when they overstep their remit and people start to think that &lt;em&gt;all&lt;/em&gt; HTML must be XHTML or &lt;em&gt;all&lt;/em&gt; XML must be RDF/XML or &lt;em&gt;all&lt;/em&gt; RDF must be RDFa.&lt;/p&gt;

&lt;h2&gt;Sharing Scenarios&lt;/h2&gt;

&lt;p&gt;The second thing that I wanted to explore is the experience from legislation.gov.uk of what it&amp;#8217;s like to be a publisher who actively wants to share their data. We need to operate simultaneously at three levels in our data sharing efforts.&lt;/p&gt;

&lt;h3&gt;Large-Scale Consumer-Driven Data Sharing&lt;/h3&gt;

&lt;p&gt;The first target for our data sharing efforts are the search engines. Obviously we&amp;#8217;re not selling anything, but we want people to be able to locate legislation easily when they want it, and we want people who have done the search to be able to see some information about the legislation so that they know that they&amp;#8217;ve located the right item.&lt;/p&gt;

&lt;p&gt;This is large-scale consumer (search engine) driven data sharing, typified by schema.org and Facebook&amp;#8217;s &lt;a href=&quot;http://developers.facebook.com/docs/opengraph/&quot;&gt;Open Graph Protocol&lt;/a&gt; (OGP). There are a few very big data consumers (Google, Microsoft, Yahoo!, Facebook etc) who need to consume data from large numbers of data providers. These consumers obviously can&amp;#8217;t understand &lt;em&gt;everything&lt;/em&gt;, so they determine and document what syntaxes and vocabularies they &lt;em&gt;do&lt;/em&gt; understand and expect publishers to follow.&lt;/p&gt;

&lt;p&gt;The benefits that publishers get from a particular consumer determines which syntax/vocabulary they use; publishers who are particularly keen to show up prettily within search results will target schema.org whereas those who want to be sharable within Facebook will target OGP. Many publishers will want to target both. There is probably a driver towards eventual convergence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;publishers might push back about inserting two lots of very similar data in their pages&lt;/li&gt;
&lt;li&gt;consumers might want to include data from publishers who haven&amp;#8217;t specifically targeted them&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;although there&amp;#8217;s likely to be a period where they coexist, much as there was for VHS and Betamax (and &lt;a href=&quot;http://en.wikipedia.org/wiki/Video_2000&quot;&gt;V2000&lt;/a&gt;, I know, dad) during the early days of video players.&lt;/p&gt;

&lt;p&gt;As &lt;a href=&quot;http://www.jenitennison.com/blog/node/157&quot;&gt;I discussed previously&lt;/a&gt;, these large-scale consumers will be driven by the data that they find in the wild, in all its messy variety. They get relatively little benefit directly from using a generic &lt;em&gt;syntax&lt;/em&gt;, as they are really interested in only a few, pretty generic, &lt;em&gt;vocabularies&lt;/em&gt; for which they have hardwired processing. Indirectly, adopting a generic syntax has benefits in that publishers might find it easier to find tools that enable them to generate it, tutorials about how to use it, and feel that they aren&amp;#8217;t being quite as locked in to something proprietary. However, rejecting data that isn&amp;#8217;t marked up properly using that syntax has no benefit for consumers except in so far as it makes them feel that they are being good community members. &lt;/p&gt;

&lt;p&gt;This is the pattern we see with schema.org (which accepts microdata but, based on its documentation, won&amp;#8217;t reject data that isn&amp;#8217;t fully compliant with it) and with OGP (which accepts a subset of RDFa but doesn&amp;#8217;t reject data that hasn&amp;#8217;t got prefixes properly bound, for example).&lt;/p&gt;

&lt;p&gt;Another point to mention is that there is very little trust in this scenario. The communication between consumers and publishers is very limited, and the consumers will want to protect themselves against accidental or malicious errors that are evident in mismatches between explicit metadata and that which is parsed from the visible content of the page.&lt;/p&gt;

&lt;p&gt;The parallels to HTML and browser vendors are very strong in this type of data sharing.&lt;/p&gt;

&lt;h3&gt;Small-Scale Consumer-Driven Data Sharing&lt;/h3&gt;

&lt;p&gt;A second type of data sharing is again driven by consumers, but this time at a lot smaller and more specialised scale. For legislation.gov.uk, these are services such as &lt;a href=&quot;http://www.glin.gov/&quot;&gt;GLIN&lt;/a&gt;, which is a global legislation registry. Other examples are the recent work that we&amp;#8217;ve done to publish &lt;a href=&quot;http://data.gov.uk/organogram&quot;&gt;UK Government organograms&lt;/a&gt; or &lt;a href=&quot;http://countculture.wordpress.com/&quot;&gt;Chris Taggart&lt;/a&gt;&amp;#8217;s &lt;a href=&quot;http://openelectiondata.org/&quot;&gt;Open Election Data&lt;/a&gt; project. In these cases, there&amp;#8217;s a single, relatively small and specialised consumer and a small number of publishers which are closely coordinated together.&lt;/p&gt;

&lt;p&gt;As in the large-scale case, the consumer ultimately determines the syntax/vocabulary that it recognises, and communicates that to the publishers. However, small-scale consumers typically have close coordination with the publishers, which has a number of side-effects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;consumers may be more able to both apply pressure to and help publishers to do well in their markup&lt;/li&gt;
&lt;li&gt;publishers have the opportunity to feed back directly to the consumer any suggestions that they have about changes to the syntax/vocabulary&lt;/li&gt;
&lt;li&gt;publishers are likely to gain an immediate and tangible benefit from their cooperation, such as visualisations of their data that they otherwise wouldn&amp;#8217;t have seen&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another noteworthy point about small-scale consumers is that they&amp;#8217;re unlikely to have the engineering capability to create a custom parser for a particular syntax, but will instead want to use something off-the-shelf to extract data from pages and into their own backend systems. This, coupled with the closer coordination with publishers, means that they&amp;#8217;re much more likely to stick to a specification, assuming that the off-the-shelf tools do.&lt;/p&gt;

&lt;h3&gt;Publisher-Driven Data Sharing&lt;/h3&gt;

&lt;p&gt;The final type of data sharing is driven by publishers. At legislation.gov.uk, we&amp;#8217;re motivated to make our data available for reuse for transparency/accountability reasons (to help citizens understand the law), efficiency reasons (to help parliament and government departments to publish new legislation better) and economic reasons (to foster innovation in legal publishing). We don&amp;#8217;t have any individual consumers in mind when we publish our data, but have found that simply by publishing it well, we foster reuse.&lt;/p&gt;

&lt;p&gt;In this case, we as publishers are highly motivated to ensure that the data we publish is easily parsed with something off-the-shelf, since that lowers the barrier for potential consumers. Publishers like us are very likely to have unique, specialised, content and need to use a vocabulary that fits closely to our internal data structures as this lowers implementation cost. Consumers can also trust publishers like us: we simply have no motivation to lie in the data that we provide for reuse.&lt;/p&gt;

&lt;h2&gt;Mixed Markup&lt;/h2&gt;

&lt;p&gt;As I&amp;#8217;ve outlined above, publishers like legislation.gov.uk need to target several potential consumers at the same time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;large-scale consumers such as search engines&lt;/li&gt;
&lt;li&gt;small-scale consumers that provide us with a useful service&lt;/li&gt;
&lt;li&gt;specialist consumers that are interested specifically in our data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We cannot use a single vocabulary for all these different purposes. (Well, we could write our own vocabulary and describe mappings to other vocabularies using RDFS, but search engines wouldn&amp;#8217;t read it.)&lt;/p&gt;

&lt;p&gt;We must therefore use a mix of vocabularies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;generic vocabularies about things that search engines care about&lt;/li&gt;
&lt;li&gt;specialised vocabularies for particular small consumers&lt;/li&gt;
&lt;li&gt;site-specific vocabularies for sharing our unique data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It&amp;#8217;s repetitive, but it&amp;#8217;s manageable so long as we have a syntax that enables us to say that an item of legislation is a &lt;code&gt;http://scheme.org/CreativeWork&lt;/code&gt; and a &lt;code&gt;http://purl.org/dc/dcmitype/Text&lt;/code&gt; and a &lt;code&gt;http://www.legislation.gov.uk/def/legislation/Legislation&lt;/code&gt; and allows us to give multiple properties the same value.&lt;/p&gt;

&lt;p&gt;The way things are going at the moment, we might well end up having to use multiple &lt;em&gt;syntaxes&lt;/em&gt; on the same page, as some consumers understand microdata, others consume RDFa, and still others will parse microformats. This leads to more repetition: adding &lt;code&gt;itemprop&lt;/code&gt; for microdata, &lt;code&gt;property&lt;/code&gt; for RDFa and specialised &lt;code&gt;class&lt;/code&gt; attributes for microformats. But worse (much worse), each of the syntaxes uses a different parsing model to create an entity-property-value structure, so not only do we have to learn substantially different markup patterns but our pages quickly become some kind of hideous polyglot mess trying to balance between them.&lt;/p&gt;

&lt;h2&gt;Looking Forward&lt;/h2&gt;

&lt;p&gt;As I said at the start, I&amp;#8217;m fairly sure that my experience at legislation.gov.uk isn&amp;#8217;t representative of the wider web, but I don&amp;#8217;t have a clear idea about just how unrepresentative it is, in terms of technology use or motivations around data sharing. When I read my twitter stream or blogs, there&amp;#8217;s a massive sampling bias, both in terms of who I follow and what I read, but also about who talks about what they&amp;#8217;re doing. (I&amp;#8217;m reminded of &lt;a href=&quot;http://www.codinghorror.com/blog/&quot;&gt;Jeff Atwood&lt;/a&gt;&amp;#8217;s post on the &lt;a href=&quot;http://www.codinghorror.com/blog/2007/11/the-two-types-of-programmers.html&quot;&gt;Two Types of Programmers&lt;/a&gt;: the vast majority of web developers don&amp;#8217;t make a noise about what they do.)&lt;/p&gt;

&lt;p&gt;Taking part in web standardisation today often feels like being part on an ongoing cold war between distinct camps rather than a community working towards common aims. The underlying question seems to be &amp;#8220;who&amp;#8217;s side are you on?&amp;#8221; Every decision and activity is cast as a victory or defeat. Time is wasted on attack and defence, or on raking over past slights and stupidities, rather than on progress. Valid criticism from outside a group cannot be listened to for fear of giving ground, cannot be made within a group where it seems like betrayal.&lt;/p&gt;

&lt;p&gt;It is the &lt;a href=&quot;http://en.wikipedia.org/wiki/Realistic_conflict_theory#The_Robbers_Cave_Experiment&quot;&gt;Robbers Cave Experiment&lt;/a&gt; played out in web standards. As a psychologist, I find it fascinating. As a developer, and particularly one who doesn&amp;#8217;t self-identify with any single group, it is frustrating. As a TAG member, trying to work for the longer-term good of the web, it is worrying, because situations of intergroup conflict lead to &lt;a href=&quot;http://en.wikipedia.org/wiki/Groupthink&quot;&gt;groupthink&lt;/a&gt; and non-optimal solutions.&lt;/p&gt;

&lt;p&gt;As I described above, a non-optimal outcome seems to be the most likely result of the particular microdata vs RDFa conflict for us at legislation.gov.uk. While I know we are not generally representative, I believe that it will be similarly bad for other developers: publishers, consumers and tool implementers.&lt;/p&gt;

&lt;p&gt;This is a problem for all who want to foster data sharing on the web using open standards; it is not one that any one group can fix on their own. It&amp;#8217;s my hope that a balanced task force of individuals with a variety of experience and backgrounds can provide a focus for us all to work together to solve it. If we can&amp;#8217;t, then we have let our prejudice and bias overcome our judgement.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/160#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/44">html5</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/71">microdata</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/31">rdf</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/42">rdfa</category>
 <pubDate>Sun, 24 Jul 2011 16:24:00 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">160 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>TAG F2F, June 2011</title>
 <link>http://www.jenitennison.com/blog/node/158</link>
 <description>&lt;p&gt;As you may know, I accepted an appointment to the &lt;a href=&quot;http://www.w3.org/2001/tag/&quot;&gt;W3C&amp;#8217;s Technical Architecture Group&lt;/a&gt; earlier this year. Last week was the first face-to-face meeting that I attended, hosted in the &lt;a href=&quot;http://en.wikipedia.org/wiki/Ray_and_Maria_Stata_Center&quot;&gt;Stata Center&lt;/a&gt; at MIT. As you can tell from the &lt;a href=&quot;http://www.w3.org/2001/tag/2011/06/06-agenda&quot;&gt;agenda&lt;/a&gt; (which was in fact revised as we went along) it was a packed three days.&lt;/p&gt;

&lt;p&gt;What I intend to do here is to briefly report on the major areas that we discussed and give a tiny bit of my own personal take on them. In no way should any of what I write here be judged as revealing the official opinion of the TAG, it&amp;#8217;s just me saying what I think, and I&amp;#8217;m not going to go into anything in depth because they&amp;#8217;re all incredibly gnarly and contentious topics and I&amp;#8217;d not only be here all year but also end up in a tar pit.&lt;/p&gt;

&lt;!--break--&gt;

&lt;h2&gt;Role of the TAG&lt;/h2&gt;

&lt;p&gt;Usefully for me as a newcomer, our first session was about the ongoing role of the TAG. The TAG occupies a unique position within the W3C. According to its &lt;a href=&quot;http://www.w3.org/2004/10/27-tag-charter.html&quot;&gt;charter&lt;/a&gt; it was set up&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;To improve the effectiveness of Working Groups, to reduce misunderstandings and overlapping work, and to improve the consistency of Web technologies developed inside and outside W3C&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The TAG ultimately has three routes to do this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;by providing specific advice on issues that are brought to its attention&lt;/li&gt;
&lt;li&gt;by writing documents on basic web architecture principles that go through community review, particularly through the general review of the W3C standards track and become Recommendations&lt;/li&gt;
&lt;li&gt;by advising the W3C Director (Tim Berners-Lee) about what he should do on the extremely rare occasions when there are issues that he is supposed to adjudicate on&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In none of these cases is there anything that binds the people receiving the advice of the TAG, or reading Findings or Recommendations made by the TAG, to accept them or do anything about them. The power and authority of the TAG depends solely on the quality and utility of its arguments, which is how it should be in my opinion.&lt;/p&gt;

&lt;h2&gt;Client-Side Application State&lt;/h2&gt;

&lt;p&gt;The first technical session was about &lt;a href=&quot;http://www.w3.org/2001/tag/2011/06/06-agenda#clientState&quot;&gt;client-side application state&lt;/a&gt; and was a review of the &lt;a href=&quot;http://www.w3.org/2001/tag/doc/IdentifyingApplicationState-20110515.html&quot;&gt;Identifying Application State draft&lt;/a&gt; that &lt;a href=&quot;http://en.wikipedia.org/wiki/T._V._Raman&quot;&gt;T.V. Raman&lt;/a&gt; began before he left the TAG and that &lt;a href=&quot;http://www.linkedin.com/pub/ashok-malhotra/4/675/6a2&quot;&gt;Ashok Malhotra&lt;/a&gt; has been working on since. This should in the next few months or so be published as a TAG Finding (though it is currently on the Recommendation track).&lt;/p&gt;

&lt;p&gt;This work is essentially about documenting the different ways in which you can identify application state within a URI, why that&amp;#8217;s a useful thing to do, and some of the pitfalls of using &lt;a href=&quot;http://www.jenitennison.com/blog/node/154&quot;&gt;hash URIs&lt;/a&gt; to do so. Most of the discussion was about details to do with wording within the document. One thing I thought particularly interesting was the point that URI-based application state is relevant in all &amp;#8216;active content&amp;#8217;, not just in HTML; for example, scripting in SVG or in PDFs bring the same considerations.&lt;/p&gt;

&lt;h2&gt;Buffer Bloat&lt;/h2&gt;

&lt;p&gt;Over lunch on Monday we listened to and discussed a presentation by &lt;a href=&quot;http://en.wikipedia.org/wiki/Jim_Gettys&quot;&gt;Jim Gettys&lt;/a&gt; on &lt;a href=&quot;http://www.w3.org/2001/tag/2011/06/06-agenda#bufferbloat&quot;&gt;buffer bloat&lt;/a&gt;. Basically (and all the errors here are introduced by me), TCP/IP is designed to route around network blockages, but it can only do so if it detects them quickly. When you have big buffers in place, as in the case of all modern operating systems and hardware, blockages aren&amp;#8217;t detected quickly; they&amp;#8217;re only detected when the buffers fill up. Then buffers empty and the data has to be sent again. The net result is that connections get really slow, not just for upload or download but for both, not just for you but for everyone using the network.&lt;/p&gt;

&lt;p&gt;Jim talked about how this is exacerbated by the large amount of web traffic and the design of HTTP, particularly the lack of use of HTTP pipelining (whereby several HTTP requests and responses are sent over one long-term connection), because it leads to lots of small messages which can&amp;#8217;t be handled effectively. There&amp;#8217;s lots more about this &lt;a href=&quot;http://gettys.wordpress.com/&quot;&gt;on his blog&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Jim also talked about the failure of certificate authorities and how we should be looking at distributed protocols using digitally signed data, pointing us in particular to &lt;a href=&quot;http://www.ccnx.org/&quot;&gt;CCNx&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;Fragment ID Semantics&lt;/h2&gt;

&lt;p&gt;First thing Tuesday was a session that I led on &lt;a href=&quot;http://www.w3.org/2001/tag/2011/06/06-agenda.html#mimefrag&quot;&gt;fragids&lt;/a&gt;, in particular the problems that are arising out of the mime type registration of +xml types (&lt;a href=&quot;http://www.w3.org/2006/02/son-of-3023/draft-murata-kohn-lilley-xml-04.html#frag&quot;&gt;3023bis&lt;/a&gt;) clashing with those that are used for, say, &lt;a href=&quot;http://www.w3.org/TR/2011/WD-media-frags-20110317/&quot;&gt;images&lt;/a&gt;, and what happens when these come together in something like SVG.&lt;/p&gt;

&lt;p&gt;The same issues arise whenever you have documents with types that &amp;#8216;inherit&amp;#8217; fragid semantics from two directions. For example, XHTML documents are XML documents, so constraints on +xml mean you shouldn&amp;#8217;t use interpreted fragids (eg hash-bangs) on them, but they are also &amp;#8216;active content&amp;#8217; which makes interpreted fragids useful. Similarly, in linked data you shouldn&amp;#8217;t really use a hash URI to mean a Person with a primary resource that provides as a response an XML document with embedded RDFa, because according to XML fragid semantics, such a URI should point to an XML element.&lt;/p&gt;

&lt;p&gt;Basically the use of fragids has grown markedly outside their original scope and these situations aren&amp;#8217;t really covered in the specs. I am now tasked to create a document that describes the issues and suggests ways forward. So that will be fun.&lt;/p&gt;

&lt;h2&gt;Telcon with IAB&lt;/h2&gt;

&lt;p&gt;The second session on Tuesday was a telcon with the &lt;a href=&quot;http://www.iab.org/&quot;&gt;IAB&lt;/a&gt; which has a similar role within the &lt;a href=&quot;http://www.ietf.org/&quot;&gt;IETF&lt;/a&gt; as the TAG does within the W3C. This was a bit of a &amp;#8216;getting to know you&amp;#8217; session, covering the work of the two groups on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;versioning and extensibility&lt;/li&gt;
&lt;li&gt;security&lt;/li&gt;
&lt;li&gt;privacy, including Do Not Track&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and talking about opportunities to meet and work together on various topics like these.&lt;/p&gt;

&lt;h2&gt;URI Definition Discovery and Metadata Architecture&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;http://www.w3.org/2001/tag/2011/06/06-agenda#metadata&quot;&gt;afternoon session on Tuesday&lt;/a&gt; was spent on &lt;a href=&quot;http://mumble.net/~jar/&quot;&gt;Jonathan Rees&amp;#8217;s&lt;/a&gt; work on the &lt;a href=&quot;http://www.w3.org/wiki/AwwswHome&quot;&gt;Architecture of the World Wide Semantic Web&lt;/a&gt;, which covers, amongst other things, what people in semantic web circles call &lt;a href=&quot;http://www.w3.org/wiki/HttpRange14Webography&quot;&gt;httpRange-14&lt;/a&gt;. At core, this is about the kinds of URIs we can use to refer to real-world things, what the response to HTTP requests on those URIs should be, and how we find out information about these resources.&lt;/p&gt;

&lt;p&gt;Jonathan has put together a document called &lt;a href=&quot;http://www.w3.org/2001/tag/awwsw/issue57/20110531/&quot;&gt;Providing and discovering definitions of URIs&lt;/a&gt; which covers the various ways that have been suggested over time, including the 303 method that was &lt;a href=&quot;http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039&quot;&gt;recommended by the TAG in 2005&lt;/a&gt; and methods that have been suggested by various people since that time.&lt;/p&gt;

&lt;p&gt;It&amp;#8217;s clear that the 303 method has lots of practical shortcomings for people deploying linked data, and isn&amp;#8217;t the way in which URIs are commonly used by Facebook and schema.org, who don&amp;#8217;t currently care about using separate URIs for documents and the things those documents are about. We discussed these alongside concerns that we continue to support people who want to do things like describe the license or provenance of a document (as well as the facts that it contains) and don&amp;#8217;t introduce anything that is incompatible with the ways in which people who have been following recommended practice are publishing their linked data. The general mood was that we need to support some kind of &amp;#8216;punning&amp;#8217;, whereby a single URI could be used to refer to both a document and a real-world thing, with different properties being assigned to different &amp;#8216;views&amp;#8217; of that resource.&lt;/p&gt;

&lt;p&gt;Jonathan is going to continue to work on the draft, incorporating some other possible approaches. It&amp;#8217;s a &lt;a href=&quot;http://lists.w3.org/Archives/Public/public-lod/2011Jun/0186.html&quot;&gt;very contentious topic within the linked data community&lt;/a&gt;. My opinion is while we need to provide some &amp;#8216;good practice&amp;#8217; guides for linked data publishers, we can&amp;#8217;t just stick to a theoretical ideal that experience has shown not to be practical. What I&amp;#8217;d hope is that the TAG can help to pull together the various arguments for and against different options, and document whatever approach the wider community supports.&lt;/p&gt;

&lt;h2&gt;Can publication of hyperlinks cause copyright infringment?&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;http://www.w3.org/2001/tag/2011/06/06-agenda.html#linkcopyright&quot;&gt;first session on Wednesday&lt;/a&gt; was another session that I led, discussing the &lt;a href=&quot;http://www.w3.org/2001/tag/doc/publishingAndLinkingOnTheWeb-2011-05-28&quot;&gt;Publishing and Linking on the Web draft&lt;/a&gt; that &lt;a href=&quot;http://torgo.com/blog/&quot;&gt;Dan Appelquist&lt;/a&gt; and I have been working on.&lt;/p&gt;

&lt;p&gt;The aim of this document is to explain the tensions between terms that are commonly used in legal documents such as &amp;#8220;possession&amp;#8221;, &amp;#8220;adaptation&amp;#8221; and &amp;#8220;distribution&amp;#8221; and the way that publication works on the web, in which multiple servers may have copies of the same document (because they cache copies to make the &amp;#8216;net go faster), automated agents may make changes to those documents (such as compressing or resizing documents, or merging Javascript) and people may refer others to those documents through linking.&lt;/p&gt;

&lt;p&gt;We&amp;#8217;re particularly keen to argue that linking to something is not the same thing as distributing it. The web&amp;#8217;s power arises through its links, so it&amp;#8217;s important that people are able to link to something without being worried about what happens when/if the domain they link to is taken over by something illegal.&lt;/p&gt;

&lt;p&gt;Dan and I are going to continue to work on this document in response to various suggestions around organisation and terminology, with a view to getting some &amp;#8216;friendly legal experts&amp;#8217; to look it over and then seeking wider review. The intention is for it to eventually become a Recommendation as this will give greater weight to it as a document for a legal audience.&lt;/p&gt;

&lt;h2&gt;API Minimisation and Client-Side Storage&lt;/h2&gt;

&lt;p&gt;There were then a couple of short sessions.&lt;/p&gt;

&lt;p&gt;Dan talked about &lt;a href=&quot;http://www.w3.org/2001/tag/2011/06/06-agenda.html#apis&quot;&gt;API Minimisation&lt;/a&gt;, which is the design principle that to increase privacy we should design APIs that enable people requesting information to say exactly what information they need, and return only that rather that everything known about a think. Dan&amp;#8217;s put together an &lt;a href=&quot;http://www.w3.org/2001/tag/doc/APIMinimization-20100605.html&quot;&gt;draft&lt;/a&gt; and should be calling for review for it soon.&lt;/p&gt;

&lt;p&gt;Ashok then led discussion on &lt;a href=&quot;http://www.w3.org/2001/tag/2011/06/06-agenda.html#webAppStorage&quot;&gt;client-side storage&lt;/a&gt; and we brainstormed around some of the architectural/design issues about which we might want to write if we were to put together a document. This work is at a very early stage.&lt;/p&gt;

&lt;h2&gt;TAG Priorities&lt;/h2&gt;

&lt;p&gt;After lunch, we had a &lt;a href=&quot;http://www.w3.org/2001/tag/2011/06/06-agenda#priorities&quot;&gt;session on TAG priorities&lt;/a&gt; where we discussed which of the various pieces of work that we&amp;#8217;re doing should receive the most attention and had a quick review of who is doing what within the TAG.&lt;/p&gt;

&lt;p&gt;Our basic problem is that a lot of this stuff feels quite urgent, and we want to be responsive, but with only 5-6 of us &amp;#8220;actively involved&amp;#8221; (which means 1 day/week) in drafting documents, and other TAG duties taking up our time, it feels like we have taken on too much work. Our focus for the next little while is going to be on responding to issues where our lack of response might either hold people up or cause longer term problems (for example the publication of contradictory mime type definitions), which means things like the document on publishing and linking on the web will need to bubble in the background rather than being the focus of activity.&lt;/p&gt;

&lt;h2&gt;HTML5 Last Call&lt;/h2&gt;

&lt;p&gt;Our &lt;a href=&quot;http://www.w3.org/2001/tag/2011/06/06-agenda.html#htmlreview&quot;&gt;final session&lt;/a&gt;, for which we were joined by &lt;a href=&quot;http://www.w3.org/People/LeHegaret/&quot;&gt;Philippe Le Hégaret&lt;/a&gt;, was on the HTML5 Last Call documents. The TAG has raised various issues over the course of HTML5 development and want to follow up on how those issues have been addressed in the documents. Our role means that we&amp;#8217;re responsible for making sure there&amp;#8217;s consistency with other specifications, and that there isn&amp;#8217;t anything that seems like it&amp;#8217;s going to cause problems in the long term.&lt;/p&gt;

&lt;p&gt;The part that we spent most discussion time on was the relationship between &lt;a href=&quot;http://www.w3.org/TR/2011/WD-microdata-20110525/&quot;&gt;Microdata&lt;/a&gt; and &lt;a href=&quot;http://www.w3.org/TR/2011/WD-rdfa-in-html-20110525/&quot;&gt;RDFa&lt;/a&gt;. We talked about the precedents for having two specifications that do very similar things but with different approaches, such as CSS and XSL, and how this isn&amp;#8217;t necessarily a bad thing so long as they don&amp;#8217;t contradict each other and people can move between them easily (because they have the same conceptual foundations).&lt;/p&gt;

&lt;p&gt;I&amp;#8217;m going to save my opinion on this topic for another post. Suffice it to say that microdata and RDFa as currently specified don&amp;#8217;t work well with each other but it&amp;#8217;s not at all clear what the best path forward is. The TAG decided to recommend that the W3C set up a Task Force to look at what the best way forward might be.&lt;/p&gt;

&lt;h2&gt;Final Words&lt;/h2&gt;

&lt;p&gt;If you want links to the minutes of the TAG F2F, they&amp;#8217;re available within the agenda or on separate pages for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://www.w3.org/2001/tag/2011/06/06-minutes&quot;&gt;Monday 6th June&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.w3.org/2001/tag/2011/06/07-minutes&quot;&gt;Tuesday 7th June&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.w3.org/2001/tag/2011/06/08-minutes&quot;&gt;Wednesday 8th June&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have anything to say on any of these topics, please send email to the &lt;a href=&quot;mailto:www-tag@w3.org&quot;&gt;TAG mailing list&lt;/a&gt;. Or you could comment here or &lt;a href=&quot;mailto:jeni@jenitennison.com&quot;&gt;email me directly&lt;/a&gt; if you like. Which leads me on to talking about what I&amp;#8217;d like to do in the TAG.&lt;/p&gt;

&lt;p&gt;One of the guidance notes for new members to the TAG says:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;TAG members are elected or appointed not to represent their individual member organizations, but the Web community as a whole. We try to take that responsibility very seriously.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I do take that responsibility seriously. Web architecture has to be a combination of practice and theory, balancing approaches that work right now with a desire to not break anything long term. I do practical work developing web applications with HTML, CSS, Javascript, XML, RDF, XSLT, XQuery and so on and so on every day, but I know I don&amp;#8217;t see all the difficult corners of the open web standard space: no one person can.&lt;/p&gt;

&lt;p&gt;I can listen though, so that&amp;#8217;s what I will try to do: listen, digest, reflect and act.&lt;/p&gt;

&lt;p&gt;But I have limited resources. Unlike most of the members of the TAG, I am not employed by a large organisation that pays me for time I take on the work that I do for the TAG. The W3C kindly paid for my flights to and from F2Fs, but not hotels or expenses. I wouldn&amp;#8217;t have taken this on if I wasn&amp;#8217;t prepared to shoulder the financial burden, but if there is anyone out there who might sponsor my participation, I&amp;#8217;d love to hear from you.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/158#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/44">html5</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/46">linked data</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/71">microdata</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/42">rdfa</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/73">tag</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/69">uris</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/12">web</category>
 <pubDate>Fri, 17 Jun 2011 10:44:12 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">158 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Lessons for Microdata from schema.org</title>
 <link>http://www.jenitennison.com/blog/node/156</link>
 <description>&lt;p&gt;There is (obviously, from the way my tweet stream, feed reader and email have filled up) lots to say at many levels about &lt;a href=&quot;http://schema.org/&quot;&gt;schema.org&lt;/a&gt;, a new collaboration between Google, Microsoft and Yahoo! that describes the next phase in search engines&amp;#8217; extraction of semantics from web pages. In this post I&amp;#8217;m going to focus on what we can learn from schema.org about the design of &lt;a href=&quot;http://www.w3.org/TR/microdata/&quot;&gt;microdata&lt;/a&gt; and how it might be improved.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;Digging into the details of schema.org there are several examples of places where its recommended method of marking up metadata directly contradicts the HTML5 specs. Given the number of internal contradictions within schema.org, I&amp;#8217;m assuming that these are mistakes that will be corrected as the material is reviewed and matures rather than deliberate forking of HTML5.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;em&gt;Note: What I say about HTML5 here is equally true &amp;#8212; at least at time of writing &amp;#8212; of the WHATWG version of HTML, which of course already diverges from HTML5.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;One of the inputs to the design of microdata was to look at the mistakes that people make and try to design something to address the cause of those errors, so it&amp;#8217;s interesting to apply that method to the errors made by schema.org. This doesn&amp;#8217;t mean changing specs so that erroneous markup is conformant, but it does mean providing facilities that enable people to more easily do things in a conformant way, removing the temptation of non-conformance and lowering the likelihood of future mistakes.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;em&gt;Note: I am certain that there would also have been errors had schema.org used RDFa or microformats, indeed I gather that they are common in the documentation of &lt;a href=&quot;http://www.google.com/support/webmasters/bin/answer.py?answer=99170&quot;&gt;Google&amp;#8217;s Rich Snippets&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;Use of &lt;code&gt;&amp;lt;time&amp;gt;&lt;/code&gt; element&lt;/h2&gt;

&lt;p&gt;The first example is one &lt;a href=&quot;http://tantek.com/2011/155/t5/schemaorg-html5-fork-smoke-openinghours-time-duration&quot;&gt;spotted by Tantek&lt;/a&gt;: the value of the &lt;code&gt;openingHours&lt;/code&gt; property of &lt;a href=&quot;http://schema.org/EventVenue&quot;&gt;EventVenue&lt;/a&gt; is described as:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;The opening hours for a business. Opening hours can be specified as a weekly time range, starting with days, then times per day. Multiple days can be listed with commas &amp;#8216;,&amp;#8217; separating each day. Day or time ranges are specified using a hyphen &amp;#8216;-&amp;#8216;.&lt;/p&gt;
  
  &lt;ul&gt;
  &lt;li&gt;Days are specified using the following two-letter combinations: &lt;code&gt;Mo&lt;/code&gt;, &lt;code&gt;Tu&lt;/code&gt;, &lt;code&gt;We&lt;/code&gt;, &lt;code&gt;Th&lt;/code&gt;, &lt;code&gt;Fr&lt;/code&gt;, &lt;code&gt;Sa&lt;/code&gt;, &lt;code&gt;Su&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Times are specified using 24:00 time. For example, 3pm is specified as &lt;code&gt;15:00&lt;/code&gt;.&lt;/li&gt;
  &lt;/ul&gt;
  
  &lt;p&gt;Here is an example: &lt;code&gt;&amp;lt;time itemprop=&quot;openingHours&quot; datetime=&quot;Tu,Th 16:00-20:00&quot;&amp;gt;Tuesdays and Thursdays 4-8pm&amp;lt;/time&amp;gt;&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A similar error involving the &lt;code&gt;&amp;lt;time&amp;gt;&lt;/code&gt; element can be found on the &lt;a href=&quot;http://schema.org/docs/gs.html#advanced_dates&quot;&gt;Getting Started page&lt;/a&gt; which has an example in which the &lt;code&gt;datetime&lt;/code&gt; attribute contains an ISO 8601 &lt;em&gt;duration&lt;/em&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Durations can be specified in an analogous way using the time tag with the datetime attribute. Durations are prefixed with the letter P (stands for &amp;#8220;period&amp;#8221;). Here&amp;#8217;s how you can specify a recipe cook time of 1 ½ hours:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;time itemprop=&quot;cookTime&quot; datetime=&quot;P1H30M&quot;&amp;gt;1 1/2 hrs&amp;lt;/time&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
  
  &lt;p&gt;H is used to designate the number of hours, and M is used to designate the number of minutes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;According to the HTML5 specification, the &lt;code&gt;datetime&lt;/code&gt; attribute holds a &lt;a href=&quot;http://www.w3.org/TR/html5/Overview.html#valid-date-or-time-string&quot;&gt;valid date or time string&lt;/a&gt; which is specified as using the normal ISO 8601 syntaxes for dates and times. &lt;a href=&quot;http://www.w3.org/TR/html5/Overview.html#conforming-html5-documents&quot;&gt;Conforming HTML5 document&lt;/a&gt; must not hold the syntax used in the above example. From what I can tell (please correct me if I have this wrong), &lt;a href=&quot;http://www.w3.org/TR/html5/Overview.html#data-mining&quot;&gt;conforming data-mining tools&lt;/a&gt; &lt;em&gt;can&lt;/em&gt; process this syntax because they use the &lt;a href=&quot;http://www.w3.org/TR/microdata/#values&quot;&gt;value of the &lt;code&gt;datetime&lt;/code&gt; content attribute&lt;/a&gt;, rather than the value of the &lt;code&gt;dateTime&lt;/code&gt; IDL attribute to provide the property&amp;#8217;s value, but from an authoring perspective, no one should be encouraging people to create non-conformant HTML5.&lt;/p&gt;

&lt;p&gt;In fact, given that values from microdata are never typed, it&amp;#8217;s not clear why these examples use the &lt;code&gt;&amp;lt;time&amp;gt;&lt;/code&gt; element at all. The conformant way to provide the data would be to use a separate &lt;code&gt;&amp;lt;meta&amp;gt;&lt;/code&gt; element to hold the data:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;meta itemprop=&quot;openingHours&quot; content=&quot;Tu,Th 16:00-20:00&quot;&amp;gt;
Tuesdays and Thursdays 4-8pm
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;But I can understand the itch to use the &lt;code&gt;&amp;lt;time&amp;gt;&lt;/code&gt; element here; the &lt;code&gt;&amp;lt;meta&amp;gt;&lt;/code&gt; element above is &lt;em&gt;before&lt;/em&gt;, rather than &lt;em&gt;around&lt;/em&gt;, the textual content of the page which it reflects, whereas with the &lt;code&gt;&amp;lt;time&amp;gt;&lt;/code&gt; element it is more obviously an explicit machine-readable version of that content. In microformats, the pattern is to use a &lt;code&gt;title&lt;/code&gt; attribute:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;span class=&quot;openingHours&quot; title=&quot;Tu,Th 16:00-20:00&quot;&amp;gt;
  Tuesdays and Thursdays 4-8pm
&amp;lt;/span&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and in RDFa a &lt;code&gt;content&lt;/code&gt; attribute:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;span property=&quot;openingHours&quot; content=&quot;Tu,Th 16:00-20:00&quot;&amp;gt;
  Tuesdays and Thursdays 4-8pm
&amp;lt;/span&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So maybe this error is an indication that microdata needs an &lt;code&gt;itemvalue&lt;/code&gt; attribute for those cases where the human-readable content can be expressed in a more formal machine-readable microsyntax, with special handling with the &lt;code&gt;&amp;lt;time&amp;gt;&lt;/code&gt; element for the case when the value is a date/time and &lt;code&gt;href&lt;/code&gt;/&lt;code&gt;src&lt;/code&gt; when the value is a URI.&lt;/p&gt;

&lt;h2&gt;String parsing of URIs&lt;/h2&gt;

&lt;p&gt;The second example of non-conformance with HTML5 in schema.org is the method by which they &lt;a href=&quot;http://schema.org/docs/extension.html&quot;&gt;support extensibility&lt;/a&gt;. Schema.org provides a set of types for the things described by web pages. Naturally, this set does not cover everything, but search engines still want to be able to use the metadata within the page about a &lt;code&gt;Person&lt;/code&gt; even when that &lt;code&gt;Person&lt;/code&gt; is described as a &lt;code&gt;Minister&lt;/code&gt; (say).&lt;/p&gt;

&lt;p&gt;So schema.org says that to extend their type hierarchy, you simply append the name of the new type after a &lt;code&gt;/&lt;/code&gt; at the end of the URI for the parent type. In this example, a &lt;code&gt;Minister&lt;/code&gt; should be given the type &lt;code&gt;http://schema.org/Person/Minister&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;My guess (as I can&amp;#8217;t see any other way in which they&amp;#8217;d do it) is that the search engines intend to use string processing on the type URI in order to work out whether it&amp;#8217;s a subtype of a known type (ie, does the URI start with the string &lt;code&gt;http://schema.org/Person&lt;/code&gt;? If it does, it&amp;#8217;s a Person of some kind).&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;http://www.w3.org/TR/2011/WD-microdata-20110525/#items&quot;&gt;microdata specification&lt;/a&gt; states that:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;The item type must be a type defined in an &lt;a href=&quot;http://www.w3.org/TR/html5/Overview.html#other-applicable-specifications&quot;&gt;applicable specification&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;and that:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Item types are opaque identifiers, and user agents must not dereference unknown item types, or otherwise deconstruct them, in order to determine how to process items that use them.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Treating URIs as opaque identifiers, and forbidding inferring semantic meaning through string processing, is a fairly fundamental web architectural principle as well as fitting with microdata&amp;#8217;s constraint that types are specified somewhere.&lt;/p&gt;

&lt;p&gt;Perhaps the gloss is that schema.org is an applicable specification that states that any URI that looks like &lt;code&gt;http://schema.org/{known-type}/{extension-type}&lt;/code&gt; is a type that is defined in schema.org. But I think what&amp;#8217;s actually happening is that schema.org wants to grow their vocabulary organically, responding to the data that the search engines find on the web. They recognise that people will want to use their own vocabularies for their own purposes (for example to provide data for scripts, reusers, or for browsers and other agents that aren&amp;#8217;t search engines) but want to continue to be able to understand the semantics of that data.&lt;/p&gt;

&lt;p&gt;Schema.org is constrained in the mechanisms that it could use to recognise these new types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;they can&amp;#8217;t resolve the type URI and expect metadata at the end of it to indicate the parent schema.org type, because the HTML5 microdata specification forbids resolving unrecognised item type URIs; even if they were allowed to do so, working out inheritance by declarative mechanisms that involve resolving URIs and interpreting the results is computationally expensive compared to string munging; for search engines that expect to do this with billions of web pages, this may be a significant processing burden&lt;/li&gt;
&lt;li&gt;they can&amp;#8217;t tell people to use the schema.org type in addition to their own type because the HTML5 microdata specification doesn&amp;#8217;t allow an item to have more than one type; if it did, they could encourage people to use the schema.org type as well as their more specific one, so rather than &lt;code&gt;itemtype=&quot;http://schema.org/Person/Minister&quot;&lt;/code&gt;, enable publishers to do &lt;code&gt;itemtype=&quot;http://schema.org/Person http://reference.data.gov.uk/def/central-government/Minister&quot;&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I suspect that having multiple types is currently disallowed in microdata because the semantics of the (non-URI) properties of an item are based on its type, but I think that microdata could handle multiple types for an item if instead of saying:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;If the item is a typed item: [the property token must be] a defined property name allowed in this situation according to the specification that defines the relevant type for the item&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;it said:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;If the item is a typed item: [the property token must be] a defined property name allowed in this situation according to any of the specifications that define the relevant types for the item; if the property is defined for more than one type, these definitions must be identical&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It&amp;#8217;s worth noting that, unlike &lt;code&gt;itemtype&lt;/code&gt;, the &lt;code&gt;itemprop&lt;/code&gt; attribute &lt;em&gt;can&lt;/em&gt; take multiple values which, along with its ability to take URIs, provides for an easier inheritance mechanism. However, schema.org again recommends a string-based approach for extended properties; maybe that&amp;#8217;s for consistency, or perhaps the real reason is that they want to become a centralised repository for acceptable microdata on the web.&lt;/p&gt;

&lt;h2&gt;Summary&lt;/h2&gt;

&lt;p&gt;Whatever method publishers use, creating structured machine-processable data is hard and embedding it in a page is even harder. It is a layer that necessarily sits above or alongside the content of a page, invisible to people simply looking at the page in a browser and therefore difficult to get right without additional tooling.&lt;/p&gt;

&lt;p&gt;Microdata specifically aims to make this easy to do; schema.org demonstrates some of the ways in which it doesn&amp;#8217;t &lt;em&gt;quite&lt;/em&gt; meet that requirement. Perhaps the search engines behind the effort haven&amp;#8217;t managed to (or bothered to) implement HTML5/microdata parsing correctly or perhaps the people writing the documentation thought they understood how HTML5/microdata works but actually didn&amp;#8217;t. &lt;/p&gt;

&lt;p&gt;Either way, the mistakes are worth learning from to improve the specs while they are not yet final. As I discussed above I think that means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;adding an &lt;code&gt;itemvalue&lt;/code&gt; attribute for machine-readable versions of content&lt;/li&gt;
&lt;li&gt;enabling &lt;code&gt;itemtype&lt;/code&gt; to take multiple values to support extensibility&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;though I suspect the latter point at least will be contentious among those who don&amp;#8217;t think decentralised extensibility is ever a desirable feature.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/156#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/44">html5</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/71">microdata</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/72">schema.org</category>
 <pubDate>Fri, 10 Jun 2011 20:27:38 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">156 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>The HTML5 DOM and RDFa</title>
 <link>http://www.jenitennison.com/blog/node/129</link>
 <description>&lt;p&gt;One of the fundamental disconnects between HTML5 and previous versions of HTML is the way in which you answer the question &amp;#8220;what is the structure of this page?&amp;#8221;. Things that make use of that structure, such as RDFa, need to take this into account.&lt;/p&gt;

&lt;p&gt;An example is the document:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;html&amp;gt;
  &amp;lt;head&amp;gt;&amp;lt;title&amp;gt;HTML example&amp;lt;/title&amp;gt;&amp;lt;/head&amp;gt;
  &amp;lt;body&amp;gt;
    &amp;lt;table&amp;gt;
      &amp;lt;span&amp;gt;Example title&amp;lt;/span&amp;gt;
      &amp;lt;tr&amp;gt;&amp;lt;td&amp;gt;Example table&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;
    &amp;lt;/table&amp;gt;
  &amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;There are two different ways in which you might interpret the structure of this document. First, you might view the structure to be as it is written, with the &lt;code&gt;&amp;lt;span&amp;gt;&lt;/code&gt; element as a child of the &lt;code&gt;&amp;lt;table&amp;gt;&lt;/code&gt; element and therefore a tree that looks like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;+- html
   +- head
   | +- title
   +- body
      +- table
         +- span
         +- tr
            +- td
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Second, you might view the structure of the page to be the DOM as it is constructed by an HTML5 processor, which will move the &lt;code&gt;&amp;lt;span&amp;gt;&lt;/code&gt; out from the table due to &lt;a href=&quot;http://dev.w3.org/html5/spec/Overview.html#foster-parenting&quot;&gt;foster parenting&lt;/a&gt;, giving the result:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;+- html
   +- head
   | +- title
   +- body
      +- span
      +- table
         +- tr
            +- td
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Which you view it as doesn&amp;#8217;t really matter at this point, but it does when you start to introduce markup that infers information based on the structure of the page, such as RDFa. Let me introduce some RDFa markup to the document:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;html xmlns:dc=&quot;http://purl.org/dc/elements/1.1/&quot;&amp;gt;
  &amp;lt;head&amp;gt;&amp;lt;title&amp;gt;HTML+RDFa example&amp;lt;/title&amp;gt;&amp;lt;/head&amp;gt;
  &amp;lt;body&amp;gt;
    &amp;lt;table about=&quot;http://example.com&quot;&amp;gt;
      &amp;lt;span property=&quot;dc:title&quot;&amp;gt;Example title&amp;lt;/span&amp;gt;
      &amp;lt;tr&amp;gt;&amp;lt;td&amp;gt;Example table&amp;lt;/td&amp;gt;&amp;lt;/tr&amp;gt;
    &amp;lt;/table&amp;gt;
  &amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now, if you view the structure to be as written, the &lt;code&gt;&amp;lt;span&amp;gt;&lt;/code&gt; element is within the &lt;code&gt;&amp;lt;table&amp;gt;&lt;/code&gt; element, and is therefore viewed as talking about whatever it is that the &lt;code&gt;&amp;lt;table&amp;gt;&lt;/code&gt; element is talking about, namely &lt;code&gt;http://example.com&lt;/code&gt;. So the RDF that you will glean from this page is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://example.com&amp;gt; dc:title &quot;Example title&quot;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;On the other hand, if you view the structure to be that constructed by an HTML5 processor, the &lt;code&gt;&amp;lt;span&amp;gt;&lt;/code&gt; element is not within the &lt;code&gt;&amp;lt;table&amp;gt;&lt;/code&gt; element, and is therefore viewed as talking about whatever the document is talking about, namely the document itself. So the RDF that you will glean from the page is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;&amp;gt; dc:title &quot;Example title&quot;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This isn&amp;#8217;t exactly a new problem. There has always been the possibility of Javascript embedded within a page changing the page by moving or inserting elements, making the page that a non-browser sees fundamentally different from the one that a browser sees. This has been used by SEO people and spam merchants to get search engines to direct people to pages which mutated into something different when they were actually visited by a browser. And this eventually lead to those people who cared about interpreting meaning from the structure of pages (ie search engines) to at least go some way towards evaluating the Javascript within the page in order to &amp;#8220;see&amp;#8221; the page as a human would.&lt;/p&gt;

&lt;p&gt;So it&amp;#8217;s not a new problem, but it&amp;#8217;s still a problem.&lt;/p&gt;

&lt;p&gt;For those people trying to define how RDFa is interpreted in HTML5, there are several unpleasant alternatives:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Define RDFa as operating over an HTML5 DOM. This would make things easy for Javascript implementations in as much as they can rely on being used with HTML5 DOMs, ie in HTML5 browsers. But it raises the implementation burden for other implementations, such as those based on XSLT or a simple tidy-then-interpret-as-XML approach: essentially every implementation will need to include an HTML5 parsing library.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Define RDFa as operating over a DOM, but leave the creation of that DOM as implementation-defined. This effectively passes the buck (&amp;#8220;it&amp;#8217;s not &lt;em&gt;our&lt;/em&gt; fault that HTML5 processors will construct a different DOM from XML processors&amp;#8221;) but makes it hard to test implementation conformance and for authors to know exactly how their page will be interpreted. For example, an implementation that constructed a DOM with randomly rearranged elements would be entirely conformant despite producing completely different triples from one that took the elements in the original order.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Define RDFa as operating over a serialisation, with precise wording that describes how that serialisation is mapped into a tree structure that is walked to process the RDFa within the page. This approach will prevent implementations that use other methods of constructing trees from being conformant; depending on how it&amp;#8217;s defined that might include XSLT implementations and/or Javascript implementations and/or implementations that use standard (XML-based) libraries for parsing the documents.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Personally I lean towards the second of these: defining RDFa as operating over a DOM but placing no constraints on how that DOM is created. It leaves open the possibility of Javascript implementations to work on the DOM they see, which may be radically different from the one seen by other processors due both to HTML5 reordering of elements and the dynamic modification of the page through Javascript. (Several people use rdfQuery to do before-and-after parsing of RDFa within a page, turning browsers into semantic editors, for example.) But it also lets conformant implementations be constructed in other ways for implementation ease or user needs, supporting the use of XSLT through GRDDL and the static crawling of content with minimal processing.&lt;/p&gt;

&lt;p&gt;Perhaps the set of permissible methods of DOM creation could be listed to prevent completely random processing, but I expect that it will be effectively limited through social and technological pressures. Implementations that build DOMs in random ways aren&amp;#8217;t going to be as useful (to their users) as those that build them in expected ways; it&amp;#8217;s also going to be far easier to implement RDFa processors using standard parsing libraries.&lt;/p&gt;

&lt;p&gt;The approach is not without its downsides, of course. XSLT is similarly defined as operating over a tree model, with the question of how that tree model is constructed left to the implementation. Most processors decided to construct the tree using standard XML parsing, but famously MSXML would strip certain whitespace-only text nodes from the tree (unless you specified a parsing flag telling it not to), leading to incompatibilities and user confusion.&lt;/p&gt;

&lt;p&gt;My guess is that the same kind of thing will happen with RDFa processors. It could very well be the case that an author will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;check their RDFa in an RDFa validator that constructs a static HTML5 DOM, revealing one set of triples&lt;/li&gt;
&lt;li&gt;be confused when they then use a Javascript RDFa library within their page and get a slightly different set of triples because of some Javascript embedded in the page that changes its structure&lt;/li&gt;
&lt;li&gt;be further confused when a search engine that uses a tidy-and-interpret-as-XML approach gleans yet another slightly different set of triples and displays it in the search result&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So if this approach were chosen, I would expect wording in the specification that required implementations to state the method they used to create the DOM (ie it should be implementation-&lt;em&gt;defined&lt;/em&gt; rather than implementation-&lt;em&gt;dependent&lt;/em&gt;) and that warned authors of the most likely causes of differences between implementations (such as tree modifications performed by HTML5 processors and Javascript within the page). I&amp;#8217;d also like to see tools that take an HTML page and indicate the triples that it generates under different common DOM construction methods, so that authors can see the variation in how their documents might be interpreted.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/129#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/44">html5</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/42">rdfa</category>
 <pubDate>Thu, 24 Sep 2009 08:05:03 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">129 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>On Resolvability</title>
 <link>http://www.jenitennison.com/blog/node/126</link>
 <description>&lt;p&gt;In my &lt;a href=&quot;http://www.jenitennison.com/blog/node/124&quot;&gt;last post about RDFa and HTML&lt;/a&gt; I talked about how one of the gulfs that separates the HTML5 and Semantic Web communities is the attitude to the resolvability of property (and class) URIs.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;m currently experimenting with introducing the ability to automatically locate information about properties and other resources that are referenced within triples to &lt;a href=&quot;http://code.google.com/p/rdfquery&quot;&gt;rdfQuery&lt;/a&gt;, so now is a good time, as far as I&amp;#8217;m concerned, to look more closely at what the ability to resolve properties gives you and how to avoid problems if the property URI is (temporarily or permanently) unresolvable or resolvable to something new.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;m going to attempt to answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How do or might applications use property and class URIs?&lt;/li&gt;
&lt;li&gt;How can data and ontology publishers assist them in doing so?&lt;/li&gt;
&lt;li&gt;What should frameworks (such as rdfQuery) do to help application developers?&lt;/li&gt;
&lt;/ul&gt;

&lt;!--break--&gt;

&lt;h2&gt;Application Developers&lt;/h2&gt;

&lt;p&gt;We can divide applications using online data into three general categories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;data-specific applications&lt;/strong&gt; are constructed around particular data sets that are known to the developer of the application; the &lt;a href=&quot;http://www.jenitennison.com/blog/node/125&quot;&gt;visualisations&lt;/a&gt; &lt;a href=&quot;http://www.jenitennison.com/blog/node/123&quot;&gt;that&lt;/a&gt; &lt;a href=&quot;http://www.jenitennison.com/blog/node/119&quot;&gt;I&amp;#8217;ve been&lt;/a&gt; &lt;a href=&quot;http://www.jenitennison.com/blog/node/113&quot;&gt;doing&lt;/a&gt; are examples of data-specific applications&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;vocabulary-specific applications&lt;/strong&gt; are constructed around particular vocabularies, wherever the data might be found that uses them; &lt;a href=&quot;http://code.google.com/apis/socialgraph/&quot;&gt;Google&amp;#8217;s Social Graph API&lt;/a&gt; and &lt;a href=&quot;http://developer.search.yahoo.com/start&quot;&gt;Yahoo! SearchMonkey&lt;/a&gt; are examples&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;generic applications&lt;/strong&gt; are constructed to navigate through any RDF that they find; &lt;a href=&quot;http://www.w3.org/2005/ajar/tab&quot;&gt;Tabulator&lt;/a&gt; is one example&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most mashups are data-specific applications. When you, as a developer, create a data-specific application, the thing that you need to know most of all is what information the dataset contains. Part of that is working out the meaning of a particular property (or class). What the data publisher needs to do is make sure that the data they publish is documented.&lt;/p&gt;

&lt;p&gt;There are three ways of locating the documentation about a particular property or class:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;looking through the general documentation the data publisher has provided&lt;/li&gt;
&lt;li&gt;resolving the URI of the class or property&lt;/li&gt;
&lt;li&gt;searching&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a developer, it&amp;#8217;s very useful to find out about a property by bunging its URI into a browser and hitting return. Want to know what &lt;code&gt;http://xmlns.com/foaf/0.1/name&lt;/code&gt; means? Look up that URI. By comparison, if you want to know what a &lt;code&gt;vevent&lt;/code&gt; is, your best bet is a search engine. In the results I get from Google, the microformat definition of &lt;code&gt;vevent&lt;/code&gt; is currently second on the list. (The Microdata definition of &lt;code&gt;vevent&lt;/code&gt; doesn&amp;#8217;t even feature.) &lt;strong&gt;Even if a property isn&amp;#8217;t available at its URI, its URI gives a more unique identifier to search for than an short term&lt;/strong&gt;: you&amp;#8217;re more likely to find relevant information if you search for &lt;code&gt;http://xmlns.com/foaf/0.1/name&lt;/code&gt; than if you search for &lt;code&gt;name&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;But there&amp;#8217;s no requirement for data-specific applications to use computer-readable information about properties or classes. If you know the data that&amp;#8217;s available in a dataset, you can find out the semantics of the properties and classes it contains and hard-code those within your application. Most applications that reuse data are currently of this type, and it tends to be the only kind that non-Semantic Web people think about.&lt;/p&gt;

&lt;p&gt;Vocabulary-specific and generic applications will have some vocabularies built in but may also operate with unknown vocabularies. For example, an application that cares about FOAF profiles is almost certainly going to want to hard-code information about FOAF rather than download its schema every time it&amp;#8217;s used. &lt;/p&gt;

&lt;p&gt;There are three reasons for building-in knowledge about particular vocabularies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;some information about a vocabulary simply can&amp;#8217;t be represented in a schema or ontology; if you want special handling for particular properties, you&amp;#8217;re going to want to hard-code it&lt;/li&gt;
&lt;li&gt;downloading, parsing and interpreting a schema that you know you&amp;#8217;re going to need every time you run the application is really inefficient&lt;/li&gt;
&lt;li&gt;relying on the network to provide information about a vocabulary you know you&amp;#8217;re going to need makes your application fragile, especially if you do not have control over the publication of the schema yourself&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;em&gt;It&amp;#8217;s worth noting that applications increasingly do rely on the availability of networked resources in order to operate &amp;#8212; that&amp;#8217;s what &lt;a href=&quot;http://en.wikipedia.org/wiki/Cloud_computing&quot;&gt;cloud computing&lt;/a&gt; is all about &amp;#8212; but the resources are usually ones that the application developers have some kind of control over.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;It helps to use URIs for properties and classes for well-known vocabularies only in as much as it means that property and class names from different vocabularies won&amp;#8217;t clash&lt;/strong&gt;, so you don&amp;#8217;t have to worry about your application confusing &lt;code&gt;http://xmlns.com/foaf/0.1/title&lt;/code&gt; with &lt;code&gt;http://purl.org/dc/terms/title&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;On the other hand, if data uses an unknown vocabulary, vocabulary-specific and generic applications would like to get hold of extra information. This falls into three categories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;human-readable information&lt;/strong&gt; includes things that help with the display of data, such as human-readable labels for properties and classes; the expected datatype of the values of a property might also fall into this category&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;mapping information&lt;/strong&gt; helps applications map unknown properties and classes onto known ones; for example, if &lt;code&gt;http://people.example.org/ontology/fullName&lt;/code&gt; is defined as a sub-property of &lt;code&gt;http://xmlns.com/foaf/0.1/name&lt;/code&gt; then the application can use or display the value of &lt;code&gt;http://people.example.org/ontology/fullName&lt;/code&gt; in exactly the same way as the value of &lt;code&gt;http://xmlns.com/foaf/0.1/name&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;reasoning information&lt;/strong&gt; helps applications draw further conclusions about the resources for which there&amp;#8217;s information based on what they already know; for example, if &lt;code&gt;http://people.example.org/ontology/fullName&lt;/code&gt; has a domain of &lt;code&gt;http://xmlns.com/foaf/0.1/Person&lt;/code&gt; then anything that has the property &lt;code&gt;http://people.example.org/ontology/fullName&lt;/code&gt; must be a &lt;code&gt;http://xmlns.com/foaf/0.1/Person&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are in descending order of priority: many applications will want to interact with the user in some way, in which case human-readable information is vital. Applications that have built-in knowledge about one or more vocabularies are likely to have special handling for those vocabularies, so being able to map unknown properties and classes into those known vocabularies will enhance the behaviour of the application, although it adds a bit of complexity in the implementation to do so. Further reasoning has the potential to increase the value of sparse data but again increases the complexity of implementation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Using URIs for classes and properties provides a mechanism for applications to get hold of this extra information about unknown vocabularies&lt;/strong&gt;. They might try four tactics, in order of priority:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;look at the data they already know&lt;/strong&gt;; the information they need about the unknown properties and classes may be included in the files they&amp;#8217;ve already accessed (including those containing data)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;look in an application-specific (possibly cloud-hosted) cache&lt;/strong&gt; of vocabularies that the application has already downloaded&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;resolve the URI&lt;/strong&gt; of the class or property by performing an HTTP GET (and add it to the application-specific cache)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;look in a general-purpose cache&lt;/strong&gt;, such as &lt;a href=&quot;http://www.archive.org/&quot;&gt;the Internet Archive&lt;/a&gt; or an ontology repository such as &lt;a href=&quot;http://swoogle.umbc.edu/&quot;&gt;Swoogle&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Robust applications will not break if they don&amp;#8217;t manage to locate the definition of a property or class. They can certainly continue to parse any data that they come across. To create a human-readable label, they might use the part of the URI after the last &lt;code&gt;#&lt;/code&gt; or &lt;code&gt;/&lt;/code&gt;. It&amp;#8217;s no loss (to the application) if they cannot perform other reasoning: they might display the data in some default way or simply ignore it.&lt;/p&gt;

&lt;p&gt;It&amp;#8217;s worth noting, because of the fear of &lt;a href=&quot;http://en.wikipedia.org/wiki/Denial-of-service_attack&quot;&gt;DDoS attacks&lt;/a&gt; that some people have, that the majority of applications won&amp;#8217;t need to actually &lt;code&gt;GET&lt;/code&gt; property or class URIs, either because they are data-specific applications or because they only work with vocabularies that are hard-coded into them. Applications that are good web citizens will avoid DDoS attacks on popular vocabularies by hard-coding knowledge about those vocabularies and/or maintaining a cache, either locally or in the cloud, of vocabularies that have already been resolved.&lt;/p&gt;

&lt;h2&gt;Publishers&lt;/h2&gt;

&lt;p&gt;With what I&amp;#8217;ve said above in mind, what can publishers do to help applications to understand the data that they provide?&lt;/p&gt;

&lt;p&gt;If a publisher is only concerned about data-specific, point-to-point mashups, all they &lt;em&gt;have&lt;/em&gt; to provide is the data itself. It will help developers if there is some documentation of the dataset and the properties and classes used within it. But data publishers who only want their data to be discoverable by &lt;em&gt;people&lt;/em&gt; can rely on human intelligence for locating information, and for them using URIs for properties and classes may seem like overkill.&lt;/p&gt;

&lt;p&gt;But in a linked data world, publishers should really support their data being discovered automatically via the links from other data. Here we&amp;#8217;re talking about making life easier for vocabulary-specific and generic applications to use the data that you provide.&lt;/p&gt;

&lt;p&gt;The vocabularies that you use within your data fall into three general categories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;well-known vocabularies&lt;/strong&gt; are vocabularies that are commonly enough used that vocabulary-specific and generic applications are likely to have them built-in; these vocabularies tend to be useful across domains, such as FOAF, which is useful whenever you want to express information about people or organisations&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;local vocabularies&lt;/strong&gt; are vocabularies that are specific to the dataset that you are publishing; you have as much control over their publication as you do over the publication of the data itself&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;reused vocabularies&lt;/strong&gt; are vocabularies that you are using that are owned by other people but that do not have the take-up of well-known vocabularies; these are typically domain-specific; one example is &lt;a href=&quot;http://www.metalex.eu/&quot;&gt;Metalex&lt;/a&gt;, which is a vocabulary about legislation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a data publisher, the first thing you can do is to &lt;strong&gt;use well-known vocabularies in your data wherever possible&lt;/strong&gt;, even if you also use local or reused vocabularies to express the same properties or classes.&lt;/p&gt;

&lt;p&gt;For example, say you have some data describing a cricket team and use &lt;code&gt;http://cricket.example.org/ontology#name&lt;/code&gt; for the name of a member of a team, and that you mean it to be a sub-property of &lt;code&gt;http://xmlns.com/foaf/0.1/name&lt;/code&gt; (which is itself a sub-property of &lt;code&gt;http://www.w3.org/2000/01/rdf-schema#label&lt;/code&gt;). If you &lt;em&gt;just&lt;/em&gt; publish the &lt;code&gt;http://cricket.example.org/ontology#name&lt;/code&gt; property then the only way that a generic application can know that &lt;code&gt;http://cricket.example.org/ontology#name&lt;/code&gt; can be used as a label for a resource (which is a person) is by attempting to resolve &lt;code&gt;http://cricket.example.org/ontology&lt;/code&gt; and reasoning based on what it finds. On the other hand, if you &lt;em&gt;also&lt;/em&gt; provide &lt;code&gt;http://xmlns.com/foaf/0.1/name&lt;/code&gt; and &lt;code&gt;http://www.w3.org/2000/01/rdf-schema#label&lt;/code&gt; properties, applications are no longer dependent on the network, nor on having the ability to reason, to use that information.&lt;/p&gt;

&lt;p&gt;You &lt;em&gt;could&lt;/em&gt; also provide mappings onto any reused vocabularies that you specialise, but this is less worthwhile given that vocabulary-specific and generic applications are unlikely to understand them either. &lt;/p&gt;

&lt;p&gt;The second thing you can do is to &lt;strong&gt;include information about the properties that you use within the data that you publish&lt;/strong&gt;. This isn&amp;#8217;t important for well-known vocabularies (because they&amp;#8217;re&amp;#8230; uh&amp;#8230; well-known) and it&amp;#8217;s only useful for local vocabularies if you&amp;#8217;re not publishing those vocabularies, because if someone can access your data, odds are they&amp;#8217;re able to access your local vocabulary&amp;#8217;s property URIs as well. But it is useful for reused vocabularies, where you can&amp;#8217;t guarantee access, in just the same way as it&amp;#8217;s useful to provide basic labelling information about any resources you reference.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;em&gt;If you&amp;#8217;re publishing your data embeddded within a web page, as well as marking up the &lt;strong&gt;data&lt;/strong&gt;, you can mark up the &lt;strong&gt;labels&lt;/strong&gt; that you use for those values, which more than likely appear as headings in a table or something similar.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you are publishing a schema or ontology that describes your properties and types, there are also things that you can do to help applications. The most important thing is to assist caches in their caching of the ontology, which will reduce the number of times that it needs to be accessed directly and help you avoid DDoS attacks: see &lt;a href=&quot;http://www.mnot.net/cache_docs/&quot;&gt;Mark Nottingham&amp;#8217;s Caching Tutorial&lt;/a&gt;. You can also reduce the number of hits on your server by using hash URIs for your property and class names and use standard load-balancing techniques to manage the traffic.&lt;/p&gt;

&lt;p&gt;If you&amp;#8217;re referring to reused vocabularies within your own, you can also embed information about the relevant properties and classes from those vocabularies within your own ontology. This can save applications an extra hop, and lessens the risk of the reused vocabulary disappearing (perhaps forever).&lt;/p&gt;

&lt;p&gt;If you want to help people who might reuse your ontology, you can make the process of copying it easier by publishing it as a single file, or broken up into segments that are likely to be reused individually. At a non-technical level, it&amp;#8217;s also a good idea to provide a announcement mailing list or a feed so that people who reuse your vocabulary can be kept up to date with any changes you make to it.&lt;/p&gt;

&lt;h2&gt;Framework Developers&lt;/h2&gt;

&lt;p&gt;Bearing all this in mind, what should I (and other framework developers) do to support the reusers of data? I think I need to make it easy for application developers to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;load in known ontologies from known locations&lt;/li&gt;
&lt;li&gt;hard-code relevant semantics in the script&lt;/li&gt;
&lt;li&gt;create catalogs that map known property or class names onto known locations of documents that contain details about them&lt;/li&gt;
&lt;li&gt;use caching proxies when accessing unknown vocabularies&lt;/li&gt;
&lt;li&gt;access vocabularies directly at the relevant URI&lt;/li&gt;
&lt;li&gt;fallback on archives when the URI cannot be found&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, I need to make it easy for people to use a range of strategies for getting hold of information about a property or class, aside from simply trying to access it at its URI. I think that means that it&amp;#8217;s better to provide a lightweight solution, giving developers the opportunity to be in control of which URIs get resolved rather than automatically downloading extra information from the URI that&amp;#8217;s actually used for the property or class. It also means I need to provide hooks in the code that they can use to trigger that resolution.&lt;/p&gt;

&lt;p&gt;It would also be useful, of course, for developers to be able to use information about properties and classes easily, in particular to reason with it. That kind of support is something I&amp;#8217;ve been working on for rdfQuery. It&amp;#8217;s not quite ready yet.&lt;/p&gt;

&lt;h2&gt;Conclusions&lt;/h2&gt;

&lt;p&gt;My (somewhat contentious) view is that we place too much emphasis on the resolvability of property and class names, and that this can put people off the idea of the Semantic Web. You can do useful things with data without resolving properties or classes. And for a large number of useful applications, being able to actually &lt;em&gt;reason&lt;/em&gt; over the data you get at the end of a property URI would have a high implementation cost without providing a great deal of functional benefit. &lt;/p&gt;

&lt;p&gt;Further, for data publishers, the requirement to enable the resolution of every property and class URI you use within your data just adds to the publishing burden, especially if you&amp;#8217;re made to feel it has to resolve to some kind of grand OWL ontology.&lt;/p&gt;

&lt;p&gt;There&amp;#8217;s a concept in psychology of the &lt;a href=&quot;http://en.wikipedia.org/wiki/Zone_of_proximal_development&quot;&gt;zone of proximal development&lt;/a&gt;. The idea is that if someone is operating at a particular level then as a teacher you should help them to achieve something &lt;em&gt;slightly&lt;/em&gt; above that level, rather than trying to get them to do everything straight away.&lt;/p&gt;

&lt;p&gt;The same is true here. We need to help publishers make the small steps that they can make, one at a time, to gradually get them to full Semantic Web goodness:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;publish a dataset in some kind of open format (CSV, XML etc) so that people can get hold of it&lt;/li&gt;
&lt;li&gt;publish the data with distinct URIs for distinct resources so that people can reference them&lt;/li&gt;
&lt;li&gt;publish the data in a machine-readable format so that people can easily reuse it&lt;/li&gt;
&lt;li&gt;publish the data in a way that can be interpreted as RDF, with URIs for properties and types, to avoid conflicts with other vocabularies and so that the data can be &amp;#8220;understood&amp;#8221; even when discovered automatically&lt;/li&gt;
&lt;li&gt;put some human-readable documentation at the end of the property/type URIs, so that developers can easily discover what your data&amp;#8217;s about&lt;/li&gt;
&lt;li&gt;embed machine-readable labels and descriptions for your properties/types within your data, so that applications can display it&lt;/li&gt;
&lt;li&gt;embed &lt;code&gt;rdfs:subPropertyOf&lt;/code&gt;/&lt;code&gt;rdfs:subClassOf&lt;/code&gt; mappings from your properties/types to well-known properties/types within your data, so that it can be displayed in custom ways&lt;/li&gt;
&lt;li&gt;put the machine-readable information about the properties/types at the end of the property/type URIs, so that you can update your vocabulary easily and so that other people can reuse it&lt;/li&gt;
&lt;li&gt;add other RDFS and OWL statements about the properties/types, so that reasoners can add value to your data&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The biggest leap, the one that requires the most persuasion and the most justification, is probably from simply publishing the data in a machine-readable format to using the RDF model with URIs for properties and types. But if you remove the cost of having to provide anything at the end of the URI and factor in the potential benefits you may reap in the future (as you step further up that ladder), the question becomes less &amp;#8220;why?&amp;#8221; and more &amp;#8220;why not?&amp;#8221;.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/126#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/44">html5</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/46">linked data</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/31">rdf</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/40">rdfQuery</category>
 <pubDate>Fri, 28 Aug 2009 22:02:16 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">126 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>HTML5/RDFa Arguments</title>
 <link>http://www.jenitennison.com/blog/node/124</link>
 <description>&lt;p&gt;When I came back from holiday, I caught up with the recent &lt;a href=&quot;http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009Aug/thread.html&quot;&gt;discussions around RDFa and HTML5&lt;/a&gt;. It&amp;#8217;s exhausting reading so many posts repetitively reiterating the positions of people who all have the best of intentions but fundamentally different priorities. And such a shame that so much energy is spent on fruitless discussion when it could be spent at the very least improving specifications, if not testing, implementing, experimenting or otherwise in some very minor way changing the world.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;The particular thread&amp;#8217;s subject was the use of prefixes, which are used to provide a shorthand for URIs, which are used to name properties such as &lt;code&gt;http://xmlns.com/foaf/0.1/firstName&lt;/code&gt;. It&amp;#8217;s unquestionable, really, that prefixes are a source of problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;with copy-and-paste, because it&amp;#8217;s easy to lose the bindings that are used to construct the full URI&lt;/li&gt;
&lt;li&gt;in general understandability, because people just don&amp;#8217;t get the idea that the prefix &lt;code&gt;foaf&lt;/code&gt; could be used for something other than &lt;code&gt;http://xmlns.com/foaf/0.1/&lt;/code&gt; or that &lt;code&gt;http://xmlns.com/foaf/0.1/&lt;/code&gt; might be bound to a prefix other than &lt;code&gt;foaf&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;when declared using namespace declarations, because of differences in interpretation between HTML and XHTML and for the general namespaces-in-content problems that we see in XML&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I think it&amp;#8217;s generally the case that the Semantic Web community, who are used to using syntaxes such as RDF/XML and Turtle which use prefixes all the time, judge these as being less disadvantageous than the members of the HTML5 community, who are much more in touch with and concerned about the &amp;#8220;common user&amp;#8221;.&lt;/p&gt;

&lt;p&gt;But underlying the arguments about the costs of prefixes are arguments about whether these disadvantages are important enough to stop&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;giving people shorthands for URIs and/or&lt;/li&gt;
&lt;li&gt;using URIs when naming properties and/or&lt;/li&gt;
&lt;li&gt;using RDF as the data model for data on the web&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Fundamentally, members of the Semantic Web (capital S, capital W) community take it as a matter of faith that the correct data model to use for expressing data about things both on and off the web is RDF. If there&amp;#8217;s anything that defines the Semantic Web community, it&amp;#8217;s that underlying assumption. (Well, probably: I&amp;#8217;m sure there are still some who hanker after Topic Maps.)&lt;/p&gt;

&lt;p&gt;Further, they think it is absolutely essential that if a property such as &amp;#8216;first name&amp;#8217; is used within a page, you can &lt;code&gt;GET&lt;/code&gt; a URI to find interesting information about the property. For example, with &lt;code&gt;http://xmlns.com/foaf/0.1/firstName&lt;/code&gt;, you will get&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;  &amp;lt;rdf:Property rdf:about=&quot;http://xmlns.com/foaf/0.1/firstName&quot; 
    vs:term_status=&quot;testing&quot; 
    rdfs:label=&quot;firstName&quot; 
    rdfs:comment=&quot;The first name of a person.&quot;&amp;gt;
    &amp;lt;rdf:type rdf:resource=&quot;http://www.w3.org/2002/07/owl#DatatypeProperty&quot;/&amp;gt;
    &amp;lt;rdfs:domain rdf:resource=&quot;http://xmlns.com/foaf/0.1/Person&quot;/&amp;gt;
    &amp;lt;rdfs:range rdf:resource=&quot;http://www.w3.org/2000/01/rdf-schema#Literal&quot;/&amp;gt;
    &amp;lt;rdfs:isDefinedBy rdf:resource=&quot;http://xmlns.com/foaf/0.1/&quot;/&amp;gt;
  &amp;lt;/rdf:Property&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;which among other things includes a human-readable label that you might use in a display of information about things that have the property, and a statement about the domain of the property, from which you can tell that anything that has a &lt;code&gt;http://xmlns.com/foaf/0.1/firstName&lt;/code&gt; property is a &lt;code&gt;http://xmlns.com/foaf/0.1/Person&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Given that you need URIs for properties, members of the Semantic Web community generally think that you need a way to shorten those URIs to make them palatable to people who might be embedding metadata in their pages. The cost of that is that you have to use prefixes, with the disadvantages that I&amp;#8217;ve outlined above, but it&amp;#8217;s a cost that&amp;#8217;s worth paying to gain resolvability and concision. And since it can be done in RDF/XML and Turtle and SPARQL and almost every other syntax in existence for expressing RDF, it seems unnatural not to be able to do it in the metadata embedded within webpages.&lt;/p&gt;

&lt;p&gt;At an equally fundamental level, members of the HTML5 community are unconvinced that RDF is a necessary or useful model to use for data. They do not see how it offers significant advantages over Javascript object structures, for example.&lt;/p&gt;

&lt;p&gt;Part of the reason for not being convinced of the utility of RDF is that members of the HTML5 community think it simply isn&amp;#8217;t important for properties to be named with resolvable URIs. After all, &lt;a href=&quot;http://microformats.org/&quot;&gt;microformats&lt;/a&gt; have illustrated that applications can derive meaning without having a machine-readable definition of the semantics of a property.&lt;/p&gt;

&lt;p&gt;I haven&amp;#8217;t heard them make these arguments, but they could also point out that there are vastly more mash-ups constructed with the developer having knowledge and understanding of the data that they are mashing up (and therefore requiring human-readable but not machine-readable definitions), than there are generic applications that could, or do, actually retrieve and reason with schemas or ontologies.&lt;/p&gt;

&lt;p&gt;Some within the HTML5 community even think it is &lt;em&gt;dangerous&lt;/em&gt; to use information from a property or class&amp;#8217;s URI, because if the metadata within an HTML page can only be accurately understood when coupled with information from an external document, applications that are built on being able to locate that external information are in real trouble if (when) that document disappears, either temporarily or permanently.&lt;/p&gt;

&lt;p&gt;For example, say that I have a page that contains the triple:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://www.jenitennison.com/#me&amp;gt; 
  &amp;lt;http://xmlns.com/foaf/0.1/firstName&amp;gt; &quot;Jenifer&quot;^^&amp;lt;http://www.w3.org/2001/XMLSchema#token&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If it is the case that an application that resolves that triple also (a) retrieves the information available about the property and (b) reasons using it and any information derived from it, then having that triple in my page entails:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://www.jenitennison.com/#me&amp;gt; 
  a &amp;lt;http://xmlns.com/foaf/0.1/Person&amp;gt; ,
  a &amp;lt;http://xmlns.com/foaf/0.1/Agent&amp;gt; ,
  a &amp;lt;http://www.w3.org/2000/10/swap/pim/contact#Person&amp;gt; ,
  a &amp;lt;http://www.w3.org/2000/10/swap/pim/contact#SocialEntity&amp;gt; ,
  a &amp;lt;http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;but this information can only be known by collecting the documents at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;http://xmlns.com/foaf/0.1/firstName&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;http://xmlns.com/foaf/0.1/Person&lt;/code&gt; (actually the same document as &lt;code&gt;http://xmlns.com/foaf/0.1/firstName&lt;/code&gt; but I can&amp;#8217;t know that)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;http://xmlns.com/foaf/0.1/Agent&lt;/code&gt; (as with &lt;code&gt;http://xmlns.com/foaf/0.1/Person&lt;/code&gt;, actually the same document as &lt;code&gt;http://xmlns.com/foaf/0.1/firstName&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;http://www.w3.org/2000/10/swap/pim/contact&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;http://www.w3.org/2003/01/geo/wgs84_pos&lt;/code&gt; (resolving this doesn&amp;#8217;t actually tell me anything extra, but I have to do it all the same to check)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In these circumstances, if, say, &lt;code&gt;http://xmlns.com/foaf/0.1/Person&lt;/code&gt; can&amp;#8217;t be located (my connection dips out for a second or whatever) then suddenly the page&amp;#8217;s metadata &lt;em&gt;means&lt;/em&gt; something different, which is surely a problem in a &lt;em&gt;Semantic&lt;/em&gt; Web.&lt;/p&gt;

&lt;p&gt;Putting aside all that, even if you did need URIs for properties, the HTML5 community feels that the costs in usability of using prefixes to shorten URIs are simply too high to justify the benefit of concision. Why not simply use the full URI within an attribute?&lt;/p&gt;

&lt;p&gt;In summary, you will not be able to persuade members of the HTML5 community that it&amp;#8217;s worth paying the usability cost of prefixes until you have persuaded them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;that RDF has significant advantages as a way of modelling data over and above Javascript object structures&lt;/li&gt;
&lt;li&gt;that the ability to resolve the URIs that are used to name properties is not dangerous, and can even be helpful&lt;/li&gt;
&lt;li&gt;that it is ugly and tedious to have to use full URIs when naming properties&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Really I&amp;#8217;m just trying to draw attention to the fact that the HTML5 community has very reasonable concerns about things much more fundamental than using prefix bindings. After redrafting this concluding section many times, the things that I want to say are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;so wouldn&amp;#8217;t things be better if we put as much effort into understanding each other as persuading each other (hah, what an idealist!)&lt;/li&gt;
&lt;li&gt;so we will make more progress in discussions if we focus on the underlying arguments&lt;/li&gt;
&lt;li&gt;so we need to talk in a balanced way about the advantages and disadvantages of RDF&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;or, in a more realistic frame of mind:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;so it&amp;#8217;s just not going to happen for HTML5&lt;/li&gt;
&lt;li&gt;so why not just stop arguing and use the spare time and energy &lt;em&gt;doing&lt;/em&gt;?&lt;/li&gt;
&lt;li&gt;so why not demonstrate RDF&amp;#8217;s power in real-world applications?&lt;/li&gt;
&lt;/ul&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/124#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/44">html5</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/31">rdf</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/42">rdfa</category>
 <pubDate>Fri, 21 Aug 2009 18:28:16 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">124 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>What You Can&#039;t Do with HTML5 Microdata</title>
 <link>http://www.jenitennison.com/blog/node/103</link>
 <description>&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; Fixed a couple of errors in the microdata code.&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;http://dev.w3.org/html5/spec/Overview.html#microdata&quot; title=&quot;W3C: HTML5: Microdata&quot;&gt;HTML5 microdata&lt;/a&gt; proposal has hit the web, just days before Google announced its &lt;a href=&quot;http://radar.oreilly.com/2009/05/google-announces-support-for-m.html&quot; title=&quot;O&#039;Reilly: Google Announces Support for Microformats and RDFa&quot;&gt;support for RDFa&lt;/a&gt; (or at least &lt;a href=&quot;http://iandavis.com/blog/2009/05/googles-rdfa-a-damp-squib&quot; title=&quot;Ian Davis: Google&#039;s RDFa a Damp Squib&quot;&gt;one vocabulary encoded using RDFa attributes&lt;/a&gt;). These are, indeed, &lt;a href=&quot;http://en.wikipedia.org/wiki/May_you_live_in_interesting_times&quot; title=&quot;Wikipedia: May you live in interesting times&quot;&gt;&amp;#8220;interesting times&amp;#8221;&lt;/a&gt; for the semantic web.&lt;/p&gt;

&lt;p&gt;Now, if you&amp;#8217;re one of those weirdos who want to embed RDF triples within your web pages, what you&amp;#8217;re going to care about is whether you can use microdata to do it. Those of us who have been using RDFa in anger, rather than in toy examples, know that it can be hard to map a particular set of RDF statements onto HTML content. I thought I&amp;#8217;d take a look to see just what it would be like to create particular RDF with the HTML5 microdata proposal.&lt;/p&gt;

&lt;!--break--&gt;

&lt;h2&gt;Basics&lt;/h2&gt;

&lt;p&gt;On the face of it, you can express any triple in microdata because a triple like this (Turtle):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://www.example.com/subject&amp;gt; &amp;lt;http://www.example.com/property&amp;gt; &amp;lt;http://www.example.com/object&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;can always, and anywhere, be expressed with (HTML5):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;span item&amp;gt;
  &amp;lt;link itemprop=&quot;about&quot; href=&quot;http://www.example.com/subject&quot;&amp;gt;
  &amp;lt;link itemprop=&quot;http://www.example.com/property&quot; href=&quot;http://www.example.com/object&quot;&amp;gt;
&amp;lt;/span&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;while a triple like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://www.example.com/subject&amp;gt; &amp;lt;http://www.example.com/otherProperty&amp;gt; &quot;value&quot; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;can be expressed with:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;span item&amp;gt;
  &amp;lt;link itemprop=&quot;about&quot; href=&quot;http://www.example.com/subject&quot;&amp;gt;
  &amp;lt;meta itemprop=&quot;http://www.example.com/otherProperty&quot; content=&quot;value&quot;&amp;gt;
&amp;lt;/span&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Of course having to use all those long, repetitive URIs is a bit of a pain and bloats out the markup, but we&amp;#8217;d never expect this to be hand-authored, right? Right? And what we really care about is that we can express the RDF.&lt;/p&gt;

&lt;p&gt;It&amp;#8217;s not just the URIs that are long-winded, by the way. RDFa manages to cram a lot into each element, whereas microdata usually requires separate elements. This is an example from the RDFa specification:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;img src=&quot;photo1.jpg&quot;
  rel=&quot;license&quot; resource=&quot;http://creativecommons.org/licenses/by/2.0/&quot;
  property=&quot;dc:creator&quot; content=&quot;Mark Birbeck&quot; /&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;which produces the triples:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;photo1.jpg&amp;gt; xhv:license &amp;lt;http://creativecommons.org/licenses/by/2.0/&amp;gt; .
&amp;lt;photo1.jpg&amp;gt; dc:creator &quot;Mark Birbeck&quot; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In HTML5, I think this has to be done with:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;span item&amp;gt;
  &amp;lt;img itemprop=&quot;about&quot; src=&quot;photo1.jpg&quot;&amp;gt;
  &amp;lt;link itemprop=&quot;http://www.w3.org/1999/xhtml/vocab#license&quot; 
        href=&quot;http://creativecommons.org/licenses/by/2.0/&quot;&amp;gt;
  &amp;lt;meta itemprop=&quot;http://purl.org/dc/elements/1.1/creator&quot; 
        content=&quot;Mark Birbeck&quot;&amp;gt;
&amp;lt;/span&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It&amp;#8217;s a bit more tedious, but also more obvious what&amp;#8217;s going on. Even after handling RDFa as much as I have, I still struggle to work out when, for example, an &lt;code&gt;href&lt;/code&gt; attribute is providing the object for a statement, and when the subject. And if you look at the &lt;a href=&quot;http://www.london-gazette.co.uk/&quot; title=&quot;London Gazette&quot;&gt;London Gazette&lt;/a&gt; RDFa, you&amp;#8217;ll notice many occasions where empty &lt;code&gt;&amp;lt;span&amp;gt;&lt;/code&gt; elements are used to provide the equivalent of the inline &lt;code&gt;&amp;lt;link&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;meta&amp;gt;&lt;/code&gt; elements shown above. (In fact, as far as I recall earlier drafts of RDFa allowed &lt;code&gt;&amp;lt;link&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;meta&amp;gt;&lt;/code&gt; elements to be used this too.)&lt;/p&gt;

&lt;p&gt;From what I can see, though, there are two things that the microdata proposal in its current form can&amp;#8217;t handle: datatyping and XML literals.&lt;/p&gt;

&lt;h2&gt;Datatypes&lt;/h2&gt;

&lt;p&gt;Datatypes are important in RDF. Values of properties are often not just strings, but dates, times, integers and so on. The microdata proposal mentions using the &lt;code&gt;&amp;lt;time&amp;gt;&lt;/code&gt; element to create values, and has this example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;div item&amp;gt;
 I was born on &amp;lt;time itemprop=&quot;birthday&quot; datetime=&quot;2009-05-10&quot;&amp;gt;May 10th 2009&amp;lt;/time&amp;gt;.
&amp;lt;/div&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The triple that you&amp;#8217;d want to create from this is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;&amp;gt; &amp;lt;http://www.w3.org/1999/xhtml/custom#birthday&amp;gt; &quot;2009-05-10&quot;^^xsd:date .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;which makes it plain that the value is a date. However, the definition of the mapping from microdata to RDF makes it clear that the triple that&amp;#8217;s created is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;&amp;gt; &amp;lt;http://www.w3.org/1999/xhtml/custom#birthday&amp;gt; &quot;2009-05-10&quot; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In other words, the value is a plain literal, not a date.&lt;/p&gt;

&lt;p&gt;In RDFa, the &lt;code&gt;datatype&lt;/code&gt; attribute is used to indicate the datatype of the value, so you can do:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;div xmlns:custom=&quot;http://www.w3.org/1999/xhtml/custom#&quot;&amp;gt;
  I was born on &amp;lt;span property=&quot;custom:birthday&quot; content=&quot;2009-05-10&quot; datatype=&quot;xsd:date&quot;&amp;gt;May 10th 2009&amp;lt;/span&amp;gt;
&amp;lt;/div&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It would be easy enough to say that the value of a &lt;code&gt;&amp;lt;time&amp;gt;&lt;/code&gt; element has the datatype &lt;code&gt;xsd:date&lt;/code&gt;, &lt;code&gt;xsd:time&lt;/code&gt; or &lt;code&gt;xsd:dateTime&lt;/code&gt; dependent on the syntax of its &lt;code&gt;datetime&lt;/code&gt; attribute, but there are other times that you want typed values. We&amp;#8217;ve used strings (as opposed to plain literals), integers and years. I wouldn&amp;#8217;t want to rule out the use of custom datatypes such as colours (and RDF permits these). The JSON mapping could, perhaps, use an appropriate object if there is one, and otherwise use just the string value without too much loss of power.&lt;/p&gt;

&lt;h2&gt;XML Literals&lt;/h2&gt;

&lt;p&gt;Arguably less important is the lack of support for XML literals, which are values that contain markup. The example in the RDFa spec is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;h2 property=&quot;dc:title&quot;&amp;gt;
  E = mc&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;: The Most Urgent Problem of Our Time
&amp;lt;/h2&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;which generates the triple (Turtle):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;&amp;gt; &amp;lt;http://purl.org/dc/elements/1.1/title&amp;gt; &quot;E = mc&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;: The Most Urgent Problem of Our Time&quot;^^rdf:XMLLiteral .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;RDFa allows you to force a value as an XML literal or a plain literal using the &lt;code&gt;datatype&lt;/code&gt; attribute. Otherwise, if the element has any element children then it&amp;#8217;s assumed to be an XML literal, and if not, a plain literal. I think the microdata proposal could adopt the same course of action. The JSON mapping could, perhaps, result in a value which is an array or some other container for a sequence of text and element nodes.&lt;/p&gt;

&lt;h2&gt;Final Thoughts&lt;/h2&gt;

&lt;p&gt;To my mind, the HTML5 microdata proposal is unacceptable in its current form because, unlike RDFa, it can&amp;#8217;t be used to represent all the statements that you might want to represent. If those issues were fixed, there would be pros and cons between it and RDFa. Microdata is more long-winded, but more explicit. RDFa is more arcane but doesn&amp;#8217;t swamp the content of the page to quite the same extent.&lt;/p&gt;

&lt;p&gt;Like a lot of people, I would have far rather seen a proposal which didn&amp;#8217;t reinvent the wheel, but how does the old saying go: &amp;#8220;The great thing about standards is that there are so many to choose from.&amp;#8221; If the microdata proposal stays the course, I only hope that we&amp;#8217;ll see consumers supporting both it and RDFa so that producers can choose which to use rather than being forced to embed both within their pages.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/103#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/44">html5</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/42">rdfa</category>
 <pubDate>Wed, 13 May 2009 00:57:46 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">103 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Evolving Standards</title>
 <link>http://www.jenitennison.com/blog/node/102</link>
 <description>&lt;p&gt;I&amp;#8217;ve been trying to finalise this post for a long time now, but today&amp;#8217;s publication of an &lt;a href=&quot;http://dev.w3.org/html5/spec/&quot; title=&quot;W3C Working Draft: HTML5&quot;&gt;HTML5&lt;/a&gt; draft that includes a new &lt;a href=&quot;http://dev.w3.org/html5/spec/#microdata&quot;&gt;microdata section&lt;/a&gt; makes it all the more relevant. The long and short of it is that I am less and less concerned about the huge mess that is the HTML5 standardisation process. On the one hand, it&amp;#8217;s a huge mess; on the other, it doesn&amp;#8217;t matter.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;I&amp;#8217;ve been thinking about it this way: the web is an ecosystem, a rich, complex and ever-changing environment in which HTML documents live alongside browsers, developers, Javascript, CSS, spiders and robots and so on and on.&lt;/p&gt;

&lt;p&gt;Every HTML document is like a creature living in this ecosystem. If you study the documents, you discover species and families and genera. Some may be like &lt;a href=&quot;http://en.wikipedia.org/wiki/Nautilina&quot; title=&quot;Wikipedia: Nautilus&quot;&gt;nautiluses&lt;/a&gt; &amp;#8212; throwbacks to earlier eras when browser wars raged and nested tables were acceptable ways of laying out a page, but all the documents that you can &lt;code&gt;GET&lt;/code&gt; are managing to survive in this modern environment in their own ways.&lt;/p&gt;

&lt;p&gt;The kinds of documents on the web have evolved over time, as new browsers, legal regulations, and development practices open up new ecological niches. The documents we see now on the web are a snapshot that reflects that evolutionary process and the current web ecology.&lt;/p&gt;

&lt;p&gt;What we know from the natural world is that the best way to survive in such an environment is to gradually adapt. In the natural world, the genetic make-up of an organism determines its survivability, and the mixing and mutation of genes allow organisms to react to changing conditions. The corollary to genes in the web world are elements and attributes. Some are evident across multiple organisms, essential parts of the way the web works (&lt;code&gt;&amp;lt;p&amp;gt;&lt;/code&gt;) while others are only present in the rarest of specimens (&lt;code&gt;&amp;lt;dir&amp;gt;&lt;/code&gt;). The elements and attributes that a page uses determine its utility to a large extent: who can view it, how its content can be processed.&lt;/p&gt;

&lt;p&gt;What is the role of standards bodies in this rich, complex and ever-changing environment? They are like geneticists: documenting and manipulating genes in the hope of improving the survivability (utility) of the documents they effect. And only the kind of mad, hubristic geneticists you see in B-movies would attempt to&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;make large genetic deviations from known working organisms&lt;/li&gt;
&lt;li&gt;commit specicide (there will always be an ecological niche for successful organisms)&lt;/li&gt;
&lt;li&gt;create organisms that cannot evolve (because those that cannot adapt will die)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Translated to the web, I predict:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Creating grand new designs for HTML will not work (witness XHTML2); instead, it will develop through a series of small, gradual adjustments. &lt;/li&gt;
&lt;li&gt;HTML5 cannot hope to eliminate the use of any particular markup (such as RDFa), if it&amp;#8217;s useful for even the smallest population. Introducing an alternative that satisfies those same requirements in a better way might cause some change but the legacy will never completely die.&lt;/li&gt;
&lt;li&gt;Even if HTML5 is not defined to be extensible, documents will be created that extend it to meet requirements we don&amp;#8217;t even know about yet.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So, long term, I&amp;#8217;m not worried. The web will evolve and there is space enough in it for us all. Short term, it&amp;#8217;s a pity that so many intelligent, thoughtful and engaged people are wasting so much energy on a conflict that can&amp;#8217;t be won.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/102#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/44">html5</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/42">rdfa</category>
 <pubDate>Sun, 10 May 2009 20:36:52 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">102 at http://www.jenitennison.com/blog</guid>
</item>
</channel>
</rss>

