One of the things that’s been niggling at the back of my mind since the schema.org announcement is how small a role search engine results plays in the wider data sharing efforts that I’m more familiar with in my work on legislation.gov.uk, and more generally how my day job experience differs from (what seem to be) more common experiences of development on the web. In this post, I’m going to talk about that experience, and about the particular problems that I see with the coexistence of microdata and RDFa as a result.
As you may know, I accepted an appointment to the W3C’s Technical Architecture Group earlier this year. Last week was the first face-to-face meeting that I attended, hosted in the Stata Center at MIT. As you can tell from the agenda (which was in fact revised as we went along) it was a packed three days.
What I intend to do here is to briefly report on the major areas that we discussed and give a tiny bit of my own personal take on them. In no way should any of what I write here be judged as revealing the official opinion of the TAG, it’s just me saying what I think, and I’m not going to go into anything in depth because they’re all incredibly gnarly and contentious topics and I’d not only be here all year but also end up in a tar pit.
There is (obviously, from the way my tweet stream, feed reader and email have filled up) lots to say at many levels about schema.org, a new collaboration between Google, Microsoft and Yahoo! that describes the next phase in search engines’ extraction of semantics from web pages. In this post I’m going to focus on what we can learn from schema.org about the design of microdata and how it might be improved.
One of the fundamental disconnects between HTML5 and previous versions of HTML is the way in which you answer the question “what is the structure of this page?”. Things that make use of that structure, such as RDFa, need to take this into account.
An example is the document:
<html>
<head><title>HTML example</title></head>
<body>
<table>
<span>Example title</span>
<tr><td>Example table</td></tr>
</table>
</body>
</html>
In my last post about RDFa and HTML I talked about how one of the gulfs that separates the HTML5 and Semantic Web communities is the attitude to the resolvability of property (and class) URIs.
I’m currently experimenting with introducing the ability to automatically locate information about properties and other resources that are referenced within triples to rdfQuery, so now is a good time, as far as I’m concerned, to look more closely at what the ability to resolve properties gives you and how to avoid problems if the property URI is (temporarily or permanently) unresolvable or resolvable to something new.
I’m going to attempt to answer:
When I came back from holiday, I caught up with the recent discussions around RDFa and HTML5. It’s exhausting reading so many posts repetitively reiterating the positions of people who all have the best of intentions but fundamentally different priorities. And such a shame that so much energy is spent on fruitless discussion when it could be spent at the very least improving specifications, if not testing, implementing, experimenting or otherwise in some very minor way changing the world.
Update: Fixed a couple of errors in the microdata code.
The HTML5 microdata proposal has hit the web, just days before Google announced its support for RDFa (or at least one vocabulary encoded using RDFa attributes). These are, indeed, “interesting times” for the semantic web.
Now, if you’re one of those weirdos who want to embed RDF triples within your web pages, what you’re going to care about is whether you can use microdata to do it. Those of us who have been using RDFa in anger, rather than in toy examples, know that it can be hard to map a particular set of RDF statements onto HTML content. I thought I’d take a look to see just what it would be like to create particular RDF with the HTML5 microdata proposal.
I’ve been trying to finalise this post for a long time now, but today’s publication of an HTML5 draft that includes a new microdata section makes it all the more relevant. The long and short of it is that I am less and less concerned about the huge mess that is the HTML5 standardisation process. On the one hand, it’s a huge mess; on the other, it doesn’t matter.