When I came back from holiday, I caught up with the recent discussions around RDFa and HTML5. It’s exhausting reading so many posts repetitively reiterating the positions of people who all have the best of intentions but fundamentally different priorities. And such a shame that so much energy is spent on fruitless discussion when it could be spent at the very least improving specifications, if not testing, implementing, experimenting or otherwise in some very minor way changing the world.
The particular thread’s subject was the use of prefixes, which are used to provide a shorthand for URIs, which are used to name properties such as http://xmlns.com/foaf/0.1/firstName. It’s unquestionable, really, that prefixes are a source of problems:
foaf could be used for something other than http://xmlns.com/foaf/0.1/ or that http://xmlns.com/foaf/0.1/ might be bound to a prefix other than foafI think it’s generally the case that the Semantic Web community, who are used to using syntaxes such as RDF/XML and Turtle which use prefixes all the time, judge these as being less disadvantageous than the members of the HTML5 community, who are much more in touch with and concerned about the “common user”.
But underlying the arguments about the costs of prefixes are arguments about whether these disadvantages are important enough to stop
Fundamentally, members of the Semantic Web (capital S, capital W) community take it as a matter of faith that the correct data model to use for expressing data about things both on and off the web is RDF. If there’s anything that defines the Semantic Web community, it’s that underlying assumption. (Well, probably: I’m sure there are still some who hanker after Topic Maps.)
Further, they think it is absolutely essential that if a property such as ‘first name’ is used within a page, you can GET a URI to find interesting information about the property. For example, with http://xmlns.com/foaf/0.1/firstName, you will get
<rdf:Property rdf:about="http://xmlns.com/foaf/0.1/firstName"
vs:term_status="testing"
rdfs:label="firstName"
rdfs:comment="The first name of a person.">
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#DatatypeProperty"/>
<rdfs:domain rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
<rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
<rdfs:isDefinedBy rdf:resource="http://xmlns.com/foaf/0.1/"/>
</rdf:Property>
which among other things includes a human-readable label that you might use in a display of information about things that have the property, and a statement about the domain of the property, from which you can tell that anything that has a http://xmlns.com/foaf/0.1/firstName property is a http://xmlns.com/foaf/0.1/Person.
Given that you need URIs for properties, members of the Semantic Web community generally think that you need a way to shorten those URIs to make them palatable to people who might be embedding metadata in their pages. The cost of that is that you have to use prefixes, with the disadvantages that I’ve outlined above, but it’s a cost that’s worth paying to gain resolvability and concision. And since it can be done in RDF/XML and Turtle and SPARQL and almost every other syntax in existence for expressing RDF, it seems unnatural not to be able to do it in the metadata embedded within webpages.
At an equally fundamental level, members of the HTML5 community are unconvinced that RDF is a necessary or useful model to use for data. They do not see how it offers significant advantages over Javascript object structures, for example.
Part of the reason for not being convinced of the utility of RDF is that members of the HTML5 community think it simply isn’t important for properties to be named with resolvable URIs. After all, microformats have illustrated that applications can derive meaning without having a machine-readable definition of the semantics of a property.
I haven’t heard them make these arguments, but they could also point out that there are vastly more mash-ups constructed with the developer having knowledge and understanding of the data that they are mashing up (and therefore requiring human-readable but not machine-readable definitions), than there are generic applications that could, or do, actually retrieve and reason with schemas or ontologies.
Some within the HTML5 community even think it is dangerous to use information from a property or class’s URI, because if the metadata within an HTML page can only be accurately understood when coupled with information from an external document, applications that are built on being able to locate that external information are in real trouble if (when) that document disappears, either temporarily or permanently.
For example, say that I have a page that contains the triple:
<http://www.jenitennison.com/#me>
<http://xmlns.com/foaf/0.1/firstName> "Jenifer"^^<http://www.w3.org/2001/XMLSchema#token> .
If it is the case that an application that resolves that triple also (a) retrieves the information available about the property and (b) reasons using it and any information derived from it, then having that triple in my page entails:
<http://www.jenitennison.com/#me>
a <http://xmlns.com/foaf/0.1/Person> ,
a <http://xmlns.com/foaf/0.1/Agent> ,
a <http://www.w3.org/2000/10/swap/pim/contact#Person> ,
a <http://www.w3.org/2000/10/swap/pim/contact#SocialEntity> ,
a <http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing> .
but this information can only be known by collecting the documents at:
http://xmlns.com/foaf/0.1/firstNamehttp://xmlns.com/foaf/0.1/Person (actually the same document as http://xmlns.com/foaf/0.1/firstName but I can’t know that)http://xmlns.com/foaf/0.1/Agent (as with http://xmlns.com/foaf/0.1/Person, actually the same document as http://xmlns.com/foaf/0.1/firstName)http://www.w3.org/2000/10/swap/pim/contacthttp://www.w3.org/2003/01/geo/wgs84_pos (resolving this doesn’t actually tell me anything extra, but I have to do it all the same to check)In these circumstances, if, say, http://xmlns.com/foaf/0.1/Person can’t be located (my connection dips out for a second or whatever) then suddenly the page’s metadata means something different, which is surely a problem in a Semantic Web.
Putting aside all that, even if you did need URIs for properties, the HTML5 community feels that the costs in usability of using prefixes to shorten URIs are simply too high to justify the benefit of concision. Why not simply use the full URI within an attribute?
In summary, you will not be able to persuade members of the HTML5 community that it’s worth paying the usability cost of prefixes until you have persuaded them:
Really I’m just trying to draw attention to the fact that the HTML5 community has very reasonable concerns about things much more fundamental than using prefix bindings. After redrafting this concluding section many times, the things that I want to say are:
or, in a more realistic frame of mind:
Comments
Re: HTML5/RDFa Arguments
One thing that I'm missing in this discussion is what, in the HTML5 model, actively prevents one from using the RDF model. Or, put another way, is it not possible to encode an RDF graph using the microdata syntax (I admit that I've thought about it very shortly, and not looked at the details, and also skipped on the discussion after a while...) and if not are there acceptably small and simple changes that would allow it?
I understand that a concern could then be that some pages would have microdata in an RDF model, while others would use an entirely different approach, which could confuse tools — but it's not as if there were no technical solutions to that issue. One could flag that a document uses the RDF-in-Microdata (let's call it RiM for short) approach with some form of meta element.
I know I'm handwaving lots here, mostly because it's hot here and I appreciate the air, but also because I'm wondering if we need to agree on a model. If we can share a syntax, and everyone can stuff their own model on top of it, does it not fly? There is some value in the minimal amount of synergy that comes with using the some syntax, in tool support, as well as in eased cross-pollination. And it's not as if yet another syntax for RDF was going to be one too many.
URLs for-ever
Semantic Web is supposed to be distributed, so while you can count on Google’s or FOAF’s URLs being reachable “forever”, that might not be true for some mom-n-pop-turtle-collection URLs. Would you be able to read metadata from page that is 20 years old?
It would be nice if archive.org or such could be used as a fallback for those URLs. Can it be done? (I mean in standard reliable manner, not just “implement that in your client if you want”).
Re: URLs for-ever
I think that’s the general idea behind purl.org, which then redirects to whatever organization is currently serving the required metadata document. Though it’s not the Internet Archive it’s the OCLC.
Re: HTML5/RDFa Arguments
Jeni,
A lot of calm, very well reasoned analysis here that demonstrates you have dove into the depths of many of these technologies. Thank you for adding a high quality critique to these discussions.
When I first proposed microformats in 2004[1] with Kevin Marks, it was in many ways a proof of concept that the neither the RDF data model nor (and especially) syntax(es) had significant advantages as a way of modeling (and especially *authoring*) data over and above not just Javascript object structures, but simple markup that perhaps mainstream to modern web designers were already broadly familiar with. And not just a lack of advantages, but that frankly, perhaps simpler, more real world solutions were possible.
Five years later and there is solid support for several microformats in numerous sites from as large as AOL, Google, Yahoo, Yelp, to a very long tail of smaller sites - at last count Yahoo Searchmonkey reported over 1.3 billion hCards on the web. What started as a proof of concept has become the dominant form of representing semantics (over and above what's built into HTML) on the web.
Thus, it is not only right to challenge the assumptions[2] from the perhaps more "official" RDF / Semantic Web community, but there is absolutely the possibility of developing better solutions that will be more useful and more easily/widely adopted on the web, in less time.
Note that microdata didn't happen overnight. Much of the design and simplicity of microdata is based on years of work on principles[3] deliberately designed to help guide and create simpler, more usable and accessible solutions. It happened so quickly because Ian Hickson designed microdata based upon years of work by others, as good (efficient) scientists and engineers do.
I think there are some very clever ideas in general purpose microdata, it has a lot of potential, and despite being a bit more abstract than microformats, still much simpler/easier to explain to web designers/developers/programmers than any form of RDF.
General microdata may be the right solution to the general purpose data representation problem that microformats specifically chose not to work on.
One bit of technical addition to your post regarding microformats, and URIs for vocabulary:
When I was working on XFN (effectively the first microformat) in 2003, I specifically designed the underlying technology of XHTML Meta Data Profiles (XMDP)[4] to *enable* all HTML rel/class additions/extensions (what would later become known as "microformats") to be *optionally* bound by URLs. And not just any URLs, but URLs that were compatible with and looked like RDF vocabulary URLs that ended with a "#" and term name. This was a deliberate design decision on my part, because I knew that there would be those who insisted on defining their terms with URLs. Here is an HTML markup fragment that demonstrates this with the above-mentioned hCard:
The terms "vcard" and "fn" which are used as class names are defined by the hCard XMDP profile[5] and the respective URLs for those terms are created by referencing the fragment IDs in that document:
http://microformats.org/profile/hcard#vcard
http://microformats.org/profile/hcard#fn
Thus providing the necessary URLs for any (meta)data system that stores/reasons about data based on vocabulary based on URLs (whether RDF or some other URL-based vocabulary store).
The key here is that this was made *optional*, not required, and thus does not have the potential "web of fragility" problem that you described for systems (like RDF) that *depend* on URI based vocabulary.
In practice, few people use XMDP profiles, well, except for at least the millions Wordpress blogs[6] out there - just view source on them and you'll find: profile="http://gmpg.org/xfn/11" near the top of the page.
But the point is that those that want to use XMDP profiles have the *option* of using them, while not burdening everyone else with doing so.
Tantek
[1] http://tantek.com/presentations/2004etech/realworldsemanticspres.html
[2] http://microformats.org/wiki/namespaces-considered-harmful
[3] http://microformats.org/wiki/principles
[4] http://gmpg.org/xmdp/description
[5] http://microformats.org/profile/hcard
[6] http://wordpress.com/
Re: HTML5/RDFa Arguments
I would like to contrast a contrarian voice against Tantek’s (perhaps understandable) unchecked positivity about microformats:
Gordon Luk: A Warning About the Real Cost of Microformats
This is a real-world perspective from an adopter who discovered that things aren’t necessarily roses and sunshine in practice.
The upshot is that microformats are no panacea. They are useful in some classes of problem and not so much in others – just like any other meta-/data model.
Re: HTML5/RDFa Arguments
On the optional URI for the profile, notwithstanding issues of politics, you could see some possibility for consensus in these discussions around syntax if you were, say, to throw out the reverse DNS from microdata, but allow something like Ben Adida is proposing here, which is the same basic idea that Tantek is explaining of assigning a URI-based scope to property tokens.
I also think the RDFa “about” attribute is pretty important, and don’t much like having to create a separate empty link element just to encode that information.
Re: HTML5/RDFa Arguments
To preface, I regret not paying much attention in the past to the development of this technology. Expect more input from me going forward.
It is of supreme importance to me to be able to guarantee my customers and colleagues a high standard of clarity and conceptual integrity in the work that I do.
I wish to be able to enrich messages originally intended for people by weaving into them a side channel of structured data intended for computers. I have a multitude of reasons for wanting to do this, and I envision the results of such a capability not only consumed by myself and my own interests, but by those whom I serve.
Over approximately five years I have learned about the Semantic Web on an incremental and practical basis. While I do not profess to be an expert, I recognize it has serious, albeit soluble flaws. I also recognize that the system and community have evolved to surmount them. In my opinion, RDF is a net-positive extant step in the evolution of the Web, and RDFa satisfies my aforementioned objectives in a fashion superior to others to which I have had the opportunity to examine.
I agree completely with Mr. Bray's assessment that designing a vocabulary is hard, as those members of the WHATWG must undoubtedly be experiencing. I submit that this was the most important lesson the journey through XML and RDF has taught us.
The design of a vocabulary directly reflects our concept of a problem space — it practically bares our belief structures. It also renders patent what we haven't given sufficient thought. It is because of this phenomenon that I find it important to support an abstract structure and symbol space that affords learning, mingling and the making of (hopefully private) mistakes.
Topic Maps
Heh, you make us Topic Mappers sound like old geezers who pine for the good ol’ days of Turbo Pascal. Ahem.
What is very confusing for my little brain to grasp is how come you can’t have both? Are there some underlying “big reasons” for why namespaces - which are dominant in any modern incarnation of XML - should be missing from the way forward? Did some of the HTML5 guys have a bad namespace experience when he was younger or something?
Now, I’m not a big fan of RDF and namespaces to begin with, but I think that stems from the amount of them you need to make any sense. But with a bit of clever namespace design, you can make elegance happen, which would not be possible in HTML5. And here I was thinking they were on the right track and making it look good and all.
But then, I live in my Topic Maps bubble, and I’m trying to understand what you two (HTML5 vs. RDF) guys are really worrying over, without much success. :)
Re: Topic Maps
Ian seems pretty clearly set against any kind of prefix mechanism, and xml namespaces in particular have problems in HTML. So this is the crux of the conflict: microdata allows properties to be denoted using a simple token, a full URI, or a reverse DNS, while RDFa requires only a CURIE prefix.
Re: Topic Maps
By “problems with namespaces in HTML”, do you mean that anything namespacy is ignored in HTML? Not sure it’s a problem with what we want to do as opposed to what we have done. :) I personally don’t understand why we can’t just use namespaces in HTML even if the standard don’t support it. For all intents and purposes of backwards compatibility, surely someone could put together HTML 4.02 for this purpose, and just leave the ignored namespaces back in the 90’s where it belongs?
Thanks for pointing out the CURIE part, though. I’m not actually a fan of CURIE, at least not in its current form where you can have valid empty CURIEs, and I miss the full-fledged template nature of, say, URI Templates.
I used to be into microformats myself, until I realized how useless they quickly become as your amount of namespaces and meta data grows.
Re: HTML5/RDFa Arguments
Except that you have forgotten the folks who are now looking at HTML5 and the microdata section, and are confused about why the W3C has promoted RDF all these years, and now, all of a sudden, introduces something completely new, that isn’t backwards compatible, that doesn’t work with any of the RDF or RDFa tools, and says, oh, well this is OK, too.
Microdata was hacked together in a few days by an author who was irritated that his judgement was being questioned. It has never been challenged, it has never really been tested, there are no tools for it, and frankly, none of the browser companies are paying any attention to it. Even the author has stated he doesn’t like it.
But it hangs on, causing confusion, leaving people unsure of where they are supposed to put their effort and energy. You may not see this as a bad situation, but I, and I think others, do.
Re: HTML5/RDFa Arguments
Hi Shelley,
No, I haven’t forgotten about Microdata.
When you’re designing something, the decisions you make are choices, based on your priorities, which are consequences of your experience, your beliefs, your hopes, your fears. Other people will have different priorities.
Microdata is a moderately reasonable design from someone whose priorities don’t revolve around the Semantic Web. It’s what you get if you only grudgingly accept the requirement for embedding data within web pages, discard the criteria that the data model must support (the entirety of) RDF and the criteria that properties must be resolvable, and you include criteria around ease of use for people who are pretty technologically illiterate. Those aren’t quite my priorities, but I certainly understand them.
Another choice you make make is whether to design based on your own priorities, or to try to take into account other people’s. A design based on a single coherent set of priorities will be more internally consistent than one based on a variety of conflicting priorities. That’s just the way it is. The downside is that it also causes conflict with other people with different priorities. But this is only a disadvantage if you don’t like conflict.
If you look at Ian Hickson’s life policies you’ll see:
So with HTML5, we have a situation where the Designer has decided not to design by consensus, and has very different priorities from the members of the Semantic Web community. The benefit is a certain level of consistency of approach with HTML5 (as much as that’s possible with the legacy constraint). The cost is that the Semantic Web community, indeed any community with different priorities from the Designer, can have very little input into that design, and this causes a whole load of conflict. But this actually only disadvantages those people who, unlike Ian Hickson, do not find debate enjoyable for the sake of it.
(Personally I find this my-opinion-above-all approach to the design process abhorrent. I would show some humility and sacrifice some consistency to avoid the divisions, conflict and confusion that this approach causes.)
For those of us not the Designer, I’d liken the process to trying to break down a brick wall with your head. You can just charge against the wall if you like, but it actually just hurts; I’ve had enough of these emotionally charged and emotionally draining technical discussions to know. Better to step back a minute, examine the wall, think about why it’s been built the way it has. You may find a weak point, or discover that the wall is so strongly built that there is no way your head will make any kind of impact.
Talking specifically about Microdata, I absolutely agree with you that Microdata hasn’t received the level of critical review that it needs to have if it’s going to go forward. I think it should be dropped from the specification if it doesn’t receive the level of feedback and implementation that other parts of the HTML5 spec have. I don’t find it a problem in and of itself, though: over time the web will decide which of the many methods of embedding data in a page gives the biggest advantage with the lowest cost (currently I’d say that embedded Javascript is winning).
In my opinion, the things that are likely to have an effect on the design of Microdata are:
Anyway, I have the greatest admiration for those of you who are banging your heads against the wall. But I do find it painful to watch.
Re: HTML5/RDFa Arguments
Couldn’t have put it better myself.
Re: HTML5/RDFa Arguments
The strangest and most confusing part of RDFa was when I put a full URI in a property attribute just as I'd done in the @about and @href and even an @resource... and the resulting document failed to parse. I'd read the CURIE spec (heck, I'd just implemented a parser for it... and a creator when I discovered this), I'd read the RDFa primer, and it really didn't dawn on me that I couldn't NOT use prefixed properties. RDF/XML has the same annoying limitation as well. On the other hand Microdata's most confusing moment was the realization that talking about the same object/item in two parts of a document isn't really possible, or I'm not smart enough to figure out how.
I'm not really convinced by the "but my RDF vocabularies might not be reachable." In my case I've been adding RDFa (initially microdata) to the O'Reilly catalog pages. See first try, and 2nd try. From the top of the page I can list the resources out of our control that will cause the page to fail to render, not just not provide some more data.
That's all before the end of the header if we keep going...
Right now we've reached the footer, time for the really fun stuff:
The idea that URI based metadata some how made this page more dependent on the rest of the Web seems to ignore the current state of web. The reasonably simple slow changing static file that tends to be at the end of most RDF vocab URIs don't really seem to add much danger. The web already doesn't work without the web, in just adding this comment I note that the page uses the wonderful reCAPTCHA service to avoid comment spam. The idea of an unconnected web doesn't seem like the future.
Re: HTML5/RDFa Arguments
Gavin,
I’m really glad that you’re trying out both RDFa and Microdata in the same, real-world pages. It’s exactly that kind of comparison that will illustrate the advantages and disadvantages of each, improve their designs, and help users choose which meets their needs. It’s precisely this kind of analysis that we need to move the situation forward.
I completely agree with you that the argument about the danger of referencing external ontologies seems weird considering the number of other resources that are linked to within a page and affect its appearance and behaviour. If you follow the comments on the post I made about Google’s RDFa support, Henri Sivonen explains the dangers as he sees them.
I think we need more experience about when, whether and how much ontology is useful, and what you have to do as an application writer, using the information embedded in HTML pages, to shield yourself from the possibility of ontologies going missing. We will be more persuasive if we say “yes, that could be a problem but this is how you mitigate it” than to deny that the possibility exists.