This post was imported from my old Drupal blog. To see the full thing, including comments, it's best to visit the Internet Archive.
Yes, I’m determined to write up every talk I attended at XTech 2007, so that I have a record of it if nothing else. On Wednesday afternoon, I attended sessions on microformats, internationalisation and NVDL (as well as giving my own talk, of course).
This was a supremely well-put-together presentation on microformats: beautiful slides, drama and humour, and a reference to Neal Stephenson’s Diamond Age (was I really one of only three people in the packed room to have read it?). There was a lot about what microformats are, how they’re designed, what their niche is (Jeremy was very up-front about the fact they don’t solve every problem), and how they’re developed. But there weren’t any demonstrations of microformat-based applications, which I would have really liked to see. The other thing I thought was worth noting was that Jeremy talked about the dangers of “grey goo” (he was using a nanotechnology metaphor): the proliferation of microformats. He expressed the strong desire that the set of microformats be kept small, and even said (I paraphrase) “Do use semantic class names in your HTML, but don’t call them microformats [unless they’ve been through the microformats standardisation process]!”
Liam Quin gave a paper entitled Microformats: Contaminants or Ingredients at Extreme last year, asking what we, as traditional markup geeks, should do about them. Some were very sceptical, saying something along the lines of “They’re headed for a trainwreck; and we should sit back, watch it happen, and pick up the pieces.” Others wanted to celebrate: the fact that tagging has become understood is really good news for the semantic web, open data and all that jazz.
Both the traditional markup and the microformats community have the same goals: they want to make information easier to search for, to query, to integrate and so on. The microformats approach is to minimise the cost to those supplying information, and to target just a few, very common, kinds of data such as contact information, events and social networks. Traditional markup, on the other hand, aims to cover every single kind of information you might want to make available, and has to worry about issues like validating, styling, and distinguishing between tag sets.
It seems that a fundamental problem is that the benefits of including semantic markup aren’t immediately obvious to the supplier. Whether you use semantic class names in HTML or use elements in known namespaces, it’s purely a matter of faith that this will make your information easier to locate or use. You can’t know that search engines will include that information in their weighting algorithms, or that people reading your page will have the screen-scraping software necessary to pull anything out. With so little (obvious) benefit, authors will only supply semantic data if the cost is low. Adding class names to existing HTML elements is easy whether a web page is generated by hand or automatically. Adding namespaces and authoring special CSS might not be that much more costly to do, but it’s much more costly to grok.
So if we want authors to start putting elements in their own namespaces in their web pages, we need an application that immediately cranks up the benefit of doing so. I have no idea what that is.
This was a good introduction to [a standard] I only knew about vaguely. It’s definitely worth knowing about the
its:* attributes for defining i18n features such as indicating which content should be translated, which are terms, providing comments for localisation and so on, just in case you need to build those in to new markup languages.
I also have much admiration for how the ITS standard doesn’t expect people to completely rework their markup languages to incorporate ITS data. Instead of using the ITS attributes directly in a document, you can use global rules embedded in the document itself, referenced from the document, or embedded in the schema for the document. I think this approach will prove useful in the development of LIX, when we get around to formalising it.
NVDL is Part 4 of DSDL, specifically targeted at organising the validation of documents that incorporate multiple namespaces, such as XHTML documents containing islands of SVG, RDF and MathML. NVDL’s approach is to identify subtrees within the document that need to be validated against a particular schema. The subtrees don’t need to only hold one namespace, but often that will be the case.
The XML Schema wonks in the room (Henry Thompson and Michael Sperberg-McQueen) were a bit befuddled, I think, because with XML Schema you just supply a whole bunch of schema documents to the processor, for different namespaces, and as long as the schemas contain wildcards they’ll do the right thing. The concept of supplying multiple schemas to a validator isn’t part of RELAX NG’s validation approach, so you need something like NVDL if you don’t want to rework your schema for every combination of namespaces.
Henry and Michael were particularly concerned about the fact that it means you can override the original schema, allowing elements from foreign namespaces in situations where the original schema hasn’t allowed them. But as Henry said, it just means that the primary schema you use to define what’s allowed where is actually an NVDL schema: it’s not auxiliary validation like Schematron is, but a language for the primary schema you use.
Later, I wondered how much the XProc work would render NVDL irrelevant. After all, XProc can invoke validation of subtrees against multiple external schemas. On the other hand, NVDL’s syntax is going to be easier to use if that’s all you want to do. Perhaps someone will write a tool to convert NVDL schemas to XProc pipelines…
Actually, Jirka & Petr’s experience with JNVDL is interesting from the XProc viewpoint, in particular the problems that they had with reporting meaningful line numbers when validating subtrees. Something that XProc implementers might want to look at in regard to error reporting with