XTech 2007: Wednesday 16th May Afternoon

Yes, I’m determined to write up every talk I attended at XTech 2007, so that I have a record of it if nothing else. On Wednesday afternoon, I attended sessions on microformats, internationalisation and NVDL (as well as giving my own talk, of course).

Microformats: the nanotechnology of the semantic web

Jeremy Keith

This was a supremely well-put-together presentation on microformats: beautiful slides, drama and humour, and a reference to Neal Stephenson’s Diamond Age (was I really one of only three people in the packed room to have read it?). There was a lot about what microformats are, how they’re designed, what their niche is (Jeremy was very up-front about the fact they don’t solve every problem), and how they’re developed. But there weren’t any demonstrations of microformat-based applications, which I would have really liked to see. The other thing I thought was worth noting was that Jeremy talked about the dangers of “grey goo” (he was using a nanotechnology metaphor): the proliferation of microformats. He expressed the strong desire that the set of microformats be kept small, and even said (I paraphrase) “Do use semantic class names in your HTML, but don’t call them microformats [unless they’ve been through the microformats standardisation process]!”

Liam Quin gave a paper entitled Microformats: Contaminants or Ingredients at Extreme last year, asking what we, as traditional markup geeks, should do about them. Some were very sceptical, saying something along the lines of “They’re headed for a trainwreck; and we should sit back, watch it happen, and pick up the pieces.” Others wanted to celebrate: the fact that tagging has become understood is really good news for the semantic web, open data and all that jazz.

Both the traditional markup and the microformats community have the same goals: they want to make information easier to search for, to query, to integrate and so on. The microformats approach is to minimise the cost to those supplying information, and to target just a few, very common, kinds of data such as contact information, events and social networks. Traditional markup, on the other hand, aims to cover every single kind of information you might want to make available, and has to worry about issues like validating, styling, and distinguishing between tag sets.

It seems that a fundamental problem is that the benefits of including semantic markup aren’t immediately obvious to the supplier. Whether you use semantic class names in HTML or use elements in known namespaces, it’s purely a matter of faith that this will make your information easier to locate or use. You can’t know that search engines will include that information in their weighting algorithms, or that people reading your page will have the screen-scraping software necessary to pull anything out. With so little (obvious) benefit, authors will only supply semantic data if the cost is low. Adding class names to existing HTML elements is easy whether a web page is generated by hand or automatically. Adding namespaces and authoring special CSS might not be that much more costly to do, but it’s much more costly to grok.

So if we want authors to start putting elements in their own namespaces in their web pages, we need an application that immediately cranks up the benefit of doing so. I have no idea what that is.

Applying the Internationalization Tag Set

Yves Savourel

This was a good introduction to [a standard] I only knew about vaguely. It’s definitely worth knowing about the its:* attributes for defining i18n features such as indicating which content should be translated, which are terms, providing comments for localisation and so on, just in case you need to build those in to new markup languages.

I also have much admiration for how the ITS standard doesn’t expect people to completely rework their markup languages to incorporate ITS data. Instead of using the ITS attributes directly in a document, you can use global rules embedded in the document itself, referenced from the document, or embedded in the schema for the document. I think this approach will prove useful in the development of LIX, when we get around to formalising it.

NVDL - a breath of fresh air for compound document validation

Jirka Kosek & Petr Nálevka

NVDL is Part 4 of DSDL, specifically targeted at organising the validation of documents that incorporate multiple namespaces, such as XHTML documents containing islands of SVG, RDF and MathML. NVDL’s approach is to identify subtrees within the document that need to be validated against a particular schema. The subtrees don’t need to only hold one namespace, but often that will be the case.

The XML Schema wonks in the room (Henry Thompson and Michael Sperberg-McQueen) were a bit befuddled, I think, because with XML Schema you just supply a whole bunch of schema documents to the processor, for different namespaces, and as long as the schemas contain wildcards they’ll do the right thing. The concept of supplying multiple schemas to a validator isn’t part of RELAX NG’s validation approach, so you need something like NVDL if you don’t want to rework your schema for every combination of namespaces.

Henry and Michael were particularly concerned about the fact that it means you can override the original schema, allowing elements from foreign namespaces in situations where the original schema hasn’t allowed them. But as Henry said, it just means that the primary schema you use to define what’s allowed where is actually an NVDL schema: it’s not auxiliary validation like Schematron is, but a language for the primary schema you use.

Later, I wondered how much the XProc work would render NVDL irrelevant. After all, XProc can invoke validation of subtrees against multiple external schemas. On the other hand, NVDL’s syntax is going to be easier to use if that’s all you want to do. Perhaps someone will write a tool to convert NVDL schemas to XProc pipelines…

Actually, Jirka & Petr’s experience with JNVDL is interesting from the XProc viewpoint, in particular the problems that they had with reporting meaningful line numbers when validating subtrees. Something that XProc implementers might want to look at in regard to error reporting with <p:viewport>.

Comments

Re: XTech 2007: Wednesday 16th May Afternoon

Hi Jeni,

I think that base of misunderstanding between NVDL and XML Schema folks is really very simple — one like loosely coupled and others tightly coupled systems and approaches. As you say, if you use xsi:schemaLocation with multiple schemas with wildcards, you can validate compound documents. But this means that authors of original schemas had your usage scenario in mind, that you believe that after 20 years you will be still using W3C XML Schema for validation (and thus you will don’t mind xsi:schemaLocation in your instances) and that you are always validating your document again one fixed set of schemas.

Of course I (and many other people involved in NVDL) do not believe to this. In general, schema should not be specified in instance, because over the time validation technologies are evolving, and you probably do not want to edit your document just to point to a new and better schema. Moreover I quite often find useful to validate document against different schemas to check for different constraints for different purposes. And finally, with NVDL you can create compound documents utilizing XML vocabularies which were not originally developed for such purposes (or at least their schemas were not written witch such usage scenario in mind). IMHO this is very pragmatic approach as almost noone can imagine and foresee all possible usages of schema his/her developed.

Sorry for short blurb, but alpha version of web front-end for JNVDL was just launched at http://relaxed.vse.cz/nextgeneration/

Regarding XProc: You can certainly simulate NVDL behaviour with XProc, but I suppose that in almost all cases XProc code will be much more longer and procedural then corresponding NVDL very declarative code. But this is the case for specialized languages like NVDL.

But I think that XProc has much greater overlap with DSDL Part 10 (Validation management), then with NVDL. It can even turn out that XProc is sufficient for Part 10 and will be just adopted for it.

I’m thinking about “CreoNVDL” — something like NVDL operating on overlapping markup. NVDL approach could be very useful for orthogonal vocabularies — like XHTML/DocBook/SomeOfficeXML + revision tracking + comments. I think that for certain usecases this could be more effective than somewhat complex content models which you can get with Creole (or at least complex for someone who is not yet very used to “overlapping” idea).

Anyway, I was glad that I can met you personally during XTech.

Jirka

Re: XTech 2007: Wednesday 16th May Afternoon

I think you’re misrepresenting XSD a bit. There’s no requirement for tight-coupling in systems that use XML Schema. They provide the xsi:schemaLocation attribute if you want to specify the schemas that can be used to validate a document, but that certainly doesn’t preclude you from validating that document against other schemas.

The way XSD developers think is that you have a pool of schemas that are associated with namespaces, and in each of them you provide wildcards that indicate where content from other namespaces can go (and what kind of validation you want for that content). If you want your XHTML to contain RDF, SVG and MathML then make sure you design your XHTML with wildcards to indicate where elements from foreign namespaces can go, and provide the pool of XHTML, RDF, SVG and MathML schemas to the validator.

I’ll have to blog separately about the difference between Creole’s and NVDL’s approaches…