In my last post I talked about different techniques for representing overlap within XML. One technique is fragmentation. In the work that I’ve been doing, I’ve been using milestone-based formats similar to ECLIX, but my eyes were opened at the GODDAG workshop: fragmentation would make overlap so much easier to process in XSLT, especially when dealing with localised overlap such as revision or comment markup.
But how could fragmentation be used with full-on overlap? I had a little play and came up with some XSLT to demonstrate.
I’m still on an overlap jag. I’ve shown some examples in the last couple of posts of TexMECS, XCONCUR and LMNL syntax, which depart from the usual well-formedness strictures in XML. But these syntaxes have one big problem: they’re not XML. XML is well-known, well-understood, and has great tools available for it, for querying, transforming, and pipelining. So it would be a real win if overlap could be represented within XML in a usable manner.
In my last post I discussed the kinds of situations where overlapping markup can appear in documents, and the distinction between containment, when one element happens to contain another, and dominance, where the relationship between the two elements is more meaningful. Here I’ll expand a bit more on the issue of whether dominance relationships are or should be part of the essential information in the document.
I’ve spent the last few days at a workshop on overlapping markup in Amsterdam. It was organised by Claus Huitfeldt and Michael Sperberg-McQueen under a GODDAG banner, but included representatives of other approaches, such as the XCONCUR crowd and the LMNListas Wendell and myself.
Wendell Piez forwarded me an interesting poster by Bert Van Elsacker on automatic fragmentation of overlapping structures. That’s taking something like:
<bold> this is bold <italic> and italic </bold> text </italic>
and turning it into something well-formed, like:
<bold> this is bold <italic> and italic </italic></bold><italic> text </italic>
When you do this, you have to decide which elements can be split and which can’t, and their relative priorities. Wendell suggested that perhaps Creole might help to do this. I have been thinking about is using Creole to add annotations to markup (something like, you add attributes to the Creole patterns and they get copied on to the matched ranges, or are used to create new ranges), but I haven’t done that yet, and actually I think you probably want a different kind of language to do it (a new kind of schema language like James Clark suggested), because the way in which you break up overlapping structures has a lot to do with how you’re going to process them.
Argh. I’ve been contacted by the guys at WikiCreole who want me to change the name of Creole. What should I do? Not only is “Creole” a great name for a schema language that deals with concurrent markup, but it’s a great acronym too (Composable regular expressions for overlapping languages etc.)
I did Google when I first came up with the name in August 2006, but didn’t discover WikiCreole (unsurprisingly, since it was only coined in July 2006 itself). But now far more many people know, care about and use WikiCreole than Creole grammars. So any suggestions for alternative names?