I’ve spent the last few days at a workshop on overlapping markup in Amsterdam. It was organised by Claus Huitfeldt and Michael Sperberg-McQueen under a GODDAG banner, but included representatives of other approaches, such as the XCONCUR crowd and the LMNListas Wendell and myself.
So first there was the XML Summer School. This year was my sixth, and it was really great to hang out with chums old and new. I love that
I left feeling not only invigorated and inspired, but also a part of a fun and friendly community.
Wendell Piez forwarded me an interesting poster by Bert Van Elsacker on automatic fragmentation of overlapping structures. That’s taking something like:
<bold> this is bold <italic> and italic </bold> text </italic>
and turning it into something well-formed, like:
<bold> this is bold <italic> and italic </italic></bold><italic> text </italic>
When you do this, you have to decide which elements can be split and which can’t, and their relative priorities. Wendell suggested that perhaps Creole might help to do this. I have been thinking about is using Creole to add annotations to markup (something like, you add attributes to the Creole patterns and they get copied on to the matched ranges, or are used to create new ranges), but I haven’t done that yet, and actually I think you probably want a different kind of language to do it (a new kind of schema language like James Clark suggested), because the way in which you break up overlapping structures has a lot to do with how you’re going to process them.
Henry Thompson had a lot to say after my Creole presentation (open takahashi.xul?data=creole.data; requires Firefox) about the benefits of stand-off markup for linguistic information. From his overview, it seems that the NITE XML Toolkit that he’s been involved with represents overlapping linguistic data by holding atoms (here meaning the “lowest common denominator” shared pieces of data) and having multiple trees marking up these atoms. The trees are independently validated (since they are pure XML), with cross-hierarchy validation done through the query language. This is pretty similar to the XCONCUR approach, which augments a CONCUR-like multi-grammar validation with a Schematron-like constraint language.
Argh. I’ve been contacted by the guys at WikiCreole who want me to change the name of Creole. What should I do? Not only is “Creole” a great name for a schema language that deals with concurrent markup, but it’s a great acronym too (Composable regular expressions for overlapping languages etc.)
I did Google when I first came up with the name in August 2006, but didn’t discover WikiCreole (unsurprisingly, since it was only coined in July 2006 itself). But now far more many people know, care about and use WikiCreole than Creole grammars. So any suggestions for alternative names?