overlapping markup

Partitioning overlapping markup

Wendell Piez forwarded me an interesting poster by Bert Van Elsacker on automatic fragmentation of overlapping structures. That’s taking something like:

<bold> this is bold <italic> and italic </bold> text </italic>

and turning it into something well-formed, like:

<bold> this is bold <italic> and italic </italic></bold><italic> text </italic>

When you do this, you have to decide which elements can be split and which can’t, and their relative priorities. Wendell suggested that perhaps Creole might help to do this. I have been thinking about is using Creole to add annotations to markup (something like, you add attributes to the Creole patterns and they get copied on to the matched ranges, or are used to create new ranges), but I haven’t done that yet, and actually I think you probably want a different kind of language to do it (a new kind of schema language like James Clark suggested), because the way in which you break up overlapping structures has a lot to do with how you’re going to process them.

A Creole by any other name...

Argh. I’ve been contacted by the guys at WikiCreole who want me to change the name of Creole. What should I do? Not only is “Creole” a great name for a schema language that deals with concurrent markup, but it’s a great acronym too (Composable regular expressions for overlapping languages etc.)

I did Google when I first came up with the name in August 2006, but didn’t discover WikiCreole (unsurprisingly, since it was only coined in July 2006 itself). But now far more many people know, care about and use WikiCreole than Creole grammars. So any suggestions for alternative names?

Syndicate content