<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.jenitennison.com/blog" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>creole</title>
 <link>http://www.jenitennison.com/blog/taxonomy/term/7</link>
 <description>The taxonomy view with a depth of 0.</description>
 <language>en</language>
<item>
 <title>Overlap, Containment and Dominance</title>
 <link>http://www.jenitennison.com/blog/node/95</link>
 <description>&lt;p&gt;I&amp;#8217;ve spent the last few days at a &lt;a href=&quot;http://ilps.science.uva.nl/PoliticalMashup/2008/11/workshop-on-multi-dimensional-markup/&quot; title=&quot;Workshop on multi dimensional markup&quot;&gt;workshop on overlapping markup&lt;/a&gt; in Amsterdam. It was organised by &lt;a href=&quot;http://www.hf.uib.no/i/Filosofisk/claus/&quot; title=&quot;Claus Huitfeldt&quot;&gt;Claus Huitfeldt&lt;/a&gt; and &lt;a href=&quot;http://www.w3.org/People/cmsmcq/&quot; title=&quot;Michael Sperberg-McQueen&quot;&gt;Michael Sperberg-McQueen&lt;/a&gt; under a GODDAG banner, but included representatives of other approaches, such as the &lt;a href=&quot;http://www.xconcur.org/&quot; title=&quot;XCONCUR&quot;&gt;XCONCUR crowd&lt;/a&gt; and the &lt;a href=&quot;http://www.lmnl.org/wiki/&quot; title=&quot;LMNL Wiki&quot;&gt;LMNListas&lt;/a&gt; &lt;a href=&quot;http://www.piez.org/wendell/&quot; title=&quot;Wendell Piez&quot;&gt;Wendell&lt;/a&gt; and myself.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;Overlap is arguably the main remaining problem area for markup technologists. Capturing and analysing the overlap between poetic and syntactic structures in poems and plays helps academics gain a deeper understanding of the ways poetic technique has changed over time. And the complexities of structures in documents such as the Bible simply cannot be represented without allowing overlap to happen.&lt;/p&gt;

&lt;p&gt;But academic study aside, overlap is a really important problem because whenever we collaborate on documents and whenever we change documents, we create overlapping structures. One of the major projects that I&amp;#8217;ve worked on at &lt;a href=&quot;http://www.tso.co.uk/&quot; title=&quot;The Stationery Office&quot;&gt;TSO&lt;/a&gt; deals with publishing &lt;a href=&quot;http://www.opsi.gov.uk/legislation/revised&quot; title=&quot;OPSI: Revised Legislation&quot;&gt;consolidated legislation&lt;/a&gt;, showing the places where &amp;#8220;current&amp;#8221; legislation was amended over time from its original, enacted state. The authors of legislation care little for document structures, and amendments often overlap document structures such as paragraphs and list items, and each other.&lt;/p&gt;

&lt;h2&gt;An Example&lt;/h2&gt;

&lt;p&gt;I used the following example during my talk on the &lt;a href=&quot;http://www.lmnl.org/wiki/index.php/Creole&quot; title=&quot;Creole Schema Language&quot;&gt;Creole&lt;/a&gt; schema language during the workshop. It uses &lt;a href=&quot;http://decentius.aksis.uib.no/mlcd/2003/Papers/texmecs.html&quot; title=&quot;TexMECS&quot;&gt;TexMECS&lt;/a&gt; notation, in which &lt;code&gt;&amp;lt;name|&lt;/code&gt; is a start tag, &lt;code&gt;|name&amp;gt;&lt;/code&gt; an end tag and the normal XML syntax is used for attributes:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;page n=&quot;199&quot;|
...
&amp;lt;poem|
  &amp;lt;title|&amp;lt;pl|Recueillement|pl&amp;gt;|title&amp;gt;
  &amp;lt;stanza|
    &amp;lt;s|&amp;lt;sl|&amp;lt;pl|Sois sage, ô ma douleur, et tiens-toi plus |pl&amp;gt;
                                        &amp;lt;pl|tranquille.|pl&amp;gt;|sl&amp;gt;|s&amp;gt;
    &amp;lt;s|&amp;lt;sl|&amp;lt;pl|Tu réclamais le Soir; il descend; le voici:|pl&amp;gt;|sl&amp;gt;
    &amp;lt;sl|&amp;lt;pl|Une atmosphère obscure enveloppe la ville,|pl&amp;gt;|sl&amp;gt;
    &amp;lt;sl|&amp;lt;pl|Aux uns portant la paix, aux autres le souci.|pl&amp;gt;|sl&amp;gt;|s&amp;gt;
  |stanza&amp;gt;
  &amp;lt;stanza|
    &amp;lt;s|&amp;lt;sl|&amp;lt;pl|Pendant que des mortels la multitude vile,|pl&amp;gt;|sl&amp;gt;
    &amp;lt;sl|&amp;lt;pl|Sous le fouet du Plaisir, ce bourreau sans merci,|pl&amp;gt;|sl&amp;gt;
    &amp;lt;sl|&amp;lt;pl|Va cueillir des remords dans la fête servile,|pl&amp;gt;|sl&amp;gt;
    &amp;lt;sl|&amp;lt;pl|Ma douleur, donne moi la main; viens par ici,|pl&amp;gt;|sl&amp;gt;
  |stanza&amp;gt;
  &amp;lt;stanza|
    &amp;lt;sl|&amp;lt;pl|Loin d&#039;eux.|s&amp;gt; &amp;lt;s|Vois se pencher les défuntes |pl&amp;gt;
                                                &amp;lt;pl|Années,|pl&amp;gt;|sl&amp;gt;
    &amp;lt;sl|&amp;lt;pl|Sur les balcons du ciel, en robes surannées;|pl&amp;gt;|sl&amp;gt;
    &amp;lt;sl|&amp;lt;pl|Surgir du fond des eaux le Regret souriant;|pl&amp;gt;|sl&amp;gt;
  |stanza&amp;gt;|page&amp;gt;
  &amp;lt;page n=&quot;200&quot;|&amp;lt;stanza|
    &amp;lt;sl|&amp;lt;pl|Le Soleil moribund s&#039;endormir sous une arche,|pl&amp;gt;|sl&amp;gt;
    &amp;lt;sl|&amp;lt;pl|Et, comme un long linceul traînant à l&#039;Orient,|pl&amp;gt;|sl&amp;gt;
    &amp;lt;sl|&amp;lt;pl|Entends, ma chère, entends la douce Nuit qui |pl&amp;gt;
                                              &amp;lt;pl|marche.|pl&amp;gt;|sl&amp;gt;|s&amp;gt;
  |stanza&amp;gt;
|poem&amp;gt;
...
|page&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The start and end tags mark &lt;em&gt;ranges&lt;/em&gt; in the text. (In some discussions of overlap, the ranges are called &amp;#8220;elements&amp;#8221;, but I prefer to reserve that term for structures that are self-contained, such as those in XML, to avoid confusion.) In Creole&amp;#8217;s compact syntax, you could articulate the structure as follows:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;# a book is a sequence of pages; it is also a sequence of poems
start = element book { page+ ~ poem+ }

# a page is a sequence of page lines
page = range page { pl+ }

# a poem starts with a title; the body of the poem can be characterised
# as a sequence of stanzas, but also as a sequence of sentences
poem = range poem { title, ( stanza+ ~ s+ ) }

# a title is a self-contained structure that may contains several page lines
title = element title { pl+ }

# a stanza contains several stanza lines
stanza = range stanza { sl+ }

# a stanza line contains one or more page lines
sl = range sl { pl+ }

# a sentence contains some text
s = range s { text }

# a page line contains some text
pl = range pl { text }
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You could go further: sentences are made up of phrases, which are made up of words, which are made up of syllables, which are made up of letters. Stanzas within a sonnet such as this one can be clustered into an octet and a sestet and classified as quatrains and tercets based on the number of lines they contain. Stanza lines are also made up of syllables. And so on. Analysing the way in which the syntactic (sentence/phrase) structure overlaps with the prosodic (stanza/line) structure is one important way in which you can &lt;a href=&quot;http://www.tau.ac.il/~tsurxx/Recueillement.html&quot; title=&quot;Archetypal Pattern in Baudelaire&#039;s &#039;Recueillement&#039;&quot;&gt;analyse a poem&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;Containment vs Dominance&lt;/h2&gt;

&lt;p&gt;When you&amp;#8217;re talking about overlapping structures, it&amp;#8217;s useful to make the distinction between structures that &lt;em&gt;contain&lt;/em&gt; each other and structures that &lt;em&gt;dominate&lt;/em&gt; each other. Containment is a happenstance relationship between ranges while dominance is one that has a meaningful semantic. A page may happen to &lt;em&gt;contain&lt;/em&gt; a stanza, but a poem &lt;em&gt;domainates&lt;/em&gt; the stanzas that it contains.&lt;/p&gt;

&lt;p&gt;In LMNL, we view a document as consisting of a &lt;em&gt;sequence of atoms&lt;/em&gt;, usually characters, and ranges over those characters. But the model makes no assertions about dominance relationships between the ranges. This document model is easy to construct from a serialised document like the one above.&lt;/p&gt;

&lt;p&gt;Conversely, &lt;a href=&quot;http://www.w3.org/People/cmsmcq/2000/poddp2000.html&quot; title=&quot;GODDAG: A Data Structure for Overlapping Hierarchies&quot;&gt;GODDAG document models&lt;/a&gt; are directed acyclic graphs (DAGs): the nodes within those graphs have children and parents, with leaf nodes containing characters, and the parent-child relationship implies dominance. This is a useful model for processing, and particularly querying. Navigating a DAG is a lot like navigating a tree, just one that represents multiple hierarchies. But it isn&amp;#8217;t possible to construct a DAG from a serialised document like the one above without extra information about which containment relationships are actually dominance relationships, and which mere happenstance.&lt;/p&gt;

&lt;p&gt;So an important challenge is how to get from a flat, containment-only model to a DAG. There are four approaches that can be taken:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;For any document, for each pair of range names A and B, if every range named A contains a range named B, then assume that A dominates B; from that set of relationships, create a DAG.&lt;/li&gt;
&lt;li&gt;Introduce additional syntax into tags, such that dominance relationships between ranges can be expressed explicitly within the serialisation.&lt;/li&gt;
&lt;li&gt;Associate each document with a schema, and use the model expressed in the schema to identify dominance relationships; a Creole schema like the one above could be taking as asserting that poems dominate stanzas, for example, since stanzas are mentioned in the content model of the poem range.&lt;/li&gt;
&lt;li&gt;Defer the construction of a DAG to the point of processing; a document would then not be a DAG in and of itself, but only in relation to a particular process.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I find the last of these the most satisfactory. 1 is too arbitrary. 2 requires too much syntax. 3 requires a single schema per document (which, from experience with XML, I think is a broken model). One could imagine being able to specify something like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;book &amp;gt; page &amp;gt; pl &amp;gt; #text
book &amp;gt; poem &amp;gt; stanza &amp;gt; sl &amp;gt; #text
book &amp;gt; poem &amp;gt; s &amp;gt; #text
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and this generating a DAG in which a &lt;code&gt;book&lt;/code&gt; node had &lt;code&gt;page&lt;/code&gt; and &lt;code&gt;poem&lt;/code&gt; children, &lt;code&gt;page&lt;/code&gt; nodes had &lt;code&gt;pl&lt;/code&gt; children which had text children, &lt;code&gt;poem&lt;/code&gt; nodes had &lt;code&gt;stanza&lt;/code&gt; children and &lt;code&gt;s&lt;/code&gt; children, and so on. With this structure, it would be easy enough to find stanzas with four lines (&lt;code&gt;/book/poem/stanza[count(sl) = 4]&lt;/code&gt;) without having to worry about the possibilities of happenstance containment, such as some stanza lines being contained by sentences that are contained by stanzas.&lt;/p&gt;

&lt;p&gt;There&amp;#8217;s lots more to talk about here. In particular, things about the useful and appropriate ways of querying and transforming these structures, and how to best serialise them in XML. But I&amp;#8217;ll leave those thoughts for another post.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/95#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/7">creole</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/9">overlapping markup</category>
 <pubDate>Sat, 06 Dec 2008 20:56:52 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">95 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>And she&#039;s back</title>
 <link>http://www.jenitennison.com/blog/node/46</link>
 <description>&lt;p&gt;So first there was the &lt;a href=&quot;http://www.xmlsummerschool.com/&quot; title=&quot;Oxford XML Summer School&quot;&gt;XML Summer School&lt;/a&gt;. This year was my sixth, and it was really great to hang out with &lt;a href=&quot;http://www.xmlsummerschool.com/speakers.html&quot; title=&quot;XML Summer School Speakers List&quot;&gt;chums&lt;/a&gt; old and new. I love that&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you get to meet people from all corners of the XML community, even ones you haven&amp;#8217;t got the slightest interest in, and learn that they&amp;#8217;re human too (even the web services guys)&lt;/li&gt;
&lt;li&gt;there&amp;#8217;s always &lt;em&gt;something&lt;/em&gt; to learn; I&amp;#8217;ve seen some talks for six years on the trot, others were completely new this year, but they&amp;#8217;re all worth attending because the audience, war stories and discussion are always different. Also, because each talk is aimed at newcomers, you get a great overview of topics that you&amp;#8217;re not so familiar with, and you can always chat to the speaker later to find out more&lt;/li&gt;
&lt;li&gt;there are social events laid on every evening that you&amp;#8217;re expected to attend, so you&amp;#8217;re practically forced to socialise, which is useful for an insecure introvert like me who&amp;#8217;d otherwise be sitting in her hotel room getting miserable imagining everyone else having a good time&lt;/li&gt;
&lt;li&gt;there&amp;#8217;s a creche, so despite being inseparable from two small children over the last four years, I&amp;#8217;ve still been able to attend without dragging an entourage with me (not that I object to the entourage, just the expense and the dependency)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I left feeling not only invigorated and inspired, but also a part of a fun and friendly community.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;The following week, we moved house. Twelve days later, we&amp;#8217;re &lt;em&gt;almost&lt;/em&gt; completely unpacked, and the important things are done. We have wireless, positioned nicely so that virtually the whole house gets &amp;#8220;Excellent&amp;#8221; coverage. Thanks to my father, we have a &lt;a href=&quot;http://en.wikipedia.org/wiki/NSLU2&quot; title=&quot;Wikipedia: NSLU2&quot;&gt;NSLU2&lt;/a&gt; running &lt;a href=&quot;http://en.wikipedia.org/wiki/Debian&quot; title=&quot;Wikipedia: Debian&quot;&gt;Debian&lt;/a&gt; and acting as our low-power-consumption file and mail server. (For those that are interested, I&amp;#8217;m getting access from my Windows machines using &lt;a href=&quot;http://en.wikipedia.org/wiki/Xming&quot; title=&quot;Wikipedia: Xming&quot;&gt;Xming&lt;/a&gt; to actually interact with the machine, and &lt;a href=&quot;http://www.webdrive.com/&quot; title=&quot;South River Technologies&quot;&gt;WebDrive&lt;/a&gt; to map a drive onto the file system.) And we have an area entirely dedicated to Lego. Yes, this might be my dream house.&lt;/p&gt;

&lt;p&gt;(We&amp;#8217;re also only a 40 minute drive from Heathrow, 30 minute train from Waterloo, so if anyone I know&amp;#8217;s visiting the UK and wants to drop in, you&amp;#8217;re more than welcome. There&amp;#8217;s even a spare room.)&lt;/p&gt;

&lt;p&gt;And amongst all this, I had to record a &lt;a href=&quot;http://www.jenitennison.com/extreme/Creole.zip&quot; title=&quot;Zipped Powerpoint with linked sound files&quot;&gt;virtual presentation&lt;/a&gt; on &lt;a href=&quot;http://www.lmnl.org/wiki/Creole&quot; title=&quot;Creole: schema language for overlapping markup&quot;&gt;Creole&lt;/a&gt; for &lt;a href=&quot;http://www.extrememarkup.com/overlap/index.html&quot; title=&quot;International Workshop on Markup of Overlapping Structures&quot;&gt;Overlap day&lt;/a&gt; at &lt;a href=&quot;http://www.extrememarkup.com/&quot; title=&quot;Extreme Markup Languages Conference&quot;&gt;Extreme&lt;/a&gt;. The sound quality&amp;#8217;s not great, but it&amp;#8217;s a reasonable 10-minute introduction, I think. It sounds like they got good attendance: will someone who was there please post about it?&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/46#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/7">creole</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/27">life</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/10">gadgets</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/3">conferences</category>
 <pubDate>Sat, 11 Aug 2007 20:54:45 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">46 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Partitioning overlapping markup</title>
 <link>http://www.jenitennison.com/blog/node/27</link>
 <description>&lt;p&gt;&lt;a href=&quot;http://www.piez.org/&quot; title=&quot;Wendell&#039;s Home Page&quot;&gt;Wendell Piez&lt;/a&gt; forwarded me an interesting poster by &lt;a href=&quot;http://www.huygensinstituut.knaw.nl/index.php?option=com_content&amp;amp;task=view&amp;amp;id=120&amp;amp;Itemid=57&quot; title=&quot;Bert Van Elsacker&quot;&gt;Bert Van Elsacker&lt;/a&gt; on automatic fragmentation of overlapping structures. That&amp;#8217;s taking something like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;bold&amp;gt; this is bold &amp;lt;italic&amp;gt; and italic &amp;lt;/bold&amp;gt; text &amp;lt;/italic&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and turning it into something well-formed, like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;bold&amp;gt; this is bold &amp;lt;italic&amp;gt; and italic &amp;lt;/italic&amp;gt;&amp;lt;/bold&amp;gt;&amp;lt;italic&amp;gt; text &amp;lt;/italic&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;When you do this, you have to decide which elements can be split and which can&amp;#8217;t, and their relative priorities. Wendell suggested that perhaps Creole might help to do this. I have been thinking about is using Creole to add annotations to markup (something like, you add attributes to the Creole patterns and they get copied on to the matched ranges, or are used to create new ranges), but I haven&amp;#8217;t done that yet, and actually I think you probably want a different kind of language to do it (&lt;a href=&quot;http://blog.jclark.com/2007/04/do-we-need-new-kind-of-schema-language.html&quot; title=&quot;James Clark: Do we need a new kind of schema language?&quot;&gt;a new kind of schema language&lt;/a&gt; like James Clark suggested), because the way in which you break up overlapping structures has a lot to do with how you&amp;#8217;re going to process them.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;I&amp;#8217;m reminded of the paper&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Sperberg-McQueen, C. M., David Dubin, Claus Huitfeldt and Allen Renear. “&lt;a href=&quot;http://www.idealliance.org/papers/extreme/proceedings/html/2002/CMSMcQ01/EML2002CMSMcQ01.html&quot;&gt;Drawing inferences on the basis of markup.&lt;/a&gt;” In Proceedings of Extreme Markup Languages 2002. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;in which (based on my memory of the talk) they discuss how different elements allow you to make different assertions about the text they contain, and consequently can be split in different ways. For example, a &lt;code&gt;&amp;lt;paragraph&amp;gt;&lt;/code&gt; element can&amp;#8217;t be split into two &lt;code&gt;&amp;lt;paragraph&amp;gt;&lt;/code&gt; elements without changing the meaning of the document, whereas a &lt;code&gt;&amp;lt;bold&amp;gt;&lt;/code&gt; element can be split into two &lt;code&gt;&amp;lt;bold&amp;gt;&lt;/code&gt; elements with no problems because it&amp;#8217;s really indicating &amp;#8220;these characters are bold&amp;#8221; rather than &amp;#8220;this is a bold phrase&amp;#8221;.&lt;/p&gt;

&lt;p&gt;You can take a purist view (which would usually entail splitting hardly any elements, since most elements &lt;em&gt;do&lt;/em&gt; mark up a range of text rather than the individual characters they contain), but I think the main reason you want to do this fragmentation is for presentation. And in that context, the notional semantics of the element don&amp;#8217;t really matter: what matters is how they&amp;#8217;re styled. For example, a &lt;code&gt;&amp;lt;comment&amp;gt;&lt;/code&gt; element, marking up a range of text that has been commented on, might not be splittable at a theoretical level, but if you&amp;#8217;re going to render it simply by turning the background yellow, then in fact you &lt;em&gt;can&lt;/em&gt; split it for that purpose.&lt;/p&gt;

&lt;p&gt;Since it&amp;#8217;s related to presentation, I wonder whether you could use a (simplified) CSS stylesheet to provide both the fragmentation and the style. Block-level elements (&lt;code&gt;display: block;&lt;/code&gt;) couldn&amp;#8217;t be split whereas inline elements could. Elements that have the box model properties (margin, padding &amp;amp; borders) can&amp;#8217;t be split, or, if they are, you need to mark the fragments as &amp;#8220;left&amp;#8221;, &amp;#8220;middle&amp;#8221; and &amp;#8220;right&amp;#8221;, and only apply the &lt;em&gt;left&lt;/em&gt; margin/padding/border to the &amp;#8220;left&amp;#8221; fragment, and similarly with the right.&lt;/p&gt;

&lt;p&gt;It wouldn&amp;#8217;t be a general purpose transformation mechanism, but it would be darned useful!&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/27#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/7">creole</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/9">overlapping markup</category>
 <pubDate>Mon, 11 Jun 2007 20:36:09 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">27 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>XTech Creole presentation fallout</title>
 <link>http://www.jenitennison.com/blog/node/17</link>
 <description>&lt;p&gt;&lt;a href=&quot;http://www.ltg.ed.ac.uk/~ht/&quot; title=&quot;Henry S. Thompson&#039;s Home Page&quot;&gt;Henry Thompson&lt;/a&gt; had a lot to say after &lt;a href=&quot;http://www.jenitennison.com/blog/files/XTech2007CreoleSlides.zip&quot; title=&quot;XTech 2007 Creole presentation&quot;&gt;my Creole presentation&lt;/a&gt; (open takahashi.xul?data=creole.data; requires Firefox) about the benefits of stand-off markup for linguistic information. From his overview, it seems that the &lt;a href=&quot;http://www.ltg.ed.ac.uk/NITE&quot; title=&quot;NITE XML Toolkit&quot;&gt;NITE XML Toolkit&lt;/a&gt; that he&amp;#8217;s been involved with represents overlapping linguistic data by holding atoms (here meaning the &amp;#8220;lowest common denominator&amp;#8221; shared pieces of data) and having multiple trees marking up these atoms. The trees are independently validated (since they are pure XML), with cross-hierarchy validation done through the query language. This is pretty similar to the &lt;a href=&quot;http://www.idealliance.org/papers/extreme/Proceedings/html/2006/Schonefeld01/EML2006Schonefeld01.html&quot; title=&quot;Towards Validation of Concurrent Markup&quot;&gt;XCONCUR&lt;/a&gt; approach, which augments a CONCUR-like multi-grammar validation with a Schematron-like constraint language.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;Now, I have nothing against using constraint languages (like Schematron) to validate documents, but grammars (like RELAX NG) have big advantages. Most importantly, they are easier to write (if they&amp;#8217;re designed properly), and tools can analyse them to do useful things, such as tell you what element or attribute is expected next. If it&amp;#8217;s possible to write cross-grammar constraints in a grammar (like Creole) then why would you use a constraint language to do it?&lt;/p&gt;

&lt;p&gt;I think the big difference between Henry&amp;#8217;s domain and the one that I think will move overlap into the mainstream is between global and local concurrence. With global concurrence, entirely separate hierarchies are applied to the same data, so the natural validation mechanism is to use entirely separate grammars (with perhaps a few small rules to do cross-grammar validation where that proves necessary). With local concurrence, the vast majority of the document follows a single hierarchy with concurrence happening at a low level.&lt;/p&gt;

&lt;p&gt;Actually, the best example for this doesn&amp;#8217;t even involve overlap. Consider HTML paragraphs, which contain various inline elements such as &lt;code&gt;&amp;lt;strong&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;em&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;a&amp;gt;&lt;/code&gt;. It doesn&amp;#8217;t make sense for these elements to contain themselves (strong text is neither made stronger nor negated by appearing in two &lt;code&gt;&amp;lt;strong&amp;gt;&lt;/code&gt; elements, and it&amp;#8217;s not allowed for links to contain other links). So the natural model in Creole is&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;p      = element p { mixed { strong* &amp;amp; em* &amp;amp; a* } }
strong = range strong { text }
em     = range em { text }
a      = range a { attribute href { text }, ..., text }
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This model allows &lt;code&gt;&amp;lt;a&amp;gt;&lt;/code&gt; elements to appear within &lt;code&gt;&amp;lt;em&amp;gt;&lt;/code&gt; elements, or vice versa, not because of the content model of &lt;code&gt;&amp;lt;em&amp;gt;&lt;/code&gt; but because the two ranges are interleaved (and one arrangement of interleaved ranges is containment). It doesn&amp;#8217;t allow any of these elements to appear inside themselves. It would be a real maintenance headache to have separate grammars for each of these inline elements, when most of each of the grammars (all the hierarchy down to the paragraph level) would be the same.&lt;/p&gt;

&lt;p&gt;Actually, looking at NITE, it seems like it employs a data model that&amp;#8217;s quite like &lt;a href=&quot;http://www.lmnlwiki.org/index.php/LMNL_data_model&quot; title=&quot;LMNL data model&quot;&gt;LMNL&amp;#8217;s&lt;/a&gt;, in that it has the concept of layers over atoms or ranges/elements. (Interestingly it looks like they get around the problem of identifying which ranges belong to which layers purely by using their name.) Another difference here might be that while I&amp;#8217;m talking about supporting overlap in fairly heavily structured documents (like office documents), they&amp;#8217;re really using fairly flat annotations, where there isn&amp;#8217;t much of a grammar anyway. But I might have that wrong: need to do more reading. The other thing to investigate is whether they have any support for self-overlap (&lt;code&gt;&amp;lt;phrase&amp;gt;&lt;/code&gt; elements overlapping other &lt;code&gt;&amp;lt;phrase&amp;gt;&lt;/code&gt; elements): I kinda gather that they don&amp;#8217;t.&lt;/p&gt;

&lt;p&gt;Anyway, Henry also made the points that (a) that he doesn&amp;#8217;t want a new syntax for overlap and (b) stand-off markup works very well thank you. To address the latter point first, I think stand-off markup works very well if you have the tools to support it. It&amp;#8217;s fine if you have an integrated toolkit which can pull together and display the stand-off markup as embedded markup, and let you create ranges by highlighting text with a mouse. But the great power of HTML and other web technologies is that you don&amp;#8217;t need to use a specialised toolkit to write it: you can just use a text editor and it&amp;#8217;s all right there in front of you with no (or minimal) cross-referencing required. Frankly, I&amp;#8217;m not interested in &amp;#8220;core&amp;#8221; technologies that require me to install a particular piece of software in order to make use of them (cf &lt;a href=&quot;http://research.microsoft.com/~emeijer/&quot; title=&quot;Erik Meijer&#039;s Home Page&quot;&gt;Erik Meijer&lt;/a&gt;&amp;#8217;s talk on &lt;a href=&quot;http://msdn.microsoft.com/data/ref/linq/&quot; title=&quot;LINQ&quot;&gt;LINQ&lt;/a&gt;, which I&amp;#8217;ll have to discuss another time). I expect to be able to write a document containing overlap as easily as I can write a normal XML document.&lt;/p&gt;

&lt;p&gt;On Henry&amp;#8217;s point about yet another syntax for overlap, I am more and more coming to the conclusion that overlap will hit the mainstream if we have a simple way of encoding overlap in normal XML documents, namely something along the lines of &lt;a href=&quot;http://www.lmnlwiki.org/index.php/Talk:ECLIX#LIX&quot; title=&quot;LMNL-in-XML&quot;&gt;LIX&lt;/a&gt;. Interestingly, &lt;a href=&quot;http://www.translate.com/&quot; title=&quot;Yves Savourel&#039;s Website&quot;&gt;Yves Savourel&lt;/a&gt;&amp;#8217;s talk on Applying the &lt;a href=&quot;http://www.w3.org/TR/2007/REC-its-20070403/&quot; title=&quot;Internationalization Tag Set (ITS) Version 1.0&quot;&gt;Internationalization Tag Set&lt;/a&gt; was quite inspirational in this regard, since the working group seem to have put together a standard that both provides a set of standard elements and attributes to guide localisation, along with a method of mapping elements and attributes in existing markup languages onto those ITS elements and attributes. I wonder whether a similar approach could be used with LIX&amp;#8230; but I&amp;#8217;ll have to leave those thoughts for another time.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/17#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/7">creole</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/4">xtech</category>
 <pubDate>Wed, 16 May 2007 20:48:47 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">17 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>A Creole by any other name...</title>
 <link>http://www.jenitennison.com/blog/node/6</link>
 <description>&lt;p&gt;Argh. I&amp;#8217;ve been contacted by the guys at &lt;a href=&quot;http://www.wikicreole.org&quot; title=&quot;Creole Wiki Markup language&quot;&gt;WikiCreole&lt;/a&gt; who want me to change the name of &lt;a href=&quot;http://www.lmnlwiki.org&quot; title=&quot;Creole schema language&quot;&gt;Creole&lt;/a&gt;. What should I do? Not only is &amp;#8220;Creole&amp;#8221; a great name for a schema language that deals with concurrent markup, but it&amp;#8217;s a great acronym too (Composable regular expressions for overlapping languages etc.)&lt;/p&gt;

&lt;p&gt;I did Google when I first came up with the name in August 2006, but didn&amp;#8217;t discover WikiCreole (unsurprisingly, since it was only coined in July 2006 itself). But now far more many people know, care about and use WikiCreole than Creole grammars. So any suggestions for alternative names?&lt;/p&gt;

&lt;!--break--&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/6#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/7">creole</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/9">overlapping markup</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/8">schema</category>
 <pubDate>Wed, 25 Apr 2007 20:09:28 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">6 at http://www.jenitennison.com/blog</guid>
</item>
</channel>
</rss>

