<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.jenitennison.com/blog" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>overlapping markup</title>
 <link>http://www.jenitennison.com/blog/taxonomy/term/9</link>
 <description>The taxonomy view with a depth of 0.</description>
 <language>en</language>
<item>
 <title>Working With Fragmented Overlapping Markup</title>
 <link>http://www.jenitennison.com/blog/node/98</link>
 <description>&lt;p&gt;In my &lt;a href=&quot;http://www.jenitennison.com/blog/node/97&quot; title=&quot;Jeni&#039;s Musings: Representing Overlap in XML&quot;&gt;last post&lt;/a&gt; I talked about different techniques for representing overlap within XML. One technique is fragmentation. In the work that I&amp;#8217;ve been doing, I&amp;#8217;ve been using milestone-based formats similar to &lt;a href=&quot;http://www.lmnl.org/wiki/index.php/ECLIX&quot; title=&quot;LMNL Wiki: Extended Canonical LMNL in XML&quot;&gt;ECLIX&lt;/a&gt;, but my eyes were opened at the &lt;a href=&quot;http://ilps.science.uva.nl/PoliticalMashup/2008/11/workshop-on-multi-dimensional-markup/&quot; title=&quot;Workshop on Multi-Dimensional Markup&quot;&gt;GODDAG workshop&lt;/a&gt;: fragmentation would make overlap so much easier to process in XSLT, especially when dealing with localised overlap such as revision or comment markup.&lt;/p&gt;

&lt;p&gt;But how could fragmentation be used with full-on overlap? I had a little play and came up with &lt;a href=&quot;http://www.jenitennison.com/blog/files/fragmentation-utils.xsl&quot; title=&quot;fragmentation-utils.xsl&quot;&gt;some XSLT to demonstrate&lt;/a&gt;.&lt;/p&gt;

&lt;!--break--&gt;

&lt;h2&gt;Fragmentation Example&lt;/h2&gt;

&lt;p&gt;First, an example of how to represent overlap using fragments. Using fragments for overlap comes straight out of the support for &lt;a href=&quot;http://www.tei-c.org/release/doc/tei-p5-doc/en/html/SA.html#SAAG&quot; title=&quot;TEI: Linking, Segmentation, and Alignment: Aggregation&quot;&gt;aggregation within TEI&lt;/a&gt; where they&amp;#8217;re used to not only represent overlapping structures but to construct completely new &amp;#8220;virtual&amp;#8221; elements that are not necessarily contiguous (and may even contain fragments in different orders from how they appear in the text). In TEI, they usually use the &lt;code&gt;next&lt;/code&gt; and &lt;code&gt;prev&lt;/code&gt; attributes to point from one fragment to another in order to reconstruct the element.&lt;/p&gt;

&lt;p&gt;In the example here, I&amp;#8217;ve done something slightly different, namely to use an ID in the &lt;code&gt;http://www.jenitennison.com/xslt/fragmentation&lt;/code&gt; namespace to link the elements: all elements with the same ID are actually the same element. Here&amp;#8217;s what it looks like.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;book xmlns:f=&quot;http://www.jenitennison.com/xslt/fragmentation&quot;&amp;gt;
    &amp;lt;page f:id=&quot;page199&quot; n=&quot;199&quot;&amp;gt;
        ...
    &amp;lt;/page&amp;gt;
    &amp;lt;poem&amp;gt;
        &amp;lt;page f:id=&quot;page199&quot; n=&quot;199&quot;&amp;gt;
            &amp;lt;title&amp;gt;
                &amp;lt;pl&amp;gt;Recueillement&amp;lt;/pl&amp;gt;
            &amp;lt;/title&amp;gt;
            &amp;lt;stanza&amp;gt;
                &amp;lt;sl&amp;gt;&amp;lt;s&amp;gt;&amp;lt;pl&amp;gt;Sois sage, ô ma douleur, et tiens-toi plus &amp;lt;/pl&amp;gt;
                                                          &amp;lt;pl&amp;gt;tranquille.&amp;lt;/pl&amp;gt;&amp;lt;/s&amp;gt;&amp;lt;/sl&amp;gt;
                &amp;lt;s&amp;gt;
                    &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Tu réclamais le Soir; il descend; le voici:&amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
                    &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Une atmosphère obscure enveloppe la ville,&amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
                    &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Aux uns portant la paix, aux autres le souci.&amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
                &amp;lt;/s&amp;gt;
            &amp;lt;/stanza&amp;gt;
            &amp;lt;stanza&amp;gt;
              &amp;lt;s f:id=&quot;s3&quot;&amp;gt;
                  &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Pendant que des mortels la multitude vile,&amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
                  &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Sous le fouet du Plaisir, ce bourreau sans merci,&amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
                  &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Va cueillir des remords dans la fête servile,&amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
                  &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Ma douleur, donne moi la main; viens par ici,&amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
        &amp;lt;/s&amp;gt;
            &amp;lt;/stanza&amp;gt;
            &amp;lt;stanza&amp;gt;
                &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;&amp;lt;s f:id=&quot;s3&quot;&amp;gt;Loin d&#039;eux. &amp;lt;/s&amp;gt;&amp;lt;s f:id=&quot;s4&quot;&amp;gt;Vois se pencher les défuntes &amp;lt;/s&amp;gt;&amp;lt;/pl&amp;gt;
                                                                        &amp;lt;pl&amp;gt;&amp;lt;s f:id=&quot;s4&quot;&amp;gt;Années, &amp;lt;/s&amp;gt;&amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
                &amp;lt;s f:id=&quot;s4&quot;&amp;gt;
                  &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Sur les balcons du ciel, en robes surannées; &amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
                  &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Surgir du fond des eaux le Regret souriant; &amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
        &amp;lt;/s&amp;gt;
            &amp;lt;/stanza&amp;gt;
        &amp;lt;/page&amp;gt;
        &amp;lt;page f:id=&quot;page200&quot;&amp;gt;
            &amp;lt;stanza&amp;gt;
              &amp;lt;s f:id=&quot;s4&quot;&amp;gt;
                  &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Le Soleil moribund s&#039;endormir sous une arche, &amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
                &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Et, comme un long linceul traînant à l&#039;Orient, &amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
                &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Entends, ma chère, entends la douce Nuit qui &amp;lt;/pl&amp;gt;
                                                               &amp;lt;pl&amp;gt;marche.&amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
                &amp;lt;/s&amp;gt;
            &amp;lt;/stanza&amp;gt;
        &amp;lt;/page&amp;gt;
    &amp;lt;/poem&amp;gt;
&amp;lt;/book&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can see that I&amp;#8217;ve played pretty fast and loose here with the markup language. The &lt;code&gt;&amp;lt;s&amp;gt;&lt;/code&gt; elements can be children of &lt;code&gt;&amp;lt;stanza&amp;gt;&lt;/code&gt; or &lt;code&gt;&amp;lt;sl&amp;gt;&lt;/code&gt; or even &lt;code&gt;&amp;lt;pl&amp;gt;&lt;/code&gt;, purely depending on what happens to most neatly contain them. This makes the XML inconsistent, but less verbose than it would otherwise be. Elements that are actually fragments have &lt;code&gt;f:id&lt;/code&gt; attributes, and multiple elements may have the same &lt;code&gt;f:id&lt;/code&gt;; this is precisely what&amp;#8217;s used to work out that they&amp;#8217;re the same element.&lt;/p&gt;

&lt;h2&gt;Desired Rendering&lt;/h2&gt;

&lt;p&gt;So what would we like to do when processing this? Say we wanted to create an HTML rendition of the poem, looking something like:&lt;/p&gt;

&lt;blockquote style=&quot;width: 30em; &quot;&gt;
  &lt;hr /&gt;
  &lt;p style=&quot;text-align: right; &quot;&gt;page 199&lt;/p&gt;
  &lt;h3&gt;Recueillement&lt;/h3&gt;
  &lt;ol start=&quot;1&quot;&gt;
     &lt;li&gt;
        &lt;p&gt;Sois sage, ô ma douleur, et tiens-toi plus &lt;/p&gt;
        &lt;p style=&quot;text-align: right; &quot;&gt;tranquille.&lt;/p&gt;
     &lt;/li&gt;
     &lt;li&gt;
        &lt;p&gt;Tu réclamais le Soir; il descend; le voici:&lt;/p&gt;
     &lt;/li&gt;
     &lt;li&gt;
        &lt;p&gt;Une atmosphère obscure enveloppe la ville,&lt;/p&gt;
     &lt;/li&gt;
     &lt;li&gt;
        &lt;p&gt;Aux uns portant la paix, aux autres le souci.&lt;/p&gt;
     &lt;/li&gt;
  &lt;/ol&gt;
  &lt;ol start=&quot;5&quot;&gt;
     &lt;li&gt;
        &lt;p&gt;Pendant que des mortels la multitude vile,&lt;/p&gt;
     &lt;/li&gt;
     &lt;li&gt;
        &lt;p&gt;Sous le fouet du Plaisir, ce bourreau sans merci,&lt;/p&gt;
     &lt;/li&gt;
     &lt;li&gt;
        &lt;p&gt;Va cueillir des remords dans la fête servile,&lt;/p&gt;
     &lt;/li&gt;
     &lt;li&gt;
        &lt;p&gt;Ma douleur, donne moi la main; viens par ici,&lt;/p&gt;
     &lt;/li&gt;
  &lt;/ol&gt;
  &lt;ol start=&quot;9&quot;&gt;
     &lt;li style=&quot;background-color: yellow; &quot;&gt;
        &lt;p&gt;Loin d&amp;#8217;eux. Vois se pencher les défuntes &lt;/p&gt;
        &lt;p style=&quot;text-align: right; &quot;&gt;Années, &lt;/p&gt;
     &lt;/li&gt;
     &lt;li&gt;
        &lt;p&gt;Sur les balcons du ciel, en robes surannées; &lt;/p&gt;
     &lt;/li&gt;
     &lt;li&gt;
        &lt;p&gt;Surgir du fond des eaux le Regret souriant; &lt;/p&gt;
     &lt;/li&gt;
  &lt;/ol&gt;
  &lt;hr /&gt;
  &lt;p style=&quot;text-align: right; &quot;&gt;page 200&lt;/p&gt;
  &lt;ol start=&quot;12&quot;&gt;
     &lt;li&gt;
        &lt;p&gt;Le Soleil moribund s&amp;#8217;endormir sous une arche, &lt;/p&gt;
     &lt;/li&gt;
     &lt;li&gt;
        &lt;p&gt;Et, comme un long linceul traînant à l&amp;#8217;Orient, &lt;/p&gt;
     &lt;/li&gt;
     &lt;li&gt;
        &lt;p&gt;Entends, ma chère, entends la douce Nuit qui &lt;/p&gt;
        &lt;p style=&quot;text-align: right; &quot;&gt;marche.&lt;/p&gt;
     &lt;/li&gt;
  &lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;The logic behind this rendition is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;process the pages; for each one, create a horizontal rule followed by a paragraph giving the page number&lt;/li&gt;
&lt;li&gt;process the parts of the poem within each page; give the title if it has one in this fragment, followed by the stanzas&lt;/li&gt;
&lt;li&gt;create an ordered list for each stanza, starting at the number for the stanza line within the (whole) poem, and process the stanza lines&lt;/li&gt;
&lt;li&gt;create a list item for each stanza line; if the line contains parts of two sentences and the first of these sentences doesn&amp;#8217;t begin in this line, highlight it as this indicates an interesting overlap between prosodic and syntactic structures&lt;/li&gt;
&lt;li&gt;process the page lines within each stanza line; if there&amp;#8217;s more than one, align the second to the right&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It wouldn&amp;#8217;t be easy to express that logic against the fragmented XML above, for two reasons.&lt;/p&gt;

&lt;p&gt;First, the fragmented markup above is inconsistent: you can&amp;#8217;t tell what kinds of children a particular element will have and which elements will be fragmented. You could fix this in the markup by deciding, for example, that the prosodic hierarchy of book/poem/stanza/sl would be primary and all other elements fragmented as necessary; you could further decide which of the hierarchies would be secondary within this: whether an &lt;code&gt;&amp;lt;sl&amp;gt;&lt;/code&gt; element would hold &lt;code&gt;&amp;lt;s&amp;gt;&lt;/code&gt; or &lt;code&gt;&amp;lt;page&amp;gt;&lt;/code&gt; elements as children.&lt;/p&gt;

&lt;p&gt;Second, though, the different logical steps require the markup to be structured in different ways: 1 requires a physical hierarchy where the markup is primarily divided into pages; 2 and 3 require a prosodic hierarchy within the page, dividing the poem into stanzas and stanza lines; 4 requires a syntactic hierarchy, where the stanza lines are split into sentences; 5 requires switching back to the physical hierarchy to see the page lines within the stanza line.&lt;/p&gt;

&lt;p&gt;What you can do (and what I&amp;#8217;ve done) is to write a function to help with this kind of processing by switching between the different hierarchies as and when necessary.&lt;/p&gt;

&lt;h2&gt;Labelling Elements&lt;/h2&gt;

&lt;p&gt;To prepare for switching, you must annotate the elements in the document with an indication of the trees that they belong to. The trees can be called anything you like; for the example above, I could use the labels &amp;#8220;physical&amp;#8221; (book, page, page line), &amp;#8220;syntactic&amp;#8221; (book, poem, sentence) and &amp;#8220;prosodic&amp;#8221; (book, poem, stanza, stanza line). The idea of labelling elements based on a tree that they belong to comes from the &lt;a href=&quot;http://www.research.att.com/~divesh/papers/jlssw2004-mct.pdf&quot; title=&quot;Colorful XML: One Hierarchy Isn&#039;t Enough&quot;&gt;multi-coloured trees&lt;/a&gt; technique, but I think it&amp;#8217;s more useful to use meaningful labels if you can.&lt;/p&gt;

&lt;p&gt;You could imagine a built-in extension element that allowed you to describe the trees that different elements belonged to, and the annotation happening at the level of the XPath Data Model as its created.  But to make things easier I&amp;#8217;m using a &lt;code&gt;f:trees&lt;/code&gt; attribute on each element. Adding the attribute can be done in XSLT with code like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:template match=&quot;*&quot; mode=&quot;annotate&quot;&amp;gt;
  &amp;lt;xsl:copy&amp;gt;
    &amp;lt;xsl:attribute name=&quot;f:trees&quot;&amp;gt;
      &amp;lt;xsl:apply-templates select=&quot;.&quot; mode=&quot;trees&quot; /&amp;gt;
    &amp;lt;/xsl:attribute&amp;gt;
    &amp;lt;xsl:copy-of select=&quot;@*&quot; /&amp;gt;
    &amp;lt;xsl:apply-templates mode=&quot;annotate&quot; /&amp;gt;
  &amp;lt;/xsl:copy&amp;gt;
&amp;lt;/xsl:template&amp;gt;

&amp;lt;xsl:template match=&quot;book&quot; mode=&quot;trees&quot;&amp;gt;prosodic syntactic physical&amp;lt;/xsl:template&amp;gt;
&amp;lt;xsl:template match=&quot;poem&quot; mode=&quot;trees&quot;&amp;gt;prosodic syntactic&amp;lt;/xsl:template&amp;gt;
&amp;lt;xsl:template match=&quot;title | stanza | sl&quot; mode=&quot;trees&quot;&amp;gt;prosodic&amp;lt;/xsl:template&amp;gt;
&amp;lt;xsl:template match=&quot;s&quot; mode=&quot;trees&quot;&amp;gt;syntactic&amp;lt;/xsl:template&amp;gt;
&amp;lt;xsl:template match=&quot;page | pl&quot; mode=&quot;trees&quot;&amp;gt;physical&amp;lt;/xsl:template&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Once elements are labelled with the trees they belong to, it&amp;#8217;s possible to work out dominance hierarchies. An element A is a descendant of an element B if the elements share a tree and A starts and ends within B. If A is within B but they don&amp;#8217;t appear in the same tree, then the containment is happenstance and does not imply dominance.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;em&gt;Note: Trees should be defined so that all the elements within a given tree fit within each other without fragmenting. I haven&amp;#8217;t considered how self-overlap should be handled here; the elements need to be part of the same tree, but they can still overlap and therefore be fragmented even when that particular tree is primary. In my experience, self-overlap usually occurs in situations like comments or revision markup, in which the self-overlapping markup is never primary anyway, so I&amp;#8217;m not sure how serious this issue is.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;Swapping Hierarchies&lt;/h2&gt;

&lt;p&gt;Once the elements are annotated, it&amp;#8217;s possible to swap between hierarchies. The function I&amp;#8217;ve written &amp;#8212; &lt;code&gt;f:swap()&lt;/code&gt; &amp;#8212; takes two or three arguments. The first is an element, and the &lt;code&gt;f:swap()&lt;/code&gt; function returns this same element (actually a copy) but with its children, and possibly its parents, rearranged based on the trees listed in the second argument. The third argument defaults to the element specified as the first argument and provides a starting point from which the rearrangement takes place; the two most useful values for this argument are the element itself (which means that its children are restructured) and the root of the tree (which means that the entire document is rearranged).&lt;/p&gt;

&lt;p&gt;Some examples will help make this clearer. Starting with the poem above, to get the rendering I want, I need to swap to a &amp;#8220;physical&amp;#8221; view and process the pages:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:apply-templates select=&quot;$annotated/book/f:swap(., &#039;physical&#039;)/page&quot; /&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;code&gt;f:swap()&lt;/code&gt; call here returns the &lt;code&gt;&amp;lt;book&amp;gt;&lt;/code&gt; element but with its descendants rearranged so that the physical hierarchy is primary. The new version of the &lt;code&gt;&amp;lt;book&amp;gt;&lt;/code&gt; element will have &lt;code&gt;&amp;lt;page&amp;gt;&lt;/code&gt; children, which will themselves have &lt;code&gt;&amp;lt;pl&amp;gt;&lt;/code&gt; children. The &lt;code&gt;&amp;lt;pl&amp;gt;&lt;/code&gt; elements will contain fragments of &lt;code&gt;&amp;lt;poem&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;s&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;stanza&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;sl&amp;gt;&lt;/code&gt; elements, nested purely based on their happenstance containment within a particular &lt;code&gt;&amp;lt;pl&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Here&amp;#8217;s the code for processing the &lt;code&gt;&amp;lt;page&amp;gt;&lt;/code&gt; elements:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:template match=&quot;page&quot;&amp;gt;
  &amp;lt;hr /&amp;gt;
  &amp;lt;p style=&quot;text-align: right; &quot;&amp;gt;page &amp;lt;xsl:value-of select=&quot;@n&quot; /&amp;gt;&amp;lt;/p&amp;gt;
  &amp;lt;xsl:apply-templates select=&quot;f:swap(., (&#039;prosodic&#039;, &#039;syntactic&#039;))/poem&quot; /&amp;gt;
&amp;lt;/xsl:template&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So each &lt;code&gt;&amp;lt;page&amp;gt;&lt;/code&gt; generates a horizontal rule, a paragraph containing the page number and then processes&amp;#8230; here the switch is from the physical hierarchy to a prosodic/syntactic hierarchy. The list of two items as the second argument of &lt;code&gt;f:swap()&lt;/code&gt; means that the primary hierarchy is prosodic (poems, containing stanzas, containing stanza lines), but once you reach the bottom of the prosodic hierarchy (the stanza lines) you switch to a syntactic hierarchy (sentences) rather than a physical hierarchy (page lines).&lt;/p&gt;

&lt;p&gt;The fact that the &lt;code&gt;f:swap()&lt;/code&gt; call above only has two arguments means that the rearrangement starts from the &lt;code&gt;&amp;lt;page&amp;gt;&lt;/code&gt; element that&amp;#8217;s being processed. The ancestry of the &lt;code&gt;&amp;lt;page&amp;gt;&lt;/code&gt; element itself stays the same, and only its content is rearranged according to the views specified in the second argument. So in this case the &lt;code&gt;&amp;lt;poem&amp;gt;&lt;/code&gt; elements that a given &lt;code&gt;&amp;lt;page&amp;gt;&lt;/code&gt; contains will be fragments.&lt;/p&gt;

&lt;p&gt;Processing the poems can continue in the normal way:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:template match=&quot;poem&quot;&amp;gt;
  &amp;lt;xsl:apply-templates select=&quot;title&quot; /&amp;gt;
  &amp;lt;xsl:apply-templates select=&quot;stanza&quot; /&amp;gt;
&amp;lt;/xsl:template&amp;gt;

&amp;lt;xsl:template match=&quot;title&quot;&amp;gt;
  &amp;lt;h3&amp;gt;&amp;lt;xsl:value-of select=&quot;.&quot; /&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;/xsl:template&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The next difficulty appears when I want to start the numbering for a particular stanza based on the number of the first line within the stanza. I&amp;#8217;m doing this by setting the &lt;code&gt;start&lt;/code&gt; attribute like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:template match=&quot;stanza&quot;&amp;gt;
  &amp;lt;ol&amp;gt;
    &amp;lt;xsl:attribute name=&quot;start&quot;&amp;gt;
      &amp;lt;xsl:number select=&quot;f:swap(., &#039;prosodic&#039;, /)/sl[1]&quot; 
        count=&quot;sl&quot; from=&quot;poem&quot; level=&quot;any&quot; /&amp;gt;
    &amp;lt;/xsl:attribute&amp;gt;
    &amp;lt;xsl:apply-templates select=&quot;sl&quot; /&amp;gt;
  &amp;lt;/ol&amp;gt;
&amp;lt;/xsl:template&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This illustrates the three-argument version of the &lt;code&gt;f:swap()&lt;/code&gt; function. To number the stanza line, I need to know the number of that stanza line within the poem that contains it. That would be easy to do with &lt;code&gt;&amp;lt;xsl:number&amp;gt;&lt;/code&gt; (or in other ways), but for the fact that the &lt;code&gt;&amp;lt;poem&amp;gt;&lt;/code&gt; element the &lt;code&gt;&amp;lt;stanza&amp;gt;&lt;/code&gt; element appears in is currently fragmented between two &lt;code&gt;&amp;lt;page&amp;gt;&lt;/code&gt; elements. To work out the number of the line, I really want an XML document in which the physical hierarchy is completely ignored, and the elements are arranged &lt;code&gt;book/poem/stanza/sl&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The three-argument version of &lt;code&gt;f:swap()&lt;/code&gt; allows me to swap to a prosodic hierarchy, starting from the very root of the document. It returns the element given as the first argument as it appears within the new hierarchy. Unlike the two-argument version, which only affects the descendants of the first argument, the three-argument version may also affect its ancestors, and even merge the element if it&amp;#8217;s originally fragmented or split it if it doesn&amp;#8217;t appear in the primary hierarchy. In this example, the returned &lt;code&gt;&amp;lt;stanza&amp;gt;&lt;/code&gt; element&amp;#8217;s parent &lt;code&gt;&amp;lt;poem&amp;gt;&lt;/code&gt; is a child of the &lt;code&gt;&amp;lt;book&amp;gt;&lt;/code&gt; element rather than being a fragmented child of the &lt;code&gt;&amp;lt;page&amp;gt;&lt;/code&gt; element.&lt;/p&gt;

&lt;p&gt;The rearrangement for the purposes of computing the start number for the list doesn&amp;#8217;t affect the tree that&amp;#8217;s being processed; the template for the &lt;code&gt;&amp;lt;stanza&amp;gt;&lt;/code&gt; elements goes on to process the &lt;code&gt;&amp;lt;sl&amp;gt;&lt;/code&gt; elements it contains, which use this template:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:template match=&quot;sl&quot;&amp;gt;
  &amp;lt;li&amp;gt;
    &amp;lt;xsl:if test=&quot;count(s) &amp;gt; 1 and not(f:first(s[1]))&quot;&amp;gt;
      &amp;lt;xsl:attribute name=&quot;style&quot;&amp;gt;background-color: yellow; &amp;lt;/xsl:attribute&amp;gt;
    &amp;lt;/xsl:if&amp;gt;
    &amp;lt;xsl:apply-templates select=&quot;f:swap(., &#039;physical&#039;)/pl&quot; /&amp;gt;
  &amp;lt;/li&amp;gt;
&amp;lt;/xsl:template&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Recall that the hierarchy currently being processed is a prosodic/syntactic hierarchy. The &lt;code&gt;&amp;lt;sl&amp;gt;&lt;/code&gt; elements contain &lt;code&gt;&amp;lt;s&amp;gt;&lt;/code&gt; elements, and it&amp;#8217;s therefore possible to check whether the &lt;code&gt;&amp;lt;sl&amp;gt;&lt;/code&gt; element being processed contains more than one &lt;code&gt;&amp;lt;s&amp;gt;&lt;/code&gt;. The &lt;code&gt;f:first()&lt;/code&gt; function checks whether a given fragment is the first fragment of that element, so the test in the &lt;code&gt;&amp;lt;xsl:if&amp;gt;&lt;/code&gt; in this template checks whether the &lt;code&gt;&amp;lt;sl&amp;gt;&lt;/code&gt; contains more than one &lt;code&gt;&amp;lt;s&amp;gt;&lt;/code&gt; and the first &lt;code&gt;&amp;lt;s&amp;gt;&lt;/code&gt; is not the first fragment of the sentence it represents.&lt;/p&gt;

&lt;p&gt;To get the rendering I want, I need to generate an HTML paragraph for each page line within the stanza line. Currently the &lt;code&gt;&amp;lt;sl&amp;gt;&lt;/code&gt; elements contain &lt;code&gt;&amp;lt;s&amp;gt;&lt;/code&gt; elements, so to get the page lines I need to switch once more to the physical hierarchy and process the &lt;code&gt;&amp;lt;pl&amp;gt;&lt;/code&gt; elements that are children of this &lt;code&gt;&amp;lt;sl&amp;gt;&lt;/code&gt;. That processing is done by the template:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:template match=&quot;pl&quot;&amp;gt;
  &amp;lt;p&amp;gt;
    &amp;lt;xsl:if test=&quot;preceding-sibling::pl&quot;&amp;gt;
      &amp;lt;xsl:attribute name=&quot;style&quot;&amp;gt;text-align: right; &amp;lt;/xsl:attribute&amp;gt;
    &amp;lt;/xsl:if&amp;gt;
    &amp;lt;xsl:value-of select=&quot;.&quot; /&amp;gt;
  &amp;lt;/p&amp;gt;
&amp;lt;/xsl:template&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and we&amp;#8217;re done.&lt;/p&gt;

&lt;h2&gt;Final Thoughts&lt;/h2&gt;

&lt;p&gt;The one thing that concerns me about the approach I&amp;#8217;m taking is the fact that because XSLT can&amp;#8217;t actually amend an existing tree, the &lt;code&gt;f:swap()&lt;/code&gt; function essentially makes a copy of the entire tree every time you use it, and I don&amp;#8217;t know how well that will scale (both in terms of memory and in terms of work copying elements) when you get to documents that are larger than this toy example. Maybe processors are clever enough to discard trees they no longer need so it won&amp;#8217;t be an issue; I just don&amp;#8217;t know.&lt;/p&gt;

&lt;p&gt;Other than that, I think this approach is promising because it enables users to mostly use familiar tree-processing approaches rather than having to learn new paradigms for transforming overlapping markup or introducing a raft of new axes.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/98#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/5">xslt</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/9">overlapping markup</category>
 <enclosure url="http://www.jenitennison.com/blog/files/fragmentation-utils.xsl" length="6042" type="text/xml" />
 <pubDate>Sun, 28 Dec 2008 20:15:56 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">98 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Representing Overlap in XML</title>
 <link>http://www.jenitennison.com/blog/node/97</link>
 <description>&lt;p&gt;I&amp;#8217;m still on an overlap jag. I&amp;#8217;ve shown some examples in the &lt;a href=&quot;http://www.jenitennison.com/blog/node/95&quot; title=&quot;Jeni&#039;s Musings: Overlap, Containment and Dominance&quot;&gt;last couple&lt;/a&gt; &lt;a href=&quot;http://www.jenitennison.com/blog/node/96&quot; title=&quot;Jeni&#039;s Musings: Essential Hierarchy&quot;&gt;of posts&lt;/a&gt; of &lt;a href=&quot;http://decentius.aksis.uib.no/mlcd/2003/Papers/texmecs.html&quot; title=&quot;TexMECS&quot;&gt;TexMECS&lt;/a&gt;, &lt;a href=&quot;http://www.xconcur.org/&quot; title=&quot;XCONCUR&quot;&gt;XCONCUR&lt;/a&gt; and &lt;a href=&quot;http://www.lmnl.org/wiki/index.php/LMNL_syntax&quot; title=&quot;LMNL Wiki: LMNL syntax&quot;&gt;LMNL syntax&lt;/a&gt;, which depart from the usual well-formedness strictures in XML. But these syntaxes have one big problem: they&amp;#8217;re not XML. XML is well-known, well-understood, and has great tools available for it, for querying, transforming, and &lt;a href=&quot;http://www.w3.org/TR/xproc/&quot; title=&quot;XProc: An XML Pipeline Language&quot;&gt;pipelining&lt;/a&gt;. So it would be a real win if overlap could be represented within XML in a usable manner.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;XML syntaxes for overlap, such as in &lt;a href=&quot;http://www.tei-c.org/index.xml&quot; title=&quot;TEI: Text Encoding Initiative&quot;&gt;TEI&lt;/a&gt; or in &lt;a href=&quot;http://www.lmnl.org/wiki/index.php/Alternative_Syntaxes&quot; title=&quot;LMNL Wiki: Alternative Syntaxes&quot;&gt;LMNL&lt;/a&gt;, adopt five different techniques:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;milestones:&lt;/strong&gt; one hierarchy is represented through normal XML markup; the others through empty elements (or, in some cases, processing instructions) that mark the start and end of structures that do not fit into that hierarchy&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;fragmentation:&lt;/strong&gt; one hierarchy is represented through normal XML markup: the others are represented through fragment elements that are linked together through their attributes (eg all XML elements that represent the same structure are given the same identifier)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;flattened:&lt;/strong&gt; all start and end tags are represented by milestones within a single meaningless root element&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;multiple document:&lt;/strong&gt; the document is split into multiple documents, each with a different set of elements within them. A particular element may be present in more than one of these documents, and of course the textual content (known as the &lt;strong&gt;frontier&lt;/strong&gt;) remains the same in each&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;standoff:&lt;/strong&gt; the frontier is kept in a single place, perhaps including a common (&lt;strong&gt;sacred&lt;/strong&gt;) hierarchy and all other overlapping structures are represented by elements that point to their start and end within that content (either using offsets or &lt;a href=&quot;http://www.w3.org/TR/xptr-framework/&quot; title=&quot;W3C: XPointer Framework&quot;&gt;XPointers&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There&amp;#8217;s been some really good research at the University of Bologna on &lt;a href=&quot;http://upsilon.cc/~zack/research/publications/nrhm-overlapping-conversions.pdf&quot; title=&quot;Towards the unification of formats for overlapping markup&quot;&gt;how to translate between formats that use these techniques&lt;/a&gt; (as well as LMNL and TexMECS syntax). What I want to look at here is when and why it might be appropriate to use each of them.&lt;/p&gt;

&lt;p&gt;All these representations are useful in their own ways and in different situations. I&amp;#8217;m going to talk a bit about the payoffs here. Here&amp;#8217;s a summary of pros and cons:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;technique&lt;/th&gt;
      &lt;th&gt;advantages&lt;/th&gt;
      &lt;th&gt;disadvantages&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th align=&quot;left&quot; valign=&quot;top&quot;&gt;milestones&lt;/th&gt;
      &lt;td valign=&quot;top&quot;&gt;easy to see main structure&lt;/td&gt;
      &lt;td valign=&quot;top&quot;&gt;favours one main structure;&lt;br /&gt;hard to identify content of overlapping structures&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th align=&quot;left&quot; valign=&quot;top&quot;&gt;fragmentation&lt;/th&gt;
      &lt;td valign=&quot;top&quot;&gt;easy to see main structure;&lt;br /&gt;easy to work out content of overlapping structures&lt;/td&gt;
      &lt;td valign=&quot;top&quot;&gt;favours one main structure;&lt;br /&gt;leads to spurious containment;&lt;br /&gt;can lead to discontinuous elements&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th align=&quot;left&quot; valign=&quot;top&quot;&gt;flattened&lt;/th&gt;
      &lt;td valign=&quot;top&quot;&gt;all structures treated equally&lt;/td&gt;
      &lt;td valign=&quot;top&quot;&gt;hard to see any structure;&lt;br /&gt;hard to process naturally using XML tools&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th align=&quot;left&quot; valign=&quot;top&quot;&gt;stand-off&lt;/th&gt;
      &lt;td valign=&quot;top&quot;&gt;all structures treated equally&lt;/td&gt;
      &lt;td valign=&quot;top&quot;&gt;hard to see any structure;&lt;br /&gt;hard to process naturally using XML tools&lt;br /&gt;hard to edit without tools&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th align=&quot;left&quot; valign=&quot;top&quot;&gt;multiple document&lt;/th&gt;
      &lt;td valign=&quot;top&quot;&gt;easy to see individual structures&lt;/td&gt;
      &lt;td valign=&quot;top&quot;&gt;content gets repeated;&lt;br /&gt;complex to align structures;&lt;br /&gt;hard to do cross-hierarchy analysis;&lt;br /&gt;hard to edit without tools&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;So first, there&amp;#8217;s the ability of the format to represent different kinds of overlap or support specialised tasks. Multiple separate well-formed documents, for example, can&amp;#8217;t represent self-overlapping markup unless you have some variable, and possibly increasing, number of documents based on how much self-overlap there is. Fragmentation naturally supports discontinuous elements in a way that the other methods don&amp;#8217;t. Stand-off markup lets you mark up other people&amp;#8217;s documents without having write permissions on them. You might be constrained to use, or avoid, a particular technique simply because of what kind of overlap you&amp;#8217;re dealing with.&lt;/p&gt;

&lt;p&gt;Second, there&amp;#8217;s editability. Milestones, fragments and (arguably) flattened structures are almost as easy to edit by hand as normal XML (that is, straight-forward for geeks, impossible for normal people). Stand-off markup (depending a little on how the marked up parts of the document are referenced) and multiple documents really require specific editing tools both for adding markup and changing the content of the document. I&amp;#8217;m generally of the opinion that tools should never been a necessity, but if you&amp;#8217;re creating an editor then you&amp;#8217;re probably going to use stand-off markup or multiple documents behind the scenes.&lt;/p&gt;

&lt;p&gt;Third, there&amp;#8217;s how much, and how easily, you can use standard XML tools to process the documents. If you were using XSLT (as I usually am), and were presented with multiple documents, stand-off markup or flattened structures, you&amp;#8217;d want someone to translate them into a milestoned or fragmented structure before you did anything. I have done quite a few transforms in which the documents were represented using a milestone technique, for example the changes within the &lt;a href=&quot;http://www.opsi.gov.uk/legislation/revised&quot; title=&quot;OPSI: Revised Legislation&quot;&gt;revised statutes on the OPSI website&lt;/a&gt;, which all have to be highlighted in blue, surrounded by square brackets and have a link at the start, and they&amp;#8217;re fiddly (especially in XSLT 1.0), though tractable. Fragmented markup, on the other hand, would be much easier to process.&lt;/p&gt;

&lt;p&gt;Finally, there&amp;#8217;s the issue of whether one particular hierarchy has prominence within the XML. In some examples of overlap, particularly the ones I&amp;#8217;m concerned with such as comments and revisions, there&amp;#8217;s an obvious primary hierarchy (the main document markup) with the others being secondary. This makes techniques such as milestones and fragmentation particularly appropriate. On the other hand, when there are multiple equal hierarchies, particularly when the two hierarchies use elements with the same names (such as marking up the pages in two editions of the same book), it might seem strange to choose one over another. You either need a neutral format (such as flattened markup, stand-off markup or multiple documents) or, in processing, the ability to easily switch between different primary hierarchies.&lt;/p&gt;

&lt;p&gt;So I don&amp;#8217;t think it&amp;#8217;s ever going to be possible to say &amp;#8220;this is &lt;em&gt;the&lt;/em&gt; way in which you should mark up your overlap using XML&amp;#8221;. However, I do think it would be really useful to have standard vocabularies for marking up overlap in XML, mostly in the form of namespaced attributes. If we had a set of model-neutral (ie not LMNL, nor GODDAG, nor XCONCUR, nor &amp;#8230;) and markup-language-neutral (ie not TEI, nor &amp;#8230;) vocabularies for representing overlap, we could start constructing querying, transformation and validation tools that would be useful across a range of projects. (I was thinking an RFC, but I have no idea how to go about it.)&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/97#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/9">overlapping markup</category>
 <pubDate>Thu, 18 Dec 2008 21:44:57 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">97 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Essential Hierarchy</title>
 <link>http://www.jenitennison.com/blog/node/96</link>
 <description>&lt;p&gt;In my &lt;a href=&quot;http://www.jenitennison.com/blog/node/95&quot; title=&quot;Overlap, Containment and Dominance&quot;&gt;last post&lt;/a&gt; I discussed the kinds of situations where overlapping markup can appear in documents, and the distinction between &lt;em&gt;containment&lt;/em&gt;, when one element happens to contain another, and &lt;em&gt;dominance&lt;/em&gt;, where the relationship between the two elements is more meaningful. Here I&amp;#8217;ll expand a bit more on the issue of whether dominance relationships are or should be part of the essential information in the document.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;I wrote:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;So an important challenge is how to get from a flat, containment-only model to a DAG. There are four approaches that can be taken:&lt;/p&gt;
  
  &lt;ol&gt;
  &lt;li&gt;For any document, for each pair of range names A and B, if every range named A contains a range named B, then assume that A dominates B; from that set of relationships, create a DAG.&lt;/li&gt;
  &lt;li&gt;Introduce additional syntax into tags, such that dominance relationships between ranges can be expressed explicitly within the serialisation.&lt;/li&gt;
  &lt;li&gt;Associate each document with a schema, and use the model expressed in the schema to identify dominance relationships; a Creole schema like the one above could be taking as asserting that poems dominate stanzas, for example, since stanzas are mentioned in the content model of the poem range.&lt;/li&gt;
  &lt;li&gt;Defer the construction of a DAG to the point of processing; a document would then not be a DAG in and of itself, but only in relation to a particular process.&lt;/li&gt;
  &lt;/ol&gt;
  
  &lt;p&gt;I find the last of these the most satisfactory. 1 is too arbitrary. 2 requires too much syntax. 3 requires a single schema per document (which, from experience with XML, I think is a broken model).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href=&quot;http://www.jclark.com/&quot; title=&quot;James Clark&quot;&gt;James Clark&lt;/a&gt; commented:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;I’m surprised by your rejection of approach 2 on the grounds that it “requires too much syntax”. I would be inclined to start by designing the information model first and then figure out a syntax to represent that information model. Maybe I’m just brainwashed by too much XML/SGML, but the hierarchical relationships seem like a fundamental aspect of the information about the document which the markup should be capturing explicitly.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So I shall expand upon my rather throw-away rejection of the option of using additional syntax within tags to express dominance relationships. There are two parts to it: philosophical and pragmatic.&lt;/p&gt;

&lt;h2&gt;Philosophy&lt;/h2&gt;

&lt;p&gt;My philosophical objection applies to both the idea of indicating hierarchy within tags and using a schema (option 3 above). My attitude going into the development of &lt;a href=&quot;http://www.lmnl.org/wiki/index.php/LMNL_data_model&quot; title=&quot;LMNL data model&quot;&gt;LMNL&lt;/a&gt; (an attitude that might not be shared by the &lt;a href=&quot;http://www.lmnl.org/wiki/index.php/Ad_Hoc_LMNL_Committee&quot; title=&quot;Ad Hoc LMNL Committee&quot;&gt;other members of the ad hoc LMNL committee&lt;/a&gt;) has been to carry over as few assumptions as possible from the SGML/XML world, and to see how far we can get without those assumptions. So as well as overlap, LMNL has weird things like &lt;a href=&quot;http://www.lmnl.org/wiki/index.php/LMNL_data_model#Annotations&quot; title=&quot;LMNL: Annotations&quot;&gt;structured and ordered annotations&lt;/a&gt;, &lt;a href=&quot;http://www.lmnl.org/wiki/index.php/LMNL_data_model#Atoms&quot; title=&quot;LMNL: Atoms&quot;&gt;atoms&lt;/a&gt;, and &lt;a href=&quot;http://www.lmnl.org/wiki/index.php/LMNL_data_model#Ranges&quot; title=&quot;LMNL: Ranges&quot;&gt;anonymous ranges&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In that spirit, I want to see if we can get away with &lt;em&gt;not&lt;/em&gt; having hierarchy as a fundamental part of the information model. Does this allow us to do things that we couldn&amp;#8217;t otherwise do, or is it a burden? I don&amp;#8217;t know yet.&lt;/p&gt;

&lt;p&gt;I view this as somewhat similar to the questions around datatyping XML. Some people (seem to) think that elements and attributes have a particular datatype as part of their essential nature, others (myself amongst them) that two processes could reasonably view a document using different datatypes or no datatypes at all if it wasn&amp;#8217;t important for that particular processing. Just as with datatyping, the restrictive, &amp;#8220;there can be only one&amp;#8221;, attitude is fine for people whose documents fit that model, but causes problems for those whose don&amp;#8217;t. Conversely, if your documents need only ever be seen in one way, it won&amp;#8217;t (or shouldn&amp;#8217;t) hurt you that other people want to take a more permissive approach. So allowing processing to determine hierarchy seems more likely to satisfy more people.&lt;/p&gt;

&lt;p&gt;As a small illustration that hierarchy is in the eye of the beholder, take the &lt;a href=&quot;http://www.jenitennison.com/blog/node/95&quot; title=&quot;Overlap, Containment and Dominance&quot;&gt;poem from my previous post&lt;/a&gt;. During my talk at the workshop in Amsterdam I asserted that the stanza line (&lt;code&gt;sl&lt;/code&gt;) elements could hold (dominate) one or more page line (&lt;code&gt;pl&lt;/code&gt;) elements. &lt;a href=&quot;http://vitali.web.cs.unibo.it/&quot; title=&quot;Professor Fabio Vitali, University of Bologna&quot;&gt;Fabio Vitali&lt;/a&gt; objected strongly to this, saying that the relationship was simply one of containment. Shouldn&amp;#8217;t it be possible for us both to process the poem based on our differing views?&lt;/p&gt;

&lt;h2&gt;Pragmatism&lt;/h2&gt;

&lt;p&gt;There are several models floating around as possibilities for representing overlapping structures. I talked about two of them in my last post: the &lt;a href=&quot;http://www.lmnl.org/wiki/index.php/LMNL_data_model&quot; title=&quot;LMNL data model&quot;&gt;LMNL data model&lt;/a&gt; in which a document is basically a sequence of atoms (most of which will be characters) with annotated ranges over them, and the &lt;a href=&quot;http://www.w3.org/People/cmsmcq/2000/poddp2000.html&quot; title=&quot;GODDAG: A Data Structure for Overlapping Hierarchies&quot;&gt;GODDAG&lt;/a&gt; model in which a document is a directed acyclic graph (DAG) of nodes. The GODDAG model is closest to SGML/XML in that it views the hierarchy of elements as an fundamental part of the document.&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;http://decentius.aksis.uib.no/mlcd/2003/Papers/texmecs.html&quot; title=&quot;TexMECS&quot;&gt;TexMECS&lt;/a&gt; syntax, which is supposed to be amenable to representing GODDAG structures, only represents hierarchy through containment. If you want to read about it, &lt;a href=&quot;http://www.mapageweb.umontreal.ca/marcoux/&quot; title=&quot;Yves Marcoux&quot;&gt;Yves Marcoux&lt;/a&gt; did some &lt;a href=&quot;http://www.balisage.net/Proceedings/html/2008/Marcoux01/Balisage2008-Marcoux01.html&quot; title=&quot;Graph characterization of overlap-only TexMECS and other overlapping markup formalisms&quot;&gt;analysis of what is and isn&amp;#8217;t serialisable within TexMECS&lt;/a&gt; at &lt;a href=&quot;http://www.balisage.net/&quot; title=&quot;Balisage: The Markup Conference&quot;&gt;Balisage&lt;/a&gt; last year.&lt;/p&gt;

&lt;p&gt;In LMNL, we have the concept of limina which hold ranges (rather than atoms/characters), over which you can define other ranges. This gives us a structure roughly equivalent to a GODDAG. But despite a &lt;em&gt;lot&lt;/em&gt; of back and fro, we were never able to come up with a satisfactory serialisation. You might imagine an extension to the &lt;a href=&quot;http://www.lmnl.org/wiki/index.php/LMNL_syntax&quot; title=&quot;LMNL syntax&quot;&gt;LMNL syntax&lt;/a&gt; that goes something like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;[page=p199 [n}199{]}
...
[poem=poem}
  [title^poem}[pl^p199}Recueillement{pl]{title]
  [stanza=s1^poem}
    [s^poem}[sl^s1}[pl^p199}Sois sage, ô ma douleur, et tiens-toi plus {pl]
                                        [pl^p199}tranquille.{pl]{sl]{s]
    [s^poem}[sl^s1}[pl^p199}Tu réclamais le Soir; il descend; le voici:{pl]{sl]
    [sl^s1}[pl^p199}Une atmosphère obscure enveloppe la ville,{pl]{sl]
    [sl^s1}[pl^p199}Aux uns portant la paix, aux autres le souci.{pl]{sl]{s}
  {stanza]
  [stanza=s2^poem}
    [s^poem}[sl^s2}[pl^p199}Pendant que des mortels la multitude vile,{pl]{sl]
    [sl^s2}[pl^p199}Sous le fouet du Plaisir, ce bourreau sans merci,{pl]{sl]
    [sl^s2}[pl^p199}Va cueillir des remords dans la fête servile,{pl]{sl]
    [sl^s2}[pl^p199}Ma douleur, donne moi la main; viens par ici,{pl]{sl]
  {stanza]
  [stanza=s3^poem}
    [sl^s3}[pl^p199}Loin d&#039;eux.{s] [s^poem}Vois se pencher les défuntes {pl]
                                                [pl^p199}Années,{pl]{sl]
    [sl^s3}[pl^p199}Sur les balcons du ciel, en robes surannées;{pl]{sl]
    [sl^s3}[pl^p199}Surgir du fond des eaux le Regret souriant;{pl]{sl]
  {stanza]{page]
  [page=p200 [n}200{]}[stanza=s4^poem}
    [sl^s4}[pl^p200}Le Soleil moribund s&#039;endormir sous une arche,{pl]{sl]
    [sl^s4}[pl^p200}Et, comme un long linceul traînant à l&#039;Orient,{pl]{sl]
    [sl^s4}[pl^p200}Entends, ma chère, entends la douce Nuit qui {pl]
                                              [pl^p200}marche.{pl]{sl]{s]
  {stanza]
{poem]
...
{page]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;in which each range can point to its parent range via its ID (IDs being set with the standard &lt;code&gt;=id&lt;/code&gt; and the child relationship being indicated by &lt;code&gt;^id&lt;/code&gt;). Or various other ways of doing it, none of which are convincing.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://www.xconcur.org/&quot; title=&quot;XCONCUR&quot;&gt;XCONCUR&lt;/a&gt; (which is pretty much the same syntax as CONCUR in SGML, but using XML) does indicate hierarchy, using syntax like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;(phy)page n=&quot;199&quot;&amp;gt;
...
&amp;lt;(syn)poem&amp;gt;&amp;lt;(pro)poem&amp;gt;
  &amp;lt;(pro)title&amp;gt;&amp;lt;(syn)title&amp;gt;&amp;lt;(phy)pl&amp;gt;Recueillement&amp;lt;/(phy)pl&amp;gt;&amp;lt;/(syn)title&amp;gt;&amp;lt;/(pro)title&amp;gt;
  &amp;lt;(pro)stanza&amp;gt;
    &amp;lt;(syn)s&amp;gt;&amp;lt;(pro)sl&amp;gt;&amp;lt;(phy)pl&amp;gt;Sois sage, ô ma douleur, et tiens-toi plus &amp;lt;/(phy)pl&amp;gt;
                                        &amp;lt;(phy)pl&amp;gt;tranquille.&amp;lt;/(phy)pl&amp;gt;&amp;lt;/(pro)sl&amp;gt;&amp;lt;/(syn)s&amp;gt;
    &amp;lt;(syn)s&amp;gt;&amp;lt;(pro)sl&amp;gt;&amp;lt;(phy)pl&amp;gt;Tu réclamais le Soir; il descend; le voici:&amp;lt;/(phy)pl&amp;gt;&amp;lt;/(pro)sl&amp;gt;
    &amp;lt;(pro)sl&amp;gt;&amp;lt;(phy)pl&amp;gt;Une atmosphère obscure enveloppe la ville,&amp;lt;/(phy)pl&amp;gt;&amp;lt;/(pro)sl&amp;gt;
    &amp;lt;(pro)sl&amp;gt;&amp;lt;(phy)pl&amp;gt;Aux uns portant la paix, aux autres le souci.&amp;lt;/(phy)pl&amp;gt;&amp;lt;/(pro)sl&amp;gt;&amp;lt;/(syn)s&amp;gt;
  &amp;lt;/(pro)stanza&amp;gt;
  &amp;lt;(pro)stanza&amp;gt;
    &amp;lt;(syn)s&amp;gt;&amp;lt;(pro)sl&amp;gt;&amp;lt;(phy)pl&amp;gt;Pendant que des mortels la multitude vile,&amp;lt;/(phy)pl&amp;gt;&amp;lt;/(pro)sl&amp;gt;
    &amp;lt;(pro)sl&amp;gt;&amp;lt;(phy)pl&amp;gt;Sous le fouet du Plaisir, ce bourreau sans merci,&amp;lt;/(phy)pl&amp;gt;&amp;lt;/(pro)sl&amp;gt;
    &amp;lt;(pro)sl&amp;gt;&amp;lt;(phy)pl&amp;gt;Va cueillir des remords dans la fête servile,&amp;lt;/(phy)pl&amp;gt;&amp;lt;/(pro)sl&amp;gt;
    &amp;lt;(pro)sl&amp;gt;&amp;lt;(phy)pl&amp;gt;Ma douleur, donne moi la main; viens par ici,&amp;lt;/(phy)pl&amp;gt;&amp;lt;/(pro)sl&amp;gt;
  &amp;lt;/(pro)stanza&amp;gt;
  &amp;lt;(pro)stanza&amp;gt;
    &amp;lt;(pro)sl&amp;gt;&amp;lt;(phy)pl&amp;gt;Loin d&#039;eux.&amp;lt;/(syn)s&amp;gt; &amp;lt;(syn)s&amp;gt;Vois se pencher les défuntes &amp;lt;/(phy)pl&amp;gt;
                                                &amp;lt;(phy)pl&amp;gt;Années,&amp;lt;/(phy)pl&amp;gt;&amp;lt;/(pro)sl&amp;gt;
    &amp;lt;(pro)sl&amp;gt;&amp;lt;(phy)pl&amp;gt;Sur les balcons du ciel, en robes surannées;&amp;lt;/(phy)pl&amp;gt;&amp;lt;/(pro)sl&amp;gt;
    &amp;lt;(pro)sl&amp;gt;&amp;lt;(phy)pl&amp;gt;Surgir du fond des eaux le Regret souriant;&amp;lt;/(phy)pl&amp;gt;&amp;lt;/(pro)sl&amp;gt;
  &amp;lt;/(pro)stanza&amp;gt;&amp;lt;/(phy)page&amp;gt;
  &amp;lt;(phy)page n=&quot;200&quot;&amp;gt;&amp;lt;(pro)stanza&amp;gt;
    &amp;lt;(pro)sl&amp;gt;&amp;lt;(phy)pl&amp;gt;Le Soleil moribund s&#039;endormir sous une arche,&amp;lt;/(phy)pl&amp;gt;&amp;lt;/(pro)sl&amp;gt;
    &amp;lt;(pro)sl&amp;gt;&amp;lt;(phy)pl&amp;gt;Et, comme un long linceul traînant à l&#039;Orient,&amp;lt;/(phy)pl&amp;gt;&amp;lt;/(pro)sl&amp;gt;
    &amp;lt;(pro)sl&amp;gt;&amp;lt;(phy)pl&amp;gt;Entends, ma chère, entends la douce Nuit qui &amp;lt;/(phy)pl&amp;gt;
                                              &amp;lt;(phy)pl&amp;gt;marche.&amp;lt;/(phy)pl&amp;gt;&amp;lt;/(pro)sl&amp;gt;&amp;lt;/(syn)s&amp;gt;
  &amp;lt;/(pro)stanza&amp;gt;
&amp;lt;/(pro)poem&amp;gt;&amp;lt;/(syn)poem&amp;gt;
...
&amp;lt;/(phy)page&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;code&gt;pro&lt;/code&gt;, &lt;code&gt;syn&lt;/code&gt; and &lt;code&gt;phy&lt;/code&gt; labels in brackets before the element names indicate the hierarchy to which each element belongs. Elements can overlap if they have different labels. As you can see, this syntax means a lot of repetition for elements that belong to more than one hierarchy and there&amp;#8217;s a built-in limitation here regarding self-overlap (ie of elements that can overlap other elements with the same name).&lt;/p&gt;

&lt;p&gt;Now this isn&amp;#8217;t altogether rational, but I think that the fact that we haven&amp;#8217;t managed to come up with a good syntax that expresses hierarchy, even without the restrictions of XML well-formedness, is an indication that it&amp;#8217;s not meant to be. I am firmly of the opinion that simplicity and elegance are hallmarks of good design. If hierarchies can only be expressed through an ugly syntax, then it&amp;#8217;s just not worth it.&lt;/p&gt;

&lt;p&gt;Slightly more rationally: if the syntax for expressing hierarchies is that verbose and difficult to use, people won&amp;#8217;t use it, and we&amp;#8217;ll have to find a way to add dominance relationships programmatically. We might as well start from that point.&lt;/p&gt;

&lt;p&gt;But perhaps someone out there can come up with a clean, elegant syntax for expressing dominance within overlapping markup?&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/96#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/9">overlapping markup</category>
 <pubDate>Tue, 09 Dec 2008 17:59:32 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">96 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Overlap, Containment and Dominance</title>
 <link>http://www.jenitennison.com/blog/node/95</link>
 <description>&lt;p&gt;I&amp;#8217;ve spent the last few days at a &lt;a href=&quot;http://ilps.science.uva.nl/PoliticalMashup/2008/11/workshop-on-multi-dimensional-markup/&quot; title=&quot;Workshop on multi dimensional markup&quot;&gt;workshop on overlapping markup&lt;/a&gt; in Amsterdam. It was organised by &lt;a href=&quot;http://www.hf.uib.no/i/Filosofisk/claus/&quot; title=&quot;Claus Huitfeldt&quot;&gt;Claus Huitfeldt&lt;/a&gt; and &lt;a href=&quot;http://www.w3.org/People/cmsmcq/&quot; title=&quot;Michael Sperberg-McQueen&quot;&gt;Michael Sperberg-McQueen&lt;/a&gt; under a GODDAG banner, but included representatives of other approaches, such as the &lt;a href=&quot;http://www.xconcur.org/&quot; title=&quot;XCONCUR&quot;&gt;XCONCUR crowd&lt;/a&gt; and the &lt;a href=&quot;http://www.lmnl.org/wiki/&quot; title=&quot;LMNL Wiki&quot;&gt;LMNListas&lt;/a&gt; &lt;a href=&quot;http://www.piez.org/wendell/&quot; title=&quot;Wendell Piez&quot;&gt;Wendell&lt;/a&gt; and myself.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;Overlap is arguably the main remaining problem area for markup technologists. Capturing and analysing the overlap between poetic and syntactic structures in poems and plays helps academics gain a deeper understanding of the ways poetic technique has changed over time. And the complexities of structures in documents such as the Bible simply cannot be represented without allowing overlap to happen.&lt;/p&gt;

&lt;p&gt;But academic study aside, overlap is a really important problem because whenever we collaborate on documents and whenever we change documents, we create overlapping structures. One of the major projects that I&amp;#8217;ve worked on at &lt;a href=&quot;http://www.tso.co.uk/&quot; title=&quot;The Stationery Office&quot;&gt;TSO&lt;/a&gt; deals with publishing &lt;a href=&quot;http://www.opsi.gov.uk/legislation/revised&quot; title=&quot;OPSI: Revised Legislation&quot;&gt;consolidated legislation&lt;/a&gt;, showing the places where &amp;#8220;current&amp;#8221; legislation was amended over time from its original, enacted state. The authors of legislation care little for document structures, and amendments often overlap document structures such as paragraphs and list items, and each other.&lt;/p&gt;

&lt;h2&gt;An Example&lt;/h2&gt;

&lt;p&gt;I used the following example during my talk on the &lt;a href=&quot;http://www.lmnl.org/wiki/index.php/Creole&quot; title=&quot;Creole Schema Language&quot;&gt;Creole&lt;/a&gt; schema language during the workshop. It uses &lt;a href=&quot;http://decentius.aksis.uib.no/mlcd/2003/Papers/texmecs.html&quot; title=&quot;TexMECS&quot;&gt;TexMECS&lt;/a&gt; notation, in which &lt;code&gt;&amp;lt;name|&lt;/code&gt; is a start tag, &lt;code&gt;|name&amp;gt;&lt;/code&gt; an end tag and the normal XML syntax is used for attributes:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;page n=&quot;199&quot;|
...
&amp;lt;poem|
  &amp;lt;title|&amp;lt;pl|Recueillement|pl&amp;gt;|title&amp;gt;
  &amp;lt;stanza|
    &amp;lt;s|&amp;lt;sl|&amp;lt;pl|Sois sage, ô ma douleur, et tiens-toi plus |pl&amp;gt;
                                        &amp;lt;pl|tranquille.|pl&amp;gt;|sl&amp;gt;|s&amp;gt;
    &amp;lt;s|&amp;lt;sl|&amp;lt;pl|Tu réclamais le Soir; il descend; le voici:|pl&amp;gt;|sl&amp;gt;
    &amp;lt;sl|&amp;lt;pl|Une atmosphère obscure enveloppe la ville,|pl&amp;gt;|sl&amp;gt;
    &amp;lt;sl|&amp;lt;pl|Aux uns portant la paix, aux autres le souci.|pl&amp;gt;|sl&amp;gt;|s&amp;gt;
  |stanza&amp;gt;
  &amp;lt;stanza|
    &amp;lt;s|&amp;lt;sl|&amp;lt;pl|Pendant que des mortels la multitude vile,|pl&amp;gt;|sl&amp;gt;
    &amp;lt;sl|&amp;lt;pl|Sous le fouet du Plaisir, ce bourreau sans merci,|pl&amp;gt;|sl&amp;gt;
    &amp;lt;sl|&amp;lt;pl|Va cueillir des remords dans la fête servile,|pl&amp;gt;|sl&amp;gt;
    &amp;lt;sl|&amp;lt;pl|Ma douleur, donne moi la main; viens par ici,|pl&amp;gt;|sl&amp;gt;
  |stanza&amp;gt;
  &amp;lt;stanza|
    &amp;lt;sl|&amp;lt;pl|Loin d&#039;eux.|s&amp;gt; &amp;lt;s|Vois se pencher les défuntes |pl&amp;gt;
                                                &amp;lt;pl|Années,|pl&amp;gt;|sl&amp;gt;
    &amp;lt;sl|&amp;lt;pl|Sur les balcons du ciel, en robes surannées;|pl&amp;gt;|sl&amp;gt;
    &amp;lt;sl|&amp;lt;pl|Surgir du fond des eaux le Regret souriant;|pl&amp;gt;|sl&amp;gt;
  |stanza&amp;gt;|page&amp;gt;
  &amp;lt;page n=&quot;200&quot;|&amp;lt;stanza|
    &amp;lt;sl|&amp;lt;pl|Le Soleil moribund s&#039;endormir sous une arche,|pl&amp;gt;|sl&amp;gt;
    &amp;lt;sl|&amp;lt;pl|Et, comme un long linceul traînant à l&#039;Orient,|pl&amp;gt;|sl&amp;gt;
    &amp;lt;sl|&amp;lt;pl|Entends, ma chère, entends la douce Nuit qui |pl&amp;gt;
                                              &amp;lt;pl|marche.|pl&amp;gt;|sl&amp;gt;|s&amp;gt;
  |stanza&amp;gt;
|poem&amp;gt;
...
|page&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The start and end tags mark &lt;em&gt;ranges&lt;/em&gt; in the text. (In some discussions of overlap, the ranges are called &amp;#8220;elements&amp;#8221;, but I prefer to reserve that term for structures that are self-contained, such as those in XML, to avoid confusion.) In Creole&amp;#8217;s compact syntax, you could articulate the structure as follows:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;# a book is a sequence of pages; it is also a sequence of poems
start = element book { page+ ~ poem+ }

# a page is a sequence of page lines
page = range page { pl+ }

# a poem starts with a title; the body of the poem can be characterised
# as a sequence of stanzas, but also as a sequence of sentences
poem = range poem { title, ( stanza+ ~ s+ ) }

# a title is a self-contained structure that may contains several page lines
title = element title { pl+ }

# a stanza contains several stanza lines
stanza = range stanza { sl+ }

# a stanza line contains one or more page lines
sl = range sl { pl+ }

# a sentence contains some text
s = range s { text }

# a page line contains some text
pl = range pl { text }
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You could go further: sentences are made up of phrases, which are made up of words, which are made up of syllables, which are made up of letters. Stanzas within a sonnet such as this one can be clustered into an octet and a sestet and classified as quatrains and tercets based on the number of lines they contain. Stanza lines are also made up of syllables. And so on. Analysing the way in which the syntactic (sentence/phrase) structure overlaps with the prosodic (stanza/line) structure is one important way in which you can &lt;a href=&quot;http://www.tau.ac.il/~tsurxx/Recueillement.html&quot; title=&quot;Archetypal Pattern in Baudelaire&#039;s &#039;Recueillement&#039;&quot;&gt;analyse a poem&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;Containment vs Dominance&lt;/h2&gt;

&lt;p&gt;When you&amp;#8217;re talking about overlapping structures, it&amp;#8217;s useful to make the distinction between structures that &lt;em&gt;contain&lt;/em&gt; each other and structures that &lt;em&gt;dominate&lt;/em&gt; each other. Containment is a happenstance relationship between ranges while dominance is one that has a meaningful semantic. A page may happen to &lt;em&gt;contain&lt;/em&gt; a stanza, but a poem &lt;em&gt;domainates&lt;/em&gt; the stanzas that it contains.&lt;/p&gt;

&lt;p&gt;In LMNL, we view a document as consisting of a &lt;em&gt;sequence of atoms&lt;/em&gt;, usually characters, and ranges over those characters. But the model makes no assertions about dominance relationships between the ranges. This document model is easy to construct from a serialised document like the one above.&lt;/p&gt;

&lt;p&gt;Conversely, &lt;a href=&quot;http://www.w3.org/People/cmsmcq/2000/poddp2000.html&quot; title=&quot;GODDAG: A Data Structure for Overlapping Hierarchies&quot;&gt;GODDAG document models&lt;/a&gt; are directed acyclic graphs (DAGs): the nodes within those graphs have children and parents, with leaf nodes containing characters, and the parent-child relationship implies dominance. This is a useful model for processing, and particularly querying. Navigating a DAG is a lot like navigating a tree, just one that represents multiple hierarchies. But it isn&amp;#8217;t possible to construct a DAG from a serialised document like the one above without extra information about which containment relationships are actually dominance relationships, and which mere happenstance.&lt;/p&gt;

&lt;p&gt;So an important challenge is how to get from a flat, containment-only model to a DAG. There are four approaches that can be taken:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;For any document, for each pair of range names A and B, if every range named A contains a range named B, then assume that A dominates B; from that set of relationships, create a DAG.&lt;/li&gt;
&lt;li&gt;Introduce additional syntax into tags, such that dominance relationships between ranges can be expressed explicitly within the serialisation.&lt;/li&gt;
&lt;li&gt;Associate each document with a schema, and use the model expressed in the schema to identify dominance relationships; a Creole schema like the one above could be taking as asserting that poems dominate stanzas, for example, since stanzas are mentioned in the content model of the poem range.&lt;/li&gt;
&lt;li&gt;Defer the construction of a DAG to the point of processing; a document would then not be a DAG in and of itself, but only in relation to a particular process.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I find the last of these the most satisfactory. 1 is too arbitrary. 2 requires too much syntax. 3 requires a single schema per document (which, from experience with XML, I think is a broken model). One could imagine being able to specify something like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;book &amp;gt; page &amp;gt; pl &amp;gt; #text
book &amp;gt; poem &amp;gt; stanza &amp;gt; sl &amp;gt; #text
book &amp;gt; poem &amp;gt; s &amp;gt; #text
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and this generating a DAG in which a &lt;code&gt;book&lt;/code&gt; node had &lt;code&gt;page&lt;/code&gt; and &lt;code&gt;poem&lt;/code&gt; children, &lt;code&gt;page&lt;/code&gt; nodes had &lt;code&gt;pl&lt;/code&gt; children which had text children, &lt;code&gt;poem&lt;/code&gt; nodes had &lt;code&gt;stanza&lt;/code&gt; children and &lt;code&gt;s&lt;/code&gt; children, and so on. With this structure, it would be easy enough to find stanzas with four lines (&lt;code&gt;/book/poem/stanza[count(sl) = 4]&lt;/code&gt;) without having to worry about the possibilities of happenstance containment, such as some stanza lines being contained by sentences that are contained by stanzas.&lt;/p&gt;

&lt;p&gt;There&amp;#8217;s lots more to talk about here. In particular, things about the useful and appropriate ways of querying and transforming these structures, and how to best serialise them in XML. But I&amp;#8217;ll leave those thoughts for another post.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/95#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/7">creole</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/9">overlapping markup</category>
 <pubDate>Sat, 06 Dec 2008 20:56:52 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">95 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Partitioning overlapping markup</title>
 <link>http://www.jenitennison.com/blog/node/27</link>
 <description>&lt;p&gt;&lt;a href=&quot;http://www.piez.org/&quot; title=&quot;Wendell&#039;s Home Page&quot;&gt;Wendell Piez&lt;/a&gt; forwarded me an interesting poster by &lt;a href=&quot;http://www.huygensinstituut.knaw.nl/index.php?option=com_content&amp;amp;task=view&amp;amp;id=120&amp;amp;Itemid=57&quot; title=&quot;Bert Van Elsacker&quot;&gt;Bert Van Elsacker&lt;/a&gt; on automatic fragmentation of overlapping structures. That&amp;#8217;s taking something like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;bold&amp;gt; this is bold &amp;lt;italic&amp;gt; and italic &amp;lt;/bold&amp;gt; text &amp;lt;/italic&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and turning it into something well-formed, like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;bold&amp;gt; this is bold &amp;lt;italic&amp;gt; and italic &amp;lt;/italic&amp;gt;&amp;lt;/bold&amp;gt;&amp;lt;italic&amp;gt; text &amp;lt;/italic&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;When you do this, you have to decide which elements can be split and which can&amp;#8217;t, and their relative priorities. Wendell suggested that perhaps Creole might help to do this. I have been thinking about is using Creole to add annotations to markup (something like, you add attributes to the Creole patterns and they get copied on to the matched ranges, or are used to create new ranges), but I haven&amp;#8217;t done that yet, and actually I think you probably want a different kind of language to do it (&lt;a href=&quot;http://blog.jclark.com/2007/04/do-we-need-new-kind-of-schema-language.html&quot; title=&quot;James Clark: Do we need a new kind of schema language?&quot;&gt;a new kind of schema language&lt;/a&gt; like James Clark suggested), because the way in which you break up overlapping structures has a lot to do with how you&amp;#8217;re going to process them.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;I&amp;#8217;m reminded of the paper&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Sperberg-McQueen, C. M., David Dubin, Claus Huitfeldt and Allen Renear. “&lt;a href=&quot;http://www.idealliance.org/papers/extreme/proceedings/html/2002/CMSMcQ01/EML2002CMSMcQ01.html&quot;&gt;Drawing inferences on the basis of markup.&lt;/a&gt;” In Proceedings of Extreme Markup Languages 2002. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;in which (based on my memory of the talk) they discuss how different elements allow you to make different assertions about the text they contain, and consequently can be split in different ways. For example, a &lt;code&gt;&amp;lt;paragraph&amp;gt;&lt;/code&gt; element can&amp;#8217;t be split into two &lt;code&gt;&amp;lt;paragraph&amp;gt;&lt;/code&gt; elements without changing the meaning of the document, whereas a &lt;code&gt;&amp;lt;bold&amp;gt;&lt;/code&gt; element can be split into two &lt;code&gt;&amp;lt;bold&amp;gt;&lt;/code&gt; elements with no problems because it&amp;#8217;s really indicating &amp;#8220;these characters are bold&amp;#8221; rather than &amp;#8220;this is a bold phrase&amp;#8221;.&lt;/p&gt;

&lt;p&gt;You can take a purist view (which would usually entail splitting hardly any elements, since most elements &lt;em&gt;do&lt;/em&gt; mark up a range of text rather than the individual characters they contain), but I think the main reason you want to do this fragmentation is for presentation. And in that context, the notional semantics of the element don&amp;#8217;t really matter: what matters is how they&amp;#8217;re styled. For example, a &lt;code&gt;&amp;lt;comment&amp;gt;&lt;/code&gt; element, marking up a range of text that has been commented on, might not be splittable at a theoretical level, but if you&amp;#8217;re going to render it simply by turning the background yellow, then in fact you &lt;em&gt;can&lt;/em&gt; split it for that purpose.&lt;/p&gt;

&lt;p&gt;Since it&amp;#8217;s related to presentation, I wonder whether you could use a (simplified) CSS stylesheet to provide both the fragmentation and the style. Block-level elements (&lt;code&gt;display: block;&lt;/code&gt;) couldn&amp;#8217;t be split whereas inline elements could. Elements that have the box model properties (margin, padding &amp;amp; borders) can&amp;#8217;t be split, or, if they are, you need to mark the fragments as &amp;#8220;left&amp;#8221;, &amp;#8220;middle&amp;#8221; and &amp;#8220;right&amp;#8221;, and only apply the &lt;em&gt;left&lt;/em&gt; margin/padding/border to the &amp;#8220;left&amp;#8221; fragment, and similarly with the right.&lt;/p&gt;

&lt;p&gt;It wouldn&amp;#8217;t be a general purpose transformation mechanism, but it would be darned useful!&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/27#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/7">creole</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/9">overlapping markup</category>
 <pubDate>Mon, 11 Jun 2007 20:36:09 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">27 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>A Creole by any other name...</title>
 <link>http://www.jenitennison.com/blog/node/6</link>
 <description>&lt;p&gt;Argh. I&amp;#8217;ve been contacted by the guys at &lt;a href=&quot;http://www.wikicreole.org&quot; title=&quot;Creole Wiki Markup language&quot;&gt;WikiCreole&lt;/a&gt; who want me to change the name of &lt;a href=&quot;http://www.lmnlwiki.org&quot; title=&quot;Creole schema language&quot;&gt;Creole&lt;/a&gt;. What should I do? Not only is &amp;#8220;Creole&amp;#8221; a great name for a schema language that deals with concurrent markup, but it&amp;#8217;s a great acronym too (Composable regular expressions for overlapping languages etc.)&lt;/p&gt;

&lt;p&gt;I did Google when I first came up with the name in August 2006, but didn&amp;#8217;t discover WikiCreole (unsurprisingly, since it was only coined in July 2006 itself). But now far more many people know, care about and use WikiCreole than Creole grammars. So any suggestions for alternative names?&lt;/p&gt;

&lt;!--break--&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/6#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/7">creole</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/9">overlapping markup</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/8">schema</category>
 <pubDate>Wed, 25 Apr 2007 20:09:28 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">6 at http://www.jenitennison.com/blog</guid>
</item>
</channel>
</rss>

