<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.jenitennison.com/blog" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>creole</title>
 <link>http://www.jenitennison.com/blog/taxonomy/term/7</link>
 <description>The taxonomy view with a depth of 0.</description>
 <language>en</language>
<item>
 <title>And she&#039;s back</title>
 <link>http://www.jenitennison.com/blog/node/46</link>
 <description>&lt;p&gt;So first there was the &lt;a href=&quot;http://www.xmlsummerschool.com/&quot; title=&quot;Oxford XML Summer School&quot;&gt;XML Summer School&lt;/a&gt;. This year was my sixth, and it was really great to hang out with &lt;a href=&quot;http://www.xmlsummerschool.com/speakers.html&quot; title=&quot;XML Summer School Speakers List&quot;&gt;chums&lt;/a&gt; old and new. I love that&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you get to meet people from all corners of the XML community, even ones you haven&amp;#8217;t got the slightest interest in, and learn that they&amp;#8217;re human too (even the web services guys)&lt;/li&gt;
&lt;li&gt;there&amp;#8217;s always &lt;em&gt;something&lt;/em&gt; to learn; I&amp;#8217;ve seen some talks for six years on the trot, others were completely new this year, but they&amp;#8217;re all worth attending because the audience, war stories and discussion are always different. Also, because each talk is aimed at newcomers, you get a great overview of topics that you&amp;#8217;re not so familiar with, and you can always chat to the speaker later to find out more&lt;/li&gt;
&lt;li&gt;there are social events laid on every evening that you&amp;#8217;re expected to attend, so you&amp;#8217;re practically forced to socialise, which is useful for an insecure introvert like me who&amp;#8217;d otherwise be sitting in her hotel room getting miserable imagining everyone else having a good time&lt;/li&gt;
&lt;li&gt;there&amp;#8217;s a creche, so despite being inseparable from two small children over the last four years, I&amp;#8217;ve still been able to attend without dragging an entourage with me (not that I object to the entourage, just the expense and the dependency)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I left feeling not only invigorated and inspired, but also a part of a fun and friendly community.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;The following week, we moved house. Twelve days later, we&amp;#8217;re &lt;em&gt;almost&lt;/em&gt; completely unpacked, and the important things are done. We have wireless, positioned nicely so that virtually the whole house gets &amp;#8220;Excellent&amp;#8221; coverage. Thanks to my father, we have a &lt;a href=&quot;http://en.wikipedia.org/wiki/NSLU2&quot; title=&quot;Wikipedia: NSLU2&quot;&gt;NSLU2&lt;/a&gt; running &lt;a href=&quot;http://en.wikipedia.org/wiki/Debian&quot; title=&quot;Wikipedia: Debian&quot;&gt;Debian&lt;/a&gt; and acting as our low-power-consumption file and mail server. (For those that are interested, I&amp;#8217;m getting access from my Windows machines using &lt;a href=&quot;http://en.wikipedia.org/wiki/Xming&quot; title=&quot;Wikipedia: Xming&quot;&gt;Xming&lt;/a&gt; to actually interact with the machine, and &lt;a href=&quot;http://www.webdrive.com/&quot; title=&quot;South River Technologies&quot;&gt;WebDrive&lt;/a&gt; to map a drive onto the file system.) And we have an area entirely dedicated to Lego. Yes, this might be my dream house.&lt;/p&gt;

&lt;p&gt;(We&amp;#8217;re also only a 40 minute drive from Heathrow, 30 minute train from Waterloo, so if anyone I know&amp;#8217;s visiting the UK and wants to drop in, you&amp;#8217;re more than welcome. There&amp;#8217;s even a spare room.)&lt;/p&gt;

&lt;p&gt;And amongst all this, I had to record a &lt;a href=&quot;http://www.jenitennison.com/extreme/Creole.zip&quot; title=&quot;Zipped Powerpoint with linked sound files&quot;&gt;virtual presentation&lt;/a&gt; on &lt;a href=&quot;http://www.lmnl.org/wiki/Creole&quot; title=&quot;Creole: schema language for overlapping markup&quot;&gt;Creole&lt;/a&gt; for &lt;a href=&quot;http://www.extrememarkup.com/overlap/index.html&quot; title=&quot;International Workshop on Markup of Overlapping Structures&quot;&gt;Overlap day&lt;/a&gt; at &lt;a href=&quot;http://www.extrememarkup.com/&quot; title=&quot;Extreme Markup Languages Conference&quot;&gt;Extreme&lt;/a&gt;. The sound quality&amp;#8217;s not great, but it&amp;#8217;s a reasonable 10-minute introduction, I think. It sounds like they got good attendance: will someone who was there please post about it?&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/46#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/7">creole</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/27">life</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/10">gadgets</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/3">conferences</category>
 <pubDate>Sat, 11 Aug 2007 21:54:45 +0100</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">46 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Partitioning overlapping markup</title>
 <link>http://www.jenitennison.com/blog/node/27</link>
 <description>&lt;p&gt;&lt;a href=&quot;http://www.piez.org/&quot; title=&quot;Wendell&#039;s Home Page&quot;&gt;Wendell Piez&lt;/a&gt; forwarded me an interesting poster by &lt;a href=&quot;http://www.huygensinstituut.knaw.nl/index.php?option=com_content&amp;amp;task=view&amp;amp;id=120&amp;amp;Itemid=57&quot; title=&quot;Bert Van Elsacker&quot;&gt;Bert Van Elsacker&lt;/a&gt; on automatic fragmentation of overlapping structures. That&amp;#8217;s taking something like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;bold&amp;gt; this is bold &amp;lt;italic&amp;gt; and italic &amp;lt;/bold&amp;gt; text &amp;lt;/italic&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and turning it into something well-formed, like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;bold&amp;gt; this is bold &amp;lt;italic&amp;gt; and italic &amp;lt;/italic&amp;gt;&amp;lt;/bold&amp;gt;&amp;lt;italic&amp;gt; text &amp;lt;/italic&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;When you do this, you have to decide which elements can be split and which can&amp;#8217;t, and their relative priorities. Wendell suggested that perhaps Creole might help to do this. I have been thinking about is using Creole to add annotations to markup (something like, you add attributes to the Creole patterns and they get copied on to the matched ranges, or are used to create new ranges), but I haven&amp;#8217;t done that yet, and actually I think you probably want a different kind of language to do it (&lt;a href=&quot;http://blog.jclark.com/2007/04/do-we-need-new-kind-of-schema-language.html&quot; title=&quot;James Clark: Do we need a new kind of schema language?&quot;&gt;a new kind of schema language&lt;/a&gt; like James Clark suggested), because the way in which you break up overlapping structures has a lot to do with how you&amp;#8217;re going to process them.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;I&amp;#8217;m reminded of the paper&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Sperberg-McQueen, C. M., David Dubin, Claus Huitfeldt and Allen Renear. “&lt;a href=&quot;http://www.idealliance.org/papers/extreme/proceedings/html/2002/CMSMcQ01/EML2002CMSMcQ01.html&quot;&gt;Drawing inferences on the basis of markup.&lt;/a&gt;” In Proceedings of Extreme Markup Languages 2002. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;in which (based on my memory of the talk) they discuss how different elements allow you to make different assertions about the text they contain, and consequently can be split in different ways. For example, a &lt;code&gt;&amp;lt;paragraph&amp;gt;&lt;/code&gt; element can&amp;#8217;t be split into two &lt;code&gt;&amp;lt;paragraph&amp;gt;&lt;/code&gt; elements without changing the meaning of the document, whereas a &lt;code&gt;&amp;lt;bold&amp;gt;&lt;/code&gt; element can be split into two &lt;code&gt;&amp;lt;bold&amp;gt;&lt;/code&gt; elements with no problems because it&amp;#8217;s really indicating &amp;#8220;these characters are bold&amp;#8221; rather than &amp;#8220;this is a bold phrase&amp;#8221;.&lt;/p&gt;

&lt;p&gt;You can take a purist view (which would usually entail splitting hardly any elements, since most elements &lt;em&gt;do&lt;/em&gt; mark up a range of text rather than the individual characters they contain), but I think the main reason you want to do this fragmentation is for presentation. And in that context, the notional semantics of the element don&amp;#8217;t really matter: what matters is how they&amp;#8217;re styled. For example, a &lt;code&gt;&amp;lt;comment&amp;gt;&lt;/code&gt; element, marking up a range of text that has been commented on, might not be splittable at a theoretical level, but if you&amp;#8217;re going to render it simply by turning the background yellow, then in fact you &lt;em&gt;can&lt;/em&gt; split it for that purpose.&lt;/p&gt;

&lt;p&gt;Since it&amp;#8217;s related to presentation, I wonder whether you could use a (simplified) CSS stylesheet to provide both the fragmentation and the style. Block-level elements (&lt;code&gt;display: block;&lt;/code&gt;) couldn&amp;#8217;t be split whereas inline elements could. Elements that have the box model properties (margin, padding &amp;amp; borders) can&amp;#8217;t be split, or, if they are, you need to mark the fragments as &amp;#8220;left&amp;#8221;, &amp;#8220;middle&amp;#8221; and &amp;#8220;right&amp;#8221;, and only apply the &lt;em&gt;left&lt;/em&gt; margin/padding/border to the &amp;#8220;left&amp;#8221; fragment, and similarly with the right.&lt;/p&gt;

&lt;p&gt;It wouldn&amp;#8217;t be a general purpose transformation mechanism, but it would be darned useful!&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/27#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/7">creole</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/9">overlapping markup</category>
 <pubDate>Mon, 11 Jun 2007 21:36:09 +0100</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">27 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>XTech Creole presentation fallout</title>
 <link>http://www.jenitennison.com/blog/node/17</link>
 <description>&lt;p&gt;&lt;a href=&quot;http://www.ltg.ed.ac.uk/~ht/&quot; title=&quot;Henry S. Thompson&#039;s Home Page&quot;&gt;Henry Thompson&lt;/a&gt; had a lot to say after &lt;a href=&quot;http://www.jenitennison.com/blog/files/XTech2007CreoleSlides.zip&quot; title=&quot;XTech 2007 Creole presentation&quot;&gt;my Creole presentation&lt;/a&gt; (open takahashi.xul?data=creole.data; requires Firefox) about the benefits of stand-off markup for linguistic information. From his overview, it seems that the &lt;a href=&quot;http://www.ltg.ed.ac.uk/NITE&quot; title=&quot;NITE XML Toolkit&quot;&gt;NITE XML Toolkit&lt;/a&gt; that he&amp;#8217;s been involved with represents overlapping linguistic data by holding atoms (here meaning the &amp;#8220;lowest common denominator&amp;#8221; shared pieces of data) and having multiple trees marking up these atoms. The trees are independently validated (since they are pure XML), with cross-hierarchy validation done through the query language. This is pretty similar to the &lt;a href=&quot;http://www.idealliance.org/papers/extreme/Proceedings/html/2006/Schonefeld01/EML2006Schonefeld01.html&quot; title=&quot;Towards Validation of Concurrent Markup&quot;&gt;XCONCUR&lt;/a&gt; approach, which augments a CONCUR-like multi-grammar validation with a Schematron-like constraint language.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;Now, I have nothing against using constraint languages (like Schematron) to validate documents, but grammars (like RELAX NG) have big advantages. Most importantly, they are easier to write (if they&amp;#8217;re designed properly), and tools can analyse them to do useful things, such as tell you what element or attribute is expected next. If it&amp;#8217;s possible to write cross-grammar constraints in a grammar (like Creole) then why would you use a constraint language to do it?&lt;/p&gt;

&lt;p&gt;I think the big difference between Henry&amp;#8217;s domain and the one that I think will move overlap into the mainstream is between global and local concurrence. With global concurrence, entirely separate hierarchies are applied to the same data, so the natural validation mechanism is to use entirely separate grammars (with perhaps a few small rules to do cross-grammar validation where that proves necessary). With local concurrence, the vast majority of the document follows a single hierarchy with concurrence happening at a low level.&lt;/p&gt;

&lt;p&gt;Actually, the best example for this doesn&amp;#8217;t even involve overlap. Consider HTML paragraphs, which contain various inline elements such as &lt;code&gt;&amp;lt;strong&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;em&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;a&amp;gt;&lt;/code&gt;. It doesn&amp;#8217;t make sense for these elements to contain themselves (strong text is neither made stronger nor negated by appearing in two &lt;code&gt;&amp;lt;strong&amp;gt;&lt;/code&gt; elements, and it&amp;#8217;s not allowed for links to contain other links). So the natural model in Creole is&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;p      = element p { mixed { strong* &amp;amp; em* &amp;amp; a* } }
strong = range strong { text }
em     = range em { text }
a      = range a { attribute href { text }, ..., text }
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This model allows &lt;code&gt;&amp;lt;a&amp;gt;&lt;/code&gt; elements to appear within &lt;code&gt;&amp;lt;em&amp;gt;&lt;/code&gt; elements, or vice versa, not because of the content model of &lt;code&gt;&amp;lt;em&amp;gt;&lt;/code&gt; but because the two ranges are interleaved (and one arrangement of interleaved ranges is containment). It doesn&amp;#8217;t allow any of these elements to appear inside themselves. It would be a real maintenance headache to have separate grammars for each of these inline elements, when most of each of the grammars (all the hierarchy down to the paragraph level) would be the same.&lt;/p&gt;

&lt;p&gt;Actually, looking at NITE, it seems like it employs a data model that&amp;#8217;s quite like &lt;a href=&quot;http://www.lmnlwiki.org/index.php/LMNL_data_model&quot; title=&quot;LMNL data model&quot;&gt;LMNL&amp;#8217;s&lt;/a&gt;, in that it has the concept of layers over atoms or ranges/elements. (Interestingly it looks like they get around the problem of identifying which ranges belong to which layers purely by using their name.) Another difference here might be that while I&amp;#8217;m talking about supporting overlap in fairly heavily structured documents (like office documents), they&amp;#8217;re really using fairly flat annotations, where there isn&amp;#8217;t much of a grammar anyway. But I might have that wrong: need to do more reading. The other thing to investigate is whether they have any support for self-overlap (&lt;code&gt;&amp;lt;phrase&amp;gt;&lt;/code&gt; elements overlapping other &lt;code&gt;&amp;lt;phrase&amp;gt;&lt;/code&gt; elements): I kinda gather that they don&amp;#8217;t.&lt;/p&gt;

&lt;p&gt;Anyway, Henry also made the points that (a) that he doesn&amp;#8217;t want a new syntax for overlap and (b) stand-off markup works very well thank you. To address the latter point first, I think stand-off markup works very well if you have the tools to support it. It&amp;#8217;s fine if you have an integrated toolkit which can pull together and display the stand-off markup as embedded markup, and let you create ranges by highlighting text with a mouse. But the great power of HTML and other web technologies is that you don&amp;#8217;t need to use a specialised toolkit to write it: you can just use a text editor and it&amp;#8217;s all right there in front of you with no (or minimal) cross-referencing required. Frankly, I&amp;#8217;m not interested in &amp;#8220;core&amp;#8221; technologies that require me to install a particular piece of software in order to make use of them (cf &lt;a href=&quot;http://research.microsoft.com/~emeijer/&quot; title=&quot;Erik Meijer&#039;s Home Page&quot;&gt;Erik Meijer&lt;/a&gt;&amp;#8217;s talk on &lt;a href=&quot;http://msdn.microsoft.com/data/ref/linq/&quot; title=&quot;LINQ&quot;&gt;LINQ&lt;/a&gt;, which I&amp;#8217;ll have to discuss another time). I expect to be able to write a document containing overlap as easily as I can write a normal XML document.&lt;/p&gt;

&lt;p&gt;On Henry&amp;#8217;s point about yet another syntax for overlap, I am more and more coming to the conclusion that overlap will hit the mainstream if we have a simple way of encoding overlap in normal XML documents, namely something along the lines of &lt;a href=&quot;http://www.lmnlwiki.org/index.php/Talk:ECLIX#LIX&quot; title=&quot;LMNL-in-XML&quot;&gt;LIX&lt;/a&gt;. Interestingly, &lt;a href=&quot;http://www.translate.com/&quot; title=&quot;Yves Savourel&#039;s Website&quot;&gt;Yves Savourel&lt;/a&gt;&amp;#8217;s talk on Applying the &lt;a href=&quot;http://www.w3.org/TR/2007/REC-its-20070403/&quot; title=&quot;Internationalization Tag Set (ITS) Version 1.0&quot;&gt;Internationalization Tag Set&lt;/a&gt; was quite inspirational in this regard, since the working group seem to have put together a standard that both provides a set of standard elements and attributes to guide localisation, along with a method of mapping elements and attributes in existing markup languages onto those ITS elements and attributes. I wonder whether a similar approach could be used with LIX&amp;#8230; but I&amp;#8217;ll have to leave those thoughts for another time.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/17#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/7">creole</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/4">xtech</category>
 <pubDate>Wed, 16 May 2007 21:48:47 +0100</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">17 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>A Creole by any other name...</title>
 <link>http://www.jenitennison.com/blog/node/6</link>
 <description>&lt;p&gt;Argh. I&amp;#8217;ve been contacted by the guys at &lt;a href=&quot;http://www.wikicreole.org&quot; title=&quot;Creole Wiki Markup language&quot;&gt;WikiCreole&lt;/a&gt; who want me to change the name of &lt;a href=&quot;http://www.lmnlwiki.org&quot; title=&quot;Creole schema language&quot;&gt;Creole&lt;/a&gt;. What should I do? Not only is &amp;#8220;Creole&amp;#8221; a great name for a schema language that deals with concurrent markup, but it&amp;#8217;s a great acronym too (Composable regular expressions for overlapping languages etc.)&lt;/p&gt;

&lt;p&gt;I did Google when I first came up with the name in August 2006, but didn&amp;#8217;t discover WikiCreole (unsurprisingly, since it was only coined in July 2006 itself). But now far more many people know, care about and use WikiCreole than Creole grammars. So any suggestions for alternative names?&lt;/p&gt;

&lt;!--break--&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/6#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/7">creole</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/9">overlapping markup</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/8">schema</category>
 <pubDate>Wed, 25 Apr 2007 21:09:28 +0100</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">6 at http://www.jenitennison.com/blog</guid>
</item>
</channel>
</rss>
