<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.jenitennison.com/blog" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>pipelines</title>
 <link>http://www.jenitennison.com/blog/taxonomy/term/6</link>
 <description>The taxonomy view with a depth of 0.</description>
 <language>en</language>
<item>
 <title>RELAX NG for matching</title>
 <link>http://www.jenitennison.com/blog/node/79</link>
 <description>&lt;p&gt;I&amp;#8217;m still thinking about doing &lt;a href=&quot;http://www.jenitennison.com/blog/node/76&quot; title=&quot;Jeni&#039;s Musings: Automatic markup and XML pipelines&quot;&gt;automatic markup with XML pipelines&lt;/a&gt;, and the kind of components that you might need in such a pipeline. These are the useful ones (list inspired by the components offered by &lt;a href=&quot;http://www.gate.ac.uk/&quot; title=&quot;General Architecture for Text Engineering&quot;&gt;GATE&lt;/a&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a &lt;strong&gt;tokeniser&lt;/strong&gt; that uses regular expressions to add markup to plain text&lt;/li&gt;
&lt;li&gt;a &lt;strong&gt;gazetteer&lt;/strong&gt; that uses a lookup to add markup to plain text&lt;/li&gt;
&lt;li&gt;an &lt;strong&gt;annotater&lt;/strong&gt; that adds attributes to existing elements based on their context/content&lt;/li&gt;
&lt;li&gt;a &lt;strong&gt;grouper&lt;/strong&gt; that adds markup around sequences of existing markup&lt;/li&gt;
&lt;li&gt;a &lt;strong&gt;stripper&lt;/strong&gt; that removes markup&lt;/li&gt;
&lt;li&gt;a general purpose &lt;strong&gt;transformer&lt;/strong&gt; that uses XSLT to do just about everything else&lt;/li&gt;
&lt;/ul&gt;

&lt;!--break--&gt;

&lt;p&gt;The &amp;#8220;grouper&amp;#8221; is the most interesting and difficult of these. It needs to act like a tokeniser, except use regular expressions over markup rather than over text. For example, say I had:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;number&amp;gt;06&amp;lt;/number&amp;gt;&amp;lt;punc&amp;gt;/&amp;lt;/punc&amp;gt;&amp;lt;number&amp;gt;03&amp;lt;/number&amp;gt;&amp;lt;punc&amp;gt;/&amp;lt;/punc&amp;gt;&amp;lt;number&amp;gt;08&amp;lt;/number&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I want to be able to create a rule that says &amp;#8220;any sequence that looks like a number element that contains a two-digit number between 1 and 31, followed by a punc element that contains a slash, followed by another two-digit number between 1 and 12, followed by a punc element that contains a slash, followed by another two-digit number should be wrapped in a date element&amp;#8221;.&lt;/p&gt;

&lt;p&gt;Now this is something that XPath is really bad at. Try writing an expression that selects, from a sequence of elements that may contain other &lt;code&gt;&amp;lt;number&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;punc&amp;gt;&lt;/code&gt; elements as well as other elements, only those sequences of elements that match the pattern I just described. It&amp;#8217;s something like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;number[. &amp;gt;= 1 and . &amp;lt;= 31 and string-length(.) = 2]
      [following-sibling::*[1]/self::punc = &#039;/&#039;]
      [following-sibling::*[2]/self::number[. &amp;gt;= 1 and . &amp;lt;= 12 and string-length(.) = 2]]
      [following-sibling::*[3]/self::punc = &#039;/&#039;]
      [following-sibling::*[4]/self::number[string-length(.) = 2]]
  /(self::number, following-sibling::*[position() &amp;lt;= 4])
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;which is fiddly and messy and only works in this particular example because I know precisely how many elements there are supposed to be in the group.&lt;/p&gt;

&lt;p&gt;In fact, it&amp;#8217;s even difficult to do this kind of grouping using XSLT, even with &lt;code&gt;&amp;lt;xsl:for-each-group&amp;gt;&lt;/code&gt; because the grouping is designed around elements either returning the same value or starting or ending with a particular kind of element, rather than grouping together a sequence that has a particular internal structure.&lt;/p&gt;

&lt;p&gt;The language that &lt;em&gt;is&lt;/em&gt; designed to describe sequences of elements is RELAX NG. Obviously RELAX NG is really useful as a schema language, but it&amp;#8217;s really all to do with defining regular expressions over XML structures. We can use RELAX NG to describe the pattern of elements we want to match:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;group&amp;gt;
  &amp;lt;element name=&quot;number&quot;&amp;gt;
    &amp;lt;data type=&quot;integer&quot;&amp;gt;
      &amp;lt;param name=&quot;minInclusive&quot;&amp;gt;1&amp;lt;/param&amp;gt;
      &amp;lt;param name=&quot;maxInclusive&quot;&amp;gt;31&amp;lt;/param&amp;gt;
      &amp;lt;param name=&quot;pattern&quot;&amp;gt;[0-9]{2}&amp;lt;/param&amp;gt;
    &amp;lt;/data&amp;gt;
  &amp;lt;/element&amp;gt;
  &amp;lt;element name=&quot;punc&quot;&amp;gt;
    &amp;lt;value&amp;gt;/&amp;lt;/value&amp;gt;
  &amp;lt;/element&amp;gt;
  &amp;lt;element name=&quot;number&quot;&amp;gt;
    &amp;lt;data type=&quot;integer&quot;&amp;gt;
      &amp;lt;param name=&quot;minInclusive&quot;&amp;gt;1&amp;lt;/param&amp;gt;
      &amp;lt;param name=&quot;maxInclusive&quot;&amp;gt;12&amp;lt;/param&amp;gt;
      &amp;lt;param name=&quot;pattern&quot;&amp;gt;[0-9]{2}&amp;lt;/param&amp;gt;
    &amp;lt;/data&amp;gt;
  &amp;lt;/element&amp;gt;
  &amp;lt;element name=&quot;punc&quot;&amp;gt;
    &amp;lt;value&amp;gt;/&amp;lt;/value&amp;gt;
  &amp;lt;/element&amp;gt;
  &amp;lt;element name=&quot;number&quot;&amp;gt;
    &amp;lt;data type=&quot;integer&quot;&amp;gt;
      &amp;lt;param name=&quot;pattern&quot;&amp;gt;[0-9]{2}&amp;lt;/param&amp;gt;
    &amp;lt;/data&amp;gt;
  &amp;lt;/element&amp;gt;
&amp;lt;/group&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;or, in compact syntax:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;element number { 
  xs:integer { minInclusive = &quot;1&quot; maxInclusive = &quot;31&quot; pattern = &quot;[0-9]{2}&quot; }
},
element punc { &quot;/&quot; },
element number { 
  xs:integer { minInclusive = &quot;1&quot; maxInclusive = &quot;12&quot; pattern = &quot;[0-9]{2}&quot; }
},
element punc { &quot;/&quot; },
element number { 
  xs:integer { pattern = &quot;[0-9]{2}&quot; }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;As a language, RELAX NG is really good at this. You could even imagine adding attributes to name subexpressions which you could then do things with (in the same way as you can get the substring matching a subexpression when you use a regular expression over text).&lt;/p&gt;

&lt;p&gt;So I think a &amp;#8220;grouper&amp;#8221; component should use RELAX NG to identify sequences to be marked up. But I have no idea if there are RELAX NG libraries out there that can be used in this way: to identify and extract matching sequences rather than to validate entire documents.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/79#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/6">pipelines</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/8">schema</category>
 <pubDate>Thu, 06 Mar 2008 14:59:03 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">79 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Automatic markup and XML pipelines</title>
 <link>http://www.jenitennison.com/blog/node/76</link>
 <description>&lt;p&gt;The project I&amp;#8217;m working on at the moment aims to use RDFa (in XHTML) to expose some of the semantics in some natural-language text. We&amp;#8217;re aiming moderately low &amp;#8212; marking up dates, addresses, people&amp;#8217;s names, and various other more domain-specific things &amp;#8212; at least at the moment.&lt;/p&gt;

&lt;p&gt;The problem we&amp;#8217;re getting into now is how to get that information marked up. Because the information comes from various pretty unregulated sources, there&amp;#8217;s no way we can force the authors to do the mark up. And the scope for making it &amp;#8220;worth their while&amp;#8221; (in terms of making their authoring job easier or more effective or even offering financial rewards) is very low.&lt;/p&gt;

&lt;p&gt;So we&amp;#8217;re taking a look at the technologies we might use for automating the markup, specifically &lt;a href=&quot;http://www.gate.ac.uk/&quot; title=&quot;GATE: A General Architecture for Text Engineering&quot;&gt;GATE&lt;/a&gt; and &lt;a href=&quot;http://incubator.apache.org/uima/&quot; title=&quot;Apache UIMA: Unstructured Information Management Applications&quot;&gt;UIMA&lt;/a&gt;.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;These technologies basically use pipelines of components which each add some (out of line) annotations to the text. The annotations are done out of line because they might overlap, but you can (usually) serialize them into XML, which is what we want to do.&lt;/p&gt;

&lt;p&gt;I find these technologies frustrating for a number of reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;any configuration we do will be specific to that particular application; it&amp;#8217;ll be hard to for us to change to another implementation later on, and reuse by others will be limited to those who use the same implementation&lt;/li&gt;
&lt;li&gt;they involve a fair bit of proper coding (by which I mean Java or C++)&lt;/li&gt;
&lt;li&gt;where components can be configured through declarative means (such as keyword lists), there&amp;#8217;s no way to reuse (XML/RDF) resources that we already have; we&amp;#8217;ll have to manage transformations from them into the accepted formats through some external means, and I just &lt;em&gt;know&lt;/em&gt; they&amp;#8217;ll get out of sync&lt;/li&gt;
&lt;li&gt;their user documentation is dreadful; it seems like you need to have a good understanding of natural language processing to have a hope of even getting started&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It strikes me that the really powerful part of each of these technologies is the pipelining. The pipelining allows you to string together relatively simple operations (tokenising text, extrapolating sentences, marking up keywords, resolving ambiguities based on context etc.) which together give you something reasonably sophisticated.&lt;/p&gt;

&lt;p&gt;Using &lt;a href=&quot;http://www.w3.org/TR/xproc/&quot; title=&quot;W3C Working Draft: XProc: An XML Pipeline Language&quot;&gt;XProc&lt;/a&gt; to coordinate the pipeline would alleviate many of my frustrations. XProc can and will be implemented on many platforms, in many languages, so it&amp;#8217;ll be possible to move the pipeline from place to place (assuming that the components of the pipelines are similarly generic). It&amp;#8217;s declarative, so no &amp;#8220;proper coding&amp;#8221;. We&amp;#8217;ll be able to incorporate any transformations from existing XML/RDF data to the required configuration formats right into the pipeline. And&amp;#8230; OK, it won&amp;#8217;t automatically give us great user documentation or GUIs, but they&amp;#8217;ll come.&lt;/p&gt;

&lt;p&gt;The big problem is that XProc is still a Working Draft and the XProc ecosystem isn&amp;#8217;t well-developed. If we were one or two years down the line, XProc would be a Recommendation, there&amp;#8217;d be a .NET implementation readily available, and even perhaps extension XProc step types for tokenising, grouping and the other things we&amp;#8217;d need to do; anything that was missing we could pull together using XSLT.&lt;/p&gt;

&lt;p&gt;As it is, we&amp;#8217;re in that annoying in-between-time when the Right technology isn&amp;#8217;t ready and it looks like we&amp;#8217;re going to have to put effort into working with what feels like the Wrong technology just to get things done. But perhaps I&amp;#8217;m overlooking something in GATE or UIMA, or have missed another technology that would help us. Anyone out there got some experience that could help guide us?&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/76#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/16">markup</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/6">pipelines</category>
 <pubDate>Mon, 25 Feb 2008 22:02:57 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">76 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Detecting streamability in XPath expressions and patterns</title>
 <link>http://www.jenitennison.com/blog/node/61</link>
 <description>&lt;p&gt;The XSL Working Group &lt;a href=&quot;http://lists.w3.org/Archives/Public/public-xml-processing-model-comments/2007Oct/0118.html&quot; title=&quot;XSL WG Comments on XProc Last Call&quot;&gt;gave some comments&lt;/a&gt; recently on the &lt;a href=&quot;http://www.w3.org/TR/2007/WD-xproc-20070920/&quot; title=&quot;W3C: XProc Last Call Working Draft&quot;&gt;Last Call Working Draft of XProc&lt;/a&gt;. One of the comments was about a bunch of standard steps that we&amp;#8217;ve specified which do things you can do in XSLT, such as renaming certain nodes. These steps generally use XPath expressions or XSLT patterns to identify which nodes should be processed.&lt;/p&gt;

&lt;p&gt;What bothers the XSL WG is that these steps aren&amp;#8217;t guaranteed to be streamable. In a streamable process, an input document can be delivered to the processor as a stream of events (and an output similarly generated as a stream of events) rather than as an in-memory representation. Such processes will start producing results more quickly and require less memory than non-streamable ones. And, because they don&amp;#8217;t need as much memory, they are able to work on larger documents.&lt;/p&gt;

&lt;p&gt;If the processes we defined in XProc &lt;em&gt;were&lt;/em&gt; streamable, there&amp;#8217;d have a clear advantage over their XSLT equivalents, and therefore a purpose. However, since they&amp;#8217;re &lt;em&gt;not&lt;/em&gt; guaranteed streamable, it looks like we&amp;#8217;re simply creating yet another transformation language.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;My &lt;a href=&quot;http://lists.w3.org/Archives/Public/public-xml-processing-model-comments/2007Oct/0123.html&quot; title=&quot;Jeni&#039;s response to XSL WG comments on XProc&#039;s streamability&quot;&gt;response&lt;/a&gt; was basically that we left it down to implementers to detect when a particular expression/pattern was streamable because defining a streamable subset of XPath would (a) take too long, (b) require people to learn a particular XPath subset, raising the adoption barrier, (c) require implementers to implement their own XPath engines, raising the implementation barrier.&lt;/p&gt;

&lt;p&gt;But if you put those pragmatic reasons to one side, I think there are good abstract reasons not to specify a streamable XPath subset. First, there is no clear line that can be drawn between a streamable XPath and an unstreamable one, only a scale between &amp;#8220;buffering nothing&amp;#8221; and &amp;#8220;buffering everything&amp;#8221; (building an object model). Second, you can&amp;#8217;t judge the streamability of an XPath expression on its own: there are multiple other factors that effect how streamable a given XPath expression is for a particular processor.&lt;/p&gt;

&lt;p&gt;To illustrate, say that we&amp;#8217;re renaming all elements that we select, and let&amp;#8217;s start with an expression that&amp;#8217;s obviously streamable:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;//section
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;No problems here: as soon as we hit an start-tag (or end-tag) for a &lt;code&gt;&amp;lt;section&amp;gt;&lt;/code&gt; element, we can change its name.&lt;/p&gt;

&lt;p&gt;Now add a predicate:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;//section[@type = &#039;summary&#039;]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This predicate tests the value of the &lt;code&gt;type&lt;/code&gt; attribute on the &lt;code&gt;&amp;lt;section&amp;gt;&lt;/code&gt; element. If we&amp;#8217;re using SAX or StAX events, then this is as straightforwardly streamable as the previous example, because attribute values are reported at the same time as start-tags. But that&amp;#8217;s purely down to the API: the underlying algorithm for streaming RELAX NG validation uses a different event model, for example, in which attributes are reported after the start tag begins (and before the start tag ends). So &lt;strong&gt;streamability depends on the event model&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Now a different predicate:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;//section[title = &#039;Summary&#039;]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This predicate tests the value of the &lt;code&gt;&amp;lt;title&amp;gt;&lt;/code&gt; child of the &lt;code&gt;&amp;lt;section&amp;gt;&lt;/code&gt; element. In fact, it tests if the &lt;code&gt;&amp;lt;section&amp;gt;&lt;/code&gt; element has &lt;em&gt;any&lt;/em&gt; &lt;code&gt;&amp;lt;title&amp;gt;&lt;/code&gt; child with the value &lt;code&gt;&#039;Summary&#039;&lt;/code&gt;. Normally, an XPath processor won&amp;#8217;t be able to tell that a &lt;code&gt;&amp;lt;section&amp;gt;&lt;/code&gt; &lt;em&gt;doesn&amp;#8217;t&lt;/em&gt; satisfy the predicate until it gets to the end-tag of the element. So it will have to buffer the events from each &lt;code&gt;&amp;lt;section&amp;gt;&lt;/code&gt; start tag until its end tag until it can work out whether to do the renaming or not.&lt;/p&gt;

&lt;p&gt;But say the &lt;code&gt;&amp;lt;section&amp;gt;&lt;/code&gt; elements in this markup language can only contain a single &lt;code&gt;&amp;lt;title&amp;gt;&lt;/code&gt;, and that&amp;#8217;s the first child of the &lt;code&gt;&amp;lt;section&amp;gt;&lt;/code&gt;. For an XPath processor that&amp;#8217;s aware of the DTD or schema that the document adheres to, the situation is then very similar to the previous one, which tested the attribute. So &lt;strong&gt;streamability depends on how much the processor knows about the markup language&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Changing the XPath to&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;//section[title[1] = &#039;Summary&#039;]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;similarly limits how much the processor will have to buffer if the &lt;code&gt;&amp;lt;title&amp;gt;&lt;/code&gt; always appears, and always appears first, even without the processor being told that rule through a schema. So &lt;strong&gt;streamability depends on the markup language itself&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Anyway, I had a quick look at some academic work on streamability, such as &lt;a href=&quot;http://www.cs.umd.edu/projects/xsq/&quot; title=&quot;XSQ: A Streaming XPath Engine&quot;&gt;XSQ&lt;/a&gt;, &lt;a href=&quot;http://www-cs-students.stanford.edu/~amrutaj/work/papers/xpath.pdf&quot; title=&quot;Project Report on Streaming XPath Engine&quot;&gt;TurboXPath&lt;/a&gt; or the recent paper &lt;a href=&quot;http://doi.acm.org/10.1145/1247480.1247512&quot; title=&quot;Efficient Algorithms for Evaluating XPath over Streams&quot;&gt;&amp;#8220;Efficient Algorithms for Evaluating XPath over Streams&amp;#8221;&lt;/a&gt;. These papers really surprised me. The things that prove difficult include backwards axes (which is surprising since information about the previous nodes should be easily available), the descendant axis, and the position function. On the other hand, predicates are absolutely fine (despite requiring a &amp;#8220;look ahead&amp;#8221;). [Weirdly enough, all the papers I looked at contained XPath errors; I guess when you&amp;#8217;re considering abstract algorithms you don&amp;#8217;t have to care about insignificant things like language syntax.]&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;http://doi.acm.org/10.1145/1247480.1247512&quot; title=&quot;Efficient Algorithms for Evaluating XPath over Streams&quot;&gt;paper&lt;/a&gt; I mentioned above actually defines something called Univariate XPath which conforms to the syntax:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Path      := Step | Path Step
Step      := Axis NodeTest
             | Axis NodeTest &#039;[&#039; Predicate &#039;]&#039;
Axis      := &#039;/&#039; | &#039;//&#039;
NodeTest  := Name | &#039;*&#039;
Predicate := Path
             | Path CompOp Path
             | Predicate &#039;and&#039; Predicate
             | Predicate &#039;or&#039; Predicate
             | &#039;not&#039; Predicate                            [sic]
CompOp    := &#039;=&#039; | &#039;!=&#039; | &#039;&amp;gt;&#039; | &#039;&amp;gt;=&#039; | &#039;&amp;lt;&#039; | &#039;&amp;lt;=&#039;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This might be a useful starting point, but it omits useful things like attributes and functions which (as far as I can tell) wouldn&amp;#8217;t effect the applicability of the algorithms. It&amp;#8217;s also worth noting that it will allow paths such as &lt;code&gt;/database[dummy]/record&lt;/code&gt;, which would involve buffering every &lt;code&gt;&amp;lt;record&amp;gt;&lt;/code&gt; until the end tag of the &lt;code&gt;&amp;lt;database&amp;gt;&lt;/code&gt; document element was reached. This illustrates that just because an XPath is theoretically streamable (can be evaluated based on a stream of events) doesn&amp;#8217;t mean it can be evaluated efficiently.&lt;/p&gt;

&lt;p&gt;Some final thoughts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I wonder if there&amp;#8217;s scope for an XPath subset that can be mapped to RELAX NG syntax and therefore evaluated using Brozozowski derivatives&lt;/li&gt;
&lt;li&gt;what about an algorithm that evaluates XPaths using a pipeline process, whereby the stream of events is actually passed through several filters in order to provide the final evaluation&lt;/li&gt;
&lt;li&gt;I&amp;#8217;m sure there&amp;#8217;s preprocessing that could be done on some XPath expressions that would increase their streamability&lt;/li&gt;
&lt;/ul&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/61#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/5">xslt</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/6">pipelines</category>
 <pubDate>Tue, 06 Nov 2007 19:45:45 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">61 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>XProc Last Call</title>
 <link>http://www.jenitennison.com/blog/node/56</link>
 <description>&lt;p&gt;Can you believe it, we&amp;#8217;ve made it to Last Call on XProc (&lt;em&gt;the&lt;/em&gt; XML pipeline language)! That&amp;#8217;s only, like, nine months later than the &lt;a href=&quot;http://www.w3.org/XML/Processing/#schedule&quot; title=&quot;XML Processing Working Group Schedule&quot;&gt;published schedule&lt;/a&gt;, which I reckon is pretty good going. (Then again, I&amp;#8217;m judging it against XSLT 2.0&amp;#8230;)&lt;/p&gt;

&lt;p&gt;I&amp;#8217;m really excited about XProc. I&amp;#8217;ve found that pipelining in XSLT &amp;#8212; splitting up processing tasks into smaller, more manageable processing tasks and stringing them together &amp;#8212; has greatly improved my productivity and the simplicity and maintainability of the code I write. But some processing (such as that used by my &lt;a href=&quot;http://www.jenitennison.com/xslt/utilities/unit-testing/index.html&quot; title=&quot;XSLT Unit Test Framework&quot;&gt;XSLT unit test framework&lt;/a&gt;) can&amp;#8217;t be done in a single transformation, some is on massive documents that you can&amp;#8217;t realistically process with XSLT (and I &lt;em&gt;really&lt;/em&gt; don&amp;#8217;t want to have to write SAX or StAX code to do it), and some I just want to do on all the files in a directory.&lt;/p&gt;

&lt;p&gt;XProc gives me a high-level, declarative, streamable processing language for XML documents. And I think we&amp;#8217;ve struck the right balance between something that&amp;#8217;s simple enough to be easy for everyday tasks, and powerful enough to be able to do the more complex things you might want to do with it.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;It&amp;#8217;s going to be really interesting to see how much of current XSLT use it replaces, and how much it opens up. For example, the ability to use a &lt;a href=&quot;http://www.w3.org/TR/xproc/#p.viewport&quot; title=&quot;XProc: Viewport&quot;&gt;viewport&lt;/a&gt; to isolate a subtree of a document for processing means that XSLT could be used on (the records in) huge database dumps. A bit like &lt;a href=&quot;http://www.saxonica.com/documentation/sourcedocs/serial.html&quot; title=&quot;Saxonica: Streaming Large Documents&quot;&gt;Saxon&amp;#8217;s support for streaming large documents&lt;/a&gt;, but standardised.&lt;/p&gt;

&lt;p&gt;Anyway, &lt;strong&gt;Last Call&lt;/strong&gt;, guys! &lt;a href=&quot;http://www.w3.org/TR/2007/WD-xproc-20070920/&quot; title=&quot;XProc: An XML Pipeline Language: Last Call Working Draft 20 September 2007&quot;&gt;Read the specification.&lt;/a&gt; &lt;a href=&quot;mailto:public-xml-processing-model-comments@w3.org&quot; title=&quot;public-xml-processing-model-comments@w3.org&quot;&gt;Send us your comments.&lt;/a&gt; Write some pipelines. Try them out with &lt;a href=&quot;http://norman.walsh.name/2007/projects/xproc&quot; title=&quot;Norm Walsh&#039;s XML Pipeline Processor&quot;&gt;Norm&amp;#8217;s XML Pipeline Processor&lt;/a&gt;. Heck, write &lt;a href=&quot;http://norman.walsh.name/2007/09/05/xprocTests&quot; title=&quot;Norm Walsh: Bring out your tests&quot;&gt;test cases&lt;/a&gt;! Implement it!&lt;/p&gt;

&lt;p&gt;On the subject of comments, I recommend:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;resending any comments you already made that you don&amp;#8217;t feel we&amp;#8217;ve addressed; as I understand it, we&amp;#8217;re obligated to discuss every comment we receive during Last Call&lt;/li&gt;
&lt;li&gt;sending a separate mail for each technical comment (but a single mail with multiple editorial comments is OK)&lt;/li&gt;
&lt;li&gt;only sending requests that you think are must-haves for version 1.0&lt;/li&gt;
&lt;li&gt;supporting requests with examples&lt;/li&gt;
&lt;li&gt;searching for what you want to comment on in the &lt;a href=&quot;http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/&quot; title=&quot;W3C XML Processing Model WG Discussion Archive&quot;&gt;archives of the XML Processing WG mailing list&lt;/a&gt; just to make sure we haven&amp;#8217;t already discussed it to death&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I hope that doesn&amp;#8217;t sound as if I&amp;#8217;m discouraging comments. If you&amp;#8217;ve got something to say, say it. But be aware that the more comments we receive the longer it&amp;#8217;ll take us to get to Recommendation, so make &amp;#8216;em count. You&amp;#8217;ve got until the 24th October.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/56#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/6">pipelines</category>
 <pubDate>Fri, 21 Sep 2007 22:35:20 +0100</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">56 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Pipelines (of lentils) in action</title>
 <link>http://www.jenitennison.com/blog/node/23</link>
 <description>&lt;p&gt;We went to the &lt;a href=&quot;http://www.sciencemuseum.org.uk/&quot; title=&quot;London Science Museum&quot;&gt;Science Museum&lt;/a&gt; on Monday. In &lt;a href=&quot;http://www.sciencemuseum.org.uk/visitmuseum/galleries/launchpad.aspx&quot; title=&quot;Launch Pad Gallery&quot;&gt;Launch Pad&lt;/a&gt;, there are lots of hands-on activities for children. One of them starts with a big container with lots of lentils in it. You have to fill a bucket with lentils, then hoist the bucket up and along so it meets with a device that flips it over so that the lentils spill down a funnel into a tube and along a chute into another large container. From there there are two &lt;a href=&quot;http://en.wikipedia.org/wiki/Archimedes_screw&quot; title=&quot;Archimedes Screw&quot;&gt;Archimedes screws&lt;/a&gt; linked together that, when you turn their handles, take the lentils into another funnel and down another tube into yet another large-ish container. From there, there are two conveyor belts with scoops attached that take the lentils up to another funnel, down another pipe and back into the first big container, where they can start the entire process again.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;Around this closed system of lentil logistics were about fifteen children. Each of them was doing one of the jobs necessary to make the system work: filling buckets, turning handles, pulling ropes, pushing stubborn lentils down chutes and so on. There was no one ordering anyone about; each child was totally absorbed and content with their single job, and every job was filled (whenever anyone left to do something else, their place was immediately taken by another child).&lt;/p&gt;

&lt;p&gt;It made me wonder how many tasks I&amp;#8217;m engaged in that are ultimately pointless. And whether I really care that they&amp;#8217;re ultimately pointless, so long as I&amp;#8217;m fulfilled doing them. And what the human race could achieve if at least some of us were engaged in a system in which we&amp;#8217;d each do tasks we enjoyed while actually working towards a non-pointless goal. Like, you know, avoiding mass extinction or something.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/23#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/6">pipelines</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/23">environment</category>
 <pubDate>Tue, 29 May 2007 19:27:02 +0100</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">23 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>XTech 2007: Wednesday 16th May Afternoon</title>
 <link>http://www.jenitennison.com/blog/node/19</link>
 <description>&lt;p&gt;Yes, I&amp;#8217;m determined to write up every talk I attended at XTech 2007, so that &lt;em&gt;I&lt;/em&gt; have a record of it if nothing else. On Wednesday afternoon, I attended sessions on microformats, internationalisation and NVDL (as well as giving my own talk, of course).&lt;/p&gt;

&lt;!--break--&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/paper/41&quot; title=&quot;Microformats: the nanotechnology of the semantic web&quot;&gt;Microformats: the nanotechnology of the semantic web&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;&lt;a href=&quot;http://adactio.com/&quot; title=&quot;Jeremy Keith&#039;s Website&quot;&gt;Jeremy Keith&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;This was a supremely well-put-together presentation on &lt;a href=&quot;http://microformats.org/&quot; title=&quot;Microformats Website&quot;&gt;microformats&lt;/a&gt;: beautiful slides, drama and humour, and a reference to &lt;a href=&quot;http://en.wikipedia.org/wiki/Neal_Stephenson&quot; title=&quot;Wikipedia: Neal Stephenson&quot;&gt;Neal Stephenson&amp;#8217;s&lt;/a&gt; &lt;a href=&quot;http://www.amazon.com/Diamond-Age-Illustrated-Primer-Spectra/dp/0553380966&quot; title=&quot;Amazon: Diamond Age&quot;&gt;Diamond Age&lt;/a&gt; (was I really one of only three people in the packed room to have read it?). There was a lot about what microformats are, how they&amp;#8217;re designed, what their niche is (Jeremy was very up-front about the fact they don&amp;#8217;t solve every problem), and how they&amp;#8217;re developed. But there weren&amp;#8217;t any demonstrations of microformat-based applications, which I would have really liked to see. The other thing I thought was worth noting was that Jeremy talked about the dangers of &amp;#8220;grey goo&amp;#8221; (he was using a nanotechnology metaphor): the proliferation of microformats. He expressed the strong desire that the set of microformats be kept small, and even said (I paraphrase) &amp;#8220;Do use semantic class names in your HTML, but don&amp;#8217;t call them microformats [unless they&amp;#8217;ve been through the microformats standardisation process]!&amp;#8221;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://www.holoweb.net/~liam/&quot; title=&quot;Liam Quin&#039;s Website&quot;&gt;Liam Quin&lt;/a&gt; gave a paper entitled &lt;a href=&quot;http://www.idealliance.org/papers/extreme/proceedings/html/2006/Quin01/EML2006Quin01.html&quot; title=&quot;Microformats: Contaminants or Ingredients&quot;&gt;Microformats: Contaminants or Ingredients&lt;/a&gt; at &lt;a href=&quot;http://www.extrememarkup.com/&quot; title=&quot;Extreme Markup Languages&quot;&gt;Extreme&lt;/a&gt; last year, asking what we, as traditional markup geeks, should do about them. Some were very sceptical, saying something along the lines of &amp;#8220;They&amp;#8217;re headed for a trainwreck; and we should sit back, watch it happen, and pick up the pieces.&amp;#8221; Others wanted to celebrate: the fact that tagging has become understood is really good news for the semantic web, open data and all that jazz. &lt;/p&gt;

&lt;p&gt;Both the traditional markup and the microformats community have the same goals: they want to make information easier to search for, to query, to integrate and so on. The microformats approach is to minimise the cost to those supplying information, and to target just a few, very common, kinds of data such as contact information, events and social networks. Traditional markup, on the other hand, aims to cover every single kind of information you might want to make available, and has to worry about issues like validating, styling, and distinguishing between tag sets.&lt;/p&gt;

&lt;p&gt;It seems that a fundamental problem is that the benefits of including semantic markup aren&amp;#8217;t immediately obvious to the supplier. Whether you use semantic class names in HTML or use elements in known namespaces, it&amp;#8217;s purely a matter of faith that this will make your information easier to locate or use. You can&amp;#8217;t know that search engines will include that information in their weighting algorithms, or that people reading your page will have the screen-scraping software necessary to pull anything out. With so little (obvious) benefit, authors will only supply semantic data if the cost is low. Adding class names to existing HTML elements is easy whether a web page is generated by hand or automatically. Adding namespaces and authoring special CSS might not be that much more costly to do, but it&amp;#8217;s much more costly to grok.&lt;/p&gt;

&lt;p&gt;So if we want authors to start putting elements in their own namespaces in their web pages, we need an application that immediately cranks up the benefit of doing so. I have no idea what that is.&lt;/p&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/paper/50&quot; title=&quot;Applying the Internationalization Tag Set&quot;&gt;Applying the Internationalization Tag Set&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;&lt;a href=&quot;http://www.translate.com/&quot; title=&quot;Yves Savourel&#039;s Website&quot;&gt;Yves Savourel&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;This was a good introduction to [a standard] I only knew about vaguely. It&amp;#8217;s definitely worth knowing about the &lt;code&gt;its:*&lt;/code&gt; attributes for defining i18n features such as indicating which content should be translated, which are terms, providing comments for localisation and so on, just in case you need to build those in to new markup languages.&lt;/p&gt;

&lt;p&gt;I also have much admiration for how the ITS standard doesn&amp;#8217;t expect people to completely rework their markup languages to incorporate ITS data. Instead of using the ITS attributes directly in a document, you can use global rules embedded in the document itself, referenced from the document, or embedded in the schema for the document. I think this approach will prove useful in the development of &lt;a href=&quot;http://www.lmnlwiki.org/index.php/Talk:ECLIX#LIX&quot; title=&quot;LMNL in XML&quot;&gt;LIX&lt;/a&gt;, when we get around to formalising it.&lt;/p&gt;

&lt;h2&gt;&lt;a href=&quot;http://2007.xtech.org/public/schedule/detail/48&quot; title=&quot;NVDL - a breath of fresh air for compound document validation&quot;&gt;NVDL - a breath of fresh air for compound document validation&lt;/a&gt;&lt;/h2&gt;

&lt;h3&gt;&lt;a href=&quot;http://xmlguru.cz/&quot; title=&quot;Jirka Kosek&#039;s Website&quot;&gt;Jirka Kosek&lt;/a&gt; &amp;amp; &lt;a href=&quot;http://nalevka.com/&quot; title=&quot;Petr Nálevka&#039;s Website&quot;&gt;Petr Nálevka&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;http://www.nvdl.org/&quot; title=&quot;Namespace-based Validation Dispatching Language&quot;&gt;NVDL&lt;/a&gt; is Part 4 of &lt;a href=&quot;http://www.dsdl.org/&quot; title=&quot;Document Schema Definition Languages&quot;&gt;DSDL&lt;/a&gt;, specifically targeted at organising the validation of documents that incorporate multiple namespaces, such as XHTML documents containing islands of SVG, RDF and MathML. NVDL&amp;#8217;s approach is to identify subtrees within the document that need to be validated against a particular schema. The subtrees don&amp;#8217;t need to only hold one namespace, but often that will be the case.&lt;/p&gt;

&lt;p&gt;The XML Schema wonks in the room (Henry Thompson and Michael Sperberg-McQueen) were a bit befuddled, I think, because with XML Schema you just supply a whole bunch of schema documents to the processor, for different namespaces, and as long as the schemas contain wildcards they&amp;#8217;ll do the right thing. The concept of supplying multiple schemas to a validator isn&amp;#8217;t part of RELAX NG&amp;#8217;s validation approach, so you need something like NVDL if you don&amp;#8217;t want to rework your schema for every combination of namespaces.&lt;/p&gt;

&lt;p&gt;Henry and Michael were particularly concerned about the fact that it means you can override the original schema, allowing elements from foreign namespaces in situations where the original schema hasn&amp;#8217;t allowed them. But as Henry said, it just means that the primary schema you use to define what&amp;#8217;s allowed where is actually an NVDL schema: it&amp;#8217;s not auxiliary validation like Schematron is, but a language for the primary schema you use.&lt;/p&gt;

&lt;p&gt;Later, I wondered how much the &lt;a href=&quot;http://www.w3.org/TR/xproc&quot; title=&quot;XProc: An XML Pipeline Language&quot;&gt;XProc&lt;/a&gt; work would render NVDL irrelevant. After all, XProc can invoke validation of subtrees against multiple external schemas. On the other hand, NVDL&amp;#8217;s syntax is going to be easier to use if that&amp;#8217;s all you want to do. Perhaps someone will write a tool to convert NVDL schemas to XProc pipelines&amp;#8230;&lt;/p&gt;

&lt;p&gt;Actually, Jirka &amp;amp; Petr&amp;#8217;s experience with &lt;a href=&quot;http://sourceforge.net/projects/jnvdl/&quot; title=&quot;Java implementation of NVDL&quot;&gt;JNVDL&lt;/a&gt; is interesting from the XProc viewpoint, in particular the problems that they had with reporting meaningful line numbers when validating subtrees. Something that XProc implementers might want to look at in regard to error reporting with &lt;code&gt;&amp;lt;p:viewport&amp;gt;&lt;/code&gt;.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/19#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/16">markup</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/6">pipelines</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/8">schema</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/4">xtech</category>
 <pubDate>Sun, 20 May 2007 22:52:14 +0100</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">19 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Pipelining in XSLT</title>
 <link>http://www.jenitennison.com/blog/node/3</link>
 <description>&lt;p&gt;I took on a long-term contract back in January which is good fun (of course I have to say that; my boss might read this) and pretty challenging.&lt;/p&gt;

&lt;p&gt;First, I&amp;#8217;m hobbled by having to use XSLT 1.0 (MSXML, what&amp;#8217;s more). I hadn&amp;#8217;t really realised either how fantastic XSLT 2.0 is, nor how used to it I&amp;#8217;ve become, until I started this work. How I miss user-defined functions, sequence constructors and &lt;code&gt;if&lt;/code&gt; expressions.&lt;/p&gt;

&lt;p&gt;Second, my task is to take some XHTML generated from WordprocessingML and (a) turn all the CSS styling relative, so that it uses ems and percentages all over the place rather than points and (b) rationalise the CSS so that common styling appears in the &lt;code&gt;&amp;lt;head&amp;gt;&lt;/code&gt; of the XHTML rather than on individual elements.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;Now I hesitate to say that something is impossible in out-of-the-box XSLT 1.0, but I have developed a sense of when something is impractical. Here, to rationalise styles, we&amp;#8217;re looking at grouping (elements) with a calculated value (their inherited style). That would be OK in XSLT 2.0, with &lt;code&gt;&amp;lt;xsl:for-each-group&amp;gt;&lt;/code&gt; and user-defined functions, but in XSLT 1.0 it means code that is hard to write and impossible to maintain.&lt;/p&gt;

&lt;p&gt;The good news? I&amp;#8217;m allowed to use &lt;code&gt;msxsl:node-set()&lt;/code&gt;, and that means I can use a pipeline. I can break down the complex process into simple steps, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;turn CSS declarations into attributes&lt;/li&gt;
&lt;li&gt;inherit declarations from element and class rules down to individual elements&lt;/li&gt;
&lt;li&gt;inherit declarations down the tree, so that child elements inherit undefined inheritable properties from their parent&lt;/li&gt;
&lt;li&gt;inherit declarations &lt;em&gt;up&lt;/em&gt; the tree, so that parent elements have declarations that are common to their children&lt;/li&gt;
&lt;li&gt;convert those properties that can take relative values into relative values&lt;/li&gt;
&lt;li&gt;update existing and create new class definitions for combinations of declarations that are used many times in the document&lt;/li&gt;
&lt;li&gt;remove declarations that are inherited from element and class rules or from parent elements&lt;/li&gt;
&lt;li&gt;remove elements that don&amp;#8217;t have any new declarations&lt;/li&gt;
&lt;li&gt;turn attributes back into CSS declarations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In my stylesheet, I have a different mode for each step. I can capture the result of processing a document in that mode in a result tree fragment, then convert that into a new document using &lt;code&gt;msxsl:node-set()&lt;/code&gt;, and process that document. I end up with lots of variable declarations like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:variable name=&quot;inheritedDeclarations&quot;&amp;gt;
  &amp;lt;xsl:apply-templates select=&quot;msxsl:node-set($declarationsAsAttributes)&quot;
    mode=&quot;inheritDeclarations&quot; /&amp;gt;
&amp;lt;/xsl:variable&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The trouble with this is that it&amp;#8217;s hard to modify the pipeline and you have to make up unique names for each intermediate variable, which is such a challenge that they end up with meaningless names. So I created a mini pipeline definition in the stylesheet:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;my:pipeline&amp;gt;
  &amp;lt;my:step mode=&quot;styleToAttributes&quot; /&amp;gt;
  &amp;lt;my:step mode=&quot;inheritDeclarations&quot; /&amp;gt;
  ...
&amp;lt;/my:pipeline&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and then process it with a template similar to the following:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:template name=&quot;processPipeline&quot;&amp;gt;
  &amp;lt;xsl:param name=&quot;steps&quot; select=&quot;document(&#039;&#039;)/*/my:pipeline/my:step&quot; /&amp;gt;
  &amp;lt;xsl:param name=&quot;source&quot; select=&quot;/&quot; /&amp;gt;
  &amp;lt;xsl:choose&amp;gt;
    &amp;lt;xsl:when test=&quot;$steps&quot;&amp;gt;
      &amp;lt;xsl:variable name=&quot;mode&quot; select=&quot;$steps[1]/@mode&quot; /&amp;gt;
      &amp;lt;xsl:variable name=&quot;result&quot;&amp;gt;
        &amp;lt;xsl:choose&amp;gt;
          &amp;lt;xsl:when test=&quot;$mode = &#039;styleToAttributes&#039;&quot;&amp;gt;
            &amp;lt;xsl:apply-templates select=&quot;$source&quot; mode=&quot;styleToAttributes&quot; /&amp;gt;
          &amp;lt;/xsl:when&amp;gt;
          &amp;lt;xsl:when test=&quot;$mode = &#039;inheritDeclarations&#039;&quot;&amp;gt;
            &amp;lt;xsl:apply-templates select=&quot;$source&quot; mode=&quot;inheritDeclarations&quot; /&amp;gt;
          &amp;lt;/xsl:when&amp;gt;
          ...
        &amp;lt;/xsl:choose&amp;gt;
      &amp;lt;/xsl:variable&amp;gt;
      &amp;lt;xsl:call-template name=&quot;processPipeline&quot;&amp;gt;
        &amp;lt;xsl:with-param name=&quot;steps&quot; select=&quot;$steps[position() &amp;gt; 1]&quot; /&amp;gt;
        &amp;lt;xsl:with-param name=&quot;source&quot; select=&quot;msxsl:node-set($result)&quot; /&amp;gt;
      &amp;lt;/xsl:call-template&amp;gt;
    &amp;lt;/xsl:when&amp;gt;
    &amp;lt;xsl:otherwise&amp;gt;
      &amp;lt;xsl:copy-of select=&quot;$document&quot; /&amp;gt;
    &amp;lt;/xsl:otherwise&amp;gt;
  &amp;lt;/xsl:choose&amp;gt;
&amp;lt;/xsl:template&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I have to modify the &lt;code&gt;processPipeline&lt;/code&gt; template each time I introduce a new step, but dropping, repeating, or reordering the steps in my pipeline is very easy: I just change the pipeline definition by removing, copying or moving the &lt;code&gt;&amp;lt;my:step&amp;gt;&lt;/code&gt; elements.&lt;/p&gt;

&lt;p&gt;(You can use the same template in other XSLT 1.0 processors by changing the &lt;code&gt;msxsl:node-set()&lt;/code&gt; call into the version of the function used by your processor. And you can use the same code in XSLT 2.0 by just dropping the call to &lt;code&gt;msxsl:node-set()&lt;/code&gt; altogether, though I&amp;#8217;d also change the &lt;code&gt;&amp;lt;xsl:copy-of&amp;gt;&lt;/code&gt; to a &lt;code&gt;&amp;lt;xsl:sequence&amp;gt;&lt;/code&gt; to prevent unnecessary node creation.)&lt;/p&gt;

&lt;p&gt;The only thing that bugs me about this approach is the performance: creating so many nodes seems wasteful, especially when many are straight-forward copies of existing nodes. Now that I understand better what the stylesheet needs to do, perhaps I can merge some of the step together. In any case, in the environment I&amp;#8217;m writing for, performance isn&amp;#8217;t a particularly big issue.&lt;/p&gt;

&lt;p&gt;Of course what I &lt;em&gt;really&lt;/em&gt; want is &lt;a href=&quot;http://www.w3.org/TR/xproc&quot; title=&quot;XML Pipeline Language&quot;&gt;XProc&lt;/a&gt; to be finished and implemented (and adopted by the company I&amp;#8217;m working for). Several of the steps I&amp;#8217;m using could be streamed and might be better implemented in something other than XSLT. But at least breaking down my transformation in this way has made it easier (possible!) to write, more manageable, and more amenable to migration to a proper pipeline in the future.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/3#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/5">xslt</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/6">pipelines</category>
 <pubDate>Sun, 22 Apr 2007 22:28:50 +0100</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">3 at http://www.jenitennison.com/blog</guid>
</item>
</channel>
</rss>
