Pipelining in XSLT

I took on a long-term contract back in January which is good fun (of course I have to say that; my boss might read this) and pretty challenging.

First, I’m hobbled by having to use XSLT 1.0 (MSXML, what’s more). I hadn’t really realised either how fantastic XSLT 2.0 is, nor how used to it I’ve become, until I started this work. How I miss user-defined functions, sequence constructors and if expressions.

Second, my task is to take some XHTML generated from WordprocessingML and (a) turn all the CSS styling relative, so that it uses ems and percentages all over the place rather than points and (b) rationalise the CSS so that common styling appears in the <head> of the XHTML rather than on individual elements.

Now I hesitate to say that something is impossible in out-of-the-box XSLT 1.0, but I have developed a sense of when something is impractical. Here, to rationalise styles, we’re looking at grouping (elements) with a calculated value (their inherited style). That would be OK in XSLT 2.0, with <xsl:for-each-group> and user-defined functions, but in XSLT 1.0 it means code that is hard to write and impossible to maintain.

The good news? I’m allowed to use msxsl:node-set(), and that means I can use a pipeline. I can break down the complex process into simple steps, such as:

  • turn CSS declarations into attributes
  • inherit declarations from element and class rules down to individual elements
  • inherit declarations down the tree, so that child elements inherit undefined inheritable properties from their parent
  • inherit declarations up the tree, so that parent elements have declarations that are common to their children
  • convert those properties that can take relative values into relative values
  • update existing and create new class definitions for combinations of declarations that are used many times in the document
  • remove declarations that are inherited from element and class rules or from parent elements
  • remove elements that don’t have any new declarations
  • turn attributes back into CSS declarations

In my stylesheet, I have a different mode for each step. I can capture the result of processing a document in that mode in a result tree fragment, then convert that into a new document using msxsl:node-set(), and process that document. I end up with lots of variable declarations like:

<xsl:variable name="inheritedDeclarations">
  <xsl:apply-templates select="msxsl:node-set($declarationsAsAttributes)"
    mode="inheritDeclarations" />
</xsl:variable>

The trouble with this is that it’s hard to modify the pipeline and you have to make up unique names for each intermediate variable, which is such a challenge that they end up with meaningless names. So I created a mini pipeline definition in the stylesheet:

<my:pipeline>
  <my:step mode="styleToAttributes" />
  <my:step mode="inheritDeclarations" />
  ...
</my:pipeline>

and then process it with a template similar to the following:

<xsl:template name="processPipeline">
  <xsl:param name="steps" select="document('')/*/my:pipeline/my:step" />
  <xsl:param name="source" select="/" />
  <xsl:choose>
    <xsl:when test="$steps">
      <xsl:variable name="mode" select="$steps[1]/@mode" />
      <xsl:variable name="result">
        <xsl:choose>
          <xsl:when test="$mode = 'styleToAttributes'">
            <xsl:apply-templates select="$source" mode="styleToAttributes" />
          </xsl:when>
          <xsl:when test="$mode = 'inheritDeclarations'">
            <xsl:apply-templates select="$source" mode="inheritDeclarations" />
          </xsl:when>
          ...
        </xsl:choose>
      </xsl:variable>
      <xsl:call-template name="processPipeline">
        <xsl:with-param name="steps" select="$steps[position() > 1]" />
        <xsl:with-param name="source" select="msxsl:node-set($result)" />
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:copy-of select="$document" />
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

I have to modify the processPipeline template each time I introduce a new step, but dropping, repeating, or reordering the steps in my pipeline is very easy: I just change the pipeline definition by removing, copying or moving the <my:step> elements.

(You can use the same template in other XSLT 1.0 processors by changing the msxsl:node-set() call into the version of the function used by your processor. And you can use the same code in XSLT 2.0 by just dropping the call to msxsl:node-set() altogether, though I’d also change the <xsl:copy-of> to a <xsl:sequence> to prevent unnecessary node creation.)

The only thing that bugs me about this approach is the performance: creating so many nodes seems wasteful, especially when many are straight-forward copies of existing nodes. Now that I understand better what the stylesheet needs to do, perhaps I can merge some of the step together. In any case, in the environment I’m writing for, performance isn’t a particularly big issue.

Of course what I really want is XProc to be finished and implemented (and adopted by the company I’m working for). Several of the steps I’m using could be streamed and might be better implemented in something other than XSLT. But at least breaking down my transformation in this way has made it easier (possible!) to write, more manageable, and more amenable to migration to a proper pipeline in the future.

Comments

Welcome to the blogosphere!

Hi Jenni,

Good to see you blogging. I would love to one day soon discuss with you the concept of XML pipelines taken forward for the general power user. I call it user generated pipelines and I think you might find it interesting too.

Ian Forrester - cubicgarden.com

Re: Welcome to the blogosphere!

Thanks for the welcome. Your talk on Pipelines: Plumbing for the next web was one of the talks at XTech that I’d really like to attend, but it’s right opposite my slot (OK, so I could duck out immediately after my talk, but I think that’s generally considered to be bad manners). Anyway, perhaps we can meet up there to discuss your ideas.

Re: Pipelining in XSLT

Hi Jeni,
Welcome to the “blogosphere” :o) !

As for the long <xsl:choose .../>, you can successfully use FXSL (the functions f:compose() and f:compose-flist() ) to ease this pain.

Cheers,
Dimitre Novatchev

Re: Pipelining in XSLT

Hi

Another approach could be to generate the driver module with something like:

Glad to can read your blog, Jeni. Regards,

—drkm

Re: Pipelining in XSLT

Elements didn't pass...

<xsl:template match="step">
  <axsl:variable name="my:{generate-id(.)}">
    <axsl:apply-templates mode="{@mode}" select="
        msxsl:node-set($my:{generate-id(preceding-sibling::step[1])})"/>
  </axsl:variable>
</xsl:template>

--drkm

WebWorks

I work for WebWorks.com and we essentially focus on taking FrameMaker and Word documents to HTML/XML and use msxml to do so. We have our own pipelining built in and a user interface for working with CSS. Most folks purchase ePublisher (the name of the product) because of our ready made formats, but once you get an idea of how things work creating your own format and projects becomes very simple and powerful.

It sounds like you already have things up and running, but if you are working with Word and have any need for users dealing with style it is something to consider.

BTW, this is not supposed to be an advertisement… just one XSLT hacker offering up a helpful tool to another. Great post!

Processing CSS and pipelines with XSLT.

Sounds interesting, I'll have to have a closer read, but I have had some similar experiences myself:

1) I have done some XSLT work recently parsing CSS and applying it to an Atom feed as 'style' attributes so that subsequent templates can convert those styles into actual styles for the destination user agent but I was using XSLT 2. It too dealt with inheritance.

Are you actually parsing CSS files or picking up style attributes?

2) About a year ago I embarked upon an experiment to build a graphics renderer, X-Reyes, using XSLT 2 (it produces a Base64 encode TIFF image) and one of the first things I had to do to break-up the problem was to implement a pipeline processor in XSLT. This is expensive as I was recursively descending the pipeline feeding the result of the previous step into the next but it helps wonderfully in isolating each phase of the rendering process. You can also add stop conditions to catch the output of intermediate steps. As you can see from the blog link above, I had some success but the project remains dormant for now...

...or maybe not, ho hmmm.

Re: Processing CSS and pipelines with XSLT

  1. Yes, I’m doing some CSS parsing, but it’s pretty straight-forward: all the CSS is generated from the WordprocessingML-to-XHTML stylesheet, so it’s very predictable, and I’m not doing all the things that I should be regarding ignoring the contents of string literals and so on. Also, the selectors that I’m dealing with are all very simple (just element names or class names with preceding .) so working out how to apply declarations to individual elements isn’t as complicated as it would be in the general case.

  2. I agree with your observations about being able to isolate particular parts of a pipeline being incredibly useful for debugging.