<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.jenitennison.com/blog" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>xml</title>
 <link>http://www.jenitennison.com/blog/taxonomy/term/14</link>
 <description>The taxonomy view with a depth of 0.</description>
 <language>en</language>
<item>
 <title>My Experience of Web Standards</title>
 <link>http://www.jenitennison.com/blog/node/160</link>
 <description>&lt;p&gt;One of the things that&amp;#8217;s been niggling at the back of my mind since the &lt;a href=&quot;http://schema.org&quot;&gt;schema.org&lt;/a&gt; announcement is how small a role search engine results plays in the wider data sharing efforts that I&amp;#8217;m more familiar with in my work on &lt;a href=&quot;http://www.legislation.gov.uk/&quot;&gt;legislation.gov.uk&lt;/a&gt;, and more generally how my day job experience differs from (what seem to be) more common experiences of development on the web. In this post, I&amp;#8217;m going to talk about that experience, and about the particular problems that I see with the coexistence of microdata and RDFa as a result.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;My day job (the one I actually get paid for) is web development. The site I spend most of my time and effort on is &lt;a href=&quot;http://www.legislation.gov.uk/&quot;&gt;legislation.gov.uk&lt;/a&gt;. This deals with complex content (UK legislation) that has to be presented in multiple formats (users love PDFs of legislation). Our aim is to make the data as reusable as possible by third parties through good, RESTful, web architecture, and we want to use open standards and open source technologies as part of the &lt;a href=&quot;http://www.cabinetoffice.gov.uk/resource-library/open-source-open-standards-and-re-use-government-action-plan&quot;&gt;UK government&amp;#8217;s general strategy&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;legislation.gov.uk is not a global website like Amazon or eBay, but it&amp;#8217;s not small either: it covers 60,000 changing items of legislation, providing point-in-time views for many of them, and with more added every day. It&amp;#8217;s one of the top ten most used UK Government websites, with 2 million visits (about 10-12 million page views) each month and typically about 120 requests/second during the active times of the day. Legislation might sound like a highly specialist interest, but if you &lt;a href=&quot;http://twitter.com/search/legislation.gov.uk&quot;&gt;search for legislation.gov.uk on Twitter&lt;/a&gt; you&amp;#8217;ll see it being referenced over and over by people who want to share what the law says.&lt;/p&gt;

&lt;p&gt;I do not by any means claim that my experience is representative of the wider web. I know that there are large numbers of sites that deal only in data, not documents, and certainly not documents with the kind of rich semantic structure that legislation has. I offer the following discussion as a data point, partly because I can&amp;#8217;t quite believe that legislation.gov.uk is &lt;em&gt;completely&lt;/em&gt; unique in its requirements and partly because obviously my perspective on a bunch of issues arises from this experience.&lt;/p&gt;

&lt;h2&gt;Technology Stacks&lt;/h2&gt;

&lt;p&gt;Legislation items are complex, semi-structured documents. Their natural fit is XML (well, that&amp;#8217;s not quite true &amp;#8212; their natural fit would be something that allowed overlapping markup &amp;#8212; but XML is the closest that we have). So we store it in XML in a native XML database and we use an XML toolset to query it (XQuery) and transform it (XSLT) into various formats including rendering it as PDF (through XSL-FO).&lt;/p&gt;

&lt;p&gt;Our next step for the development of the site involves looking at legislative effects. These form a graph: one item of legislation affects other items of legislation which may in turn affect other items and so on. There are all sorts of other links between items of legislation in terms of commencements, conferred powers and so on. Particularly because we already have well-thought-through URIs for legislation, the natural fit is to use RDF to represent this graph. We already offer a SPARQL endpoint for accessing some aspects of our data, but we expect to expand and develop this over the next few months and to use it as a layer under the website and exposed for reusers, in much the same way as we use the XML database.&lt;/p&gt;

&lt;p&gt;As a government site, we have fairly strict limits on what we can do within our web pages: we have to make sure that they&amp;#8217;re accessible by everyone who wants to view them. We aren&amp;#8217;t able to use technologies that are only available in the latest browsers, but that&amp;#8217;s OK because with the kind of content we deal with, we don&amp;#8217;t have to do anything fancy anyway. So we use pretty basic HTML and CSS and Javascript, because that&amp;#8217;s how you deliver content to end-users on the web (as well as exposing the underlying XML and RDF, to enable others to reuse the data).&lt;/p&gt;

&lt;p&gt;In other words, we use three web stacks for delivering legislation.gov.uk:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the XML stack, which is great for single-source publishing of documents that have more semantic structures than those supported by HTML&lt;/li&gt;
&lt;li&gt;the RDF stack, which is well-suited for metadata about things that are identified by URIs&lt;/li&gt;
&lt;li&gt;the HTML stack, which is absolutely necessary for delivering human-accessible content on the web&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What bemuses me, because of this experience, is that sometimes it appears that the narrative around these technologies is framed in terms of an exclusive choice between them. For example, &lt;a href=&quot;http://twitter.com/mattur/status/89331716430372864&quot;&gt;@mattur asked&lt;/a&gt;:&lt;/p&gt;

&lt;p style=&quot;text-align:center;&quot;&gt;
  &lt;a href=&quot;http://twitter.com/mattur/status/89331716430372864&quot;&gt;&lt;img src=&quot;/blog/files/mattur-tweet.jpg&quot; alt=&quot;@gimsieke @JeniT how may TAG members believe RDF(a) and X(HT)ML are way forward? How many think they aren&#039;t?&quot; /&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;It is as if, if you use XML you &lt;em&gt;cannot&lt;/em&gt; appreciate the utility of error-handling in HTML; or if you use RDF you &lt;em&gt;cannot&lt;/em&gt; understand the need to represent documents in XML; or if you want to utilise HTML fully, you &lt;em&gt;cannot&lt;/em&gt; adopt RDF&amp;#8217;s view of data on the web. That&amp;#8217;s simply not my experience. They each have their role on the web; supporting the use of one does not necessitate rejecting the use of the others.&lt;/p&gt;

&lt;p&gt;It&amp;#8217;s interesting that some of the standards that are most reviled are those that arise at the intersections, where it appears that one technology is trying to encroach on the space of another:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;XHTML at the border of XML and HTML&lt;/li&gt;
&lt;li&gt;RDF/XML at the border of RDF and XML&lt;/li&gt;
&lt;li&gt;RDFa at the border of all three&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At the same time, within legislation.gov.uk, we publish XHTML (because it&amp;#8217;s the natural output from an XML toolchain) and create and process RDF/XML (because it gives us access to that data from within the XML toolchain). We use a small bit of RDFa in the XHTML to indicate the rights under which our information is avaialble, and don&amp;#8217;t yet, but are thinking about using RDFa to mark up non-document semantics within our XML (to enable the XML markup to focus on the document structures that it&amp;#8217;s good at). For all their imperfections, these intersection technologies are useful for managing cross-overs; the problems arise when they overstep their remit and people start to think that &lt;em&gt;all&lt;/em&gt; HTML must be XHTML or &lt;em&gt;all&lt;/em&gt; XML must be RDF/XML or &lt;em&gt;all&lt;/em&gt; RDF must be RDFa.&lt;/p&gt;

&lt;h2&gt;Sharing Scenarios&lt;/h2&gt;

&lt;p&gt;The second thing that I wanted to explore is the experience from legislation.gov.uk of what it&amp;#8217;s like to be a publisher who actively wants to share their data. We need to operate simultaneously at three levels in our data sharing efforts.&lt;/p&gt;

&lt;h3&gt;Large-Scale Consumer-Driven Data Sharing&lt;/h3&gt;

&lt;p&gt;The first target for our data sharing efforts are the search engines. Obviously we&amp;#8217;re not selling anything, but we want people to be able to locate legislation easily when they want it, and we want people who have done the search to be able to see some information about the legislation so that they know that they&amp;#8217;ve located the right item.&lt;/p&gt;

&lt;p&gt;This is large-scale consumer (search engine) driven data sharing, typified by schema.org and Facebook&amp;#8217;s &lt;a href=&quot;http://developers.facebook.com/docs/opengraph/&quot;&gt;Open Graph Protocol&lt;/a&gt; (OGP). There are a few very big data consumers (Google, Microsoft, Yahoo!, Facebook etc) who need to consume data from large numbers of data providers. These consumers obviously can&amp;#8217;t understand &lt;em&gt;everything&lt;/em&gt;, so they determine and document what syntaxes and vocabularies they &lt;em&gt;do&lt;/em&gt; understand and expect publishers to follow.&lt;/p&gt;

&lt;p&gt;The benefits that publishers get from a particular consumer determines which syntax/vocabulary they use; publishers who are particularly keen to show up prettily within search results will target schema.org whereas those who want to be sharable within Facebook will target OGP. Many publishers will want to target both. There is probably a driver towards eventual convergence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;publishers might push back about inserting two lots of very similar data in their pages&lt;/li&gt;
&lt;li&gt;consumers might want to include data from publishers who haven&amp;#8217;t specifically targeted them&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;although there&amp;#8217;s likely to be a period where they coexist, much as there was for VHS and Betamax (and &lt;a href=&quot;http://en.wikipedia.org/wiki/Video_2000&quot;&gt;V2000&lt;/a&gt;, I know, dad) during the early days of video players.&lt;/p&gt;

&lt;p&gt;As &lt;a href=&quot;http://www.jenitennison.com/blog/node/157&quot;&gt;I discussed previously&lt;/a&gt;, these large-scale consumers will be driven by the data that they find in the wild, in all its messy variety. They get relatively little benefit directly from using a generic &lt;em&gt;syntax&lt;/em&gt;, as they are really interested in only a few, pretty generic, &lt;em&gt;vocabularies&lt;/em&gt; for which they have hardwired processing. Indirectly, adopting a generic syntax has benefits in that publishers might find it easier to find tools that enable them to generate it, tutorials about how to use it, and feel that they aren&amp;#8217;t being quite as locked in to something proprietary. However, rejecting data that isn&amp;#8217;t marked up properly using that syntax has no benefit for consumers except in so far as it makes them feel that they are being good community members. &lt;/p&gt;

&lt;p&gt;This is the pattern we see with schema.org (which accepts microdata but, based on its documentation, won&amp;#8217;t reject data that isn&amp;#8217;t fully compliant with it) and with OGP (which accepts a subset of RDFa but doesn&amp;#8217;t reject data that hasn&amp;#8217;t got prefixes properly bound, for example).&lt;/p&gt;

&lt;p&gt;Another point to mention is that there is very little trust in this scenario. The communication between consumers and publishers is very limited, and the consumers will want to protect themselves against accidental or malicious errors that are evident in mismatches between explicit metadata and that which is parsed from the visible content of the page.&lt;/p&gt;

&lt;p&gt;The parallels to HTML and browser vendors are very strong in this type of data sharing.&lt;/p&gt;

&lt;h3&gt;Small-Scale Consumer-Driven Data Sharing&lt;/h3&gt;

&lt;p&gt;A second type of data sharing is again driven by consumers, but this time at a lot smaller and more specialised scale. For legislation.gov.uk, these are services such as &lt;a href=&quot;http://www.glin.gov/&quot;&gt;GLIN&lt;/a&gt;, which is a global legislation registry. Other examples are the recent work that we&amp;#8217;ve done to publish &lt;a href=&quot;http://data.gov.uk/organogram&quot;&gt;UK Government organograms&lt;/a&gt; or &lt;a href=&quot;http://countculture.wordpress.com/&quot;&gt;Chris Taggart&lt;/a&gt;&amp;#8217;s &lt;a href=&quot;http://openelectiondata.org/&quot;&gt;Open Election Data&lt;/a&gt; project. In these cases, there&amp;#8217;s a single, relatively small and specialised consumer and a small number of publishers which are closely coordinated together.&lt;/p&gt;

&lt;p&gt;As in the large-scale case, the consumer ultimately determines the syntax/vocabulary that it recognises, and communicates that to the publishers. However, small-scale consumers typically have close coordination with the publishers, which has a number of side-effects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;consumers may be more able to both apply pressure to and help publishers to do well in their markup&lt;/li&gt;
&lt;li&gt;publishers have the opportunity to feed back directly to the consumer any suggestions that they have about changes to the syntax/vocabulary&lt;/li&gt;
&lt;li&gt;publishers are likely to gain an immediate and tangible benefit from their cooperation, such as visualisations of their data that they otherwise wouldn&amp;#8217;t have seen&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another noteworthy point about small-scale consumers is that they&amp;#8217;re unlikely to have the engineering capability to create a custom parser for a particular syntax, but will instead want to use something off-the-shelf to extract data from pages and into their own backend systems. This, coupled with the closer coordination with publishers, means that they&amp;#8217;re much more likely to stick to a specification, assuming that the off-the-shelf tools do.&lt;/p&gt;

&lt;h3&gt;Publisher-Driven Data Sharing&lt;/h3&gt;

&lt;p&gt;The final type of data sharing is driven by publishers. At legislation.gov.uk, we&amp;#8217;re motivated to make our data available for reuse for transparency/accountability reasons (to help citizens understand the law), efficiency reasons (to help parliament and government departments to publish new legislation better) and economic reasons (to foster innovation in legal publishing). We don&amp;#8217;t have any individual consumers in mind when we publish our data, but have found that simply by publishing it well, we foster reuse.&lt;/p&gt;

&lt;p&gt;In this case, we as publishers are highly motivated to ensure that the data we publish is easily parsed with something off-the-shelf, since that lowers the barrier for potential consumers. Publishers like us are very likely to have unique, specialised, content and need to use a vocabulary that fits closely to our internal data structures as this lowers implementation cost. Consumers can also trust publishers like us: we simply have no motivation to lie in the data that we provide for reuse.&lt;/p&gt;

&lt;h2&gt;Mixed Markup&lt;/h2&gt;

&lt;p&gt;As I&amp;#8217;ve outlined above, publishers like legislation.gov.uk need to target several potential consumers at the same time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;large-scale consumers such as search engines&lt;/li&gt;
&lt;li&gt;small-scale consumers that provide us with a useful service&lt;/li&gt;
&lt;li&gt;specialist consumers that are interested specifically in our data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We cannot use a single vocabulary for all these different purposes. (Well, we could write our own vocabulary and describe mappings to other vocabularies using RDFS, but search engines wouldn&amp;#8217;t read it.)&lt;/p&gt;

&lt;p&gt;We must therefore use a mix of vocabularies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;generic vocabularies about things that search engines care about&lt;/li&gt;
&lt;li&gt;specialised vocabularies for particular small consumers&lt;/li&gt;
&lt;li&gt;site-specific vocabularies for sharing our unique data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It&amp;#8217;s repetitive, but it&amp;#8217;s manageable so long as we have a syntax that enables us to say that an item of legislation is a &lt;code&gt;http://scheme.org/CreativeWork&lt;/code&gt; and a &lt;code&gt;http://purl.org/dc/dcmitype/Text&lt;/code&gt; and a &lt;code&gt;http://www.legislation.gov.uk/def/legislation/Legislation&lt;/code&gt; and allows us to give multiple properties the same value.&lt;/p&gt;

&lt;p&gt;The way things are going at the moment, we might well end up having to use multiple &lt;em&gt;syntaxes&lt;/em&gt; on the same page, as some consumers understand microdata, others consume RDFa, and still others will parse microformats. This leads to more repetition: adding &lt;code&gt;itemprop&lt;/code&gt; for microdata, &lt;code&gt;property&lt;/code&gt; for RDFa and specialised &lt;code&gt;class&lt;/code&gt; attributes for microformats. But worse (much worse), each of the syntaxes uses a different parsing model to create an entity-property-value structure, so not only do we have to learn substantially different markup patterns but our pages quickly become some kind of hideous polyglot mess trying to balance between them.&lt;/p&gt;

&lt;h2&gt;Looking Forward&lt;/h2&gt;

&lt;p&gt;As I said at the start, I&amp;#8217;m fairly sure that my experience at legislation.gov.uk isn&amp;#8217;t representative of the wider web, but I don&amp;#8217;t have a clear idea about just how unrepresentative it is, in terms of technology use or motivations around data sharing. When I read my twitter stream or blogs, there&amp;#8217;s a massive sampling bias, both in terms of who I follow and what I read, but also about who talks about what they&amp;#8217;re doing. (I&amp;#8217;m reminded of &lt;a href=&quot;http://www.codinghorror.com/blog/&quot;&gt;Jeff Atwood&lt;/a&gt;&amp;#8217;s post on the &lt;a href=&quot;http://www.codinghorror.com/blog/2007/11/the-two-types-of-programmers.html&quot;&gt;Two Types of Programmers&lt;/a&gt;: the vast majority of web developers don&amp;#8217;t make a noise about what they do.)&lt;/p&gt;

&lt;p&gt;Taking part in web standardisation today often feels like being part on an ongoing cold war between distinct camps rather than a community working towards common aims. The underlying question seems to be &amp;#8220;who&amp;#8217;s side are you on?&amp;#8221; Every decision and activity is cast as a victory or defeat. Time is wasted on attack and defence, or on raking over past slights and stupidities, rather than on progress. Valid criticism from outside a group cannot be listened to for fear of giving ground, cannot be made within a group where it seems like betrayal.&lt;/p&gt;

&lt;p&gt;It is the &lt;a href=&quot;http://en.wikipedia.org/wiki/Realistic_conflict_theory#The_Robbers_Cave_Experiment&quot;&gt;Robbers Cave Experiment&lt;/a&gt; played out in web standards. As a psychologist, I find it fascinating. As a developer, and particularly one who doesn&amp;#8217;t self-identify with any single group, it is frustrating. As a TAG member, trying to work for the longer-term good of the web, it is worrying, because situations of intergroup conflict lead to &lt;a href=&quot;http://en.wikipedia.org/wiki/Groupthink&quot;&gt;groupthink&lt;/a&gt; and non-optimal solutions.&lt;/p&gt;

&lt;p&gt;As I described above, a non-optimal outcome seems to be the most likely result of the particular microdata vs RDFa conflict for us at legislation.gov.uk. While I know we are not generally representative, I believe that it will be similarly bad for other developers: publishers, consumers and tool implementers.&lt;/p&gt;

&lt;p&gt;This is a problem for all who want to foster data sharing on the web using open standards; it is not one that any one group can fix on their own. It&amp;#8217;s my hope that a balanced task force of individuals with a variety of experience and backgrounds can provide a focus for us all to work together to solve it. If we can&amp;#8217;t, then we have let our prejudice and bias overcome our judgement.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/160#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/44">html5</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/71">microdata</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/31">rdf</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/42">rdfa</category>
 <pubDate>Sun, 24 Jul 2011 16:24:00 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">160 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>XML Summer School 2009</title>
 <link>http://www.jenitennison.com/blog/node/107</link>
 <description>&lt;p&gt;Registration has just opened for this year&amp;#8217;s &lt;a href=&quot;http://xmlsummerschool.com/&quot;&gt;XML Summer School&lt;/a&gt;, held in Oxford on 20-25th September. I&amp;#8217;m teaching a couple of sessions and helping with a workshop on the &lt;a href=&quot;http://xmlsummerschool.com/curriculum2009/xslt-xsl-fo-and-xquery/&quot;&gt;&amp;#8220;XSLT, XSL-FO and XQuery&amp;#8221; track&lt;/a&gt; along with &lt;a href=&quot;http://www.snee.com/bob/&quot;&gt;Bob DuCharme&lt;/a&gt;, &lt;a href=&quot;http://saxonica.blogharbor.com/blog/cmd=view_user/username=mhkay&quot;&gt;Michael Kay&lt;/a&gt; and &lt;a href=&quot;http://www.datypic.com/&quot;&gt;Priscilla Walmsley&lt;/a&gt;. It&amp;#8217;s one of my favourite events, for three reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I get to listen to experts talk about particular technologies in depth. The sessions are particularly good because they&amp;#8217;re provided by people who don&amp;#8217;t just spend all their time training, but actually practice what they&amp;#8217;re talking about, and therefore positively relish the kinds of discussions that normal trainers might shy away from.&lt;/li&gt;
&lt;li&gt;I get to meet a whole bunch of people who are using XML in different areas: publishing, healthcare, government, you name it. In that way it&amp;#8217;s like a conference: many of the most useful conversations happen during the breaks or at the bar.&lt;/li&gt;
&lt;li&gt;I get to go punting, visit Oxford&amp;#8217;s best pubs and dress up for a formal dinner &amp;#8212; more social engagements in a single week than I usually have in a year!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I know a lot of beginners go to the XML Summer School for the &lt;a href=&quot;http://xmlsummerschool.com/curriculum2009/hands-on-intro/&quot;&gt;introduction course&lt;/a&gt;, but to me the real value is for people who are actually using XML on a day to day basis and want to keep on top of the latest tools and technologies that will actually help them do their jobs. I learn something new every year.&lt;/p&gt;

&lt;p&gt;Anyway, I wanted to blog about it because there&amp;#8217;s a discount on &lt;a href=&quot;http://xmlsummerschool.com/registration2009/&quot;&gt;registration&lt;/a&gt; up until 30th June. Grab &amp;#8216;em while you can!&lt;/p&gt;

&lt;!--break--&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/107#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/5">xslt</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/45">xmlsummerschool09</category>
 <pubDate>Thu, 25 Jun 2009 21:03:14 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">107 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>My Perfect XML-Based Publishing Platform</title>
 <link>http://www.jenitennison.com/blog/node/105</link>
 <description>&lt;p&gt;For the last several months, I&amp;#8217;ve been working on a project at &lt;a href=&quot;http://www.tso.co.uk/&quot;&gt;TSO&lt;/a&gt; for publishing &lt;a href=&quot;http://www.opsi.gov.uk/legislation&quot;&gt;UK legislation&lt;/a&gt; using a native XML database (eg &lt;a href=&quot;http://www.exist-db.org/&quot;&gt;eXist&lt;/a&gt; or &lt;a href=&quot;http://www.marklogic.com/&quot;&gt;MarkLogic Server&lt;/a&gt;) with some middleware (eg &lt;a href=&quot;http://www.orbeon.com/&quot;&gt;Orbeon&lt;/a&gt; or &lt;a href=&quot;http://cocoon.apache.org/&quot;&gt;Cocoon&lt;/a&gt;). It&amp;#8217;s a powerful and flexible approach that&amp;#8217;s built on declarative languages like XQuery, XSLT, and XML pipelines; you can see it in action with the &lt;a href=&quot;http://sandbox.opsi.gov.uk/&quot;&gt;Command and House Papers demo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But the killer platform isn&amp;#8217;t quite here yet, partly because the specs aren&amp;#8217;t quite done. Both Orbeon and Cocoon use XML pipelines, but they use different languages to define them; &lt;a href=&quot;http://www.w3.org/TR/xproc/&quot;&gt;XProc&lt;/a&gt; is just around the corner. XML databases are all over the place in their conformance to XQuery, its optional features and the not-quite-finalised specs for free-text searching and updating.&lt;/p&gt;

&lt;p&gt;People talk about how productive you can be using &lt;a href=&quot;http://rubyonrails.org/&quot;&gt;Ruby on Rails&lt;/a&gt; or &lt;a href=&quot;http://www.djangoproject.com/&quot;&gt;Django&lt;/a&gt;, and they work great for publishing data you can store in a relational database. What &lt;em&gt;we&lt;/em&gt; need is a similarly easy-to-use platform for document-oriented, XML-based content. This is my wish-list.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;The killer platform would have a configuration mechanism for mapping HTTP requests that it receives onto XProc pipelines. The pipeline that would be used could be based on one or more of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the HTTP method&lt;/li&gt;
&lt;li&gt;the requested URI&lt;/li&gt;
&lt;li&gt;any HTTP header&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pipelines would have a signature like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a primary &amp;#8216;source&amp;#8217; input that encodes the HTTP method, headers and body of the request; this would use the &lt;code&gt;&amp;lt;c:request&amp;gt;&lt;/code&gt; element used within the &lt;a href=&quot;http://www.w3.org/XML/XProc/docs/langspec.html#c.http-request&quot;&gt;&lt;code&gt;p:http-request&lt;/code&gt; XProc step&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;a parameter input populated by parsing the URI against a simplified version of &lt;a href=&quot;http://tools.ietf.org/html/draft-gregorio-uritemplate-03&quot;&gt;URI templates&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;a primary &amp;#8216;result&amp;#8217; output that is an XML version of the response body&lt;/li&gt;
&lt;li&gt;a &amp;#8216;response&amp;#8217; output that encodes the HTTP status code and headers of the response; this would use the &lt;code&gt;&amp;lt;c:response&amp;gt;&lt;/code&gt; element used within the &lt;code&gt;p:http-request&lt;/code&gt; XProc step&lt;/li&gt;
&lt;li&gt;a &amp;#8216;serialize&amp;#8217; output that holds a &lt;code&gt;&amp;lt;c:parameters&amp;gt;&lt;/code&gt; element containing parameters for serializing the result body; possible serialisations would include serialising XSL-FO as PDF and SVG as JPEG, for example.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pipeline engine would of course include efficient implementations of all the required steps, most importantly XSLT 2.0.&lt;/p&gt;

&lt;p&gt;The platform would have an easy mechanism for invoking queries on its XML store through an implementation-defined step that was similar to the &lt;a href=&quot;http://www.w3.org/XML/XProc/docs/langspec.html#c.xquery&quot;&gt;&lt;code&gt;p:xquery&lt;/code&gt; XProc step&lt;/a&gt;. The step might have the signature:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a primary &amp;#8216;query&amp;#8217; input for the query itself (like the &amp;#8216;query&amp;#8217; input for &lt;code&gt;p:xquery&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;a parameter input for specifying the values of external variables within the query&lt;/li&gt;
&lt;li&gt;a &amp;#8216;database&amp;#8217; option for specifying the database to query&lt;/li&gt;
&lt;li&gt;a primary &amp;#8216;result&amp;#8217; output for the result of the query, this being a sequence of documents resulting from the query&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The XML store itself would support:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://www.w3.org/TR/xquery/&quot;&gt;XQuery 1.0&lt;/a&gt;, with no extensions to the syntax except those permitted by that specification&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.w3.org/TR/xpath-full-text-10/&quot;&gt;XQuery and XPath Full Text 1.0&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.w3.org/TR/xquery-update-10/&quot;&gt;XQuery Update Facility 1.0&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It would also support setting up indexes on any expression for a particular kind of context node (usually an element); these would work like keys in XSLT, except that the XQuery engine would automatically detect when the index could be applied. For example, it would be possible to set up a key on a document for the expression &lt;code&gt;substring(//dc:identifier, 7, 2)&lt;/code&gt; and if the query used exactly this expression, the index would be used.&lt;/p&gt;

&lt;p&gt;The platform would provide an extensible architecture such that it would be possible to set up replicated XML store(s) on separate servers from the main pipeline engine. It would cache the results of queries against the XML store. It would serve up static content such as images and scripts bypassing the pipeline. It would be configured using files, so that it was easy to transfer a configuration between development and production platforms and to version control configurations through normal means.&lt;/p&gt;

&lt;p&gt;Have you used (or developed!) anything that comes close? What&amp;#8217;s on your wish-list?&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/105#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/6">pipelines</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/22">rest</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/29">xquery</category>
 <pubDate>Fri, 29 May 2009 21:40:20 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">105 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Your Website is Your API: Quick Wins for Government Data</title>
 <link>http://www.jenitennison.com/blog/node/100</link>
 <description>&lt;p&gt;&lt;em&gt;This is the talk I prepared for the UKGovWeb Barcamp, in blog form. It&amp;#8217;s probably better this way. Most of what&amp;#8217;s written here seems blindingly obvious to me, and probably to most readers of this blog, but maybe Google will direct someone here who finds it useful.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Working with public-sector information on the web, one of the things that I take an interest in is making government data freely available for anyone to re-present, mash-up, analyse and generally do whatever they want to do. This post is born out of a feeling that the people who control data don&amp;#8217;t realise that the smallest changes can be beneficial: they don&amp;#8217;t need to do &lt;em&gt;everything&lt;/em&gt; right now, just &lt;em&gt;something&lt;/em&gt;.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;There are three fundamental things that you need to do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;identify&lt;/strong&gt; the data that you control&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;represent&lt;/strong&gt; that data in a way that people can use&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;expose&lt;/strong&gt; the data to the wider world&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;but you can choose the degree to which you do each of these things.&lt;/p&gt;

&lt;h2&gt;Identify&lt;/h2&gt;

&lt;p&gt;Take a look at what data you have some kind of responsibility for or control over. You might be a PDF containing a table of schools in the local area and their intakes over the last couple of years. You might have a spreadsheet of the amount of money assigned to maintaining the playgrounds within the borough. You might have a database of company information. You might have a set of HTML agendas for court cases.&lt;/p&gt;

&lt;p&gt;The first step is simply to identify what the information is &lt;em&gt;about&lt;/em&gt;. Schools, playgrounds, companies, court cases &amp;#8212; each row in your table or spreadsheet or database, or each section in your document will be about something. We call this a &lt;strong&gt;resource&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To play nicely with the web, every resource should have an &lt;strong&gt;identifier&lt;/strong&gt;. A Uniform Resource Identifier. A URI. That URI tells us where we can find information about the resource (we&amp;#8217;ll get to what those look like later). So your second step is to work out URIs for each of your resources.&lt;/p&gt;

&lt;p&gt;Now, there are actually three levels of URIs that you can care about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;identifier URIs&lt;/li&gt;
&lt;li&gt;document URIs&lt;/li&gt;
&lt;li&gt;representation URIs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You probably already have document and representation URIs on your web server. Representation URIs are URIs for particular formats and languages and views of the information that you make available. Document URIs are typically the same URI without an extension; web servers use &lt;strong&gt;content negotiation&lt;/strong&gt; to work out which representation to serve up when a web browser asks for the page at a particular document URI.&lt;/p&gt;

&lt;p&gt;So you already have a URI for the PDF that contains the table of schools, for the Excel spreadsheet about the playgrounds. You already have URIs for the results of a particular query on your database, and of course the HTML pages that you deliver have URIs already. That&amp;#8217;s all in place. You don&amp;#8217;t want to change it.&lt;/p&gt;

&lt;p&gt;But identifier URIs are what are really important when it comes to opening up your data. They shift the focus from the documents that you serve to the resources that they are about. &lt;strong&gt;By assigning URIs to resources, you enable other people to talk about them. Even if that&amp;#8217;s all you do, you have done good.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For example, if &lt;a href=&quot;http://www.companieshouse.co.uk/&quot; title=&quot;Companies House&quot;&gt;Companies House&lt;/a&gt; stated that companies could be referred to using URIs of the form &lt;code&gt;http://www.companieshouse.co.uk/id/company/{registeredNumber}&lt;/code&gt; then other people who needed to talk about companies (websites containing customer feedback, monitoring companies going into receivership, displaying stock price information, whatever) could use these URIs whenever they referred to a company. If all websites that make data available about companies point to the same identifier for a company, then it&amp;#8217;s possible to pull that data together very easily.&lt;/p&gt;

&lt;p&gt;Now the URIs that you use should be short, clean, readable, hackable, hierarchical and so on. If you can, &lt;strong&gt;you should use a natural identifier for the resource within the URI for that resource&lt;/strong&gt;. So URIs for registered companies should use their registered number. URIs for schools should use the school&amp;#8217;s unique reference number (URN). URIs for playgrounds could use the name of the playground (scoped within the council responsible for the playground). URIs for court cases should include the court, the year, and the case number. And so on.&lt;/p&gt;

&lt;p&gt;Remember as you&amp;#8217;re creating these identifier URIs that they are nothing to do with the structure of your website or the user&amp;#8217;s experience of navigating through your website. For navigation, you might want to group schools into primary, secondary and sixth-form, but you shouldn&amp;#8217;t do that in the identifier URIs. To help decide, imagine someone wanting to construct a URI and the information that they need to do so. If any of the information they need can be derived from other information (as a school&amp;#8217;s type can be derived from its URN), leave it out.&lt;/p&gt;

&lt;p&gt;When you&amp;#8217;re doing this, you might realise that actually you shouldn&amp;#8217;t be the one in control of these URIs. If you&amp;#8217;re not the one assigning the registered number, URN or case number then there&amp;#8217;s probably a higher authority that does assign those (real-world) identifiers. Don&amp;#8217;t let that stop you creating URIs &amp;#8212; you&amp;#8217;ll still find them useful for identifying &lt;em&gt;your&lt;/em&gt; information about that particular resource &amp;#8212; but do look to see if there are existing URIs that you could point to and reuse whatever scheme they&amp;#8217;re using if there are.&lt;/p&gt;

&lt;h2&gt;Represent&lt;/h2&gt;

&lt;p&gt;So I said in the last section that assigning URIs to resources was useful. And it is. But it&amp;#8217;s even more useful if you provide some kind of response when someone &lt;strong&gt;requests&lt;/strong&gt; those URIs. A request for a URI can be done by a web browser or one of those search-engine-spider-things that crawls the web looking for data. Requests are done on the web using HTTP (hypertext transfer protocol), specifically using a &lt;strong&gt;GET&lt;/strong&gt; request, which means &amp;#8220;get this resource&amp;#8221;.&lt;/p&gt;

&lt;p&gt;When a web server receives a request, it sends back a &lt;strong&gt;response&lt;/strong&gt;. The first part of the response is a &lt;strong&gt;status code&lt;/strong&gt; that tells the browser, spider, or whatever issued the request, generally what kind of response it is. Now when a browser says &amp;#8220;get this company&amp;#8221; or &amp;#8220;get this school&amp;#8221; a web server should either respond with a &lt;code&gt;404 Not Found&lt;/code&gt; response or a &lt;code&gt;303 See Other&lt;/code&gt; response.&lt;/p&gt;

&lt;p&gt;If the company or school doesn&amp;#8217;t exist, a web server should respond with a &lt;code&gt;404 Not Found&lt;/code&gt; response. It&amp;#8217;s actually really useful to give appropriate &lt;code&gt;404 Not Found&lt;/code&gt; responses, because it tells whoever made the request that the resource (company/school/playground/court case) doesn&amp;#8217;t exist. This can act as simple validation: if I&amp;#8217;m building a site that parents can use to rate schools, and a parent enters a URN into a form, I can construct a URI based on that URN, try to GET the information about that school, and if I get a &lt;code&gt;404 Not Found&lt;/code&gt; response then I know that the parent has entered an invalid URN.&lt;/p&gt;

&lt;p&gt;If the company or school exists, a web server should respond with a &lt;code&gt;303 See Other&lt;/code&gt; response that points the browser to a &lt;em&gt;document URI&lt;/em&gt; that contains information about the company or school. After all, the web server can&amp;#8217;t very well deliver the company or school itself into your lap; all it can do is give you &lt;em&gt;information&lt;/em&gt; about it. &lt;code&gt;303 See Other&lt;/code&gt; means &amp;#8220;if you want information about that, see that other thing over there instead&amp;#8221;. The &amp;#8220;other thing over there&amp;#8221; will be a document of some kind. It might be the PDF that contains information about the school, or the spreadsheet that contains information about the playground.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simply giving a yes-this-exists or no-this-doesn&amp;#8217;t-exist response is useful. Even if that&amp;#8217;s all you do, you have done good.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It&amp;#8217;s even more useful, though, if you can make the information that you have about the school, playground, company, court case or whatever, available in a format that can be processed by a computer reasonably easily. PDFs are really really hard to extract information from, so do everything you can not to use PDFs. Word documents and Excel spreadsheets are next worse; if you have to use them, keep them really really simple and definitely don&amp;#8217;t use Word Art or embed images to display your data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You should always make your data available in HTML.&lt;/strong&gt; Try to make it as clean and regular as you can; use &lt;a href=&quot;http://www.microformats.org/&quot; title=&quot;microformats&quot;&gt;microformats&lt;/a&gt; to indicate information about people, places and events. If you want to push the boat out, use &lt;a href=&quot;http://www.w3.org/TR/xhtml-rdfa-primer/&quot; title=&quot;W3C: RDFa Primer&quot;&gt;RDFa&lt;/a&gt; to mark up the data in your page even more explicitly.&lt;/p&gt;

&lt;p&gt;The great thing about HTML is that it&amp;#8217;s human readable as well as (if you do it well) machine readable. You can also make your data available in explicitly machine-readable forms as well if you want: XML, JSON, RDF/XML, whatever floats your boat. If there are already standard formats or ontologies for the kind of data that you&amp;#8217;re making available, then use them, certainly, but it&amp;#8217;s very likely that there aren&amp;#8217;t. And in comparison to the nightmare of extracting anything useful from a PDF, it&amp;#8217;s easy to transform between different formats, so you only have to concern yourself with different formats if you want to.&lt;/p&gt;

&lt;p&gt;If you do provide multiple formats for your data, you should use server-driven content negotiation to deliver the data in an appropriate format to whatever&amp;#8217;s requesting it. So a web browser will request HTML; a semantic web crawler will request RDF/XML; a Javascript program will request JSON and so on. The &lt;code&gt;200 OK&lt;/code&gt; response that the web server sends with your data should include a &lt;code&gt;Content-Location&lt;/code&gt; header that gives the representation URI of whichever format is being returned, and a &lt;code&gt;Vary&lt;/code&gt; header that tells caches how it&amp;#8217;s decided which representation to serve up.&lt;/p&gt;

&lt;h2&gt;Expose&lt;/h2&gt;

&lt;p&gt;All the good work identifying resources and representing them comes to naught if you don&amp;#8217;t expose it. You can (and should!) tell other people about the URIs that you&amp;#8217;ve developed, but the best way to give them exposure is to use them yourself, within your website. &lt;strong&gt;Simply using the URIs within your website gives them exposure. Even if that&amp;#8217;s all you do, you have done good.&lt;/strong&gt; People who are interested in linking to you will look at your site and they will learn about your URI scheme from your use of it.&lt;/p&gt;

&lt;p&gt;The identifier URIs that you&amp;#8217;ve created might not be particularly easy to generate. For example, with the URI scheme that I suggested above for Companies House, unless you happen to know that Tesco Plc&amp;#8217;s registered company number is &lt;code&gt;00445790&lt;/code&gt;, you&amp;#8217;re not going to be able to get to information about them. So &lt;strong&gt;you should have a way of searching&lt;/strong&gt; based on something that people &lt;em&gt;will&lt;/em&gt; know, such as the name of the company. Use an HTML search form that makes GET requests like&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://www.companieshouse.gov.uk/company?name=Tesco Plc
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The response should be a &lt;code&gt;302 Found&lt;/code&gt; that redirects (using the &lt;code&gt;Location&lt;/code&gt; header) to the true identifier URI for the company (&lt;code&gt;http://www.companieshouse.gov.uk/id/company/00445790&lt;/code&gt;). If it&amp;#8217;s not possible to identify a single resource from the search string (for example, there are lots of companies with &amp;#8216;Tesco&amp;#8217; in their name), then the correct response is a &lt;code&gt;300 Multiple Choices&lt;/code&gt; that provides a list of links to the possible URIs (in HTML).&lt;/p&gt;

&lt;p&gt;There are other ways to help people find your data. If there aren&amp;#8217;t gazillions of resources, you can list the URIs within your &lt;strong&gt;sitemap&lt;/strong&gt;, which will make them discoverable by search engines. You can also list them on web pages and, especially for data that&amp;#8217;s constantly updating, in (Atom) &lt;strong&gt;feeds&lt;/strong&gt; which you link to from your HTML pages. Use metadata within the pages and feeds to help the consumers of your data work out what&amp;#8217;s relevant to them.&lt;/p&gt;

&lt;p&gt;To help even more, slice your Atom feeds into portions that different consumers of your data are going to be interested in. Slice by type, by area, by subject. That way people can stay up to date with just the resources that they&amp;#8217;re interested in, and not be bothered with information about those that are irrelevant to them.&lt;/p&gt;

&lt;h2&gt;That&amp;#8217;s It&lt;/h2&gt;

&lt;p&gt;What I&amp;#8217;ve tried to describe here is the minimum that you need to do to help people use the information you have, and some of the other things that you can do to make it even more useful. Here are some things that you shouldn&amp;#8217;t do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;don&amp;#8217;t wait for someone else to define a URI scheme for the things that you want to talk about&lt;/li&gt;
&lt;li&gt;don&amp;#8217;t wait for someone else to define an XML schema or RDF ontology for your data&lt;/li&gt;
&lt;li&gt;don&amp;#8217;t wait until you can find the time and money to do it all &amp;#8220;properly&amp;#8221;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Just do what you can, now.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/100#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/18">atom</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/42">rdfa</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/22">rest</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/43">ukgc09</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/12">web</category>
 <pubDate>Sun, 01 Feb 2009 09:28:57 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">100 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Working With Fragmented Overlapping Markup</title>
 <link>http://www.jenitennison.com/blog/node/98</link>
 <description>&lt;p&gt;In my &lt;a href=&quot;http://www.jenitennison.com/blog/node/97&quot; title=&quot;Jeni&#039;s Musings: Representing Overlap in XML&quot;&gt;last post&lt;/a&gt; I talked about different techniques for representing overlap within XML. One technique is fragmentation. In the work that I&amp;#8217;ve been doing, I&amp;#8217;ve been using milestone-based formats similar to &lt;a href=&quot;http://www.lmnl.org/wiki/index.php/ECLIX&quot; title=&quot;LMNL Wiki: Extended Canonical LMNL in XML&quot;&gt;ECLIX&lt;/a&gt;, but my eyes were opened at the &lt;a href=&quot;http://ilps.science.uva.nl/PoliticalMashup/2008/11/workshop-on-multi-dimensional-markup/&quot; title=&quot;Workshop on Multi-Dimensional Markup&quot;&gt;GODDAG workshop&lt;/a&gt;: fragmentation would make overlap so much easier to process in XSLT, especially when dealing with localised overlap such as revision or comment markup.&lt;/p&gt;

&lt;p&gt;But how could fragmentation be used with full-on overlap? I had a little play and came up with &lt;a href=&quot;http://www.jenitennison.com/blog/files/fragmentation-utils.xsl&quot; title=&quot;fragmentation-utils.xsl&quot;&gt;some XSLT to demonstrate&lt;/a&gt;.&lt;/p&gt;

&lt;!--break--&gt;

&lt;h2&gt;Fragmentation Example&lt;/h2&gt;

&lt;p&gt;First, an example of how to represent overlap using fragments. Using fragments for overlap comes straight out of the support for &lt;a href=&quot;http://www.tei-c.org/release/doc/tei-p5-doc/en/html/SA.html#SAAG&quot; title=&quot;TEI: Linking, Segmentation, and Alignment: Aggregation&quot;&gt;aggregation within TEI&lt;/a&gt; where they&amp;#8217;re used to not only represent overlapping structures but to construct completely new &amp;#8220;virtual&amp;#8221; elements that are not necessarily contiguous (and may even contain fragments in different orders from how they appear in the text). In TEI, they usually use the &lt;code&gt;next&lt;/code&gt; and &lt;code&gt;prev&lt;/code&gt; attributes to point from one fragment to another in order to reconstruct the element.&lt;/p&gt;

&lt;p&gt;In the example here, I&amp;#8217;ve done something slightly different, namely to use an ID in the &lt;code&gt;http://www.jenitennison.com/xslt/fragmentation&lt;/code&gt; namespace to link the elements: all elements with the same ID are actually the same element. Here&amp;#8217;s what it looks like.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;book xmlns:f=&quot;http://www.jenitennison.com/xslt/fragmentation&quot;&amp;gt;
    &amp;lt;page f:id=&quot;page199&quot; n=&quot;199&quot;&amp;gt;
        ...
    &amp;lt;/page&amp;gt;
    &amp;lt;poem&amp;gt;
        &amp;lt;page f:id=&quot;page199&quot; n=&quot;199&quot;&amp;gt;
            &amp;lt;title&amp;gt;
                &amp;lt;pl&amp;gt;Recueillement&amp;lt;/pl&amp;gt;
            &amp;lt;/title&amp;gt;
            &amp;lt;stanza&amp;gt;
                &amp;lt;sl&amp;gt;&amp;lt;s&amp;gt;&amp;lt;pl&amp;gt;Sois sage, ô ma douleur, et tiens-toi plus &amp;lt;/pl&amp;gt;
                                                          &amp;lt;pl&amp;gt;tranquille.&amp;lt;/pl&amp;gt;&amp;lt;/s&amp;gt;&amp;lt;/sl&amp;gt;
                &amp;lt;s&amp;gt;
                    &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Tu réclamais le Soir; il descend; le voici:&amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
                    &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Une atmosphère obscure enveloppe la ville,&amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
                    &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Aux uns portant la paix, aux autres le souci.&amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
                &amp;lt;/s&amp;gt;
            &amp;lt;/stanza&amp;gt;
            &amp;lt;stanza&amp;gt;
              &amp;lt;s f:id=&quot;s3&quot;&amp;gt;
                  &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Pendant que des mortels la multitude vile,&amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
                  &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Sous le fouet du Plaisir, ce bourreau sans merci,&amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
                  &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Va cueillir des remords dans la fête servile,&amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
                  &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Ma douleur, donne moi la main; viens par ici,&amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
        &amp;lt;/s&amp;gt;
            &amp;lt;/stanza&amp;gt;
            &amp;lt;stanza&amp;gt;
                &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;&amp;lt;s f:id=&quot;s3&quot;&amp;gt;Loin d&#039;eux. &amp;lt;/s&amp;gt;&amp;lt;s f:id=&quot;s4&quot;&amp;gt;Vois se pencher les défuntes &amp;lt;/s&amp;gt;&amp;lt;/pl&amp;gt;
                                                                        &amp;lt;pl&amp;gt;&amp;lt;s f:id=&quot;s4&quot;&amp;gt;Années, &amp;lt;/s&amp;gt;&amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
                &amp;lt;s f:id=&quot;s4&quot;&amp;gt;
                  &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Sur les balcons du ciel, en robes surannées; &amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
                  &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Surgir du fond des eaux le Regret souriant; &amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
        &amp;lt;/s&amp;gt;
            &amp;lt;/stanza&amp;gt;
        &amp;lt;/page&amp;gt;
        &amp;lt;page f:id=&quot;page200&quot;&amp;gt;
            &amp;lt;stanza&amp;gt;
              &amp;lt;s f:id=&quot;s4&quot;&amp;gt;
                  &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Le Soleil moribund s&#039;endormir sous une arche, &amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
                &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Et, comme un long linceul traînant à l&#039;Orient, &amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
                &amp;lt;sl&amp;gt;&amp;lt;pl&amp;gt;Entends, ma chère, entends la douce Nuit qui &amp;lt;/pl&amp;gt;
                                                               &amp;lt;pl&amp;gt;marche.&amp;lt;/pl&amp;gt;&amp;lt;/sl&amp;gt;
                &amp;lt;/s&amp;gt;
            &amp;lt;/stanza&amp;gt;
        &amp;lt;/page&amp;gt;
    &amp;lt;/poem&amp;gt;
&amp;lt;/book&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can see that I&amp;#8217;ve played pretty fast and loose here with the markup language. The &lt;code&gt;&amp;lt;s&amp;gt;&lt;/code&gt; elements can be children of &lt;code&gt;&amp;lt;stanza&amp;gt;&lt;/code&gt; or &lt;code&gt;&amp;lt;sl&amp;gt;&lt;/code&gt; or even &lt;code&gt;&amp;lt;pl&amp;gt;&lt;/code&gt;, purely depending on what happens to most neatly contain them. This makes the XML inconsistent, but less verbose than it would otherwise be. Elements that are actually fragments have &lt;code&gt;f:id&lt;/code&gt; attributes, and multiple elements may have the same &lt;code&gt;f:id&lt;/code&gt;; this is precisely what&amp;#8217;s used to work out that they&amp;#8217;re the same element.&lt;/p&gt;

&lt;h2&gt;Desired Rendering&lt;/h2&gt;

&lt;p&gt;So what would we like to do when processing this? Say we wanted to create an HTML rendition of the poem, looking something like:&lt;/p&gt;

&lt;blockquote style=&quot;width: 30em; &quot;&gt;
  &lt;hr /&gt;
  &lt;p style=&quot;text-align: right; &quot;&gt;page 199&lt;/p&gt;
  &lt;h3&gt;Recueillement&lt;/h3&gt;
  &lt;ol start=&quot;1&quot;&gt;
     &lt;li&gt;
        &lt;p&gt;Sois sage, ô ma douleur, et tiens-toi plus &lt;/p&gt;
        &lt;p style=&quot;text-align: right; &quot;&gt;tranquille.&lt;/p&gt;
     &lt;/li&gt;
     &lt;li&gt;
        &lt;p&gt;Tu réclamais le Soir; il descend; le voici:&lt;/p&gt;
     &lt;/li&gt;
     &lt;li&gt;
        &lt;p&gt;Une atmosphère obscure enveloppe la ville,&lt;/p&gt;
     &lt;/li&gt;
     &lt;li&gt;
        &lt;p&gt;Aux uns portant la paix, aux autres le souci.&lt;/p&gt;
     &lt;/li&gt;
  &lt;/ol&gt;
  &lt;ol start=&quot;5&quot;&gt;
     &lt;li&gt;
        &lt;p&gt;Pendant que des mortels la multitude vile,&lt;/p&gt;
     &lt;/li&gt;
     &lt;li&gt;
        &lt;p&gt;Sous le fouet du Plaisir, ce bourreau sans merci,&lt;/p&gt;
     &lt;/li&gt;
     &lt;li&gt;
        &lt;p&gt;Va cueillir des remords dans la fête servile,&lt;/p&gt;
     &lt;/li&gt;
     &lt;li&gt;
        &lt;p&gt;Ma douleur, donne moi la main; viens par ici,&lt;/p&gt;
     &lt;/li&gt;
  &lt;/ol&gt;
  &lt;ol start=&quot;9&quot;&gt;
     &lt;li style=&quot;background-color: yellow; &quot;&gt;
        &lt;p&gt;Loin d&amp;#8217;eux. Vois se pencher les défuntes &lt;/p&gt;
        &lt;p style=&quot;text-align: right; &quot;&gt;Années, &lt;/p&gt;
     &lt;/li&gt;
     &lt;li&gt;
        &lt;p&gt;Sur les balcons du ciel, en robes surannées; &lt;/p&gt;
     &lt;/li&gt;
     &lt;li&gt;
        &lt;p&gt;Surgir du fond des eaux le Regret souriant; &lt;/p&gt;
     &lt;/li&gt;
  &lt;/ol&gt;
  &lt;hr /&gt;
  &lt;p style=&quot;text-align: right; &quot;&gt;page 200&lt;/p&gt;
  &lt;ol start=&quot;12&quot;&gt;
     &lt;li&gt;
        &lt;p&gt;Le Soleil moribund s&amp;#8217;endormir sous une arche, &lt;/p&gt;
     &lt;/li&gt;
     &lt;li&gt;
        &lt;p&gt;Et, comme un long linceul traînant à l&amp;#8217;Orient, &lt;/p&gt;
     &lt;/li&gt;
     &lt;li&gt;
        &lt;p&gt;Entends, ma chère, entends la douce Nuit qui &lt;/p&gt;
        &lt;p style=&quot;text-align: right; &quot;&gt;marche.&lt;/p&gt;
     &lt;/li&gt;
  &lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;The logic behind this rendition is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;process the pages; for each one, create a horizontal rule followed by a paragraph giving the page number&lt;/li&gt;
&lt;li&gt;process the parts of the poem within each page; give the title if it has one in this fragment, followed by the stanzas&lt;/li&gt;
&lt;li&gt;create an ordered list for each stanza, starting at the number for the stanza line within the (whole) poem, and process the stanza lines&lt;/li&gt;
&lt;li&gt;create a list item for each stanza line; if the line contains parts of two sentences and the first of these sentences doesn&amp;#8217;t begin in this line, highlight it as this indicates an interesting overlap between prosodic and syntactic structures&lt;/li&gt;
&lt;li&gt;process the page lines within each stanza line; if there&amp;#8217;s more than one, align the second to the right&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It wouldn&amp;#8217;t be easy to express that logic against the fragmented XML above, for two reasons.&lt;/p&gt;

&lt;p&gt;First, the fragmented markup above is inconsistent: you can&amp;#8217;t tell what kinds of children a particular element will have and which elements will be fragmented. You could fix this in the markup by deciding, for example, that the prosodic hierarchy of book/poem/stanza/sl would be primary and all other elements fragmented as necessary; you could further decide which of the hierarchies would be secondary within this: whether an &lt;code&gt;&amp;lt;sl&amp;gt;&lt;/code&gt; element would hold &lt;code&gt;&amp;lt;s&amp;gt;&lt;/code&gt; or &lt;code&gt;&amp;lt;page&amp;gt;&lt;/code&gt; elements as children.&lt;/p&gt;

&lt;p&gt;Second, though, the different logical steps require the markup to be structured in different ways: 1 requires a physical hierarchy where the markup is primarily divided into pages; 2 and 3 require a prosodic hierarchy within the page, dividing the poem into stanzas and stanza lines; 4 requires a syntactic hierarchy, where the stanza lines are split into sentences; 5 requires switching back to the physical hierarchy to see the page lines within the stanza line.&lt;/p&gt;

&lt;p&gt;What you can do (and what I&amp;#8217;ve done) is to write a function to help with this kind of processing by switching between the different hierarchies as and when necessary.&lt;/p&gt;

&lt;h2&gt;Labelling Elements&lt;/h2&gt;

&lt;p&gt;To prepare for switching, you must annotate the elements in the document with an indication of the trees that they belong to. The trees can be called anything you like; for the example above, I could use the labels &amp;#8220;physical&amp;#8221; (book, page, page line), &amp;#8220;syntactic&amp;#8221; (book, poem, sentence) and &amp;#8220;prosodic&amp;#8221; (book, poem, stanza, stanza line). The idea of labelling elements based on a tree that they belong to comes from the &lt;a href=&quot;http://www.research.att.com/~divesh/papers/jlssw2004-mct.pdf&quot; title=&quot;Colorful XML: One Hierarchy Isn&#039;t Enough&quot;&gt;multi-coloured trees&lt;/a&gt; technique, but I think it&amp;#8217;s more useful to use meaningful labels if you can.&lt;/p&gt;

&lt;p&gt;You could imagine a built-in extension element that allowed you to describe the trees that different elements belonged to, and the annotation happening at the level of the XPath Data Model as its created.  But to make things easier I&amp;#8217;m using a &lt;code&gt;f:trees&lt;/code&gt; attribute on each element. Adding the attribute can be done in XSLT with code like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:template match=&quot;*&quot; mode=&quot;annotate&quot;&amp;gt;
  &amp;lt;xsl:copy&amp;gt;
    &amp;lt;xsl:attribute name=&quot;f:trees&quot;&amp;gt;
      &amp;lt;xsl:apply-templates select=&quot;.&quot; mode=&quot;trees&quot; /&amp;gt;
    &amp;lt;/xsl:attribute&amp;gt;
    &amp;lt;xsl:copy-of select=&quot;@*&quot; /&amp;gt;
    &amp;lt;xsl:apply-templates mode=&quot;annotate&quot; /&amp;gt;
  &amp;lt;/xsl:copy&amp;gt;
&amp;lt;/xsl:template&amp;gt;

&amp;lt;xsl:template match=&quot;book&quot; mode=&quot;trees&quot;&amp;gt;prosodic syntactic physical&amp;lt;/xsl:template&amp;gt;
&amp;lt;xsl:template match=&quot;poem&quot; mode=&quot;trees&quot;&amp;gt;prosodic syntactic&amp;lt;/xsl:template&amp;gt;
&amp;lt;xsl:template match=&quot;title | stanza | sl&quot; mode=&quot;trees&quot;&amp;gt;prosodic&amp;lt;/xsl:template&amp;gt;
&amp;lt;xsl:template match=&quot;s&quot; mode=&quot;trees&quot;&amp;gt;syntactic&amp;lt;/xsl:template&amp;gt;
&amp;lt;xsl:template match=&quot;page | pl&quot; mode=&quot;trees&quot;&amp;gt;physical&amp;lt;/xsl:template&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Once elements are labelled with the trees they belong to, it&amp;#8217;s possible to work out dominance hierarchies. An element A is a descendant of an element B if the elements share a tree and A starts and ends within B. If A is within B but they don&amp;#8217;t appear in the same tree, then the containment is happenstance and does not imply dominance.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;em&gt;Note: Trees should be defined so that all the elements within a given tree fit within each other without fragmenting. I haven&amp;#8217;t considered how self-overlap should be handled here; the elements need to be part of the same tree, but they can still overlap and therefore be fragmented even when that particular tree is primary. In my experience, self-overlap usually occurs in situations like comments or revision markup, in which the self-overlapping markup is never primary anyway, so I&amp;#8217;m not sure how serious this issue is.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;Swapping Hierarchies&lt;/h2&gt;

&lt;p&gt;Once the elements are annotated, it&amp;#8217;s possible to swap between hierarchies. The function I&amp;#8217;ve written &amp;#8212; &lt;code&gt;f:swap()&lt;/code&gt; &amp;#8212; takes two or three arguments. The first is an element, and the &lt;code&gt;f:swap()&lt;/code&gt; function returns this same element (actually a copy) but with its children, and possibly its parents, rearranged based on the trees listed in the second argument. The third argument defaults to the element specified as the first argument and provides a starting point from which the rearrangement takes place; the two most useful values for this argument are the element itself (which means that its children are restructured) and the root of the tree (which means that the entire document is rearranged).&lt;/p&gt;

&lt;p&gt;Some examples will help make this clearer. Starting with the poem above, to get the rendering I want, I need to swap to a &amp;#8220;physical&amp;#8221; view and process the pages:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:apply-templates select=&quot;$annotated/book/f:swap(., &#039;physical&#039;)/page&quot; /&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;code&gt;f:swap()&lt;/code&gt; call here returns the &lt;code&gt;&amp;lt;book&amp;gt;&lt;/code&gt; element but with its descendants rearranged so that the physical hierarchy is primary. The new version of the &lt;code&gt;&amp;lt;book&amp;gt;&lt;/code&gt; element will have &lt;code&gt;&amp;lt;page&amp;gt;&lt;/code&gt; children, which will themselves have &lt;code&gt;&amp;lt;pl&amp;gt;&lt;/code&gt; children. The &lt;code&gt;&amp;lt;pl&amp;gt;&lt;/code&gt; elements will contain fragments of &lt;code&gt;&amp;lt;poem&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;s&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;stanza&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;sl&amp;gt;&lt;/code&gt; elements, nested purely based on their happenstance containment within a particular &lt;code&gt;&amp;lt;pl&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Here&amp;#8217;s the code for processing the &lt;code&gt;&amp;lt;page&amp;gt;&lt;/code&gt; elements:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:template match=&quot;page&quot;&amp;gt;
  &amp;lt;hr /&amp;gt;
  &amp;lt;p style=&quot;text-align: right; &quot;&amp;gt;page &amp;lt;xsl:value-of select=&quot;@n&quot; /&amp;gt;&amp;lt;/p&amp;gt;
  &amp;lt;xsl:apply-templates select=&quot;f:swap(., (&#039;prosodic&#039;, &#039;syntactic&#039;))/poem&quot; /&amp;gt;
&amp;lt;/xsl:template&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So each &lt;code&gt;&amp;lt;page&amp;gt;&lt;/code&gt; generates a horizontal rule, a paragraph containing the page number and then processes&amp;#8230; here the switch is from the physical hierarchy to a prosodic/syntactic hierarchy. The list of two items as the second argument of &lt;code&gt;f:swap()&lt;/code&gt; means that the primary hierarchy is prosodic (poems, containing stanzas, containing stanza lines), but once you reach the bottom of the prosodic hierarchy (the stanza lines) you switch to a syntactic hierarchy (sentences) rather than a physical hierarchy (page lines).&lt;/p&gt;

&lt;p&gt;The fact that the &lt;code&gt;f:swap()&lt;/code&gt; call above only has two arguments means that the rearrangement starts from the &lt;code&gt;&amp;lt;page&amp;gt;&lt;/code&gt; element that&amp;#8217;s being processed. The ancestry of the &lt;code&gt;&amp;lt;page&amp;gt;&lt;/code&gt; element itself stays the same, and only its content is rearranged according to the views specified in the second argument. So in this case the &lt;code&gt;&amp;lt;poem&amp;gt;&lt;/code&gt; elements that a given &lt;code&gt;&amp;lt;page&amp;gt;&lt;/code&gt; contains will be fragments.&lt;/p&gt;

&lt;p&gt;Processing the poems can continue in the normal way:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:template match=&quot;poem&quot;&amp;gt;
  &amp;lt;xsl:apply-templates select=&quot;title&quot; /&amp;gt;
  &amp;lt;xsl:apply-templates select=&quot;stanza&quot; /&amp;gt;
&amp;lt;/xsl:template&amp;gt;

&amp;lt;xsl:template match=&quot;title&quot;&amp;gt;
  &amp;lt;h3&amp;gt;&amp;lt;xsl:value-of select=&quot;.&quot; /&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;/xsl:template&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The next difficulty appears when I want to start the numbering for a particular stanza based on the number of the first line within the stanza. I&amp;#8217;m doing this by setting the &lt;code&gt;start&lt;/code&gt; attribute like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:template match=&quot;stanza&quot;&amp;gt;
  &amp;lt;ol&amp;gt;
    &amp;lt;xsl:attribute name=&quot;start&quot;&amp;gt;
      &amp;lt;xsl:number select=&quot;f:swap(., &#039;prosodic&#039;, /)/sl[1]&quot; 
        count=&quot;sl&quot; from=&quot;poem&quot; level=&quot;any&quot; /&amp;gt;
    &amp;lt;/xsl:attribute&amp;gt;
    &amp;lt;xsl:apply-templates select=&quot;sl&quot; /&amp;gt;
  &amp;lt;/ol&amp;gt;
&amp;lt;/xsl:template&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This illustrates the three-argument version of the &lt;code&gt;f:swap()&lt;/code&gt; function. To number the stanza line, I need to know the number of that stanza line within the poem that contains it. That would be easy to do with &lt;code&gt;&amp;lt;xsl:number&amp;gt;&lt;/code&gt; (or in other ways), but for the fact that the &lt;code&gt;&amp;lt;poem&amp;gt;&lt;/code&gt; element the &lt;code&gt;&amp;lt;stanza&amp;gt;&lt;/code&gt; element appears in is currently fragmented between two &lt;code&gt;&amp;lt;page&amp;gt;&lt;/code&gt; elements. To work out the number of the line, I really want an XML document in which the physical hierarchy is completely ignored, and the elements are arranged &lt;code&gt;book/poem/stanza/sl&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The three-argument version of &lt;code&gt;f:swap()&lt;/code&gt; allows me to swap to a prosodic hierarchy, starting from the very root of the document. It returns the element given as the first argument as it appears within the new hierarchy. Unlike the two-argument version, which only affects the descendants of the first argument, the three-argument version may also affect its ancestors, and even merge the element if it&amp;#8217;s originally fragmented or split it if it doesn&amp;#8217;t appear in the primary hierarchy. In this example, the returned &lt;code&gt;&amp;lt;stanza&amp;gt;&lt;/code&gt; element&amp;#8217;s parent &lt;code&gt;&amp;lt;poem&amp;gt;&lt;/code&gt; is a child of the &lt;code&gt;&amp;lt;book&amp;gt;&lt;/code&gt; element rather than being a fragmented child of the &lt;code&gt;&amp;lt;page&amp;gt;&lt;/code&gt; element.&lt;/p&gt;

&lt;p&gt;The rearrangement for the purposes of computing the start number for the list doesn&amp;#8217;t affect the tree that&amp;#8217;s being processed; the template for the &lt;code&gt;&amp;lt;stanza&amp;gt;&lt;/code&gt; elements goes on to process the &lt;code&gt;&amp;lt;sl&amp;gt;&lt;/code&gt; elements it contains, which use this template:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:template match=&quot;sl&quot;&amp;gt;
  &amp;lt;li&amp;gt;
    &amp;lt;xsl:if test=&quot;count(s) &amp;gt; 1 and not(f:first(s[1]))&quot;&amp;gt;
      &amp;lt;xsl:attribute name=&quot;style&quot;&amp;gt;background-color: yellow; &amp;lt;/xsl:attribute&amp;gt;
    &amp;lt;/xsl:if&amp;gt;
    &amp;lt;xsl:apply-templates select=&quot;f:swap(., &#039;physical&#039;)/pl&quot; /&amp;gt;
  &amp;lt;/li&amp;gt;
&amp;lt;/xsl:template&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Recall that the hierarchy currently being processed is a prosodic/syntactic hierarchy. The &lt;code&gt;&amp;lt;sl&amp;gt;&lt;/code&gt; elements contain &lt;code&gt;&amp;lt;s&amp;gt;&lt;/code&gt; elements, and it&amp;#8217;s therefore possible to check whether the &lt;code&gt;&amp;lt;sl&amp;gt;&lt;/code&gt; element being processed contains more than one &lt;code&gt;&amp;lt;s&amp;gt;&lt;/code&gt;. The &lt;code&gt;f:first()&lt;/code&gt; function checks whether a given fragment is the first fragment of that element, so the test in the &lt;code&gt;&amp;lt;xsl:if&amp;gt;&lt;/code&gt; in this template checks whether the &lt;code&gt;&amp;lt;sl&amp;gt;&lt;/code&gt; contains more than one &lt;code&gt;&amp;lt;s&amp;gt;&lt;/code&gt; and the first &lt;code&gt;&amp;lt;s&amp;gt;&lt;/code&gt; is not the first fragment of the sentence it represents.&lt;/p&gt;

&lt;p&gt;To get the rendering I want, I need to generate an HTML paragraph for each page line within the stanza line. Currently the &lt;code&gt;&amp;lt;sl&amp;gt;&lt;/code&gt; elements contain &lt;code&gt;&amp;lt;s&amp;gt;&lt;/code&gt; elements, so to get the page lines I need to switch once more to the physical hierarchy and process the &lt;code&gt;&amp;lt;pl&amp;gt;&lt;/code&gt; elements that are children of this &lt;code&gt;&amp;lt;sl&amp;gt;&lt;/code&gt;. That processing is done by the template:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:template match=&quot;pl&quot;&amp;gt;
  &amp;lt;p&amp;gt;
    &amp;lt;xsl:if test=&quot;preceding-sibling::pl&quot;&amp;gt;
      &amp;lt;xsl:attribute name=&quot;style&quot;&amp;gt;text-align: right; &amp;lt;/xsl:attribute&amp;gt;
    &amp;lt;/xsl:if&amp;gt;
    &amp;lt;xsl:value-of select=&quot;.&quot; /&amp;gt;
  &amp;lt;/p&amp;gt;
&amp;lt;/xsl:template&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and we&amp;#8217;re done.&lt;/p&gt;

&lt;h2&gt;Final Thoughts&lt;/h2&gt;

&lt;p&gt;The one thing that concerns me about the approach I&amp;#8217;m taking is the fact that because XSLT can&amp;#8217;t actually amend an existing tree, the &lt;code&gt;f:swap()&lt;/code&gt; function essentially makes a copy of the entire tree every time you use it, and I don&amp;#8217;t know how well that will scale (both in terms of memory and in terms of work copying elements) when you get to documents that are larger than this toy example. Maybe processors are clever enough to discard trees they no longer need so it won&amp;#8217;t be an issue; I just don&amp;#8217;t know.&lt;/p&gt;

&lt;p&gt;Other than that, I think this approach is promising because it enables users to mostly use familiar tree-processing approaches rather than having to learn new paradigms for transforming overlapping markup or introducing a raft of new axes.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/98#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/5">xslt</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/9">overlapping markup</category>
 <enclosure url="http://www.jenitennison.com/blog/files/fragmentation-utils.xsl" length="6042" type="text/xml" />
 <pubDate>Sun, 28 Dec 2008 20:15:56 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">98 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Free Our Bills</title>
 <link>http://www.jenitennison.com/blog/node/83</link>
 <description>&lt;p&gt;The &lt;a href=&quot;http://www.theyworkforyou.com/freeourbills/&quot; title=&quot;TheyWorkForYou.com: Free Our Bills&quot;&gt;Free Our Bills&lt;/a&gt; campaign was launched recently in the UK. &lt;a href=&quot;http://www.theregister.co.uk/2008/03/26/mysociety_xml_bills_cameron/comments/#c_185029&quot; title=&quot;The Register: Comments on UK.gov urged to adopt web-friendly legislation format&quot;&gt;Some of the comments I&amp;#8217;ve seen&lt;/a&gt; about the campaign makes me think that it might be helpful if people understood more about how Bills and legislation get published in the UK. I thought I&amp;#8217;d offer a bit of background based on my experience (though there are many people with more intimate knowledge of the processes involved; perhaps they&amp;#8217;ll correct me when I get it wrong).&lt;/p&gt;

&lt;!--break--&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Bills are draft legislation that is under discussion within the House of Commons or House of Lords. A Bill becomes law (legislation) when it is enacted.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bills are published by Parliament and are available on the &lt;a href=&quot;http://services.parliament.uk/bills/&quot; title=&quot;UK Parliament: Bills Before Parliament&quot;&gt;Parliament website&lt;/a&gt;. Legislation is published by &lt;a href=&quot;http://www.tso.co.uk/&quot; title=&quot;The Stationery Office&quot;&gt;The Stationery Office (TSO)&lt;/a&gt; under contract to the Office of Public Sector Information (OPSI) on the &lt;a href=&quot;http://www.opsi.gov.uk/legislation&quot; title=&quot;OPSI: Legislation&quot;&gt;OPSI website&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bills are changed (amended) as they progress through the Houses of Parliament. People are mostly interested in the most recent version of a Bill. Legislation can be changed (amended) by other legislation; the version of a piece of legislation with all the changes applied to it is known as consolidated legislation. Consolidated legislation is published in the &lt;a href=&quot;http://www.statutelaw.gov.uk&quot; title=&quot;Statute Law Database&quot;&gt;Statute Law Database&lt;/a&gt; as well as (too a more limited extent) on the &lt;a href=&quot;http://www.opsi.gov.uk/legislation/revised&quot; title=&quot;OPSI: Revised Legislation&quot;&gt;OPSI website&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bills are edited by a dedicated team of Parliament employees who must reflect the amendments that the MPs say they want to make. They use a WYSIWYG XML editor. As is usual in an environment that has only been concerned about printed copies for centuries, they tend to focus on appearance rather than semantics, even when the XML supports the semantics.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Free Our Bills campaign is not about making Bills (or legislation) easier for humans to read and understand, it&amp;#8217;s about making it easier to extract information from a Bill so that people can be notified when a new Bill comes along on a subject they care about, or an old Bill is redrafted, and so on.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bills are already available for the public to view on the web, in PDF and HTML forms. The problem is that the HTML is Really Really Bad (&lt;a href=&quot;http://www.publications.parliament.uk/pa/ld200708/ldbills/044/08044.i-v.html&quot; title=&quot;Parliament: Climate Change Bill&quot;&gt;View Source to see&lt;/a&gt;) and that makes it Really Really Hard to extract useful information from them.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There are reasons for the Bills HTML being Really Really Bad:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;The HTML must look &lt;em&gt;exactly&lt;/em&gt; like it does in printed form, otherwise Members of Parliament (MPs) would get Really Really Confused.&lt;/li&gt;
&lt;li&gt;MPs refer to pieces of a Bill (which they might want to change) by page and line number, not by the semantic structure of the Bill, so the HTML must have page and line numbers in it or MPs would get Really Really Confused. &lt;/li&gt;
&lt;li&gt;Although the formatting of Bills is pretty consistent, there&amp;#8217;s always the chance that a piece will need to be formatted specially. It might be safe to assume a particular presentation for a particular semantic 99% of the time, but if that 1% isn&amp;#8217;t formatted in the different way, MPs would be Really Really Confused.&lt;/li&gt;
&lt;li&gt;The code that creates the Bill HTML was written several years ago, when browser support for CSS was Really Really Bad.&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The picture for legislation is rather better because a strategic decision was made to focus on semantics rather than presentation. When a Bill is enacted, it gets converted into &lt;a href=&quot;http://www.opsi.gov.uk/legislation/schema/&quot; title=&quot;OPSI: Legislation schema&quot;&gt;reasonably good semantic XML&lt;/a&gt;, which forms the basis of all the HTML views. It also helps that this HTML was designed fairly recently, for modern browsers; it makes heavy use of CSS so there&amp;#8217;s relatively little obfuscation of the content.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I think there are interesting general lessons here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Different user communities have different requirements.&lt;/strong&gt; MPs have different requirements from Bills from the general public, who don&amp;#8217;t care (as) much about line or page numbers. On the other hand, you need to actually consult with users about what they need rather than make assumptions about it: are MPs really likely to get Really Really Confused if the HTML presentation of a Bill looks slightly different from the PDF print version? I don&amp;#8217;t know.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Authors don&amp;#8217;t care about what they don&amp;#8217;t use.&lt;/strong&gt; When the only way of using a Bill is to print it, it&amp;#8217;s natural that authors and publishers only care about how it looks when it&amp;#8217;s printed. Training people to care about semantic markup is really hard, and it&amp;#8217;s made harder by WYSIWYG tools that allow them to override the semantic style. If a difference isn&amp;#8217;t visible, then in author&amp;#8217;s eyes it doesn&amp;#8217;t exist.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;You have to positively decide to ignore appearance.&lt;/strong&gt; When transforming from a WYSIWYG view, replicating appearance is the obvious thing to do. But it&amp;#8217;s worthwhile in the long run to focus on extracting the semantics, because the resulting documents are so much more reusable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;HTML, XML and XSLT are not inherently good.&lt;/strong&gt; Parliament wanted Bills in HTML so that they were more accessible on the web. But the HTML is dreadfully inaccessible because of the other requirements placed on it. Similarly, XML can be incredibly obfuscated, or entirely about presentation, as formats such as OOXML illustrate. And just because your code is written in XSLT does not make it inherently easier to maintain then (say) a SAX transformation. It&amp;#8217;s easy to misuse a technology.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Developers who produce atrocious HTML aren&amp;#8217;t necessarily ignorant.&lt;/strong&gt; Unfortunately, there&amp;#8217;s sometimes a limit to how much you can argue with your customers.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/83#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/5">xslt</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/37">legislation</category>
 <pubDate>Mon, 31 Mar 2008 19:10:14 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">83 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>RDF and XML Q&amp;A: Which should I use?</title>
 <link>http://www.jenitennison.com/blog/node/74</link>
 <description>&lt;p&gt;Another question to answer:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;I’ve been reading about RDF, and I’m not sure in what situations it is more appropriate to use RDF over straight XML. I usually see RDF expressed as XML, but sometimes I see it written as language-independent functions (or methods).&lt;/p&gt;
  
  &lt;p&gt;Part of me is wondering if RDF is more appropriate for this project. What might the benefits be? And if it is, how difficult it would be to refactor it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;(Note that the person asking the question is talking about a small data-oriented project.) There&amp;#8217;s a huge amount that could be said about this, so I might well post about some of it again. Here, I&amp;#8217;m going to cut to the chase. This is what I&amp;#8217;d recommend:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model your application in RDF terms&lt;/strong&gt;: Create a description of what classes of resources your application needs to deal with, and which properties link those together. You can call this description a RDF schema or conceptual model or ontology, depending on how impressive you want to sound. This modelling activity is useful in itself, largely because it helps you understand what information you’re dealing with and how it fits together.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Create a markup language that can be mapped to RDF&lt;/strong&gt;: An XML version of your data allows you to make your data more generally available and reusable than locking it away in a triple store. Do one of the following:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;strong&gt;Define a subset of &lt;a href=&quot;http://www.w3.org/TR/rdf-syntax-grammar/&quot; title=&quot;W3C Recommendation: RDF/XML Syntax Specification&quot;&gt;RDF/XML&lt;/a&gt; for your application&lt;/strong&gt;: The full flexibility of RDF/XML is complicated to handle for plain XML processors, so subset it to, for example, always used typed elements (such as &lt;code&gt;&amp;lt;my:Course&amp;gt;&lt;/code&gt;) rather than &lt;code&gt;rdf:type&lt;/code&gt; properties, and to use referencing or nesting in a consistent way.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Design markup languages that use &lt;a href=&quot;http://www.w3.org/TR/xhtml-rdfa-primer/&quot; title=&quot;W3C Working Draft: RDFa Primer&quot;&gt;RDFa&lt;/a&gt; attributes to reflect the semantics of the data&lt;/strong&gt;: This gives you a standard way of mapping your markup language into RDF triples without having to adopt the &amp;#8220;striped&amp;#8221; design of RDF/XML in your markup language. A lot of the attributes can be defaulted to leave the markup language fairly streamlined.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Design markup languages exactly as you like, and define &lt;a href=&quot;http://www.w3.org/TR/grddl/&quot; title=&quot;W3C Recommendation: Gleaning Resource Descriptions from Dialects of Languages (GRDDL)&quot;&gt;GRDDL&lt;/a&gt; mappings from them into RDF/XML&lt;/strong&gt;: This gives you the most flexibility in your markup language design (though not complete flexibility &amp;#8212; you still need to be able to identify the statements that you want to make from the XML), at the expense of having to write some XSLT.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The point of doing this is to put you in a position where you &lt;em&gt;can&lt;/em&gt; just use XML if you want, but you also have the flexibility of using RDF either now or in the future.&lt;/p&gt;

&lt;p&gt;The benefits of using RDF are partly to do with the ease with which you can do certain kinds of processing (specifically combining &amp;#8220;facts&amp;#8221; together to draw conclusions) and partly to do with the potential of reuse of your data. In the same way that XML gives people a common &lt;em&gt;syntax&lt;/em&gt; and thus aids interchange of information, RDF allows others to draw &lt;em&gt;some&lt;/em&gt; conclusions (more than they would with a random mess of elements and attributes) about what your data means.&lt;/p&gt;

&lt;p&gt;I don&amp;#8217;t think that using RDF triple stores, &lt;a href=&quot;http://www.w3.org/TR/rdf-sparql-query/&quot; title=&quot;W3C Recommendation: SPARQL Query Language for RDF&quot;&gt;SPARQL&lt;/a&gt; and all that jazz gives you a great return for a small-scale, personal project &amp;#8212; you&amp;#8217;re better off sticking to flat files and some XSLT &amp;#8212; but it doesn&amp;#8217;t hurt to build in some of the formality of RDF anyway.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/74#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/31">rdf</category>
 <pubDate>Sun, 17 Feb 2008 20:10:12 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">74 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Posterity</title>
 <link>http://www.jenitennison.com/blog/node/59</link>
 <description>&lt;p&gt;We just had photos taken of the children, and it&amp;#8217;s put me in a reflective mood. &lt;a href=&quot;http://norman.walsh.name/2007/10/15/ajax&quot; title=&quot;Norm Walsh: A little bit of Ajax&quot;&gt;Norm posted&lt;/a&gt; the other day about his experience with information/task management products:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Then it hit me.&lt;/p&gt;
  
  &lt;p&gt;None of them, with the notable exception of Tinderbox, seem to store the data in any open format. I was seriously considering one of these commercial black boxes for an important chunk of the data that drives my day-to-day life. The little voice in my head reacted viscerally when the observation was made: “What the hell you thinking, man! Stop that!”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;!--break--&gt;

&lt;p&gt;I&amp;#8217;ve been experimenting a bit with &lt;a href=&quot;http://en.wikipedia.org/wiki/Getting_Things_Done&quot; title=&quot;Wikipedia: Getting Things Done&quot;&gt;GTD&lt;/a&gt; applications recently, and have the same reaction as Norm. The two tools that I&amp;#8217;ve tried, &lt;a href=&quot;http://www.thinkingrock.com.au/&quot; title=&quot;ThinkingRock: Free GTD software&quot;&gt;ThinkingRock&lt;/a&gt; and &lt;a href=&quot;http://freemind.sourceforge.net/wiki/index.php/Main_Page&quot; title=&quot;FreeMind: free mind-mapping software&quot;&gt;FreeMind&lt;/a&gt;, can both export to an XML format (and import too), which is great, no doubt about it, but you&amp;#8217;ve got to remember to do the exporting to take advantage of it. I know me: I just won&amp;#8217;t do it (even with my GTD tool to remind me to). What I really want is an application that &lt;em&gt;natively&lt;/em&gt; stores its data as XML, preferably in some nicely structured, standard format. So even after I&amp;#8217;ve wiped the original application off my computer, or moved the file from one computer to another, I can still read that file and (with a little XSLT magic) load it into something else.&lt;/p&gt;

&lt;p&gt;This is possibly the biggest thing that bugs me about most of the Web 2.0 applications out there. Of course I&amp;#8217;ve got to be connected to the &amp;#8216;net to use them, and I&amp;#8217;m not all the time (most particularly at tech conferences, it seems). But more important, they&amp;#8217;ve got my data tucked away in their databases, out of reach. Some of them will let me export it, or get at it through an API, but that isn&amp;#8217;t enough for me. I want it here, so that even if the company folds or I forget my login and password, or the key I used to encrypt my personal data from potentially prying eyes&amp;#8230; even &lt;em&gt;years&lt;/em&gt; later, I can still read that file. It&amp;#8217;s part of my history, but I won&amp;#8217;t remember to keep it until it&amp;#8217;s gone.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;m thinking, you see, about some of the things I did on computers years ago. A half-finished book I wrote when I was about 18. The code I wrote for my PhD. Letters from my university days. These aren&amp;#8217;t from &lt;em&gt;that&lt;/em&gt; long ago, but now here I am using radically different software, in a completely different world, and these pieces from my past are lost, irretrievable because of the formats used to save them (as well as the hardware on which they&amp;#8217;re saved: it&amp;#8217;s getting harder to read a floppy nowadays).&lt;/p&gt;

&lt;p&gt;I read &lt;a href=&quot;http://www.amazon.com/Glasshouse-Charles-Stross/dp/0441015085&quot; title=&quot;Amazon: Glasshouse by Charles Stross&quot;&gt;Glasshouse by Charles Stross&lt;/a&gt; a couple of months ago (well worth the read). It&amp;#8217;s set in the far future, and contains the following passage:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&amp;#8220;We know why the dark age happened,&amp;#8221; Fiore continues. &amp;#8220;Our ancestors allowed their storage and processing architectures to proliferate uncontrollably, and they tended to throw away old technologies instead of virtualizing them. For reasons of commercial advantage, some of their largest entities deliberately created incompatible information formats and locked up huge quantities of useful material in them, so that when new architectures replaced old, the data became inaccessible.&lt;/p&gt;
  
  &lt;p&gt;&amp;#8220;This particularly affected our records of personal and household activities during the latter half of the dark age. Early on, for example, we have a lot of &lt;em&gt;film&lt;/em&gt; data captured by amateurs and home enthusiasts. They used a thing called a cine camera, which captured images on a photochemical medium. You could actually decode it with your eyeball. But a third of the way into the dark age, they switched to using magnetic storage tape, which degrades rapidly, then to digital storage, which was even worse because for no obvious reason they encrypted everything. The same sort of things happened to their audio recordings, and to text. Ironically, we know a lot more about their culture around the beginning of the dark age, around old-style year 1950, than about the end of the dark age, around 2040.&amp;#8221;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I&amp;#8217;m looking forward to the end of the dark age. In the meantime, the photos of the children will be hardcopies in the shoebox at the bottom of the wardrobe. And I think I might try the &lt;a href=&quot;http://www.flickr.com/photos/jazzmasterson/sets/48077/&quot; title=&quot;Flickr photoset: Getting Things Done with Index Cards&quot;&gt;index card version&lt;/a&gt; of GTD.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/59#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/33">gtd</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/12">web</category>
 <pubDate>Wed, 17 Oct 2007 22:08:29 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">59 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Web 2.0 Project: Using Atom and XML with Graph Data Structures</title>
 <link>http://www.jenitennison.com/blog/node/54</link>
 <description>&lt;p&gt;&lt;a href=&quot;http://www.louisecrow.com/blog/&quot; title=&quot;Louise Crow&#039;s blog&quot;&gt;A Ruby on Rails specialist friend&lt;/a&gt; and I are building a Web 2.0 application. I would say it&amp;#8217;s &amp;#8220;social networking for the dead&amp;#8221; except that I doubt that description would be attractive to most people (my ex-Goth &lt;a href=&quot;http://en.wikipedia.org/wiki/Domestic_partnership&quot; title=&quot;Wikipedia: domestic partner/common law husband/father of my children etc. etc.&quot;&gt;defacto&lt;/a&gt; being a rare exception), and it can be for the living too. It&amp;#8217;s a bit like &lt;a href=&quot;http://www.ancestry.com/&quot; title=&quot;ancestry.com&quot;&gt;all&lt;/a&gt; &lt;a href=&quot;http://www.familypursuit.com/&quot; title=&quot;familypursuit.com&quot;&gt;those&lt;/a&gt; &lt;a href=&quot;http://www.geni.com/&quot; title=&quot;geni.com&quot;&gt;genalogy&lt;/a&gt; websites, except that our focus is on people&amp;#8217;s social relationships as well as their familial ones.&lt;/p&gt;

&lt;p&gt;(I should say that this is all very casual. We&amp;#8217;re both fitting it in around our other responsibilities, and are mainly interested in working together, learning new things, and trying out all the best practices that everyone keeps talking about. So don&amp;#8217;t think I&amp;#8217;m becoming a dotcom entrepreneur or anything. Its got a very Web 2.0 name, and I&amp;#8217;m only not telling you in case you start hitting our servers. We&amp;#8217;re nowhere near ready for visitors.) &lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;We&amp;#8217;re using the &lt;a href=&quot;http://www.ngsgenealogy.org/ngsgentech/projects/Gdm/Gdm.cfm&quot; title=&quot;GENTECH genealogical data model&quot;&gt;Gentech data model&lt;/a&gt; as the basis for the application (though I expect that we&amp;#8217;ll tweak it a bit). You don&amp;#8217;t really need to know anything about it to follow what I&amp;#8217;m talking about here. The Gentech data model is very much a relational model. They might call it a logical model, but for anyone who &lt;em&gt;isn&amp;#8217;t&lt;/em&gt; a database head, it&amp;#8217;s a physical model. That&amp;#8217;s fine; we&amp;#8217;re storing our data in a database, so a relational model for that is great.&lt;/p&gt;

&lt;p&gt;In the Rails world, the model that Rails is object-oriented rather than relational. So there&amp;#8217;s a certain amount of mapping from the relational world into the OO world, in particular eliding the tables that are created simply for normalisation purposes. Making that mapping is one thing that Rails is very good at, of course.&lt;/p&gt;

&lt;p&gt;Then we&amp;#8217;re into the worlds that I&amp;#8217;m particularly interested in. One of our goals is to use &lt;a href=&quot;http://en.wikipedia.org/wiki/Atom_(standard)&quot; title=&quot;Wikipedia: Atom&quot;&gt;Atom&lt;/a&gt; as an API, on the basis that it&amp;#8217;s a fairly generic way of packaging things (entries) and lists-of-things (feeds) with a bunch of metadata. Plus, the &lt;a href=&quot;http://www.ietf.org/internet-drafts/draft-ietf-atompub-protocol-17.txt&quot; title=&quot;Atom Publishing Protocol&quot;&gt;Atom Publication Protocol&lt;/a&gt; shows you how to do RESTful applications right.&lt;/p&gt;

&lt;p&gt;The trouble, &lt;a href=&quot;http://code.google.com/apis/gdata/overview.html&quot; title=&quot;Google Data (GData) API&quot;&gt;as others&lt;/a&gt; &lt;a href=&quot;http://www.25hoursaday.com/weblog/2007/06/09/WhyGDataAPPFailsAsAGeneralPurposeEditingProtocolForTheWeb.aspx&quot; title=&quot;Dare Obasanjo: Why GData/APP Fails as a General Purpose Editing Protocol for the Web&quot;&gt;have found&lt;/a&gt; is that Atom is designed for a flattish structure, in which you have things, and a list of things. Like blog posts and feeds of posts, or pictures and feeds of pictures. But the model that we&amp;#8217;re starting from is relational, or object-oriented, or anyway it&amp;#8217;s a &lt;strong&gt;graph&lt;/strong&gt;. And that makes things more complicated.&lt;/p&gt;

&lt;p&gt;The first steps are pretty obvious. Objects are equivalent to entries, and lists of objects equivalent to feeds. So every object has its own URL, and every significant feed has its own URL too. There&amp;#8217;s the obvious &lt;code&gt;http://www.example.com/people/DarwinC01&lt;/code&gt; for a person, and &lt;code&gt;http://www.example.com/people/&lt;/code&gt; for a feed of people, but also &lt;code&gt;http://www.example.com/people/DarwinC01/events/&lt;/code&gt; for events that are related to a particular person. An entry&amp;#8217;s content is an XML document that describes the equivalent object. It has attributes and children to represent the properties from the OO model (columns in the database tables).&lt;/p&gt;

&lt;p&gt;Atom defines a bunch of metadata that you can associate with the content in an entry. These are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;id&lt;/li&gt;
&lt;li&gt;title&lt;/li&gt;
&lt;li&gt;summary (optional, as long as there&amp;#8217;s textual or XML content)&lt;/li&gt;
&lt;li&gt;updated&lt;/li&gt;
&lt;li&gt;published (optional)&lt;/li&gt;
&lt;li&gt;category (multiple, optional)&lt;/li&gt;
&lt;li&gt;source (optional)&lt;/li&gt;
&lt;li&gt;author (multiple, optional as long as there&amp;#8217;s a source that specifies one or the entry&amp;#8217;s in a feed that specifies one)&lt;/li&gt;
&lt;li&gt;contributor (multiple, optional)&lt;/li&gt;
&lt;li&gt;link (multiple, optional as long as there&amp;#8217;s some content)&lt;/li&gt;
&lt;li&gt;rights (optional, defaults to the feed&amp;#8217;s rights)&lt;/li&gt;
&lt;li&gt;extension elements (optional)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The metadata properties need to be used to indicate who created/updated the object and when. This gets confusing because some of the information in our system is likely to be &lt;em&gt;about&lt;/em&gt; content that has authors and publishing dates and so on: the Gentech data model is strong on documenting the sources of information about people you&amp;#8217;re reasearching. Even when documenting the source of some information, the Atom metadata should still be metadata about that object in our data model.&lt;/p&gt;

&lt;p&gt;The set of Atom metadata does indicate a place where we&amp;#8217;re going to want to tweak the Gentech data model though: every object should have metadata associated with it, at the very least an updated date, to populate the Atom metadata fields. Also, we need to identify the property of each object that is used in the &lt;code&gt;&amp;lt;atom:title&amp;gt;&lt;/code&gt;, though the title can be something generic if there isn&amp;#8217;t an obvious one.&lt;/p&gt;

&lt;p&gt;Now the question that&amp;#8217;s vexing me: how should we represent relationships to other objects/entries? Let&amp;#8217;s take the example of documenting &lt;a href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; title=&quot;About Darwin: HMS Beagle Voyage&quot;&gt;Charles Darwin&amp;#8217;s voyage on HMS Beagle&lt;/a&gt;. It goes something like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;atom:entry&amp;gt;
  ...
  &amp;lt;atom:content type=&quot;application/xml&quot;&amp;gt;
    &amp;lt;passenger&amp;gt;
      &amp;lt;source href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; /&amp;gt;
      ...
    &amp;lt;/passenger&amp;gt;
  &amp;lt;/atom:content&amp;gt;
&amp;lt;/atom:entry&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;code&gt;&amp;lt;evr:passenger&amp;gt;&lt;/code&gt; element needs to reference a person and a voyage (event), to say that Darwin was a passenger on the voyage.&lt;/p&gt;

&lt;p&gt;Here are the options, I think:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Use &lt;code&gt;&amp;lt;atom:link&amp;gt;&lt;/code&gt; within the &lt;code&gt;&amp;lt;atom:entry&amp;gt;&lt;/code&gt;, with a URL in &lt;code&gt;rel&lt;/code&gt; that indicates the kind of relationship  &lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;atom:entry xml:base=&quot;http://www.example.com&quot;&amp;gt;
  ...
  &amp;lt;atom:link rel=&quot;/link-relationships/assertion-persona&quot;
             href=&quot;/persona/DarwinC01&quot; /&amp;gt;
  &amp;lt;atom:link rel=&quot;/link-relationships/assertion-event&quot;
             href=&quot;/events/BeagleVoyage&quot; /&amp;gt;
  &amp;lt;atom:content type=&quot;application/xml&quot;&amp;gt;
    &amp;lt;passenger&amp;gt;
      &amp;lt;source href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; /&amp;gt;
    &amp;lt;/passenger&amp;gt;
  &amp;lt;/atom:content&amp;gt;
&amp;lt;/atom:entry&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use extension elements within the &lt;code&gt;&amp;lt;atom:entry&amp;gt;&lt;/code&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;atom:entry xml:base=&quot;http://www.example.com&quot;&amp;gt;
  ...
  &amp;lt;persona href=&quot;/persona/DarwinC01&quot; /&amp;gt;
  &amp;lt;event href=&quot;/events/BeagleVoyage&quot; /&amp;gt;
  &amp;lt;atom:content type=&quot;application/xml&quot;&amp;gt;
    &amp;lt;passenger&amp;gt;
      &amp;lt;source href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; /&amp;gt;
    &amp;lt;/passenger&amp;gt;
  &amp;lt;/atom:content&amp;gt;
&amp;lt;/atom:entry&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use child elements in the object&amp;#8217;s XML, referencing the URLs of the related objects&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;atom:entry xml:base=&quot;http://www.example.com&quot;&amp;gt;
  ...
  &amp;lt;atom:content type=&quot;application/xml&quot;&amp;gt;
    &amp;lt;passenger&amp;gt;
      &amp;lt;source href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; /&amp;gt;
      &amp;lt;persona href=&quot;/persona/DarwinC01&quot; /&amp;gt;
      &amp;lt;event href=&quot;/events/BeagleVoyage&quot; /&amp;gt;
    &amp;lt;/passenger&amp;gt;
  &amp;lt;/atom:content&amp;gt;
&amp;lt;/atom:entry&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use child elements in the object&amp;#8217;s XML, embedding the related objects&amp;#8217; Atom entry or feed&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;atom:entry xml:base=&quot;http://www.example.com&quot;&amp;gt;
  ...
  &amp;lt;atom:content type=&quot;application/xml&quot;&amp;gt;
    &amp;lt;passenger&amp;gt;
      &amp;lt;source href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; /&amp;gt;
      &amp;lt;persona&amp;gt;
        &amp;lt;atom:entry&amp;gt;
          ...
          &amp;lt;atom:title&amp;gt;Charles Darwin&amp;lt;/atom:title&amp;gt;
          &amp;lt;atom:content&amp;gt;
            &amp;lt;persona&amp;gt;
              &amp;lt;name&amp;gt;Charles Darwin&amp;lt;/name&amp;gt;
              ...
            &amp;lt;/persona&amp;gt;
          &amp;lt;/atom:content&amp;gt;
        &amp;lt;/atom:entry&amp;gt;
      &amp;lt;/persona&amp;gt;
      &amp;lt;event&amp;gt;
        &amp;lt;atom:entry&amp;gt;
          ...
          &amp;lt;atom:title&amp;gt;Beagle Voyage&amp;lt;/atom:title&amp;gt;
          &amp;lt;atom:content&amp;gt;
            &amp;lt;event&amp;gt;
              &amp;lt;name&amp;gt;Beagle Voyage&amp;lt;/name&amp;gt;
              &amp;lt;date-range&amp;gt;
                &amp;lt;date&amp;gt;1831-12-27&amp;lt;/date&amp;gt;
                &amp;lt;date&amp;gt;1836-10-02&amp;lt;/date&amp;gt;
              &amp;lt;/date-range&amp;gt;
              ...
            &amp;lt;/event&amp;gt;
          &amp;lt;/atom:content&amp;gt;
        &amp;lt;/atom:entry&amp;gt;
      &amp;lt;/event&amp;gt;
    &amp;lt;/passenger&amp;gt;
  &amp;lt;/atom:content&amp;gt;
&amp;lt;/atom:entry&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use child elements in the object&amp;#8217;s XML, embedding the related objects&amp;#8217; XML content&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;atom:entry xml:base=&quot;http://www.example.com&quot;&amp;gt;
  ...
  &amp;lt;atom:content type=&quot;application/xml&quot;&amp;gt;
    &amp;lt;passenger&amp;gt;
      &amp;lt;source href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; /&amp;gt;
      &amp;lt;persona&amp;gt;
        &amp;lt;name&amp;gt;Charles Darwin&amp;lt;/name&amp;gt;
        ...
      &amp;lt;/persona&amp;gt;
      &amp;lt;event&amp;gt;
        &amp;lt;name&amp;gt;Beagle Voyage&amp;lt;/name&amp;gt;
        &amp;lt;date-range&amp;gt;
          &amp;lt;date&amp;gt;1831-12-27&amp;lt;/date&amp;gt;
          &amp;lt;date&amp;gt;1836-10-02&amp;lt;/date&amp;gt;
        &amp;lt;/date-range&amp;gt;
        ...
      &amp;lt;/event&amp;gt;
    &amp;lt;/passenger&amp;gt;
  &amp;lt;/atom:content&amp;gt;
&amp;lt;/atom:entry&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I don&amp;#8217;t think that there&amp;#8217;s any point in using an extension element (#2), given that using &lt;code&gt;&amp;lt;atom:link&amp;gt;&lt;/code&gt; (#1) situates the information in the same place in a more standard way.&lt;/p&gt;

&lt;p&gt;Embedding information (as in #4 and #5) is a good thing because it means fewer requests to the server in order to get some useful information. Providing access to Atom feeds (as in #1, #3 and #4) is a good thing because it means you can get metadata about who created the refenced objects, and additional information about them. So #4 is good, since it does both these things, but I don&amp;#8217;t like embedding Atom in the XML because it&amp;#8217;s a lot of extra weight in the XML (making it harder to read/process).&lt;/p&gt;

&lt;p&gt;In fact, #1, #3 and #5 aren&amp;#8217;t mutually exclusive. It&amp;#8217;s possible to add relevant &lt;code&gt;&amp;lt;atom:link&amp;gt;&lt;/code&gt;s to the metadata, reference the URLs of the other objects &lt;em&gt;and&lt;/em&gt; embed their content at the same time:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;    &amp;lt;atom:entry xml:base=&quot;http://www.example.com&quot;&amp;gt;
      ...
      &amp;lt;atom:link rel=&quot;/link-relationships/assertion-persona&quot;
                 href=&quot;/persona/DarwinC01&quot; /&amp;gt;
      &amp;lt;atom:link rel=&quot;/link-relationships/assertion-event&quot;
                 href=&quot;/events/BeagleVoyage&quot; /&amp;gt;
      &amp;lt;atom:content type=&quot;application/xml&quot;&amp;gt;
        &amp;lt;passenger&amp;gt;
          &amp;lt;source href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; /&amp;gt;
          &amp;lt;persona src=&quot;/persona/DarwinC01&quot;&amp;gt;
            &amp;lt;name&amp;gt;Charles Darwin&amp;lt;/name&amp;gt;
            ...
          &amp;lt;/persona&amp;gt;
          &amp;lt;event src=&quot;/events/BeagleVoyage&quot;&amp;gt;
            &amp;lt;name&amp;gt;Beagle Voyage&amp;lt;/name&amp;gt;
            &amp;lt;date-range&amp;gt;
              &amp;lt;date&amp;gt;1831-12-27&amp;lt;/date&amp;gt;
              &amp;lt;date&amp;gt;1836-10-02&amp;lt;/date&amp;gt;
            &amp;lt;/date-range&amp;gt;
            ...
          &amp;lt;/event&amp;gt;
        &amp;lt;/passenger&amp;gt;
      &amp;lt;/atom:content&amp;gt;
    &amp;lt;/atom:entry&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We embed the core information for easy access (#5), reference its original URI for more details (#3), and then we may as well add the &lt;code&gt;&amp;lt;atom:link&amp;gt;&lt;/code&gt;s (#1) so that run-of-the-mill Atom readers who have no knowledge about our content can do something useful. We &lt;em&gt;don&amp;#8217;t&lt;/em&gt; get the metadata embedded in the XML, but it&amp;#8217;s retrievable: a client could use the entry as a kind of &amp;#8220;low resolution&amp;#8221; information set, which they can add to by retrieving the &amp;#8220;high resolution&amp;#8221; Atom for the referenced objects, via their URLs, as necessary.&lt;/p&gt;

&lt;p&gt;The problem with using an embedding method rather than a referencing method is that the object model is a graph, not a hierarchy. So you can&amp;#8217;t &lt;em&gt;always&lt;/em&gt; embed an object&amp;#8217;s XML: sometimes you have to only use a reference (#3 without #5) to avoid getting into an endless loop of repeated information. As a publisher, sometimes you might &lt;em&gt;want&lt;/em&gt; to only use a reference, because the information is only tangential to the main subject of the original entry. I&amp;#8217;m imagining that we might serve several different Atom entries for the same object, with different amounts of detail. Maybe.&lt;/p&gt;

&lt;p&gt;As an author, creating this XML, you can&amp;#8217;t include a reference if you&amp;#8217;re constructing XML (either in code or by hand) for new objects because they won&amp;#8217;t have URLs yet. Therefore, for the purpose of &lt;em&gt;creating&lt;/em&gt; objects as defined by the Atom Publishing Protocol, you&amp;#8217;ll use embedded XML (#5) with references to existing objects if necessary. The resource returned will include the references for all the created objects. When updating, you&amp;#8217;ll want to include as little as possible aside from the updated information, I imagine (small updates being less prone to clashes than large ones). &lt;/p&gt;

&lt;p&gt;By the way, I&amp;#8217;m using &lt;code&gt;src&lt;/code&gt; attributes when the information is embedded and &lt;code&gt;href&lt;/code&gt; attributes when the information is purely referenced (or almost purely referenced; the referencing elements might still have some content equivalent to the &lt;code&gt;&amp;lt;atom:title&amp;gt;&lt;/code&gt; element, in the interests of presenting a clickable link).&lt;/p&gt;

&lt;p&gt;So that&amp;#8217;s the plan at the moment, but we&amp;#8217;re open to suggestions. Anybody?&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/54#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/18">atom</category>
 <pubDate>Sun, 02 Sep 2007 19:57:36 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">54 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>The perils of default namespaces</title>
 <link>http://www.jenitennison.com/blog/node/36</link>
 <description>&lt;p&gt;A lot of people run into problems with namespaces, and most of those arise from using default namespaces (ie not giving namespaces prefixes). The transformation technology you use can have a big effect on how confusing and irritating it gets.&lt;/p&gt;

&lt;p&gt;Default namespaces make XML documents easier to read because they allow you to just give the local name of an element rather than using prefixes all over the place. For example, using:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;house status=&quot;For Sale&quot; xmlns=&quot;http://www.example.com/ns/house&quot;&amp;gt;
  &amp;lt;askingPrice&amp;gt;...&amp;lt;/askingPrice&amp;gt;
  &amp;lt;address&amp;gt;...&amp;lt;/address&amp;gt;
  &amp;lt;layout&amp;gt;...&amp;lt;/layout&amp;gt;
&amp;lt;/house&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;!--break--&gt;

&lt;p&gt;rather than:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;h:house status=&quot;For Sale&quot; xmlns:h=&quot;http://www.example.com/ns/house&quot;&amp;gt;
  &amp;lt;h:askingPrice&amp;gt;...&amp;lt;/h:askingPrice&amp;gt;
  &amp;lt;h:address&amp;gt;...&amp;lt;/h:address&amp;gt;
  &amp;lt;h:layout&amp;gt;...&amp;lt;/h:layout&amp;gt;
&amp;lt;/h:house&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In some cases, specifically documents that are validated against a DTD or interpreted by non-namespace-aware applications, you might be forced to use the default namespace. The biggest example of this is (X)HTML.&lt;/p&gt;

&lt;p&gt;In transformation technologies, such as &lt;a href=&quot;http://www.w3.org/Style/XSL/&quot;&gt;XSLT&lt;/a&gt;, &lt;a href=&quot;http://www.w3.org/XML/Query/&quot;&gt;XQuery&lt;/a&gt; and &lt;a href=&quot;http://www.xlinq.net/&quot;&gt;XLinq in VB.NET&lt;/a&gt;, you have to deal with at least two documents: the source documents that you are processing and the result documents that you are creating. Often, the source and result documents will use default namespaces, or at any rate you&amp;#8217;ll want to query and create the documents without using prefixes. Sometimes, the source and result documents all use the same namespace, but it&amp;#8217;s far more common that they don&amp;#8217;t.&lt;/p&gt;

&lt;p&gt;So transformation technologies have to support at least &lt;em&gt;two&lt;/em&gt; default namespaces: one for querying and one for construction.&lt;/p&gt;

&lt;p&gt;In XPath 1.0, you must specify a prefix for each namespace you want to use. A path like &lt;code&gt;/house/layout&lt;/code&gt; will only select &lt;code&gt;&amp;lt;layout&amp;gt;&lt;/code&gt; elements in no namespace. In XSLT 1.0, the default namespace in the stylesheet (as declared by the &lt;code&gt;xmlns&lt;/code&gt; attribute on &lt;code&gt;&amp;lt;xsl:stylesheet&amp;gt;&lt;/code&gt;) is then free to be used for the result documents. For example, I might do&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:stylesheet version=&quot;1.0&quot;
  xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;
  xmlns:h=&quot;http://www.example.com/ns/house&quot;
  exclude-result-prefixes=&quot;h&quot;
  xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&amp;gt;

&amp;lt;xsl:template match=&quot;h:house&quot;&amp;gt;
  &amp;lt;div class=&quot;house&quot;&amp;gt;
    &amp;lt;h1&amp;gt;&amp;lt;xsl:apply-templates select=&quot;h:askingPrice&quot; /&amp;gt;&amp;lt;/h1&amp;gt;
    ...
  &amp;lt;/div&amp;gt;
&amp;lt;/xsl:template&amp;gt;

&amp;lt;/xsl:stylesheet&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;[The best way to deal with multiple result documents in different default namespaces is to simply have different stylesheet documents to handle their generation, all included or imported into your main stylesheet application.]&lt;/p&gt;

&lt;p&gt;Users of XSLT 1.0 found it confusing that they couldn&amp;#8217;t just copy the namespace declarations (including a default namespace declaration) from a sample source document and have the paths just work. So in XPath 2.0, rather than no prefix meaning no namespace, the &lt;strong&gt;default element/type namespace&lt;/strong&gt; in the context is used for element names with no prefix. If the default element/type namespace is set to &lt;code&gt;http://www.example.com/ns/house&lt;/code&gt; then the path &lt;code&gt;/house/layout&lt;/code&gt; will select all &lt;code&gt;&amp;lt;layout&amp;gt;&lt;/code&gt; elements in the &lt;code&gt;http://www.example.com/ns/house&lt;/code&gt; namespace. You can set this default element/type namespace in XSLT 2.0 using the &lt;code&gt;[xsl:]xpath-default-namespace&lt;/code&gt; attribute, which can go anywhere but will usually be situated on the &lt;code&gt;&amp;lt;xsl:stylesheet&amp;gt;&lt;/code&gt; element (in which case it appears without the &lt;code&gt;xsl:&lt;/code&gt; prefix). The default element/type namespace can be scoped to a particular area of your stylesheet in the same way as namespace declarations.&lt;/p&gt;

&lt;p&gt;Otherwise, XSLT 2.0 works like XSLT 1.0 in that the default namespace in the stylesheet supplies the default namespace for created elements, so you can do:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:stylesheet version=&quot;2.0&quot;
  xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;
  xmlns=&quot;http://www.w3.org/1999/xhtml&quot;
  xpath-default-namespace=&quot;http://www.example.com/ns/house&quot;&amp;gt;

&amp;lt;xsl:template match=&quot;house&quot;&amp;gt;
  &amp;lt;div class=&quot;house&quot;&amp;gt;
    &amp;lt;h1&amp;gt;&amp;lt;xsl:apply-templates select=&quot;askingPrice&quot; /&amp;gt;&amp;lt;/h1&amp;gt;
    ...
  &amp;lt;/div&amp;gt;
&amp;lt;/xsl:template&amp;gt;

&amp;lt;/xsl:stylesheet&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;By keeping the default query namespace and the default construction namespace separate, you&amp;#8217;re able to use unprefixed names in both paths and element constructors, even if the default namespaces in the two cases are different.&lt;/p&gt;

&lt;p&gt;XQuery and VB.NET, on the other hand, provide a single default namespace that is used for both queries and construction, and they work in slightly different ways.&lt;/p&gt;

&lt;p&gt;In XQuery you can declare the default namespace for the query, with&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;declare default element namespace &quot;http://www.example.com/ns/house&quot;;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;which means that you can query the source document with paths like &lt;code&gt;/house/askingPrice&lt;/code&gt; and create elements in the &lt;code&gt;http://www.example.com/ns/house&lt;/code&gt; namespace with direct element constructors without prefixes, like&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;house status=&quot;For Sale&quot;&amp;gt;
  &amp;lt;askingPrice&amp;gt;...&amp;lt;/askingPrice&amp;gt;
  &amp;lt;address&amp;gt;...&amp;lt;/address&amp;gt;
  &amp;lt;layout&amp;gt;...&amp;lt;/layout&amp;gt;
&amp;lt;/house&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you then want to generate XHTML (or some other result for which you&amp;#8217;d prefer to use the default namespace), you can use a default namespace declaration on the XHTML you generate:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;div class=&quot;house&quot; xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&amp;gt;
  &amp;lt;h1&amp;gt;...&amp;lt;/h1&amp;gt;
  ...
&amp;lt;/div&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;But the default namespace declaration in the element constructor carries through into the embedded expressions, so&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;  &amp;lt;div class=&quot;house&quot; xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&amp;gt;
    &amp;lt;h1&amp;gt;{ /house/askingPrice }&amp;lt;/h1&amp;gt;
    ...
  &amp;lt;/div&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;won&amp;#8217;t work. As a result, you end up having to use the query&amp;#8217;s default namespace declaration to set the default namespace to XHTML (or whatever the default namespace is in the result), and use prefixes in your queries (essentially the same situation as in XSLT 1.0).&lt;/p&gt;

&lt;p&gt;In XLinq in VB.NET, there&amp;#8217;s the same kind of pattern. The &lt;code&gt;Imports&lt;/code&gt; statement allows you to declare a default namespace that&amp;#8217;s used in both queries and construction, as in:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Imports &amp;lt;xmlns=&quot;http://www.example.com/ns/house&quot;&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and you can use a default namespace declaration on the XHTML you generate to provide the default namespace for the elements in the XML literal:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;houseDiv =
  &amp;lt;div class=&quot;house&quot; xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&amp;gt;
    &amp;lt;h1&amp;gt;...&amp;lt;/h1&amp;gt;
    ...
  &amp;lt;/div&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;But unlike in XQuery, the default XHTML namespace declaration in the &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt; &lt;em&gt;doesn&amp;#8217;t&lt;/em&gt; have an effect on the default namespace used in embedded expressions, which means you can still use unprefixed element names in any paths used within the XML literal, like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;houseDiv =
  &amp;lt;div class=&quot;house&quot; xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&amp;gt;
    &amp;lt;h1&amp;gt;&amp;lt;%= doc.&amp;lt;house&amp;gt;.&amp;lt;askingPrice&amp;gt; %&amp;gt;&amp;lt;/h1&amp;gt;
    ...
  &amp;lt;/div&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;However, if you build up your XHTML gradually, perhaps by using separate variables or methods, then every time you create a snippet of XHTML you have to specify this default namespace. Again, people will end up using the &lt;code&gt;Imports&lt;/code&gt; statement to set the default namespace to the default result namespace and using prefixes in their paths.&lt;/p&gt;

&lt;p&gt;The other factor to consider is that sometimes no prefix really does mean no namespace. If you&amp;#8217;re querying an XML document that contains elements in no namespace, you have to set the default query namespace to no namespace. In XSLT 1.0, that&amp;#8217;s always the case anyway; in XSLT 2.0, the &lt;code&gt;xpath-default-namespace&lt;/code&gt; shouldn&amp;#8217;t be set (or should be unset for those places that need to query no-namespace elements). In XQuery you can&amp;#8217;t use the query default namespace declaration and in XLinq in VB.NET you can&amp;#8217;t use the &lt;code&gt;Imports&lt;/code&gt; statement. In both these cases, you better hope your result is in no namespace too. If not, the best route (to make it work at all in XQuery, and to avoid repetitive &lt;code&gt;xmlns&lt;/code&gt; attributes in VB.NET) is to create a no-namespace version of your result first, and have a standard function or method that will add the right default namespace to that result.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/36#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/5">xslt</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/15">xlinq</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/29">xquery</category>
 <pubDate>Sun, 01 Jul 2007 19:32:44 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">36 at http://www.jenitennison.com/blog</guid>
</item>
</channel>
</rss>

