<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.jenitennison.com/blog" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>xml</title>
 <link>http://www.jenitennison.com/blog/taxonomy/term/14</link>
 <description>The taxonomy view with a depth of 0.</description>
 <language>en</language>
<item>
 <title>Free Our Bills</title>
 <link>http://www.jenitennison.com/blog/node/83</link>
 <description>&lt;p&gt;The &lt;a href=&quot;http://www.theyworkforyou.com/freeourbills/&quot; title=&quot;TheyWorkForYou.com: Free Our Bills&quot;&gt;Free Our Bills&lt;/a&gt; campaign was launched recently in the UK. &lt;a href=&quot;http://www.theregister.co.uk/2008/03/26/mysociety_xml_bills_cameron/comments/#c_185029&quot; title=&quot;The Register: Comments on UK.gov urged to adopt web-friendly legislation format&quot;&gt;Some of the comments I&amp;#8217;ve seen&lt;/a&gt; about the campaign makes me think that it might be helpful if people understood more about how Bills and legislation get published in the UK. I thought I&amp;#8217;d offer a bit of background based on my experience (though there are many people with more intimate knowledge of the processes involved; perhaps they&amp;#8217;ll correct me when I get it wrong).&lt;/p&gt;

&lt;!--break--&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Bills are draft legislation that is under discussion within the House of Commons or House of Lords. A Bill becomes law (legislation) when it is enacted.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bills are published by Parliament and are available on the &lt;a href=&quot;http://services.parliament.uk/bills/&quot; title=&quot;UK Parliament: Bills Before Parliament&quot;&gt;Parliament website&lt;/a&gt;. Legislation is published by &lt;a href=&quot;http://www.tso.co.uk/&quot; title=&quot;The Stationery Office&quot;&gt;The Stationery Office (TSO)&lt;/a&gt; under contract to the Office of Public Sector Information (OPSI) on the &lt;a href=&quot;http://www.opsi.gov.uk/legislation&quot; title=&quot;OPSI: Legislation&quot;&gt;OPSI website&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bills are changed (amended) as they progress through the Houses of Parliament. People are mostly interested in the most recent version of a Bill. Legislation can be changed (amended) by other legislation; the version of a piece of legislation with all the changes applied to it is known as consolidated legislation. Consolidated legislation is published in the &lt;a href=&quot;http://www.statutelaw.gov.uk&quot; title=&quot;Statute Law Database&quot;&gt;Statute Law Database&lt;/a&gt; as well as (too a more limited extent) on the &lt;a href=&quot;http://www.opsi.gov.uk/legislation/revised&quot; title=&quot;OPSI: Revised Legislation&quot;&gt;OPSI website&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bills are edited by a dedicated team of Parliament employees who must reflect the amendments that the MPs say they want to make. They use a WYSIWYG XML editor. As is usual in an environment that has only been concerned about printed copies for centuries, they tend to focus on appearance rather than semantics, even when the XML supports the semantics.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Free Our Bills campaign is not about making Bills (or legislation) easier for humans to read and understand, it&amp;#8217;s about making it easier to extract information from a Bill so that people can be notified when a new Bill comes along on a subject they care about, or an old Bill is redrafted, and so on.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bills are already available for the public to view on the web, in PDF and HTML forms. The problem is that the HTML is Really Really Bad (&lt;a href=&quot;http://www.publications.parliament.uk/pa/ld200708/ldbills/044/08044.i-v.html&quot; title=&quot;Parliament: Climate Change Bill&quot;&gt;View Source to see&lt;/a&gt;) and that makes it Really Really Hard to extract useful information from them.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There are reasons for the Bills HTML being Really Really Bad:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;The HTML must look &lt;em&gt;exactly&lt;/em&gt; like it does in printed form, otherwise Members of Parliament (MPs) would get Really Really Confused.&lt;/li&gt;
&lt;li&gt;MPs refer to pieces of a Bill (which they might want to change) by page and line number, not by the semantic structure of the Bill, so the HTML must have page and line numbers in it or MPs would get Really Really Confused. &lt;/li&gt;
&lt;li&gt;Although the formatting of Bills is pretty consistent, there&amp;#8217;s always the chance that a piece will need to be formatted specially. It might be safe to assume a particular presentation for a particular semantic 99% of the time, but if that 1% isn&amp;#8217;t formatted in the different way, MPs would be Really Really Confused.&lt;/li&gt;
&lt;li&gt;The code that creates the Bill HTML was written several years ago, when browser support for CSS was Really Really Bad.&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The picture for legislation is rather better because a strategic decision was made to focus on semantics rather than presentation. When a Bill is enacted, it gets converted into &lt;a href=&quot;http://www.opsi.gov.uk/legislation/schema/&quot; title=&quot;OPSI: Legislation schema&quot;&gt;reasonably good semantic XML&lt;/a&gt;, which forms the basis of all the HTML views. It also helps that this HTML was designed fairly recently, for modern browsers; it makes heavy use of CSS so there&amp;#8217;s relatively little obfuscation of the content.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I think there are interesting general lessons here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Different user communities have different requirements.&lt;/strong&gt; MPs have different requirements from Bills from the general public, who don&amp;#8217;t care (as) much about line or page numbers. On the other hand, you need to actually consult with users about what they need rather than make assumptions about it: are MPs really likely to get Really Really Confused if the HTML presentation of a Bill looks slightly different from the PDF print version? I don&amp;#8217;t know.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Authors don&amp;#8217;t care about what they don&amp;#8217;t use.&lt;/strong&gt; When the only way of using a Bill is to print it, it&amp;#8217;s natural that authors and publishers only care about how it looks when it&amp;#8217;s printed. Training people to care about semantic markup is really hard, and it&amp;#8217;s made harder by WYSIWYG tools that allow them to override the semantic style. If a difference isn&amp;#8217;t visible, then in author&amp;#8217;s eyes it doesn&amp;#8217;t exist.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;You have to positively decide to ignore appearance.&lt;/strong&gt; When transforming from a WYSIWYG view, replicating appearance is the obvious thing to do. But it&amp;#8217;s worthwhile in the long run to focus on extracting the semantics, because the resulting documents are so much more reusable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;HTML, XML and XSLT are not inherently good.&lt;/strong&gt; Parliament wanted Bills in HTML so that they were more accessible on the web. But the HTML is dreadfully inaccessible because of the other requirements placed on it. Similarly, XML can be incredibly obfuscated, or entirely about presentation, as formats such as OOXML illustrate. And just because your code is written in XSLT does not make it inherently easier to maintain then (say) a SAX transformation. It&amp;#8217;s easy to misuse a technology.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Developers who produce atrocious HTML aren&amp;#8217;t necessarily ignorant.&lt;/strong&gt; Unfortunately, there&amp;#8217;s sometimes a limit to how much you can argue with your customers.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/83#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/5">xslt</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/37">legislation</category>
 <pubDate>Mon, 31 Mar 2008 20:10:14 +0100</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">83 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>RDF and XML Q&amp;A: Which should I use?</title>
 <link>http://www.jenitennison.com/blog/node/74</link>
 <description>&lt;p&gt;Another question to answer:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;I’ve been reading about RDF, and I’m not sure in what situations it is more appropriate to use RDF over straight XML. I usually see RDF expressed as XML, but sometimes I see it written as language-independent functions (or methods).&lt;/p&gt;
  
  &lt;p&gt;Part of me is wondering if RDF is more appropriate for this project. What might the benefits be? And if it is, how difficult it would be to refactor it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;(Note that the person asking the question is talking about a small data-oriented project.) There&amp;#8217;s a huge amount that could be said about this, so I might well post about some of it again. Here, I&amp;#8217;m going to cut to the chase. This is what I&amp;#8217;d recommend:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model your application in RDF terms&lt;/strong&gt;: Create a description of what classes of resources your application needs to deal with, and which properties link those together. You can call this description a RDF schema or conceptual model or ontology, depending on how impressive you want to sound. This modelling activity is useful in itself, largely because it helps you understand what information you’re dealing with and how it fits together.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Create a markup language that can be mapped to RDF&lt;/strong&gt;: An XML version of your data allows you to make your data more generally available and reusable than locking it away in a triple store. Do one of the following:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;strong&gt;Define a subset of &lt;a href=&quot;http://www.w3.org/TR/rdf-syntax-grammar/&quot; title=&quot;W3C Recommendation: RDF/XML Syntax Specification&quot;&gt;RDF/XML&lt;/a&gt; for your application&lt;/strong&gt;: The full flexibility of RDF/XML is complicated to handle for plain XML processors, so subset it to, for example, always used typed elements (such as &lt;code&gt;&amp;lt;my:Course&amp;gt;&lt;/code&gt;) rather than &lt;code&gt;rdf:type&lt;/code&gt; properties, and to use referencing or nesting in a consistent way.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Design markup languages that use &lt;a href=&quot;http://www.w3.org/TR/xhtml-rdfa-primer/&quot; title=&quot;W3C Working Draft: RDFa Primer&quot;&gt;RDFa&lt;/a&gt; attributes to reflect the semantics of the data&lt;/strong&gt;: This gives you a standard way of mapping your markup language into RDF triples without having to adopt the &amp;#8220;striped&amp;#8221; design of RDF/XML in your markup language. A lot of the attributes can be defaulted to leave the markup language fairly streamlined.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Design markup languages exactly as you like, and define &lt;a href=&quot;http://www.w3.org/TR/grddl/&quot; title=&quot;W3C Recommendation: Gleaning Resource Descriptions from Dialects of Languages (GRDDL)&quot;&gt;GRDDL&lt;/a&gt; mappings from them into RDF/XML&lt;/strong&gt;: This gives you the most flexibility in your markup language design (though not complete flexibility &amp;#8212; you still need to be able to identify the statements that you want to make from the XML), at the expense of having to write some XSLT.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The point of doing this is to put you in a position where you &lt;em&gt;can&lt;/em&gt; just use XML if you want, but you also have the flexibility of using RDF either now or in the future.&lt;/p&gt;

&lt;p&gt;The benefits of using RDF are partly to do with the ease with which you can do certain kinds of processing (specifically combining &amp;#8220;facts&amp;#8221; together to draw conclusions) and partly to do with the potential of reuse of your data. In the same way that XML gives people a common &lt;em&gt;syntax&lt;/em&gt; and thus aids interchange of information, RDF allows others to draw &lt;em&gt;some&lt;/em&gt; conclusions (more than they would with a random mess of elements and attributes) about what your data means.&lt;/p&gt;

&lt;p&gt;I don&amp;#8217;t think that using RDF triple stores, &lt;a href=&quot;http://www.w3.org/TR/rdf-sparql-query/&quot; title=&quot;W3C Recommendation: SPARQL Query Language for RDF&quot;&gt;SPARQL&lt;/a&gt; and all that jazz gives you a great return for a small-scale, personal project &amp;#8212; you&amp;#8217;re better off sticking to flat files and some XSLT &amp;#8212; but it doesn&amp;#8217;t hurt to build in some of the formality of RDF anyway.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/74#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/31">rdf</category>
 <pubDate>Sun, 17 Feb 2008 20:10:12 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">74 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Posterity</title>
 <link>http://www.jenitennison.com/blog/node/59</link>
 <description>&lt;p&gt;We just had photos taken of the children, and it&amp;#8217;s put me in a reflective mood. &lt;a href=&quot;http://norman.walsh.name/2007/10/15/ajax&quot; title=&quot;Norm Walsh: A little bit of Ajax&quot;&gt;Norm posted&lt;/a&gt; the other day about his experience with information/task management products:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Then it hit me.&lt;/p&gt;
  
  &lt;p&gt;None of them, with the notable exception of Tinderbox, seem to store the data in any open format. I was seriously considering one of these commercial black boxes for an important chunk of the data that drives my day-to-day life. The little voice in my head reacted viscerally when the observation was made: “What the hell you thinking, man! Stop that!”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;!--break--&gt;

&lt;p&gt;I&amp;#8217;ve been experimenting a bit with &lt;a href=&quot;http://en.wikipedia.org/wiki/Getting_Things_Done&quot; title=&quot;Wikipedia: Getting Things Done&quot;&gt;GTD&lt;/a&gt; applications recently, and have the same reaction as Norm. The two tools that I&amp;#8217;ve tried, &lt;a href=&quot;http://www.thinkingrock.com.au/&quot; title=&quot;ThinkingRock: Free GTD software&quot;&gt;ThinkingRock&lt;/a&gt; and &lt;a href=&quot;http://freemind.sourceforge.net/wiki/index.php/Main_Page&quot; title=&quot;FreeMind: free mind-mapping software&quot;&gt;FreeMind&lt;/a&gt;, can both export to an XML format (and import too), which is great, no doubt about it, but you&amp;#8217;ve got to remember to do the exporting to take advantage of it. I know me: I just won&amp;#8217;t do it (even with my GTD tool to remind me to). What I really want is an application that &lt;em&gt;natively&lt;/em&gt; stores its data as XML, preferably in some nicely structured, standard format. So even after I&amp;#8217;ve wiped the original application off my computer, or moved the file from one computer to another, I can still read that file and (with a little XSLT magic) load it into something else.&lt;/p&gt;

&lt;p&gt;This is possibly the biggest thing that bugs me about most of the Web 2.0 applications out there. Of course I&amp;#8217;ve got to be connected to the &amp;#8216;net to use them, and I&amp;#8217;m not all the time (most particularly at tech conferences, it seems). But more important, they&amp;#8217;ve got my data tucked away in their databases, out of reach. Some of them will let me export it, or get at it through an API, but that isn&amp;#8217;t enough for me. I want it here, so that even if the company folds or I forget my login and password, or the key I used to encrypt my personal data from potentially prying eyes&amp;#8230; even &lt;em&gt;years&lt;/em&gt; later, I can still read that file. It&amp;#8217;s part of my history, but I won&amp;#8217;t remember to keep it until it&amp;#8217;s gone.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;m thinking, you see, about some of the things I did on computers years ago. A half-finished book I wrote when I was about 18. The code I wrote for my PhD. Letters from my university days. These aren&amp;#8217;t from &lt;em&gt;that&lt;/em&gt; long ago, but now here I am using radically different software, in a completely different world, and these pieces from my past are lost, irretrievable because of the formats used to save them (as well as the hardware on which they&amp;#8217;re saved: it&amp;#8217;s getting harder to read a floppy nowadays).&lt;/p&gt;

&lt;p&gt;I read &lt;a href=&quot;http://www.amazon.com/Glasshouse-Charles-Stross/dp/0441015085&quot; title=&quot;Amazon: Glasshouse by Charles Stross&quot;&gt;Glasshouse by Charles Stross&lt;/a&gt; a couple of months ago (well worth the read). It&amp;#8217;s set in the far future, and contains the following passage:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&amp;#8220;We know why the dark age happened,&amp;#8221; Fiore continues. &amp;#8220;Our ancestors allowed their storage and processing architectures to proliferate uncontrollably, and they tended to throw away old technologies instead of virtualizing them. For reasons of commercial advantage, some of their largest entities deliberately created incompatible information formats and locked up huge quantities of useful material in them, so that when new architectures replaced old, the data became inaccessible.&lt;/p&gt;
  
  &lt;p&gt;&amp;#8220;This particularly affected our records of personal and household activities during the latter half of the dark age. Early on, for example, we have a lot of &lt;em&gt;film&lt;/em&gt; data captured by amateurs and home enthusiasts. They used a thing called a cine camera, which captured images on a photochemical medium. You could actually decode it with your eyeball. But a third of the way into the dark age, they switched to using magnetic storage tape, which degrades rapidly, then to digital storage, which was even worse because for no obvious reason they encrypted everything. The same sort of things happened to their audio recordings, and to text. Ironically, we know a lot more about their culture around the beginning of the dark age, around old-style year 1950, than about the end of the dark age, around 2040.&amp;#8221;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I&amp;#8217;m looking forward to the end of the dark age. In the meantime, the photos of the children will be hardcopies in the shoebox at the bottom of the wardrobe. And I think I might try the &lt;a href=&quot;http://www.flickr.com/photos/jazzmasterson/sets/48077/&quot; title=&quot;Flickr photoset: Getting Things Done with Index Cards&quot;&gt;index card version&lt;/a&gt; of GTD.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/59#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/33">gtd</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/12">web</category>
 <pubDate>Wed, 17 Oct 2007 23:08:29 +0100</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">59 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Web 2.0 Project: Using Atom and XML with Graph Data Structures</title>
 <link>http://www.jenitennison.com/blog/node/54</link>
 <description>&lt;p&gt;&lt;a href=&quot;http://www.louisecrow.com/blog/&quot; title=&quot;Louise Crow&#039;s blog&quot;&gt;A Ruby on Rails specialist friend&lt;/a&gt; and I are building a Web 2.0 application. I would say it&amp;#8217;s &amp;#8220;social networking for the dead&amp;#8221; except that I doubt that description would be attractive to most people (my ex-Goth &lt;a href=&quot;http://en.wikipedia.org/wiki/Domestic_partnership&quot; title=&quot;Wikipedia: domestic partner/common law husband/father of my children etc. etc.&quot;&gt;defacto&lt;/a&gt; being a rare exception), and it can be for the living too. It&amp;#8217;s a bit like &lt;a href=&quot;http://www.ancestry.com/&quot; title=&quot;ancestry.com&quot;&gt;all&lt;/a&gt; &lt;a href=&quot;http://www.familypursuit.com/&quot; title=&quot;familypursuit.com&quot;&gt;those&lt;/a&gt; &lt;a href=&quot;http://www.geni.com/&quot; title=&quot;geni.com&quot;&gt;genalogy&lt;/a&gt; websites, except that our focus is on people&amp;#8217;s social relationships as well as their familial ones.&lt;/p&gt;

&lt;p&gt;(I should say that this is all very casual. We&amp;#8217;re both fitting it in around our other responsibilities, and are mainly interested in working together, learning new things, and trying out all the best practices that everyone keeps talking about. So don&amp;#8217;t think I&amp;#8217;m becoming a dotcom entrepreneur or anything. Its got a very Web 2.0 name, and I&amp;#8217;m only not telling you in case you start hitting our servers. We&amp;#8217;re nowhere near ready for visitors.) &lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;We&amp;#8217;re using the &lt;a href=&quot;http://www.ngsgenealogy.org/ngsgentech/projects/Gdm/Gdm.cfm&quot; title=&quot;GENTECH genealogical data model&quot;&gt;Gentech data model&lt;/a&gt; as the basis for the application (though I expect that we&amp;#8217;ll tweak it a bit). You don&amp;#8217;t really need to know anything about it to follow what I&amp;#8217;m talking about here. The Gentech data model is very much a relational model. They might call it a logical model, but for anyone who &lt;em&gt;isn&amp;#8217;t&lt;/em&gt; a database head, it&amp;#8217;s a physical model. That&amp;#8217;s fine; we&amp;#8217;re storing our data in a database, so a relational model for that is great.&lt;/p&gt;

&lt;p&gt;In the Rails world, the model that Rails is object-oriented rather than relational. So there&amp;#8217;s a certain amount of mapping from the relational world into the OO world, in particular eliding the tables that are created simply for normalisation purposes. Making that mapping is one thing that Rails is very good at, of course.&lt;/p&gt;

&lt;p&gt;Then we&amp;#8217;re into the worlds that I&amp;#8217;m particularly interested in. One of our goals is to use &lt;a href=&quot;http://en.wikipedia.org/wiki/Atom_(standard)&quot; title=&quot;Wikipedia: Atom&quot;&gt;Atom&lt;/a&gt; as an API, on the basis that it&amp;#8217;s a fairly generic way of packaging things (entries) and lists-of-things (feeds) with a bunch of metadata. Plus, the &lt;a href=&quot;http://www.ietf.org/internet-drafts/draft-ietf-atompub-protocol-17.txt&quot; title=&quot;Atom Publishing Protocol&quot;&gt;Atom Publication Protocol&lt;/a&gt; shows you how to do RESTful applications right.&lt;/p&gt;

&lt;p&gt;The trouble, &lt;a href=&quot;http://code.google.com/apis/gdata/overview.html&quot; title=&quot;Google Data (GData) API&quot;&gt;as others&lt;/a&gt; &lt;a href=&quot;http://www.25hoursaday.com/weblog/2007/06/09/WhyGDataAPPFailsAsAGeneralPurposeEditingProtocolForTheWeb.aspx&quot; title=&quot;Dare Obasanjo: Why GData/APP Fails as a General Purpose Editing Protocol for the Web&quot;&gt;have found&lt;/a&gt; is that Atom is designed for a flattish structure, in which you have things, and a list of things. Like blog posts and feeds of posts, or pictures and feeds of pictures. But the model that we&amp;#8217;re starting from is relational, or object-oriented, or anyway it&amp;#8217;s a &lt;strong&gt;graph&lt;/strong&gt;. And that makes things more complicated.&lt;/p&gt;

&lt;p&gt;The first steps are pretty obvious. Objects are equivalent to entries, and lists of objects equivalent to feeds. So every object has its own URL, and every significant feed has its own URL too. There&amp;#8217;s the obvious &lt;code&gt;http://www.example.com/people/DarwinC01&lt;/code&gt; for a person, and &lt;code&gt;http://www.example.com/people/&lt;/code&gt; for a feed of people, but also &lt;code&gt;http://www.example.com/people/DarwinC01/events/&lt;/code&gt; for events that are related to a particular person. An entry&amp;#8217;s content is an XML document that describes the equivalent object. It has attributes and children to represent the properties from the OO model (columns in the database tables).&lt;/p&gt;

&lt;p&gt;Atom defines a bunch of metadata that you can associate with the content in an entry. These are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;id&lt;/li&gt;
&lt;li&gt;title&lt;/li&gt;
&lt;li&gt;summary (optional, as long as there&amp;#8217;s textual or XML content)&lt;/li&gt;
&lt;li&gt;updated&lt;/li&gt;
&lt;li&gt;published (optional)&lt;/li&gt;
&lt;li&gt;category (multiple, optional)&lt;/li&gt;
&lt;li&gt;source (optional)&lt;/li&gt;
&lt;li&gt;author (multiple, optional as long as there&amp;#8217;s a source that specifies one or the entry&amp;#8217;s in a feed that specifies one)&lt;/li&gt;
&lt;li&gt;contributor (multiple, optional)&lt;/li&gt;
&lt;li&gt;link (multiple, optional as long as there&amp;#8217;s some content)&lt;/li&gt;
&lt;li&gt;rights (optional, defaults to the feed&amp;#8217;s rights)&lt;/li&gt;
&lt;li&gt;extension elements (optional)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The metadata properties need to be used to indicate who created/updated the object and when. This gets confusing because some of the information in our system is likely to be &lt;em&gt;about&lt;/em&gt; content that has authors and publishing dates and so on: the Gentech data model is strong on documenting the sources of information about people you&amp;#8217;re reasearching. Even when documenting the source of some information, the Atom metadata should still be metadata about that object in our data model.&lt;/p&gt;

&lt;p&gt;The set of Atom metadata does indicate a place where we&amp;#8217;re going to want to tweak the Gentech data model though: every object should have metadata associated with it, at the very least an updated date, to populate the Atom metadata fields. Also, we need to identify the property of each object that is used in the &lt;code&gt;&amp;lt;atom:title&amp;gt;&lt;/code&gt;, though the title can be something generic if there isn&amp;#8217;t an obvious one.&lt;/p&gt;

&lt;p&gt;Now the question that&amp;#8217;s vexing me: how should we represent relationships to other objects/entries? Let&amp;#8217;s take the example of documenting &lt;a href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; title=&quot;About Darwin: HMS Beagle Voyage&quot;&gt;Charles Darwin&amp;#8217;s voyage on HMS Beagle&lt;/a&gt;. It goes something like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;atom:entry&amp;gt;
  ...
  &amp;lt;atom:content type=&quot;application/xml&quot;&amp;gt;
    &amp;lt;passenger&amp;gt;
      &amp;lt;source href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; /&amp;gt;
      ...
    &amp;lt;/passenger&amp;gt;
  &amp;lt;/atom:content&amp;gt;
&amp;lt;/atom:entry&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;code&gt;&amp;lt;evr:passenger&amp;gt;&lt;/code&gt; element needs to reference a person and a voyage (event), to say that Darwin was a passenger on the voyage.&lt;/p&gt;

&lt;p&gt;Here are the options, I think:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Use &lt;code&gt;&amp;lt;atom:link&amp;gt;&lt;/code&gt; within the &lt;code&gt;&amp;lt;atom:entry&amp;gt;&lt;/code&gt;, with a URL in &lt;code&gt;rel&lt;/code&gt; that indicates the kind of relationship  &lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;atom:entry xml:base=&quot;http://www.example.com&quot;&amp;gt;
  ...
  &amp;lt;atom:link rel=&quot;/link-relationships/assertion-persona&quot;
             href=&quot;/persona/DarwinC01&quot; /&amp;gt;
  &amp;lt;atom:link rel=&quot;/link-relationships/assertion-event&quot;
             href=&quot;/events/BeagleVoyage&quot; /&amp;gt;
  &amp;lt;atom:content type=&quot;application/xml&quot;&amp;gt;
    &amp;lt;passenger&amp;gt;
      &amp;lt;source href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; /&amp;gt;
    &amp;lt;/passenger&amp;gt;
  &amp;lt;/atom:content&amp;gt;
&amp;lt;/atom:entry&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use extension elements within the &lt;code&gt;&amp;lt;atom:entry&amp;gt;&lt;/code&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;atom:entry xml:base=&quot;http://www.example.com&quot;&amp;gt;
  ...
  &amp;lt;persona href=&quot;/persona/DarwinC01&quot; /&amp;gt;
  &amp;lt;event href=&quot;/events/BeagleVoyage&quot; /&amp;gt;
  &amp;lt;atom:content type=&quot;application/xml&quot;&amp;gt;
    &amp;lt;passenger&amp;gt;
      &amp;lt;source href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; /&amp;gt;
    &amp;lt;/passenger&amp;gt;
  &amp;lt;/atom:content&amp;gt;
&amp;lt;/atom:entry&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use child elements in the object&amp;#8217;s XML, referencing the URLs of the related objects&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;atom:entry xml:base=&quot;http://www.example.com&quot;&amp;gt;
  ...
  &amp;lt;atom:content type=&quot;application/xml&quot;&amp;gt;
    &amp;lt;passenger&amp;gt;
      &amp;lt;source href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; /&amp;gt;
      &amp;lt;persona href=&quot;/persona/DarwinC01&quot; /&amp;gt;
      &amp;lt;event href=&quot;/events/BeagleVoyage&quot; /&amp;gt;
    &amp;lt;/passenger&amp;gt;
  &amp;lt;/atom:content&amp;gt;
&amp;lt;/atom:entry&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use child elements in the object&amp;#8217;s XML, embedding the related objects&amp;#8217; Atom entry or feed&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;atom:entry xml:base=&quot;http://www.example.com&quot;&amp;gt;
  ...
  &amp;lt;atom:content type=&quot;application/xml&quot;&amp;gt;
    &amp;lt;passenger&amp;gt;
      &amp;lt;source href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; /&amp;gt;
      &amp;lt;persona&amp;gt;
        &amp;lt;atom:entry&amp;gt;
          ...
          &amp;lt;atom:title&amp;gt;Charles Darwin&amp;lt;/atom:title&amp;gt;
          &amp;lt;atom:content&amp;gt;
            &amp;lt;persona&amp;gt;
              &amp;lt;name&amp;gt;Charles Darwin&amp;lt;/name&amp;gt;
              ...
            &amp;lt;/persona&amp;gt;
          &amp;lt;/atom:content&amp;gt;
        &amp;lt;/atom:entry&amp;gt;
      &amp;lt;/persona&amp;gt;
      &amp;lt;event&amp;gt;
        &amp;lt;atom:entry&amp;gt;
          ...
          &amp;lt;atom:title&amp;gt;Beagle Voyage&amp;lt;/atom:title&amp;gt;
          &amp;lt;atom:content&amp;gt;
            &amp;lt;event&amp;gt;
              &amp;lt;name&amp;gt;Beagle Voyage&amp;lt;/name&amp;gt;
              &amp;lt;date-range&amp;gt;
                &amp;lt;date&amp;gt;1831-12-27&amp;lt;/date&amp;gt;
                &amp;lt;date&amp;gt;1836-10-02&amp;lt;/date&amp;gt;
              &amp;lt;/date-range&amp;gt;
              ...
            &amp;lt;/event&amp;gt;
          &amp;lt;/atom:content&amp;gt;
        &amp;lt;/atom:entry&amp;gt;
      &amp;lt;/event&amp;gt;
    &amp;lt;/passenger&amp;gt;
  &amp;lt;/atom:content&amp;gt;
&amp;lt;/atom:entry&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use child elements in the object&amp;#8217;s XML, embedding the related objects&amp;#8217; XML content&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;atom:entry xml:base=&quot;http://www.example.com&quot;&amp;gt;
  ...
  &amp;lt;atom:content type=&quot;application/xml&quot;&amp;gt;
    &amp;lt;passenger&amp;gt;
      &amp;lt;source href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; /&amp;gt;
      &amp;lt;persona&amp;gt;
        &amp;lt;name&amp;gt;Charles Darwin&amp;lt;/name&amp;gt;
        ...
      &amp;lt;/persona&amp;gt;
      &amp;lt;event&amp;gt;
        &amp;lt;name&amp;gt;Beagle Voyage&amp;lt;/name&amp;gt;
        &amp;lt;date-range&amp;gt;
          &amp;lt;date&amp;gt;1831-12-27&amp;lt;/date&amp;gt;
          &amp;lt;date&amp;gt;1836-10-02&amp;lt;/date&amp;gt;
        &amp;lt;/date-range&amp;gt;
        ...
      &amp;lt;/event&amp;gt;
    &amp;lt;/passenger&amp;gt;
  &amp;lt;/atom:content&amp;gt;
&amp;lt;/atom:entry&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I don&amp;#8217;t think that there&amp;#8217;s any point in using an extension element (#2), given that using &lt;code&gt;&amp;lt;atom:link&amp;gt;&lt;/code&gt; (#1) situates the information in the same place in a more standard way.&lt;/p&gt;

&lt;p&gt;Embedding information (as in #4 and #5) is a good thing because it means fewer requests to the server in order to get some useful information. Providing access to Atom feeds (as in #1, #3 and #4) is a good thing because it means you can get metadata about who created the refenced objects, and additional information about them. So #4 is good, since it does both these things, but I don&amp;#8217;t like embedding Atom in the XML because it&amp;#8217;s a lot of extra weight in the XML (making it harder to read/process).&lt;/p&gt;

&lt;p&gt;In fact, #1, #3 and #5 aren&amp;#8217;t mutually exclusive. It&amp;#8217;s possible to add relevant &lt;code&gt;&amp;lt;atom:link&amp;gt;&lt;/code&gt;s to the metadata, reference the URLs of the other objects &lt;em&gt;and&lt;/em&gt; embed their content at the same time:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;    &amp;lt;atom:entry xml:base=&quot;http://www.example.com&quot;&amp;gt;
      ...
      &amp;lt;atom:link rel=&quot;/link-relationships/assertion-persona&quot;
                 href=&quot;/persona/DarwinC01&quot; /&amp;gt;
      &amp;lt;atom:link rel=&quot;/link-relationships/assertion-event&quot;
                 href=&quot;/events/BeagleVoyage&quot; /&amp;gt;
      &amp;lt;atom:content type=&quot;application/xml&quot;&amp;gt;
        &amp;lt;passenger&amp;gt;
          &amp;lt;source href=&quot;http://www.aboutdarwin.com/voyage/voyage01.html&quot; /&amp;gt;
          &amp;lt;persona src=&quot;/persona/DarwinC01&quot;&amp;gt;
            &amp;lt;name&amp;gt;Charles Darwin&amp;lt;/name&amp;gt;
            ...
          &amp;lt;/persona&amp;gt;
          &amp;lt;event src=&quot;/events/BeagleVoyage&quot;&amp;gt;
            &amp;lt;name&amp;gt;Beagle Voyage&amp;lt;/name&amp;gt;
            &amp;lt;date-range&amp;gt;
              &amp;lt;date&amp;gt;1831-12-27&amp;lt;/date&amp;gt;
              &amp;lt;date&amp;gt;1836-10-02&amp;lt;/date&amp;gt;
            &amp;lt;/date-range&amp;gt;
            ...
          &amp;lt;/event&amp;gt;
        &amp;lt;/passenger&amp;gt;
      &amp;lt;/atom:content&amp;gt;
    &amp;lt;/atom:entry&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We embed the core information for easy access (#5), reference its original URI for more details (#3), and then we may as well add the &lt;code&gt;&amp;lt;atom:link&amp;gt;&lt;/code&gt;s (#1) so that run-of-the-mill Atom readers who have no knowledge about our content can do something useful. We &lt;em&gt;don&amp;#8217;t&lt;/em&gt; get the metadata embedded in the XML, but it&amp;#8217;s retrievable: a client could use the entry as a kind of &amp;#8220;low resolution&amp;#8221; information set, which they can add to by retrieving the &amp;#8220;high resolution&amp;#8221; Atom for the referenced objects, via their URLs, as necessary.&lt;/p&gt;

&lt;p&gt;The problem with using an embedding method rather than a referencing method is that the object model is a graph, not a hierarchy. So you can&amp;#8217;t &lt;em&gt;always&lt;/em&gt; embed an object&amp;#8217;s XML: sometimes you have to only use a reference (#3 without #5) to avoid getting into an endless loop of repeated information. As a publisher, sometimes you might &lt;em&gt;want&lt;/em&gt; to only use a reference, because the information is only tangential to the main subject of the original entry. I&amp;#8217;m imagining that we might serve several different Atom entries for the same object, with different amounts of detail. Maybe.&lt;/p&gt;

&lt;p&gt;As an author, creating this XML, you can&amp;#8217;t include a reference if you&amp;#8217;re constructing XML (either in code or by hand) for new objects because they won&amp;#8217;t have URLs yet. Therefore, for the purpose of &lt;em&gt;creating&lt;/em&gt; objects as defined by the Atom Publishing Protocol, you&amp;#8217;ll use embedded XML (#5) with references to existing objects if necessary. The resource returned will include the references for all the created objects. When updating, you&amp;#8217;ll want to include as little as possible aside from the updated information, I imagine (small updates being less prone to clashes than large ones). &lt;/p&gt;

&lt;p&gt;By the way, I&amp;#8217;m using &lt;code&gt;src&lt;/code&gt; attributes when the information is embedded and &lt;code&gt;href&lt;/code&gt; attributes when the information is purely referenced (or almost purely referenced; the referencing elements might still have some content equivalent to the &lt;code&gt;&amp;lt;atom:title&amp;gt;&lt;/code&gt; element, in the interests of presenting a clickable link).&lt;/p&gt;

&lt;p&gt;So that&amp;#8217;s the plan at the moment, but we&amp;#8217;re open to suggestions. Anybody?&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/54#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/18">atom</category>
 <pubDate>Sun, 02 Sep 2007 20:57:36 +0100</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">54 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>The perils of default namespaces</title>
 <link>http://www.jenitennison.com/blog/node/36</link>
 <description>&lt;p&gt;A lot of people run into problems with namespaces, and most of those arise from using default namespaces (ie not giving namespaces prefixes). The transformation technology you use can have a big effect on how confusing and irritating it gets.&lt;/p&gt;

&lt;p&gt;Default namespaces make XML documents easier to read because they allow you to just give the local name of an element rather than using prefixes all over the place. For example, using:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;house status=&quot;For Sale&quot; xmlns=&quot;http://www.example.com/ns/house&quot;&amp;gt;
  &amp;lt;askingPrice&amp;gt;...&amp;lt;/askingPrice&amp;gt;
  &amp;lt;address&amp;gt;...&amp;lt;/address&amp;gt;
  &amp;lt;layout&amp;gt;...&amp;lt;/layout&amp;gt;
&amp;lt;/house&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;!--break--&gt;

&lt;p&gt;rather than:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;h:house status=&quot;For Sale&quot; xmlns:h=&quot;http://www.example.com/ns/house&quot;&amp;gt;
  &amp;lt;h:askingPrice&amp;gt;...&amp;lt;/h:askingPrice&amp;gt;
  &amp;lt;h:address&amp;gt;...&amp;lt;/h:address&amp;gt;
  &amp;lt;h:layout&amp;gt;...&amp;lt;/h:layout&amp;gt;
&amp;lt;/h:house&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In some cases, specifically documents that are validated against a DTD or interpreted by non-namespace-aware applications, you might be forced to use the default namespace. The biggest example of this is (X)HTML.&lt;/p&gt;

&lt;p&gt;In transformation technologies, such as &lt;a href=&quot;http://www.w3.org/Style/XSL/&quot;&gt;XSLT&lt;/a&gt;, &lt;a href=&quot;http://www.w3.org/XML/Query/&quot;&gt;XQuery&lt;/a&gt; and &lt;a href=&quot;http://www.xlinq.net/&quot;&gt;XLinq in VB.NET&lt;/a&gt;, you have to deal with at least two documents: the source documents that you are processing and the result documents that you are creating. Often, the source and result documents will use default namespaces, or at any rate you&amp;#8217;ll want to query and create the documents without using prefixes. Sometimes, the source and result documents all use the same namespace, but it&amp;#8217;s far more common that they don&amp;#8217;t.&lt;/p&gt;

&lt;p&gt;So transformation technologies have to support at least &lt;em&gt;two&lt;/em&gt; default namespaces: one for querying and one for construction.&lt;/p&gt;

&lt;p&gt;In XPath 1.0, you must specify a prefix for each namespace you want to use. A path like &lt;code&gt;/house/layout&lt;/code&gt; will only select &lt;code&gt;&amp;lt;layout&amp;gt;&lt;/code&gt; elements in no namespace. In XSLT 1.0, the default namespace in the stylesheet (as declared by the &lt;code&gt;xmlns&lt;/code&gt; attribute on &lt;code&gt;&amp;lt;xsl:stylesheet&amp;gt;&lt;/code&gt;) is then free to be used for the result documents. For example, I might do&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:stylesheet version=&quot;1.0&quot;
  xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;
  xmlns:h=&quot;http://www.example.com/ns/house&quot;
  exclude-result-prefixes=&quot;h&quot;
  xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&amp;gt;

&amp;lt;xsl:template match=&quot;h:house&quot;&amp;gt;
  &amp;lt;div class=&quot;house&quot;&amp;gt;
    &amp;lt;h1&amp;gt;&amp;lt;xsl:apply-templates select=&quot;h:askingPrice&quot; /&amp;gt;&amp;lt;/h1&amp;gt;
    ...
  &amp;lt;/div&amp;gt;
&amp;lt;/xsl:template&amp;gt;

&amp;lt;/xsl:stylesheet&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;[The best way to deal with multiple result documents in different default namespaces is to simply have different stylesheet documents to handle their generation, all included or imported into your main stylesheet application.]&lt;/p&gt;

&lt;p&gt;Users of XSLT 1.0 found it confusing that they couldn&amp;#8217;t just copy the namespace declarations (including a default namespace declaration) from a sample source document and have the paths just work. So in XPath 2.0, rather than no prefix meaning no namespace, the &lt;strong&gt;default element/type namespace&lt;/strong&gt; in the context is used for element names with no prefix. If the default element/type namespace is set to &lt;code&gt;http://www.example.com/ns/house&lt;/code&gt; then the path &lt;code&gt;/house/layout&lt;/code&gt; will select all &lt;code&gt;&amp;lt;layout&amp;gt;&lt;/code&gt; elements in the &lt;code&gt;http://www.example.com/ns/house&lt;/code&gt; namespace. You can set this default element/type namespace in XSLT 2.0 using the &lt;code&gt;[xsl:]xpath-default-namespace&lt;/code&gt; attribute, which can go anywhere but will usually be situated on the &lt;code&gt;&amp;lt;xsl:stylesheet&amp;gt;&lt;/code&gt; element (in which case it appears without the &lt;code&gt;xsl:&lt;/code&gt; prefix). The default element/type namespace can be scoped to a particular area of your stylesheet in the same way as namespace declarations.&lt;/p&gt;

&lt;p&gt;Otherwise, XSLT 2.0 works like XSLT 1.0 in that the default namespace in the stylesheet supplies the default namespace for created elements, so you can do:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:stylesheet version=&quot;2.0&quot;
  xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;
  xmlns=&quot;http://www.w3.org/1999/xhtml&quot;
  xpath-default-namespace=&quot;http://www.example.com/ns/house&quot;&amp;gt;

&amp;lt;xsl:template match=&quot;house&quot;&amp;gt;
  &amp;lt;div class=&quot;house&quot;&amp;gt;
    &amp;lt;h1&amp;gt;&amp;lt;xsl:apply-templates select=&quot;askingPrice&quot; /&amp;gt;&amp;lt;/h1&amp;gt;
    ...
  &amp;lt;/div&amp;gt;
&amp;lt;/xsl:template&amp;gt;

&amp;lt;/xsl:stylesheet&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;By keeping the default query namespace and the default construction namespace separate, you&amp;#8217;re able to use unprefixed names in both paths and element constructors, even if the default namespaces in the two cases are different.&lt;/p&gt;

&lt;p&gt;XQuery and VB.NET, on the other hand, provide a single default namespace that is used for both queries and construction, and they work in slightly different ways.&lt;/p&gt;

&lt;p&gt;In XQuery you can declare the default namespace for the query, with&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;declare default element namespace &quot;http://www.example.com/ns/house&quot;;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;which means that you can query the source document with paths like &lt;code&gt;/house/askingPrice&lt;/code&gt; and create elements in the &lt;code&gt;http://www.example.com/ns/house&lt;/code&gt; namespace with direct element constructors without prefixes, like&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;house status=&quot;For Sale&quot;&amp;gt;
  &amp;lt;askingPrice&amp;gt;...&amp;lt;/askingPrice&amp;gt;
  &amp;lt;address&amp;gt;...&amp;lt;/address&amp;gt;
  &amp;lt;layout&amp;gt;...&amp;lt;/layout&amp;gt;
&amp;lt;/house&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you then want to generate XHTML (or some other result for which you&amp;#8217;d prefer to use the default namespace), you can use a default namespace declaration on the XHTML you generate:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;div class=&quot;house&quot; xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&amp;gt;
  &amp;lt;h1&amp;gt;...&amp;lt;/h1&amp;gt;
  ...
&amp;lt;/div&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;But the default namespace declaration in the element constructor carries through into the embedded expressions, so&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;  &amp;lt;div class=&quot;house&quot; xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&amp;gt;
    &amp;lt;h1&amp;gt;{ /house/askingPrice }&amp;lt;/h1&amp;gt;
    ...
  &amp;lt;/div&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;won&amp;#8217;t work. As a result, you end up having to use the query&amp;#8217;s default namespace declaration to set the default namespace to XHTML (or whatever the default namespace is in the result), and use prefixes in your queries (essentially the same situation as in XSLT 1.0).&lt;/p&gt;

&lt;p&gt;In XLinq in VB.NET, there&amp;#8217;s the same kind of pattern. The &lt;code&gt;Imports&lt;/code&gt; statement allows you to declare a default namespace that&amp;#8217;s used in both queries and construction, as in:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Imports &amp;lt;xmlns=&quot;http://www.example.com/ns/house&quot;&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and you can use a default namespace declaration on the XHTML you generate to provide the default namespace for the elements in the XML literal:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;houseDiv =
  &amp;lt;div class=&quot;house&quot; xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&amp;gt;
    &amp;lt;h1&amp;gt;...&amp;lt;/h1&amp;gt;
    ...
  &amp;lt;/div&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;But unlike in XQuery, the default XHTML namespace declaration in the &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt; &lt;em&gt;doesn&amp;#8217;t&lt;/em&gt; have an effect on the default namespace used in embedded expressions, which means you can still use unprefixed element names in any paths used within the XML literal, like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;houseDiv =
  &amp;lt;div class=&quot;house&quot; xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&amp;gt;
    &amp;lt;h1&amp;gt;&amp;lt;%= doc.&amp;lt;house&amp;gt;.&amp;lt;askingPrice&amp;gt; %&amp;gt;&amp;lt;/h1&amp;gt;
    ...
  &amp;lt;/div&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;However, if you build up your XHTML gradually, perhaps by using separate variables or methods, then every time you create a snippet of XHTML you have to specify this default namespace. Again, people will end up using the &lt;code&gt;Imports&lt;/code&gt; statement to set the default namespace to the default result namespace and using prefixes in their paths.&lt;/p&gt;

&lt;p&gt;The other factor to consider is that sometimes no prefix really does mean no namespace. If you&amp;#8217;re querying an XML document that contains elements in no namespace, you have to set the default query namespace to no namespace. In XSLT 1.0, that&amp;#8217;s always the case anyway; in XSLT 2.0, the &lt;code&gt;xpath-default-namespace&lt;/code&gt; shouldn&amp;#8217;t be set (or should be unset for those places that need to query no-namespace elements). In XQuery you can&amp;#8217;t use the query default namespace declaration and in XLinq in VB.NET you can&amp;#8217;t use the &lt;code&gt;Imports&lt;/code&gt; statement. In both these cases, you better hope your result is in no namespace too. If not, the best route (to make it work at all in XQuery, and to avoid repetitive &lt;code&gt;xmlns&lt;/code&gt; attributes in VB.NET) is to create a no-namespace version of your result first, and have a standard function or method that will add the right default namespace to that result.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/36#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/5">xslt</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/15">xlinq</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/29">xquery</category>
 <pubDate>Sun, 01 Jul 2007 20:32:44 +0100</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">36 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>XML Paths in Programming Languages</title>
 <link>http://www.jenitennison.com/blog/node/25</link>
 <description>&lt;p&gt;I&amp;#8217;ve finally finished my &amp;#8220;Progress in Processing&amp;#8221; talk for this year&amp;#8217;s &lt;a href=&quot;http://www.xmlsummerschool.com/&quot; title=&quot;XML Summer School in Oxford&quot;&gt;XML Summer School&lt;/a&gt;. It&amp;#8217;s been really interesting looking at the different APIs developed for different programming languages in the last few years, all &lt;em&gt;so&lt;/em&gt; much easier to use than the &lt;a href=&quot;http://www.w3.org/DOM/&quot;&gt;DOM&lt;/a&gt;. One of the themes is the use of path-based syntax to query XML.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;Even with the simpler XML APIs, accessing nodes in an XML tree can be pretty laborious. For example, get all the &lt;code&gt;&amp;lt;room&amp;gt;&lt;/code&gt; elements in the first &lt;code&gt;&amp;lt;floor&amp;gt;&lt;/code&gt; element of a house with (this is &lt;a href=&quot;http://www.xlinq.net/&quot; title=&quot;XLinq website&quot;&gt;XLinq&lt;/a&gt;):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;doc.Element(&quot;house&quot;).Element(&quot;floor&quot;).Elements(&quot;room&quot;)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Of course &lt;a href=&quot;http://www.w3.org/1999/xpath&quot; title=&quot;W3C: XPath specification&quot;&gt;XPath&lt;/a&gt; does this pretty well:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/house/floor[1]/room
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and many of the APIs that I looked at provided XPath access. For example (this is Ruby&amp;#8217;s &lt;a href=&quot;http://www.ruby-doc.org/stdlib/libdoc/rexml/rdoc/&quot; title=&quot;REXML&#039;s Ruby Documentation&quot;&gt;REXML&lt;/a&gt;):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;doc.elements[&quot;/house/floor[1]/room&quot;]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;But using XPath is tricky for a couple of reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A cognitive leap is required to switch from the usual object/method dot-notation syntax that you use in the surrounding language to the specialised XPath notation. In particular, it&amp;#8217;s difficult mixing the one-based indexing in XPath with the zero-based indexing that&amp;#8217;s used in most programming languages.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;XPaths have to be passed as strings; there&amp;#8217;s a temptation to construct the strings automatically, which leads to all sorts of headaches (such as remembering to put quotes around the strings that you concatenate into the XPath when you really want them to be interpreted as strings rather than element names). [A clean way of approaching this would be to use variables in the XPath and pass in a set of variable bindings when you use the XPath, but I don&amp;#8217;t know any API that actually does this.]&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because of these issues, there&amp;#8217;s been some effort to use the native dot-notation syntax to query XML within general-purpose programming languages. I knew about &lt;a href=&quot;http://java.sun.com/developer/technicalArticles/WebServices/jaxb/&quot; title=&quot;Java API for XML Binding&quot;&gt;JAXB&lt;/a&gt; before I started looking, but didn&amp;#8217;t know before about &lt;a href=&quot;http://uche.ogbuji.net/&quot; title=&quot;Uche Ogbuji&#039;s Home Page&quot;&gt;Uche Ogbuji&lt;/a&gt;&amp;#8217;s &lt;a href=&quot;http://uche.ogbuji.net/tech/4suite/amara/&quot; title=&quot;Amara: Python XML Toolkit&quot;&gt;Amara&lt;/a&gt; or the details of the VB.NET interface. Whereas with JAXB you have to compile a schema (an XML Schema schema, what&amp;#8217;s more) into Java classes, with Amara and VB.NET there&amp;#8217;s the kind of dynamic binding you get with XPath. In Amara, for example, you can do:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;doc.house.floor.room
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;while in VB.NET you can use (I think):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;doc.&amp;lt;house&amp;gt;.&amp;lt;floor&amp;gt;.First().&amp;lt;room&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;(Don&amp;#8217;t ask me how to get the rooms on the &lt;em&gt;second&lt;/em&gt; floor in VB.NET; that, I couldn&amp;#8217;t figure out. In Amara, it&amp;#8217;s &lt;code&gt;doc.house.floor[1].room&lt;/code&gt;.)&lt;/p&gt;

&lt;p&gt;Path-based syntax in general-purpose programming languages is really neat: it exposes XML documents as if they were objects, which makes them &amp;#8220;closer&amp;#8221; to you as a programmer. They work particularly well for data-oriented XML in which elements contain either elements or text and not both.&lt;/p&gt;

&lt;p&gt;There are two main areas where the path-based languages differ.&lt;/p&gt;

&lt;p&gt;First, what they do with paths with intermediate steps that select more than one element. For example, in XPath, &lt;code&gt;/house/floor/room&lt;/code&gt; gets you all the rooms in all the floors of the house, as does &lt;code&gt;doc.&amp;lt;house&amp;gt;.&amp;lt;floor&amp;gt;.&amp;lt;room&amp;gt; in VB.NET: both provide an implicit iteration over the selected elements in the intermediate steps. In Amara,&lt;/code&gt;doc.house.floor.room&lt;code&gt;gets you all the rooms in the *first* floor of the house, so you have to explicitly iterate over the&lt;/code&gt;&lt;floor&gt;` elements if you want to collect all the rooms in the house.&lt;/p&gt;

&lt;p&gt;Second, how they handle namespaces. In XPath, you have to provide a set of namespace bindings whenever you evaluate an XPath expression, and the prefixes you use on element names are resolved against those namespace bindings. In XPath 1.0, element names with no prefix only match elements in no namespace; in XPath 2.0, you can also provide a default namespace that&amp;#8217;s used for names with no prefix.&lt;/p&gt;

&lt;p&gt;That works well when XPath is embedded in some XML (such as in XSLT, XForms, XProc and so on), because the namespace bindings from the XML environment can provide the namespace bindings for the XPath expression. But that can&amp;#8217;t generally happen when XPaths are used in a programming language.&lt;/p&gt;

&lt;p&gt;All the APIs that use XPaths allow you to specify the namespace bindings explicitly, but some, such as REXML, do an automatic namespace binding based on the namespace bindings from the source document. So if I have:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;xs:schema xmlns:xs=&quot;http://www.w3.org/2001/XMLSchema&quot;&amp;gt;
  ...
&amp;lt;/xs:schema&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;in the document I&amp;#8217;m querying then I can use the &lt;code&gt;xs&lt;/code&gt; prefix to mean the XML Schema namespace in the path that I use to query the document, such as &lt;code&gt;/xs:schema/xs:element/@name&lt;/code&gt; to get the names of the global element declarations.&lt;/p&gt;

&lt;p&gt;This makes paths nice and simple&amp;#8230; right until you have to use them process a document that uses different namespace bindings. For example, it&amp;#8217;s not uncommon to find XML Schema documents that use the prefix &lt;code&gt;xsd&lt;/code&gt; instead of &lt;code&gt;xs&lt;/code&gt;, or even the default namespace; for those documents, the automatic binding won&amp;#8217;t work and the path &lt;code&gt;/xs:schema/xs:element/@name&lt;/code&gt; will give you an error. [REXML also provides &lt;code&gt;XPath.match()&lt;/code&gt; and &lt;code&gt;XPath.each()&lt;/code&gt;, to which you can provide an explicit set of namespace bindings; you&amp;#8217;ll use these if you care about keeping the indirection between prefixes and namespaces.]&lt;/p&gt;

&lt;p&gt;In Amara (when using Pythonic paths), you can just forget about namespaces: the elements and attributes are selected purely based on their local name. The only time you&amp;#8217;ll run into problems is if you actually have, in the same context, two elements from different namespaces with the same local name, which is an event that&amp;#8217;s rarer than people using different prefixes for a given namespace. In the XML Schema example, you can use &lt;code&gt;doc.schema.element.name&lt;/code&gt; (yes, attributes are picked up with the same syntax as elements), and will only have a problem if there&amp;#8217;s an &lt;code&gt;&amp;lt;element&amp;gt;&lt;/code&gt; element in some other namespace. [Amara also provides XPath-based querying, and you can supply explicit namespace bindings for that.]&lt;/p&gt;

&lt;p&gt;In VB.NET, the &lt;code&gt;Imports&lt;/code&gt; directive is used to provide global namespace bindings, so it gets the benefits that you would have from using XPath in an XML context. What&amp;#8217;s more, you can use a default namespace binding so that you don&amp;#8217;t have to use prefixes in your paths. So you can do:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Imports &amp;lt;xmlns:xs=&quot;http://www.w3.org/2001/XMLSchema&quot;&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and then &lt;code&gt;doc.&amp;lt;xs:schema&amp;gt;.&amp;lt;xs:element&amp;gt;.@name&lt;/code&gt; and it will work as planned, no matter what prefixes were actually used in the schema document. Or you can do:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Imports &amp;lt;xmlns=&quot;http://www.w3.org/2001/XMLSchema&quot;&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and &lt;code&gt;doc.&amp;lt;schema&amp;gt;.&amp;lt;element&amp;gt;.@name&lt;/code&gt;. Overall, I think it&amp;#8217;s pretty impressive that VB.NET is going to have support for querying XML documents built in at such a low level.&lt;/p&gt;

&lt;p&gt;Using default namespaces in paths is a tricky issue, though. I&amp;#8217;ll have to dedicate a different post to that; this one&amp;#8217;s quite long enough already.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/25#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <pubDate>Tue, 05 Jun 2007 22:35:48 +0100</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">25 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Unicode database in XML</title>
 <link>http://www.jenitennison.com/blog/node/15</link>
 <description>&lt;p&gt;Whatever &lt;a href=&quot;http://www.jenitennison.com/blog/node/12&quot; title=&quot;Levenshtein distance on the diagonal&quot;&gt;algorithm&lt;/a&gt; you use to &lt;a href=&quot;http://www.jenitennison.com/blog/node/11&quot; title=&quot;Levenshtein distance in XSLT 2.0&quot;&gt;calculate Levenshtein distance&lt;/a&gt;, one of its great features is that you can tweak the cost of letter substitutions. For example, you can do a case-insensitive comparison of two strings, or perhaps more interestingly a semi-case-sensitive comparison of two strings, where the cost of replacing a character for its upper or lower case equivalent is less than the cost of replacing a character with an unrelated character, but more than zero. But that requires knowledge of whether and how two characters are related.&lt;/p&gt;

&lt;p&gt;Of course all that information is stored in the &lt;a href=&quot;http://www.unicode.org/Public/UNIDATA/&quot; title=&quot;Unicode Database directory&quot;&gt;Unicode Database&lt;/a&gt;, which are a bunch of text files in a structured format. I looked for an XML version but couldn&amp;#8217;t find one (well, Googling &amp;#8220;Unicode database XML&amp;#8221; isn&amp;#8217;t much help). So I downloaded &lt;a href=&quot;http://www.unicode.org/Public/UNIDATA/UnicodeData.txt&quot; title=&quot;Unicode Database&quot;&gt;UnicodeData.txt&lt;/a&gt; and &lt;a href=&quot;http://www.unicode.org/Public/UNIDATA/NamesList.txt&quot; title=&quot;Unicode Names List Database&quot;&gt;NamesList.txt&lt;/a&gt; and put together an &lt;a href=&quot;http://www.jenitennison.com/blog/files/Unicode.xsl&quot; title=&quot;Unicode database builder XSLT&quot;&gt;XSLT 2.0 stylesheet&lt;/a&gt; to create an &lt;a href=&quot;http://www.jenitennison.com/blog/files/unicode.zip&quot; title=&quot;Unicode XML&quot;&gt;XML version of the Unicode database&lt;/a&gt;.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;The XML contains practically everything that you can get from those two files, which means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;block and subblock structures&lt;/li&gt;
&lt;li&gt;hexadecimal and decimal codepoints&lt;/li&gt;
&lt;li&gt;names, aliases and comments&lt;/li&gt;
&lt;li&gt;category and numeric information&lt;/li&gt;
&lt;li&gt;uppercase, lowercase and titlecase equivalents&lt;/li&gt;
&lt;li&gt;decomposition of various kinds&lt;/li&gt;
&lt;li&gt;related characters&lt;/li&gt;
&lt;li&gt;bidi information&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It might prove easier to search than grepping the text files, if you&amp;#8217;re used to using XPath. I might split it up and put together an AJAX browser, in my Copious Spare Time.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/15#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/14">xml</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/13">unicode</category>
 <pubDate>Mon, 14 May 2007 21:11:23 +0100</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">15 at http://www.jenitennison.com/blog</guid>
</item>
</channel>
</rss>
