For the last several months, I’ve been working on a project at TSO for publishing UK legislation using a native XML database (eg eXist or MarkLogic Server) with some middleware (eg Orbeon or Cocoon). It’s a powerful and flexible approach that’s built on declarative languages like XQuery, XSLT, and XML pipelines; you can see it in action with the Command and House Papers demo.
But the killer platform isn’t quite here yet, partly because the specs aren’t quite done. Both Orbeon and Cocoon use XML pipelines, but they use different languages to define them; XProc is just around the corner. XML databases are all over the place in their conformance to XQuery, its optional features and the not-quite-finalised specs for free-text searching and updating.
People talk about how productive you can be using Ruby on Rails or Django, and they work great for publishing data you can store in a relational database. What we need is a similarly easy-to-use platform for document-oriented, XML-based content. This is my wish-list.
The killer platform would have a configuration mechanism for mapping HTTP requests that it receives onto XProc pipelines. The pipeline that would be used could be based on one or more of:
The pipelines would have a signature like:
<c:request>element used within the
<c:response>element used within the
<c:parameters>element containing parameters for serializing the result body; possible serialisations would include serialising XSL-FO as PDF and SVG as JPEG, for example.
The pipeline engine would of course include efficient implementations of all the required steps, most importantly XSLT 2.0.
The platform would have an easy mechanism for invoking queries on its XML store through an implementation-defined step that was similar to the
p:xquery XProc step. The step might have the signature:
The XML store itself would support:
It would also support setting up indexes on any expression for a particular kind of context node (usually an element); these would work like keys in XSLT, except that the XQuery engine would automatically detect when the index could be applied. For example, it would be possible to set up a key on a document for the expression
substring(//dc:identifier, 7, 2) and if the query used exactly this expression, the index would be used.
The platform would provide an extensible architecture such that it would be possible to set up replicated XML store(s) on separate servers from the main pipeline engine. It would cache the results of queries against the XML store. It would serve up static content such as images and scripts bypassing the pipeline. It would be configured using files, so that it was easy to transfer a configuration between development and production platforms and to version control configurations through normal means.
Have you used (or developed!) anything that comes close? What’s on your wish-list?