For the last several months, I’ve been working on a project at TSO for publishing UK legislation using a native XML database (eg eXist or MarkLogic Server) with some middleware (eg Orbeon or Cocoon). It’s a powerful and flexible approach that’s built on declarative languages like XQuery, XSLT, and XML pipelines; you can see it in action with the Command and House Papers demo.
But the killer platform isn’t quite here yet, partly because the specs aren’t quite done. Both Orbeon and Cocoon use XML pipelines, but they use different languages to define them; XProc is just around the corner. XML databases are all over the place in their conformance to XQuery, its optional features and the not-quite-finalised specs for free-text searching and updating.
People talk about how productive you can be using Ruby on Rails or Django, and they work great for publishing data you can store in a relational database. What we need is a similarly easy-to-use platform for document-oriented, XML-based content. This is my wish-list.
The killer platform would have a configuration mechanism for mapping HTTP requests that it receives onto XProc pipelines. The pipeline that would be used could be based on one or more of:
The pipelines would have a signature like:
<c:request> element used within the p:http-request XProc step<c:response> element used within the p:http-request XProc step<c:parameters> element containing parameters for serializing the result body; possible serialisations would include serialising XSL-FO as PDF and SVG as JPEG, for example.The pipeline engine would of course include efficient implementations of all the required steps, most importantly XSLT 2.0.
The platform would have an easy mechanism for invoking queries on its XML store through an implementation-defined step that was similar to the p:xquery XProc step. The step might have the signature:
p:xquery)The XML store itself would support:
It would also support setting up indexes on any expression for a particular kind of context node (usually an element); these would work like keys in XSLT, except that the XQuery engine would automatically detect when the index could be applied. For example, it would be possible to set up a key on a document for the expression substring(//dc:identifier, 7, 2) and if the query used exactly this expression, the index would be used.
The platform would provide an extensible architecture such that it would be possible to set up replicated XML store(s) on separate servers from the main pipeline engine. It would cache the results of queries against the XML store. It would serve up static content such as images and scripts bypassing the pipeline. It would be configured using files, so that it was easy to transfer a configuration between development and production platforms and to version control configurations through normal means.
Have you used (or developed!) anything that comes close? What’s on your wish-list?
Comments
Re: My Perfect XML-Based Publishing Platform
In addition to be more perfect You said - the HTTP method - the requested URI - any HTTP header
May the future services will work over xmpp / Jabber
So http methods and URI must carefully encapsulated (just addressing conventions) Since many years exist papers (Tim BL for instance) with other representations (N3 for instance) In addition we have not clear directions foreseen in order to implement two levels TP
I read some strange things in recent updates of XQuery specifications
Re: My Perfect XML-Based Publishing Platform
Jeni,
Would it be possible for me to republish this article on XMLToday.org? We’re discussing general requirements for an XRX/RESTful Services based Drupal-like CMS, and I think you’ve covered a lot of the salient points.
— Kurt Cagle
Re: My Perfect XML-Based Publishing Platform
Kurt,
Sure, you can republish so long as you indicate where you got the article from (ie abide by the CC attribution licence referenced at the bottom of the page).
Cheers,
Jeni
Re: My Perfect XML-Based Publishing Platform
Regarding “invoking queries on its XML store”, or more generally having a REST client, one approach would be to leverage the xforms:submission element.
In fact in Orbeon we have already wrapped that element in an XPL processor (or “step” as per the XProc terminology). The benefit is that the XForms 1.1 xforms:submission really acts like a quite complete REST client.
-Erik
Re: My Perfect XML-Based Publishing Platform
I am amazed to see the power of XML and allied technologies. The Command and House Papers demo above is just wonderful. I’ve till date not yet seen such a wonderful application of XML technology from the UI and Reporting perspective. If I am not wrong, the ‘Papers’ section on the demo, on the event of on click demonstrates the entire XML document applied with XSL in a navigational layout. You seemed to have applied multiple style sheets using an XML pipeline. And when, I click on the PDF file, the same XML file is then converted to PDF using XSL-FO.
I am a newbie to XML and it’s allied technologies. I have above written the flow of the processing as I could understand it. Kindly correct me and provide with leads to develop such an application for domains that might come in the course of my work.
Re: My Perfect XML-Based Publishing Platform
Dynamic Delivery Services (DDS) by EMC, soon to be released free for developer use via the EMC XML Developer Community (http://developer.emc.com/xmltech), is pretty close to what you are describing. A platform for creating XML content delivery applications, DDS is build on top of EMC Documentum xDB (formerly known as X-Hive/DB) and uses technologies such as XQuery and XProc heavily.
The XProc engine used in DDS (also to be released as a separate download) integrates with xDB seamlessly. It’s not just the usual load/store/query type of integration - all standard XProc steps can operate directly on the content in the database, which makes it possible to process very large documents, make use of indexes, transaction control, etc.
DDS has a HTTP interface to its services, including XProc. The interfaces are not completely RESTful yet (What??? - Yes, I know…), but it will not stay that way for long.
Re: My Perfect XML-Based Publishing Platform
that’s a very nice list of useful features. it would be interesting to compare it to my recently posted list of the perfect REST toolbox at http://dret.typepad.com/dretblog/2009/05/rest-programming-toolbox-requirements.html . after all, an XML CMS is probably just a application built on top of a RESTful framework, and on the other hand, many RESTful applications use XML and manage content in one way ot the other (given a general definition of “content”, for example extending to physical objects). i thnk we are close to getting good platforms and toolboxes, and i am looking forward to working with one that really “gets the web”.
Re: My Perfect XML-Based Publishing Platform
Perhaps M/DB:X (http://www.mgateway.com/mdbx.html) could provide the basis of such a platform? It’s Open Source and could be extended to provide the facilties you’re wanting. The version just released is the “bare bones”, eg there’s no XQuery capability yet but I’d like to see that added by someone with the time and knowledge of XQuery. I’d love to have some collaborators to add the kind of features you describe
Re: My Perfect XML-Based Publishing Platform
Most of the processes and methods you mention we use in the CMS I developed. Unfortunately, as I am a designer by trade, our cms is built to resolve problems I faced - without knowledge of existing standards such as xproc, URI Templates etc.
It’s amazing to see that the process you outlined above mirrors very much the workflow we have used - we have obviously encountered the same problems from different ends - and it would seem come to a similar solution.
So while the idea you outline above closely mirrors our workflow - our ‘proprietary’ solution would need a lot more work to tick all the boxes above.
Re: My Perfect XML-Based Publishing Platform
I’ve been wishing for something very similar for at least ten years. :)
Re: My Perfect XML-Based Publishing Platform
We are presently building an easy to use web-based XML content portal using best-of-breed technologies. It is intended for non-technical users and requires minimal coding. We take care of all the setup.
Basically, an author or editor logs into the system. They select the content they want to work with from a tree that has organized the different types of content per the client’s needs. They are then presented with an interactive list showing all of the related documents and any number of “document-control” metadata (e.g. Last update, location, size), any number of “content-control” metadata (e.g. sortation overrides, topographic coding) and any number of content elements. This is all defined with the editor while loading the content. From this list, the user can also create and store reports on the content. Also in this view, the editor can decide to have 1 or more views into the XML. This can be a visual presentation, a restricted set of elements, or a WYSIWYG editing view where the user can choose to see tags or not. Again, all of this is easily customized to the client’s needs. Lastly, the user can choose to extract the content for full-scale pagination and output using our high-end typesetting/pagination system.
A quick outline of some of the features we have: 1) Customizable application and “XML” security (through views). 2) Transparent access to the underlying XML. Managing the underlying documents in the database can be done using a standard FTP client. 3) Word-processor like views of the data instead of a tags-only environment. 4) We handle all of the hosting issues, like backups, connectivity, software, and hardware. 5) Almost 100% client software free. 6) Very low cost compared to other CMS offerings. And quickly: • Dynamic embedded TOCs for long structured documents. • Embedded image capabilities. • Self-sorting sequences. • Conditional display. • Autolists. • CALS tables and XML-table editing. • CSS-aware. • RTF, PDF, HTML, and XSL-FO outputs. • 2-way linkable crossreferences and footnotes.
It doesn’t support XProc, but there is a full-scale programming API available for the underlying database.
Unfortunately, it is targetted at the publisher market so it is more of an Editorial Management system rather than a website publishing platform; although, I’m sure it could be turned into one since full-text and XML-aware indexing is also available.