My Perfect XML-Based Publishing Platform

May 29, 2009

For the last several months, I’ve been working on a project at TSO for publishing UK legislation using a native XML database (eg eXist or MarkLogic Server) with some middleware (eg Orbeon or Cocoon). It’s a powerful and flexible approach that’s built on declarative languages like XQuery, XSLT, and XML pipelines; you can see it in action with the Command and House Papers demo.

But the killer platform isn’t quite here yet, partly because the specs aren’t quite done. Both Orbeon and Cocoon use XML pipelines, but they use different languages to define them; XProc is just around the corner. XML databases are all over the place in their conformance to XQuery, its optional features and the not-quite-finalised specs for free-text searching and updating.

People talk about how productive you can be using Ruby on Rails or Django, and they work great for publishing data you can store in a relational database. What we need is a similarly easy-to-use platform for document-oriented, XML-based content. This is my wish-list.

What You Can't Do with HTML5 Microdata

May 13, 2009

Update: Fixed a couple of errors in the microdata code.

The HTML5 microdata proposal has hit the web, just days before Google announced its support for RDFa (or at least one vocabulary encoded using RDFa attributes). These are, indeed, “interesting times” for the semantic web.

Now, if you’re one of those weirdos who want to embed RDF triples within your web pages, what you’re going to care about is whether you can use microdata to do it. Those of us who have been using RDFa in anger, rather than in toy examples, know that it can be hard to map a particular set of RDF statements onto HTML content. I thought I’d take a look to see just what it would be like to create particular RDF with the HTML5 microdata proposal.

Google's RDFa Support

May 13, 2009

I can’t reply to Henri Sivonen

@JeniT What’s wrong with http://rdf.data-vocabulary.org/rdf.xml ?

in 140 characters.

http://rdf.data-vocabulary.org/rdf.xml is the the RDF schema that describes the classes and properties recognised by Google’s rich snippets, which promises to provide richer information about search results than is available currently, in the manner of SearchMonkey.

So what’s so bad about this RDF schema?

Evolving Standards

May 10, 2009

I’ve been trying to finalise this post for a long time now, but today’s publication of an HTML5 draft that includes a new microdata section makes it all the more relevant. The long and short of it is that I am less and less concerned about the huge mess that is the HTML5 standardisation process. On the one hand, it’s a huge mess; on the other, it doesn’t matter.

Temporal Scope for RDF Triples

Feb 15, 2009

To me, the biggest deficiency in RDF is how hard it is to associate metadata with statements. I’ve talked before about the requirement in the genealogical application I’m toying with to provide metadata such as who made a statement, when, based on which source, the certainty in it and so on. But there’s one type of metadata that I think is required in practically every domain: the temporal scoping of statements.