Registration has just opened for this year’s XML Summer School, held in Oxford on 20-25th September. I’m teaching a couple of sessions and helping with a workshop on the “XSLT, XSL-FO and XQuery” track along with Bob DuCharme, Michael Kay and Priscilla Walmsley. It’s one of my favourite events, for three reasons:
I know a lot of beginners go to the XML Summer School for the introduction course, but to me the real value is for people who are actually using XML on a day to day basis and want to keep on top of the latest tools and technologies that will actually help them do their jobs. I learn something new every year.
Anyway, I wanted to blog about it because there’s a discount on registration up until 30th June. Grab ‘em while you can!
For the last several months, I’ve been working on a project at TSO for publishing UK legislation using a native XML database (eg eXist or MarkLogic Server) with some middleware (eg Orbeon or Cocoon). It’s a powerful and flexible approach that’s built on declarative languages like XQuery, XSLT, and XML pipelines; you can see it in action with the Command and House Papers demo.
But the killer platform isn’t quite here yet, partly because the specs aren’t quite done. Both Orbeon and Cocoon use XML pipelines, but they use different languages to define them; XProc is just around the corner. XML databases are all over the place in their conformance to XQuery, its optional features and the not-quite-finalised specs for free-text searching and updating.
People talk about how productive you can be using Ruby on Rails or Django, and they work great for publishing data you can store in a relational database. What we need is a similarly easy-to-use platform for document-oriented, XML-based content. This is my wish-list.
This is the talk I prepared for the UKGovWeb Barcamp, in blog form. It’s probably better this way. Most of what’s written here seems blindingly obvious to me, and probably to most readers of this blog, but maybe Google will direct someone here who finds it useful.
Working with public-sector information on the web, one of the things that I take an interest in is making government data freely available for anyone to re-present, mash-up, analyse and generally do whatever they want to do. This post is born out of a feeling that the people who control data don’t realise that the smallest changes can be beneficial: they don’t need to do everything right now, just something.
In my last post I talked about different techniques for representing overlap within XML. One technique is fragmentation. In the work that I’ve been doing, I’ve been using milestone-based formats similar to ECLIX, but my eyes were opened at the GODDAG workshop: fragmentation would make overlap so much easier to process in XSLT, especially when dealing with localised overlap such as revision or comment markup.
But how could fragmentation be used with full-on overlap? I had a little play and came up with some XSLT to demonstrate.
The Free Our Bills campaign was launched recently in the UK. Some of the comments I’ve seen about the campaign makes me think that it might be helpful if people understood more about how Bills and legislation get published in the UK. I thought I’d offer a bit of background based on my experience (though there are many people with more intimate knowledge of the processes involved; perhaps they’ll correct me when I get it wrong).
Another question to answer:
I’ve been reading about RDF, and I’m not sure in what situations it is more appropriate to use RDF over straight XML. I usually see RDF expressed as XML, but sometimes I see it written as language-independent functions (or methods).
Part of me is wondering if RDF is more appropriate for this project. What might the benefits be? And if it is, how difficult it would be to refactor it.
We just had photos taken of the children, and it’s put me in a reflective mood. Norm posted the other day about his experience with information/task management products:
Then it hit me.
None of them, with the notable exception of Tinderbox, seem to store the data in any open format. I was seriously considering one of these commercial black boxes for an important chunk of the data that drives my day-to-day life. The little voice in my head reacted viscerally when the observation was made: “What the hell you thinking, man! Stop that!”
A Ruby on Rails specialist friend and I are building a Web 2.0 application. I would say it’s “social networking for the dead” except that I doubt that description would be attractive to most people (my ex-Goth defacto being a rare exception), and it can be for the living too. It’s a bit like all those genalogy websites, except that our focus is on people’s social relationships as well as their familial ones.
(I should say that this is all very casual. We’re both fitting it in around our other responsibilities, and are mainly interested in working together, learning new things, and trying out all the best practices that everyone keeps talking about. So don’t think I’m becoming a dotcom entrepreneur or anything. Its got a very Web 2.0 name, and I’m only not telling you in case you start hitting our servers. We’re nowhere near ready for visitors.)
A lot of people run into problems with namespaces, and most of those arise from using default namespaces (ie not giving namespaces prefixes). The transformation technology you use can have a big effect on how confusing and irritating it gets.
Default namespaces make XML documents easier to read because they allow you to just give the local name of an element rather than using prefixes all over the place. For example, using:
<house status="For Sale" xmlns="http://www.example.com/ns/house">
<askingPrice>...</askingPrice>
<address>...</address>
<layout>...</layout>
</house>
I’ve finally finished my “Progress in Processing” talk for this year’s XML Summer School. It’s been really interesting looking at the different APIs developed for different programming languages in the last few years, all so much easier to use than the DOM. One of the themes is the use of path-based syntax to query XML.