I’m aware I’ve been quiet for the past few months. This isn’t because nothing interesting has been going on — rather the opposite. It’s been difficult to get a chance to sit down and write about the work I’ve been doing, when actually doing the work has been taking up so much time.
Most of my time has been spent on the new legislation.gov.uk website and its underlying API. There’s so much to say about this project that I hardly know where to start, so I’ll just try to do an overview and we can take it from there. Let me know what you’re interested in.
legislation.gov.uk is a government website built on the principles of transparency and open data, including ideas laid out in the Power of Information Taskforce Report. We have a lovely user interface which helps end-users find and understand legislation, but it’s layered over the top of an API that anyone is free to use to construct their own websites based on the same data.
In fact, we built the API first, and it’s been around (though not in a particularly stable state) for about a year. However, it turned out that building the user interface really helped in two ways. First, it helped the legislation experts who were looking at the documents to spot errors in a way that they unsurprisingly struggled to do when presented with raw XML. Second, it helped to identify things that the API needed to do to support a useful website, such as always providing links to the table of contents for an item of legislation or providing a search based on modification date.
Now, if you’ve been reading Sean McGrath’s blog you’ll know that as far as content goes, legislation is about as tough as you can get. For a start, Acts and Statutory Instruments are semi-structured documents, not tabular data. It’s not a simple matter of storing and extracting rows in a database: we need to be able to address portions of an item of legislation such as “Local Government Act 1988 (c. 9, SIF 81:2), Sch. 3 para. 13(1)(b)(2)” (this an actual citation! I am not making this up!).
The content itself is complex. For legislation.gov.uk, the main challenge is not to do with faithfully reconstructing page and line breaks (fortunately!) but how to represent complex, annotated, changes to legislation over time, and then how to present them. Much of this had already been done (in terms of technology) within the Statute Law and OPSI websites, although the data comes from a variety of sources over time, each with its own set of peculiarities to be navigated. The larger challenge here was to provide a mechanism of navigating through the content that made clear the distinctions between the various versions of legislation that people can look at and warning them about their status without overwhelming them with information.
We also have a lot of documents, some of which are very large. There are nearly 60,000 items of legislation on the site. The largest and most complex of them has hundreds of sections and about a hundred distinct versions. When you consider all the versions of all the possible fragments of all the items of legislation, you’re talking about 6.5 million distinct documents, each of which is available in HTML, XML, PDF and for which there is some RDF metadata.
On top of this, the content is constantly changing. New legislation is published every working day, first as PDFs, then as HTML (and XML), and then various associated documents the most important of which are Explanatory Notes, again first in PDF and then in HTML/XML form. Old legislation changes too; the legislation.gov.uk editorial team is constantly working through a backlog of changes to existing legislation brought about by new legislation. Simply hooking up the site to keep up to date with these changes has been an enormous challenge.
The content also changes because we intend to add features to the site over time. The site has already seen bug fixes and tweaks to address problems that we’ve encountered post-launch, and there are a number of new features in the pipeline to bring the site up to the level of completeness where it can fully replace the existing OPSI and Statute Law websites.
Then we needed something that was reasonably fast and robust in the face of moderately heavy traffic. Providing fast access to ever changing content, especially when the changes themselves are unpredictable, is an ongoing challenge.
All of this has only been possible by having an excellent team of experts and developers. One of the things that made this project quite different from the majority of government projects of this size was that it was much closer to Agile than Prince2: clients and providers working closely in the same team, chatting on daily calls, working side-by-side. From the developer perspective, it gave us direct access to the people who both had the expertise about the content and knew what they wanted. From the customer side, I hope and believe that it gave them as close involvement in the development of the site as they could want and a far deeper level of understanding about exactly how it works (and therefore what is easy and what is hard, and where compromises are best made) than they would have had otherwise.
So here are some credits. First, from TSO, where I work:
From Bunnyfoot:
And from The National Archives:
And finally, none of this would have happened without John Sheridan having the ambition and the vision for how legislation should be published on the web, creating the environment that enabled this project to be done, setting a positive tone and providing support, encouragement and a gently guiding hand throughout the process.
This isn’t everyone who has been involved in the project: there are system administrators and testers and beta users and a whole cloud of other support particularly from MarkLogic, Orbeon and Akamai. But these are the people who let it consume their lives for at least a while. Every one of them was vitally important to the project, bringing their own expertise and skills and personality. I admire them all hugely. No project of this size is completely plain sailing, and I am convinced that we would be in a very different position today if the project hadn’t been built on mutual respect and trust. I’ve sketched some of the challenges that we faced. If it all looks easy, it’s only because this group of people did their jobs incredibly well. This is my public thanks to them for all their work.
Comments
Re: legislation.gov.uk: Credit Where it's Due
Giving credit where it is due would be incomplete without crediting Jeni for the heroic amount of work that she put in as technical lead and main developer.