<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.jenitennison.com/blog" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>psi</title>
 <link>http://www.jenitennison.com/blog/taxonomy/term/50</link>
 <description>The taxonomy view with a depth of 0.</description>
 <language>en</language>
<item>
 <title>Three challenges for alpha.gov.uk</title>
 <link>http://www.jenitennison.com/blog/node/155</link>
 <description>&lt;p&gt;The new &lt;a href=&quot;http://alpha.gov.uk/&quot;&gt;alpha.gov.uk&lt;/a&gt; website was launched recently, as a prototype for the &amp;#8220;single Government website&amp;#8221; described in Martha Lane Fox&amp;#8217;s report &lt;a href=&quot;http://download.cabinetoffice.gov.uk/digital/directgov-2010-and-beyond.pdf&quot;&gt;Directgov 2010 and Beyond: Revolution Not Evolution&lt;/a&gt;. Apparently &lt;a href=&quot;http://www.guardian.co.uk/government-computing-network/2011/may/11/cabinet-office-launches-alphagovuk&quot;&gt;the real deal could go live &amp;#8220;in about a year&amp;#8221;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The site is lovely, a far cry from the standard government fare. But this isn&amp;#8217;t exactly surprising: it&amp;#8217;s been developed using &lt;a href=&quot;http://twitter.com/jystewart/status/68343015407763456&quot;&gt;modern technologies&lt;/a&gt; by a &lt;a href=&quot;http://alpha.gov.uk/humans.txt&quot;&gt;top team&lt;/a&gt; with a set of &lt;a href=&quot;http://blog.alpha.gov.uk/blog/alpha-gov-uk-design-rules&quot;&gt;design rules&lt;/a&gt; far removed from those usually applied to government websites, a &lt;a href=&quot;http://twitter.com/alphagov/status/68225799282634752&quot;&gt;budget that&amp;#8217;s not exactly tight&lt;/a&gt; and using an Agile methodology. These factors mark it out from the majority (&lt;a href=&quot;http://blog.alpha.gov.uk/blog/shoulders-of-giants&quot;&gt;though not all&lt;/a&gt;) government websites. And this is part of the point, to illustrate the gap between what we have and what a revolution could bring.&lt;/p&gt;

&lt;p&gt;There are three challenges where I am and have been particularly interested to see the alpha.gov.uk approach. These are in balancing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;simplicity and complexity&lt;/li&gt;
&lt;li&gt;centralisation and distribution&lt;/li&gt;
&lt;li&gt;end-user and data re-user&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is not currently clear to me whether alpha.gov.uk has decided an approach on any of these &amp;#8212; whether the way the site works currently is the way that they have decided it should work &amp;#8212; or whether these are areas that are still up in the air at the moment. I&amp;#8217;m hoping it&amp;#8217;s the latter.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;&lt;em&gt;Disclaimer: What I write here is largely coloured by having worked on &lt;a href=&quot;http://www.legislation.gov.uk/&quot;&gt;legislation.gov.uk&lt;/a&gt; for the last few years, both in terms of the difficulties that we face and the question at the back of my mind of how legislation.gov.uk fits in this brave new world.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;Simplicity vs Complexity&lt;/h2&gt;

&lt;p&gt;alpha.gov.uk does a great job at providing the simple message that is needed by the majority of people. The first challenge, I think, lies in also addressing the complex situations that are faced by the minority. For example, alpha.gov.uk provides information about &lt;a href=&quot;http://alpha.gov.uk/does-my-child-need-car-seat/&quot;&gt;child car seats&lt;/a&gt; but what if my child is disabled? What if I want to check whether it&amp;#8217;s OK to go without in a taxi? Similarly, there&amp;#8217;s a tool for &lt;a href=&quot;http://alpha.gov.uk/calculate-holiday-pay&quot;&gt;calculating holiday entitlement&lt;/a&gt;, but what if I work part time and my days include Mondays; how do bank holidays fit in?&lt;/p&gt;

&lt;p&gt;Does a single Government website need to address all these complexities? I can see arguments both ways. On the one hand, handling minority cases can mean making things unnecessarily complex for everyone &amp;#8212; this is the trap that DirectGov falls into too much of the time. On the other hand, if people can&amp;#8217;t find this information from the single Government website, what other authoritative source is there? Can we even assume that non-authoritative third parties will fill the gaps? Will people trust the information if it isn&amp;#8217;t from a government source?&lt;/p&gt;

&lt;p&gt;My feeling is that, unlike conventional websites, a single Government website has an additional responsibility to provide complex content, and that this includes helping people find information as well as perform tasks. For example, if I&amp;#8217;m writing a holiday travel guide and want to include a page describing what to do when you&amp;#8217;ve lost your passport, including contact details for common countries, I want a description and a list, not a tool.&lt;/p&gt;

&lt;p&gt;One of the alpha.gov.uk design rules was &amp;#8220;optimise for the common case&amp;#8221;, and the emphasis on clear content and straightforward tools is admirable. But optimisation for a common situation shouldn&amp;#8217;t mean completely ignoring the uncommon one. The challenge &amp;#8212; which I think is primarily a design challenge and handled pretty nicely in the Guides &amp;#8212; is to make a website that contains complex information appear simple.&lt;/p&gt;

&lt;h2&gt;Centralisation vs Distribution&lt;/h2&gt;

&lt;p&gt;In her report, Martha Lane-Fox laid out a number of advantages to a single government website, including providing consistency in user experience, ensuring that there is no &amp;#8220;wrong part&amp;#8221; of the government web estate from which to start your search for government information, and saving money by building on shared platforms. These imply a high level of centralisation.&lt;/p&gt;

&lt;p&gt;Against that is the fact that there is no single owner of government content. Departments have the best understanding of their policies. HMRC collects taxes. DVLA licenses drivers. The Identity and Passport Service issues passports. The greater the distance between the website and the people who understand the area, the more likely the content and tools are to be simply wrong or out of date and ultimately worthless. To completely centralise all government websites runs counter to the way the web works and the advantages that it gives in distributing control over content. If Google is your front page, then what does it matter how many domains the content is distributed across?&lt;/p&gt;

&lt;p&gt;So this is what I see as the second challenge for a single government website. It is a question of governance and of technology. How does a single Government website enable the people that own information to keep control of it while providing a consistent user experience? &lt;/p&gt;

&lt;p&gt;While there are very likely large swathes of government content that could be perfectly well served by a Wordpress or Drupal installation with some custom templates, other content really couldn&amp;#8217;t fit into that kind of content management system (yes, I&amp;#8217;m thinking of legislation), and need custom solutions which are best owned and updated by those that understand the content. What does the &amp;#8220;single Government website&amp;#8221; mean for these sites? Do they continue running their own sites with a common set of styles and links? Do they only run a service with an API that is skinned centrally?&lt;/p&gt;

&lt;p&gt;In my opinion, alpha.gov.uk should be trying out the different ways in which different content could be brought together from distributed sources, learning lessons from experience at the BBC and beyond, to find approaches that work within government.&lt;/p&gt;

&lt;h2&gt;End User vs Data Reuser&lt;/h2&gt;

&lt;p&gt;The third challenge is particularly important for those of us who care about freeing government data. We don&amp;#8217;t want information locked within government&amp;#8217;s web pages, but available for us to re-present, mash together and build into our own products. The balancing act here is between the end-user who really doesn&amp;#8217;t care at all about the data underlying the pages they see, and the citizen developer who wants to reuse that information.&lt;/p&gt;

&lt;p&gt;There is some evidence of the idea of providing access to underlying data in some of the alpha.gov.uk Guides, such as the &lt;a href=&quot;http://alpha.gov.uk/guides-redundancy/&quot;&gt;An employee&amp;#8217;s guide to Redundancy&lt;/a&gt;, which include a &amp;#8220;Syndication API&amp;#8221; box with (not yet working) links to JSON, XML and Atom versions of the content. But there&amp;#8217;s no provision of a feed on the search results page, no RDFa or microdata markup, or separate data versions, on pages that contain data such as &lt;a href=&quot;http://alpha.gov.uk/how-much-minimum-wage/&quot;&gt;What is the minimum wage?&lt;/a&gt;, and the tools such as that used to &lt;a href=&quot;http://alpha.gov.uk/calculate-holiday-pay&quot;&gt;Calculate holiday pay&lt;/a&gt; are provided as client-side scripts rather than server-side APIs which could be called from other applications.&lt;/p&gt;

&lt;p&gt;As we&amp;#8217;ve found with legislation.gov.uk, building websites based on data and API access to that data is very possible, but doing so can change your approach to both site design and technology use. I&amp;#8217;d love to see alpha.gov.uk being built around making reuse easy rather than retrofitting it as a secondary concern.&lt;/p&gt;

&lt;h2&gt;Other Challenges&lt;/h2&gt;

&lt;p&gt;There are undoubtedly other challenging balancing acts for alpha.gov.uk.&lt;/p&gt;

&lt;p&gt;Steph talks about &lt;a href=&quot;http://www.helpfultechnology.com/helpful-blog/2011/05/10-things-alpha-gov-uk-gets-wrong-part-2/&quot;&gt;lack of community&lt;/a&gt;; I view this as something to balance against the need to provide authoritative set of information, which is hard to do when you let just anyone participate in a site.&lt;/p&gt;

&lt;p&gt;There&amp;#8217;s also the balance between the need for small-g government to provide a persistent and consistent website while big-g Government wants to create short-lived special-case sites to tout particular policies, such as &lt;a href=&quot;http://www.redtapechallenge.cabinetoffice.gov.uk/&quot;&gt;Red Tape Challenge&lt;/a&gt; or &lt;a href=&quot;http://yourfreedom.hmg.gov.uk/&quot;&gt;Your Freedom&lt;/a&gt;. How do these sites fit within the single Government website?&lt;/p&gt;

&lt;p&gt;Navigating a way through these challenges will require making hard decisions, reaching difficult compromises and finding imaginative third ways. I for one am very interested to see the directions alpha.gov.uk chooses to take.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/155#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/70">alphagovuk</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/52">opendata</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/50">psi</category>
 <pubDate>Fri, 20 May 2011 19:45:40 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">155 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Government Should Do its Own Data Homework</title>
 <link>http://www.jenitennison.com/blog/node/148</link>
 <description>&lt;p&gt;I&amp;#8217;ve been reflecting a little since &lt;a href=&quot;http://www.ukuug.org/events/opentech2010/&quot;&gt;OpenTech&lt;/a&gt; on the relationship between the developer community and government.&lt;/p&gt;

&lt;p&gt;Let me set out my perspective first. My goal is to help ensure that the public sector publishes reusable data in the long term.&lt;/p&gt;

&lt;p&gt;To do that, data publication needs to be sustainable. It needs to be embedded within the day-to-day activity of the public sector, something that seems as natural as the generation of PDF reports seems today. It also needs to be useful. It needs to be easy for anyone to understand and reuse the data, with minimal effort. It cannot be the case, long term, that you need to be an expert hacker to reuse government data.&lt;/p&gt;

&lt;p&gt;To get there, we need to work towards a virtuous cycle in which the public sector is rewarded for publishing useful data well. The reward may come from financial savings, from increasing data quality, from better delivery of its remit, or simply from kudos. It doesn&amp;#8217;t matter how, but there needs to be some reward, or it just won&amp;#8217;t happen.&lt;/p&gt;

&lt;p&gt;Over the last few years, government has had to be persuaded that it&amp;#8217;s a good idea to release their data at all. The message from the developer community has been &amp;#8220;give us your data and we&amp;#8217;ll show you what we can do with it!&amp;#8221; Through hack days and various similar activities, developers have excited, wowed and dazzled officials and politicians, opening their eyes to what could be done. Through sustained argument and political pressure, developers have set out the economic and moral case that releasing data not only &lt;em&gt;could&lt;/em&gt;, but &lt;em&gt;should&lt;/em&gt; happen.&lt;/p&gt;

&lt;p&gt;They have been incredibly successful. We have &lt;a href=&quot;http://data.gov.uk/&quot;&gt;data.gov.uk&lt;/a&gt;, &lt;a href=&quot;http://www.ordnancesurvey.co.uk/oswebsite/opendata/&quot;&gt;open data from Ordnance Survey&lt;/a&gt;, strong commitments to open data within the &lt;a href=&quot;http://programmeforgovernment.hmg.gov.uk/government-transparency/&quot;&gt;Coalition Agreement&lt;/a&gt;, and the &lt;a href=&quot;http://data.gov.uk/blog/new-public-sector-transparency-board-and-public-data-transparency-principles&quot;&gt;Public Sector Transparency Board&lt;/a&gt; who are now applying that pressure, with authority, at the heart of government.&lt;/p&gt;

&lt;p&gt;My perception is that the argument that government should open up its data has basically been won. The questions within the public sector are now about &lt;em&gt;how&lt;/em&gt;, not &lt;em&gt;whether&lt;/em&gt;. And as a result, in this changed environment, I&amp;#8217;m growing slightly uneasy about the core developer message of &amp;#8220;give us your data and we&amp;#8217;ll show you what we can do with it!&amp;#8221;&lt;/p&gt;

&lt;p&gt;There are two things about that message that concern me. First, it implies government is doing it all wrong. Second, it implies that government doesn&amp;#8217;t &lt;em&gt;need&lt;/em&gt; to do any better, because the developer community can take up all the slack and fill in all the gaps. It&amp;#8217;s like getting fed up with a child struggling with their homework, and saying &amp;#8220;oh, just give it here and I&amp;#8217;ll do it!&amp;#8221; It&amp;#8217;s a narrative that simultaneously undermines the best efforts of those within government and removes from them the motivation and opportunity to learn to do better.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;Of course there is a tricky balance here. We don&amp;#8217;t want to let up pressure on the government to release important information. We don&amp;#8217;t want government to feel that they have to get their data perfect before releasing it. And we can&amp;#8217;t always wait for government, which can be slow-moving as an organisation, to provide everything we need right now.&lt;/p&gt;

&lt;p&gt;However, there are certain things that only the owners of data &amp;#8212; those within the public sector &amp;#8212; can do. People who own data understand it so much better than third parties: what codes mean, what values are used to indicate missing data, what gets included and what gets left out, which columns aren&amp;#8217;t really used any more, which interpretations are safe and which are meaningless. Data owners can be trusted in a way that no one outside could be; when data publication becomes a sustainable part of their activity, they are much better placed to provide a steady, reliable, flow of data than a third-party API that could disappear or get out of date whenever the volunteer behind it moves on to something new.&lt;/p&gt;

&lt;p&gt;People in government must be given the responsibility to publish their data well. And there are three core ways in which I think developers could help them.&lt;/p&gt;

&lt;p&gt;First, while there are many more technically savvy people within government than is sometimes made out, the average civil servant lacks both know-how and tooling. I think developers could help a huge amount here. What about hack days where developers sit side by side with civil servants to help them clean and publish their data? What about engaging with the owners of a particular data set to help &lt;em&gt;them&lt;/em&gt; to publish it in a way that was reusable and sustainable? What about writing services, accessible through the locked-down IT systems that civil servants have to use, that enabled them to convert their data into multiple formats, and to link up the ways they refer to things with the way other people do?&lt;/p&gt;

&lt;p&gt;Second, while government needs to be responsible for publishing its data, it can&amp;#8217;t be responsible for building everything that end-users need based on that data. Developers have the facility to create applications that bring together data from diverse parts of the public sector, and combine it with data from outside. This has always been a feature of hack days, of course; all I&amp;#8217;m arguing for is a focus on applications that the public sector &lt;em&gt;shouldn&amp;#8217;t&lt;/em&gt; be doing itself.&lt;/p&gt;

&lt;p&gt;Third, we need to build the virtuous cycle that I talked about above. Government needs to hear about what works for developers, as well as what doesn&amp;#8217;t. What data releases have been helpful and why? Who are the stars? Who should be rewarded and emulated? We need ways of feeding back in a constructive way to public sector workers who are trying their best with the resources they have &amp;#8212; often extensive subject-matter expertise but little time, locked-down technology and contracting finances.&lt;/p&gt;

&lt;p&gt;The vitality and engagement of the developer community has played a massively important role in the open government data initiative within the UK, and I&amp;#8217;m sure it will continue to do so. We are incredibly lucky, here, to have a collection of talented and motivated developers who volunteer their time to work with government data. My hope is simply that the relationship between government and developers can grow into one that is more encouraging and supportive, that understands the constraints and concerns of those within government, and that provides practical help to overcome them.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/148#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/54">datagovuk</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/50">psi</category>
 <pubDate>Sun, 26 Sep 2010 21:41:30 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">148 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>On Standards</title>
 <link>http://www.jenitennison.com/blog/node/146</link>
 <description>&lt;p&gt;I&amp;#8217;m beginning to think that &amp;#8216;to recommend&amp;#8217; is an irregular verb like those that appeared every so often in &lt;a href=&quot;http://en.wikiquote.org/wiki/Yes,_Minister&quot;&gt;Yes, Minister&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Bernard:&lt;/strong&gt; It&amp;#8217;s one of those irregular verbs, isn&amp;#8217;t it: I have an independent mind; you are an eccentric; he is round the twist.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Something like: I recommend, you tell people what to do, he engages in premature standardisation.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;Our own recommendations are much more reasonable than those made by other people. &lt;em&gt;We&lt;/em&gt; understand the requirements, whereas &lt;em&gt;they&lt;/em&gt; haven&amp;#8217;t talked to anyone. &lt;em&gt;We&lt;/em&gt; are issuing them as guidance and are open to feedback, whereas &lt;em&gt;they&lt;/em&gt; are ramming them down people&amp;#8217;s throats.&lt;/p&gt;

&lt;p&gt;Of course without guidance, recommendations and standards of some description, it becomes near impossible to do anything useful. Take a look at the wide variety of &lt;a href=&quot;https://spreadsheets.google.com/ccc?key=0AhOqra7su40fdEgtaG4yVFZGVjdYREVIWmprX2dENkE&amp;amp;hl=en_GB&quot;&gt;information released by different councils to meet their commitment to publish spending data&lt;/a&gt;. Many use different formats but even amongst those that use Excel or CSV, the column names are different. Look closer and you see that they actually report at different levels of granularity as well. Some report each transaction, some each invoice item, some the ways these items are assigned to different cost centres. Some stick to the £500 limit, some report everything. Some include VAT in the amounts they quote, some don&amp;#8217;t. Some provide the dates of each transaction, some just the period that it occurred in. If you are clever and committed, &lt;a href=&quot;http://openlylocal.com/councils/spending&quot;&gt;you can find some wood in the trees&lt;/a&gt; but it&amp;#8217;s hard work.&lt;/p&gt;

&lt;p&gt;This variety is not due to pigheadedness or stupidity on the part of the councils. It&amp;#8217;s down to the very different technical and political constraints and approaches, and the fact that there was little guidance at all, &lt;a href=&quot;http://data.gov.uk/blog/local-spending-data-guidance&quot;&gt;up until this week&lt;/a&gt;, about what was expected of them.&lt;/p&gt;

&lt;p&gt;The point I&amp;#8217;m making is that people in different circumstances will naturally do things differently; common practice does not appear overnight by magic.&lt;/p&gt;

&lt;p&gt;Should councils have held off publishing their data until there was some kind of guidance in place? &lt;strong&gt;Absolutely 100% No!&lt;/strong&gt; It is far better to have the data in some form than to not have it at all, and it&amp;#8217;s only by making real data available that they and we get to start informed discussions about what kind of guidance is necessary.&lt;/p&gt;

&lt;p&gt;Should they be working towards publishing something better? &lt;strong&gt;Hell Yeah!&lt;/strong&gt; Data is not really open if the people who consume it have to put in hours or days of effort to understand it, map it, merge it, to be able to do something useful with it.&lt;/p&gt;

&lt;p&gt;What that &amp;#8216;something better&amp;#8217; looks like, I really don&amp;#8217;t know. My prediction is that councils will converge gradually, over time, into a handful of different approaches (rather than the basketful that we have now). Some will converge by choosing to use particular publishers for their data. Others will converge because they want to take advantage of particular tools for analysing or visualising the data that they produce, which will require certain formats. Still others will converge through an interest in &amp;#8220;doing what&amp;#8217;s right&amp;#8221;, based on guidance from groups and organisations that they trust.&lt;/p&gt;

&lt;p&gt;From chaos will come order, eventually. But this is a process that is led by politics &amp;#8212; negotiation, persuasion, socialisation and cultural change &amp;#8212; not by technology. It&amp;#8217;s only to be expected that there will be differences in approaches along the way, because we need to try, to learn, and we need for there to be choice, to evolve.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/146#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/54">datagovuk</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/52">opendata</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/50">psi</category>
 <pubDate>Sun, 19 Sep 2010 15:54:51 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">146 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>legislation.gov.uk: Credit Where it&#039;s Due</title>
 <link>http://www.jenitennison.com/blog/node/144</link>
 <description>&lt;p&gt;I&amp;#8217;m aware I&amp;#8217;ve been quiet for the past few months. This isn&amp;#8217;t because nothing interesting has been going on &amp;#8212; rather the opposite. It&amp;#8217;s been difficult to get a chance to sit down and write about the work I&amp;#8217;ve been doing, when actually doing the work has been taking up so much time.&lt;/p&gt;

&lt;p&gt;Most of my time has been spent on the new &lt;a href=&quot;http://www.legislation.gov.uk&quot;&gt;legislation.gov.uk&lt;/a&gt; website and its underlying API. There&amp;#8217;s so much to say about this project that I hardly know where to start, so I&amp;#8217;ll just try to do an overview and we can take it from there. Let me know what you&amp;#8217;re interested in.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;legislation.gov.uk is a government website built on the principles of transparency and open data, including ideas laid out in the &lt;a href=&quot;http://webarchive.nationalarchives.gov.uk/20100413152047/http://poit.cabinetoffice.gov.uk/poit/2009/02/modernising-information-publishing-final/&quot;&gt;Power of Information Taskforce Report&lt;/a&gt;. We have a lovely user interface which helps end-users find and understand legislation, but it&amp;#8217;s layered over the top of an API that &lt;a href=&quot;http://www.legislation.gov.uk/licence&quot;&gt;anyone is free to use&lt;/a&gt; to construct their own websites based on the same data.&lt;/p&gt;

&lt;p&gt;In fact, we built the API first, and it&amp;#8217;s been around (though not in a particularly stable state) for about a year. However, it turned out that building the user interface really helped in two ways. First, it helped the legislation experts who were looking at the documents to spot errors in a way that they unsurprisingly struggled to do when presented with raw XML. Second, it helped to identify things that the API needed to do to support a useful website, such as always providing links to the table of contents for an item of legislation or providing a search based on modification date.&lt;/p&gt;

&lt;p&gt;Now, if you&amp;#8217;ve been reading &lt;a href=&quot;http://seanmcgrath.blogspot.com/2010/05/kliss-first-things-first-what-is.html&quot;&gt;Sean McGrath&amp;#8217;s blog&lt;/a&gt; you&amp;#8217;ll know that as far as content goes, legislation is about as tough as you can get. For a start, Acts and Statutory Instruments are &lt;em&gt;semi-structured documents&lt;/em&gt;, not tabular data. It&amp;#8217;s not a simple matter of storing and extracting rows in a database: we need to be able to address portions of an item of legislation such as &amp;#8220;Local Government Act 1988 (c. 9, SIF 81:2), Sch. 3 para. 13(1)(b)(2)&amp;#8221; (this an &lt;a href=&quot;http://www.legislation.gov.uk/ukpga/1975/30/section/24/2000-09-08#commentary-c1075749&quot;&gt;actual citation&lt;/a&gt;! I am not making this up!).&lt;/p&gt;

&lt;p&gt;The content itself is complex. For legislation.gov.uk, the main challenge is not to do with faithfully reconstructing page and line breaks (fortunately!) but how to represent complex, annotated, changes to legislation over time, and then how to present them. Much of this had already been done (in terms of technology) within the &lt;a href=&quot;http://www.statutelaw.gov.uk&quot;&gt;Statute Law&lt;/a&gt; and &lt;a href=&quot;http://www.opsi.gov.uk/legislation&quot;&gt;OPSI&lt;/a&gt; websites, although the data comes from a variety of sources over time, each with its own set of peculiarities to be navigated. The larger challenge here was to provide a mechanism of navigating through the content that made clear the distinctions between the various versions of legislation that people can look at and warning them about their status without overwhelming them with information.&lt;/p&gt;

&lt;p&gt;We also have a lot of documents, some of which are very large. There are nearly 60,000 items of legislation on the site. The largest and most complex of them has hundreds of sections and about a hundred distinct versions. When you consider all the versions of all the possible fragments of all the items of legislation, you&amp;#8217;re talking about 6.5 million distinct documents, each of which is available in HTML, XML, PDF and for which there is some RDF metadata.&lt;/p&gt;

&lt;p&gt;On top of this, the content is constantly changing. New legislation is published every working day, first as PDFs, then as HTML (and XML), and then various associated documents the most important of which are Explanatory Notes, again first in PDF and then in HTML/XML form. Old legislation changes too; the legislation.gov.uk editorial team is constantly working through a backlog of changes to existing legislation brought about by new legislation. Simply hooking up the site to keep up to date with these changes has been an enormous challenge.&lt;/p&gt;

&lt;p&gt;The content also changes because we intend to add features to the site over time. The site has already seen bug fixes and tweaks to address problems that we&amp;#8217;ve encountered post-launch, and there are a number of new features in the pipeline to bring the site up to the level of completeness where it can fully replace the existing OPSI and Statute Law websites.&lt;/p&gt;

&lt;p&gt;Then we needed something that was reasonably fast and robust in the face of moderately heavy traffic. Providing fast access to ever changing content, especially when the changes themselves are unpredictable, is an ongoing challenge.&lt;/p&gt;

&lt;p&gt;All of this has only been possible by having an excellent team of experts and developers. One of the things that made this project quite different from the majority of government projects of this size was that it was much closer to Agile than Prince2: clients and providers working closely in the same team, chatting on daily calls, working side-by-side. From the developer perspective, it gave us direct access to the people who both had the expertise about the content and knew what they wanted. From the customer side, I hope and believe that it gave them as close involvement in the development of the site as they could want and a far deeper level of understanding about exactly how it works (and therefore what is easy and what is hard, and where compromises are best made) than they would have had otherwise.&lt;/p&gt;

&lt;p&gt;So here are some credits. First, from &lt;a href=&quot;http://www.tso.co.uk/&quot;&gt;TSO&lt;/a&gt;, where I work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&quot;http://twitter.com/careyfarrell&quot;&gt;Carey Farrell&lt;/a&gt;&lt;/strong&gt; ran the project, keeping track of the many and various bits and pieces that needed doing and finding the people to get them done; he has been the project&amp;#8217;s backbone&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&quot;http://twitter.com/pauldappleby&quot;&gt;Paul Appleby&lt;/a&gt;&lt;/strong&gt; may have moved on to better things part way through, but made his mark early on in its design and architecture, and much much earlier in the design of the XML schema, and in many of the stylesheets that underlie the HTML and PDF views of this data&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Lee Goodby&lt;/strong&gt; put together the system infrastructure, arranging more machines and memory and disk space to satisfy our endless demands&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Chunyu Cong&lt;/strong&gt; performed many a thankless data wrangling task without complaint&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Griff Chamberlain&lt;/strong&gt; has worked doggedly on this project (with occasional pauses for beer) since he came aboard, among many &lt;em&gt;many&lt;/em&gt; other things working on the generation of PDF (via XSL-FO) from the XML source and dealing with the difficulties of usable next/previous navigation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&quot;http://www.menteithconsulting.com/wiki/People/TonyGraham&quot;&gt;Tony Graham&lt;/a&gt;&lt;/strong&gt; made the publication of Tables of Effects his own, as well as constantly improving our build processes&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Gavin Mannings&lt;/strong&gt; achieved quite remarkable things with a combination of HTML, CSS and Javascript. If you don&amp;#8217;t believe me, take a look at the source underlying &lt;a href=&quot;http://www.legislation.gov.uk/ukpga&quot;&gt;the histograms on the browse pages&lt;/a&gt; or &lt;a href=&quot;http://www.legislation.gov.uk/ukpga/1985/67/section/6?timeline=true&quot;&gt;the timelines for legislation content&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Faiz Muhammad&lt;/strong&gt; quickly got to grips with a whole set of complex and unfamiliar content and technologies, to create the UI from the API data&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Paul Harvey&lt;/strong&gt; furnished us with data, warned us of bear pits, and remained astonishingly uncomplaining of the changes we were putting him through&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Marc Sturman&lt;/strong&gt; brought all his expertise to bear in managing the publication of legislation from the SLD editorial system into the new website and pulled our fat from the fire both on deployment and in the creation of the larger PDFs&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Vinod Sathyamoorthy&lt;/strong&gt; worked on all aspects of the infrastructure: scaling out the environment, testing it, configuring it and so on, to make it into a site that more than a few people could access at a time&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&quot;http://twitter.com/RobBullen&quot;&gt;Rob Bullen&lt;/a&gt;&lt;/strong&gt; brought a little more order to a kind of controlled chaos, in the way a good project manager should&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Terry Blake&lt;/strong&gt; had the clout and the clear-sighted vision to get things done, as well as (and perhaps secretly enjoying) getting his hands dirty on occasion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From &lt;a href=&quot;http://www.bunnyfoot.com/&quot;&gt;Bunnyfoot&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Mark Pierce&lt;/strong&gt; designed the look and feel of the site, having to get to grips with the complexities of legislative content as well as treading the fine line between making the site look modern yet authoritative, appealing whilst not detracting from the content&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rebecca Gill&lt;/strong&gt; provided clear eyes, analysis and insight to help us understand how to improve the site for our users&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And from &lt;a href=&quot;http://www.nationalarchives.gov.uk/&quot;&gt;The National Archives&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&quot;http://twitter.com/crallison&quot;&gt;Clare Allison&lt;/a&gt;&lt;/strong&gt; has devoted her life to ensuring that the content on the site is as accurate and meaningful as it can be, working with the astonishing complexities of the legislation content with an amazing depth of knowledge and expertise&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&quot;http://twitter.com/clairelait&quot;&gt;Claire Lait&lt;/a&gt;&lt;/strong&gt; has poured her soul into providing a meaningful and useful experience for the end users of the site with insight, intelligence and unparalleled openness and enthusiasm&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Catherine Tabone&lt;/strong&gt; has dealt with the traumas of the ups and downs of deployment with fortitude and good humour&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And finally, none of this would have happened without &lt;strong&gt;&lt;a href=&quot;http://twitter.com/johnlsheridan&quot;&gt;John Sheridan&lt;/a&gt;&lt;/strong&gt; having the ambition and the vision for how legislation should be published on the web, creating the environment that enabled this project to be done, setting a positive tone and providing support, encouragement and a gently guiding hand throughout the process.&lt;/p&gt;

&lt;p&gt;This isn&amp;#8217;t everyone who has been involved in the project: there are system administrators and testers and beta users and a whole cloud of other support particularly from &lt;a href=&quot;http://www.marklogic.com/&quot;&gt;MarkLogic&lt;/a&gt;, &lt;a href=&quot;http://orbeon.com/&quot;&gt;Orbeon&lt;/a&gt; and &lt;a href=&quot;http://www.akamai.com/&quot;&gt;Akamai&lt;/a&gt;. But these are the people who let it consume their lives for at least a while. Every one of them was vitally important to the project, bringing their own expertise and skills and personality. I admire them all hugely. 
No project of this size is completely plain sailing, and I am convinced that we would be in a very different position today if the project hadn&amp;#8217;t been built on mutual respect and trust. I&amp;#8217;ve sketched some of the challenges that we faced. If it all looks easy, it&amp;#8217;s only because this group of people did their jobs incredibly well. This is my public thanks to them for all their work.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/144#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/37">legislation</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/52">opendata</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/50">psi</category>
 <pubDate>Sat, 14 Aug 2010 11:18:38 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">144 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Expressing Statistics with RDF</title>
 <link>http://www.jenitennison.com/blog/node/132</link>
 <description>&lt;p&gt;&lt;em&gt;&lt;strong&gt;Update:&lt;/strong&gt; If you&amp;#8217;re interested in expressing statistics in RDF, I&amp;#8217;d encourage you to join the &lt;a href=&quot;http://groups.google.com/group/publishing-statistical-data&quot;&gt;publishing statistical data&lt;/a&gt; group and take a look at the documentation for &amp;#8216;SDMX-RDF&amp;#8217; described there.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;One of the things that we&amp;#8217;ve been discussing over on the &lt;a href=&quot;http://groups.google.com/group/uk-government-data-developers&quot;&gt;UK Government Data Developers mailing list&lt;/a&gt; is how best to represent the vast quantities of statistical data that the government produces, in RDF. This is what we&amp;#8217;ve come up with.&lt;/p&gt;

&lt;!--break--&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;We&amp;#8217;ll use &lt;a href=&quot;http://sw.joanneum.at/scovo/schema.html&quot;&gt;SCOVO&lt;/a&gt; as our main vocabulary.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dimensions (the things a statistic are about) should be instances of specialised classes such as &amp;#8216;Hospital&amp;#8217; or &amp;#8216;School&amp;#8217;; these will often be &lt;a href=&quot;http://www.w3.org/TR/skos-primer/&quot;&gt;SKOS&lt;/a&gt; concepts. We will try to reuse these as much as possible across datasets (see below).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We will create subproperties of &lt;code&gt;scv:dimension&lt;/code&gt; that have appropriate names and different subclasses of &lt;code&gt;scv:Dimension&lt;/code&gt;s as ranges. We will try to reuse these as much as possible across datasets (see below).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;scv:Item&lt;/code&gt;s we use (representing individual statistics) should not be blank nodes (because giving them URIs allows us to attach other information to them); they will each have a &lt;code&gt;scv:dataset&lt;/code&gt; property that points to the &lt;code&gt;scv:Dataset&lt;/code&gt; they belong to (which will probably also be a &lt;code&gt;void:Dataset&lt;/code&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Every &lt;code&gt;scv:Item&lt;/code&gt; will also be the object of at least one triple that involves one of its dimensions; this will usually be the real-world thing that the statistic is associated with (eg the school or hospital).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Most statistics are provided for a particular time period; for these, we will define relationships from &lt;a href=&quot;http://www.w3.org/TR/owl-time/&quot;&gt;OWL-Time&lt;/a&gt; to &lt;a href=&quot;http://www.placetime.com/&quot;&gt;placetime.com&lt;/a&gt; resources, but will also use appropriately datatyped literals where possible to make querying easier.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here&amp;#8217;s an example of what this looks like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@prefix rdf: &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; .
@prefix scv: &amp;lt;http://purl.org/NET/scovo#&amp;gt; .
@prefix skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt; .
@prefix dct: &amp;lt;http://purl.org/dc/terms/&amp;gt; .
@prefix void: &amp;lt;http://rdfs.org/ns/void#&amp;gt; .
@prefix time: &amp;lt;http://www.w3.org/2006/time#&amp;gt; .
@prefix sdmx: &amp;lt;http://proxy.data.gov.uk/sdmx.org/def/sdmx/&amp;gt; .
@prefix pop: &amp;lt;http://statistics.data.gov.uk/def/population/&amp;gt; .
@prefix year: &amp;lt;http://statistics.data.gov.uk/def/census-year/&amp;gt; .

# The statistics themselves

&amp;lt;http://statistics.data.gov.uk/id/local-authority-district/00HE&amp;gt;
  rdfs:label &quot;Cornwall&quot; ;
  pop:totalPopulation &amp;lt;http://statistics.data.gov.uk/id/local-authority-district/00HE/population/total/year/2001&amp;gt; ;
  pop:ruralPopulation &amp;lt;http://statistics.data.gov.uk/id/local-authority-district/00HE/population/rural/year/2001&amp;gt; ;
  ... .

&amp;lt;http://statistics.data.gov.uk/id/local-authority-district/00HE/population/total/year/2001&amp;gt;
  a scv:Item ;
  rdf:value &quot;499399&quot;^^xsd:integer ;
  scv:dataset &amp;lt;http://statistics.data.gov.uk/doc/local-authority-district/*/population&amp;gt; ;
  sdmx:refArea &amp;lt;http://statistics.data.gov.uk/id/local-authority-district/00HE&amp;gt; ;
  pop:populationType pop:total ;
  sdmx:timePeriod &amp;lt;http://statistics.data.gov.uk/def/census-year/2001&amp;gt; .

&amp;lt;http://statistics.data.gov.uk/id/local-authority-district/00HE/population/rural/year/2001&amp;gt;
  a scv:Item ;
  rdf:value &quot;127904&quot;^^xsd:integer ;
  scv:dataset &amp;lt;http://statistics.data.gov.uk/doc/local-authority-district/*/population&amp;gt; ;
  sdmx:refArea &amp;lt;http://statistics.data.gov.uk/id/local-authority-district/00HE&amp;gt; ;
  pop:populationType pop:rural ;
  sdmx:timePeriod &amp;lt;http://statistics.data.gov.uk/def/census-year/2001&amp;gt; .

...

# Datasets

&amp;lt;http://statistics.data.gov.uk/doc/local-authority-district/*/population/*/year/2001&amp;gt;
  a scv:Dataset ;
  a void:Dataset ;
  dct:title &quot;Populations of Local Authority Districts&quot; ;
  ... .

# Common definitions for the dataset

pop:totalPopulation a rdf:Property ;
  rdfs:label &quot;total population&quot; ;
  rdfs:range scv:Item .
pop:ruralPopulation a rdf:Property ;
  rdfs:label &quot;rural population&quot; ;
  rdfs:range scv:Item .
...

pop:populationType rdfs:subPropertyOf scv:dimension ;
  rdfs:label &quot;population type&quot; ;
  rdfs:domain scv:Item ;
  rdfs:range pop:Population .

pop:Population a rdfs:Class ;
  rdfs:subClassOf skos:Concept ;
  rdfs:subClassOf scv:Dimension ;
  rdfs:label &quot;population type&quot; .

pop:populationScheme a skos:ConceptScheme ;
  skos:prefLabel &quot;Population Types&quot; ;
  pop:hasTopConcept pop:total .

pop:total a pop:Population ;
  skos:prefLabel &quot;total population&quot; ;
  skos:topConceptOf pop:populationScheme ;
  skos:narrower pop:rural ;
  ... .

pop:rural a pop:Population ;
  skos:prefLabel &quot;rural population&quot; ;
  skos:inScheme pop:populationScheme ;
  skos:broader pop:total ;
  ... .

year:Year a rdfs:Class ;
  rdfs:subClassOf time:Interval ;
  rdfs:subClassOf scv:Dimension .

&amp;lt;http://statistics.data.gov.uk/def/census-year/2001&amp;gt;
  rdfs:label &quot;mid-2001&quot; ;
  time:intervalDuring &amp;lt;http://www.placetime.com/interval/gregorian/2001-01-01T00:00:00Z/P1Y&amp;gt; .

&amp;lt;http://www.placetime.com/interval/gregorian/2001-01-01T00:00:00Z/P1Y&amp;gt;
  rdfs:label &quot;2001&quot; ;
  rdf:value &quot;2001&quot;^^xsd:gYear .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;One source of sub-properties of &lt;code&gt;scv:dimension&lt;/code&gt; (and subtypes of &lt;code&gt;scv:Dimension&lt;/code&gt;) is &lt;a href=&quot;http://sdmx.org/&quot;&gt;SDMX&lt;/a&gt; (Statistical Data and Metadata eXchange). This provides standard ways of indicating things like the area and time that a statistic applies to. I&amp;#8217;ve made an &lt;a href=&quot;/blog/files/sdmx.ttl&quot;&gt;initial mapping into some RDFS properties&lt;/a&gt; and &lt;a href=&quot;/blog/files/codelists.ttl&quot;&gt;SKOS schemes&lt;/a&gt; as an indication of the kind of thing that would work here, but expect it to change.&lt;/p&gt;

&lt;p&gt;We&amp;#8217;re currently working on providing identifiers for the areas that statistics are likely to be about (such as local authority districts, MSOAs or wards). They are of the form:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://statistics.data.gov.uk/id/{area-type}/{ONS-area-code}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and they tie into the &lt;a href=&quot;http://data.ordnancesurvey.co.uk/&quot;&gt;newly released OS data&lt;/a&gt;. I hope we&amp;#8217;ll have them available as Linked Data soon.&lt;/p&gt;

&lt;p&gt;One issue that hasn&amp;#8217;t been resolved is how to handle the huge amount of repetition that is inherent in this method of representing statistical data. For example, in the data above, all the &lt;code&gt;scv:DataItem&lt;/code&gt;s in the &lt;code&gt;scv:Dataset&lt;/code&gt; &lt;code&gt;http://statistics.data.gov.uk/doc/local-authority-district/*/population/*/year/2001&lt;/code&gt; are from 2001. Rather than indicating the year of each individual &lt;code&gt;scv:DataItem&lt;/code&gt;, it would be nice if we could have a property on the dataset that indicated that &lt;em&gt;all&lt;/em&gt; the items in that dataset had the same value for a particular dimension. If this were called &lt;code&gt;scv:itemDimension&lt;/code&gt;, for example, then we could do:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://statistics.data.gov.uk/doc/local-authority-district/*/population/*/year/2001&amp;gt;
  a scv:Dataset ;
  a void:Dataset ;
  dct:title &quot;Populations of Local Authority Districts&quot; ;
  sdmx:itemTimePeriod &amp;lt;http://statistics.data.gov.uk/def/census-year/2001&amp;gt; ;
  ... .

sdmx:itemTimePeriod rdfs:subPropertyOf scv:itemDimension ;
  rdfs:label &quot;time period of items in the dataset&quot; ;
  rdfs:domain scv:Dataset .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and the individual &lt;code&gt;scv:Item&lt;/code&gt;s would not have to have any &lt;code&gt;sdmx:timePeriod&lt;/code&gt; properties explicitly. Perhaps this is something that the people beind SCOVO might consider, or we might create the property ourselves.&lt;/p&gt;

&lt;p&gt;As far as I know, this pattern for representing statistics has yet to be used &amp;#8220;in anger&amp;#8221;, but I hope that we&amp;#8217;ll have some illustrations soon which will help us assess whether it&amp;#8217;s viable. Any comments and suggestions would, of course, be very welcome!&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/132#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/50">psi</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/31">rdf</category>
 <enclosure url="http://www.jenitennison.com/blog/files/codelists.ttl" length="19853" type="application/octet-stream" />
 <pubDate>Fri, 23 Oct 2009 22:07:50 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">132 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>hmg.gov.uk/data and What We Can Do</title>
 <link>http://www.jenitennison.com/blog/node/131</link>
 <description>&lt;p&gt;This week, the Cabinet Office &lt;a href=&quot;http://blogs.cabinetoffice.gov.uk/digitalengagement/&quot;&gt;went live&lt;/a&gt; with a preview version of &lt;a href=&quot;http://data.hmg.gov.uk&quot;&gt;hmg.gov.uk/data&lt;/a&gt;, available only to those who subscribe to the &lt;a href=&quot;http://groups.google.com/group/uk-government-data-developers&quot;&gt;UK Government Data Developers Google Group&lt;/a&gt;. &lt;a href=&quot;http://harrymetcalfe.com/&quot;&gt;Harry Metcalfe&lt;/a&gt; has written a &lt;a href=&quot;http://thedextrousweb.com/2009/10/the-wraps-come-off-data-gov-uk/&quot;&gt;great review&lt;/a&gt;, or of course you can check it out yourselves.&lt;/p&gt;

&lt;p&gt;Already, though, there are discussions starting on the mailing list about &lt;a href=&quot;http://groups.google.com/group/uk-government-data-developers/browse_thread/thread/73f1f4e8a8c2d6bb&quot;&gt;how the data is being made available&lt;/a&gt;, and I&amp;#8217;m worried that these might distract us from getting things done.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;There are a range of ways in which data can be made available on the web:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;embedded in PDFs and Word documents (ie proprietary formats that make it nearly impossible to get hold of the data itself)&lt;/li&gt;
&lt;li&gt;through a Web Services / RPC style interface whereby a custom API is available for each data set&lt;/li&gt;
&lt;li&gt;as downloadable (possibly compressed) files in a machine-readable format such as CSV or XML&lt;/li&gt;
&lt;li&gt;through a RESTful API whereby there is a URI for each resource and GETting that URI provides information about the resource, and there are also URIs for lists of resources&lt;/li&gt;
&lt;li&gt;through a search interface, such as a SQL or SPARQL interface onto a database or triplestore, which may enable aggregations over multiple resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of us Open Data advocates agree that we need to encourage government away from the first two methods of making data &amp;#8220;available&amp;#8221; and towards the last three. But there&amp;#8217;s a vast array of opinion within the developer community about which of the latter three are most useful, and precisely which technologies to use in each of those categories.&lt;/p&gt;

&lt;p&gt;What&amp;#8217;s the real answer? &amp;#8220;All of them!&amp;#8221;&lt;/p&gt;

&lt;p&gt;We need the raw data because it enables us to double-check all the other interfaces which are provided to it. We need RESTful APIs. We need them to serve RDF and XML and JSON and CSV and all the other formats that people ask for. We need the data to be made available in SQL databases and NoSQL databases and triplestores; we need access to SQL queries and SPARQL queries and map/reduce processing. All of that, all of them, and more.&lt;/p&gt;

&lt;p&gt;This is not a &lt;a href=&quot;http://en.wikipedia.org/wiki/Zero-sum&quot;&gt;zero-sum game&lt;/a&gt;. Just because someone makes &lt;a href=&quot;http://www.edubase.gov.uk/&quot;&gt;edubase&lt;/a&gt; available on the &lt;a href=&quot;http://www.talis.com/platform&quot;&gt;Talis platform&lt;/a&gt; through a SPARQL interface does not prevent someone else making it available on &lt;a href=&quot;http://aws.amazon.com/s3/&quot;&gt;Amazon S3&lt;/a&gt;. The more methods of access there are, the more widely available and therefore useful the data is. The more things we try, the more lessons we learn, the better we get.&lt;/p&gt;

&lt;p&gt;One thing is certain, though: the government cannot do all of this itself. They simply don&amp;#8217;t have the resources or expertise. If we think something&amp;#8217;s important, we can help by doing it. And we can help them, and each other, by sharing both the results of our work (so that others can build on it) and how we got them (so that others can follow the same patterns for other datasets). That, as far as I&amp;#8217;m concerned, is what hmg.gov.uk/data is for.&lt;/p&gt;

&lt;p&gt;Whatever our technology preferences, we can help each other by sharing our results whenever we:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;clean up the data that the government have supplied&lt;/li&gt;
&lt;li&gt;analyse the data (which is often in de-normalised formats)&lt;/li&gt;
&lt;li&gt;find areas of commonality between different data sets&lt;/li&gt;
&lt;li&gt;transform the data into other formats&lt;/li&gt;
&lt;li&gt;build other types of APIs on top of the ones others have constructed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The hmg.gov.uk/data has a certain bias towards Linked Data, it&amp;#8217;s true, and this should come as no surprise &lt;a href=&quot;http://blogs.cabinetoffice.gov.uk/digitalengagement/post/2009/06/09/Data-So-what-happens-now.aspx&quot;&gt;given its advisors&lt;/a&gt;. But whichever side of that particular argument we&amp;#8217;re on, we&amp;#8217;re shooting ourselves in the feet if we assert that this is an exclusive choice.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/131#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/46">linked data</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/52">opendata</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/50">psi</category>
 <pubDate>Sat, 03 Oct 2009 14:00:24 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">131 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>The Real Deal: data.gov.uk</title>
 <link>http://www.jenitennison.com/blog/node/115</link>
 <description>&lt;p&gt;I&amp;#8217;m sure that you&amp;#8217;ve noticed that my recent posts have been somewhat obsessed with publishing and using public sector information. It&amp;#8217;s because I&amp;#8217;ve somehow been sucked into the work going on within the UK government, &lt;a href=&quot;http://blogs.cabinetoffice.gov.uk/digitalengagement/post/2009/06/09/Data-So-what-happens-now.aspx&quot;&gt;with Tim Berners-Lee and Nigel Shadbolt advising&lt;/a&gt;, to publish its data as linked data.&lt;/p&gt;

&lt;p&gt;My &lt;a href=&quot;http://www.jenitennison.com/blog/node/109&quot;&gt;recent&lt;/a&gt; &lt;a href=&quot;http://www.jenitennison.com/blog/node/110&quot;&gt;blog&lt;/a&gt; &lt;a href=&quot;http://www.jenitennison.com/blog/node/111&quot;&gt;posts&lt;/a&gt; about publishing data using &lt;a href=&quot;http://www.talis.com/platform/&quot;&gt;Talis&lt;/a&gt; have actually been a front for much more complex work that I&amp;#8217;ve been doing with a different data set.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;As an early demonstration of how existing government data sets might be turned into linked data, a few weeks ago I was given a CSV file containing road traffic counts; the raw data that lies behind the &lt;a href=&quot;http://www.dft.gov.uk/matrix/&quot;&gt;traffic flow information&lt;/a&gt; available on the Department for Transport website. The data is really interesting and ripe for visualisations and analysis. For each hour of particular days each year, at particular points on many roads within the UK, the Department for Transport measures the number of bicycles, motorbikes, cars, vans, buses and HGVs of various types that roll past in each direction. The data contains information about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the count of each of the various classes of traffic that pass the point in a particular direction on a particular hour of a particular day&lt;/li&gt;
&lt;li&gt;the points at which these measurements were taken&lt;/li&gt;
&lt;li&gt;the roads on which the points are situated&lt;/li&gt;
&lt;li&gt;the areas in which the points are situated&lt;/li&gt;
&lt;li&gt;the local authority that is in charge of these areas&lt;/li&gt;
&lt;li&gt;the region that the area is in&lt;/li&gt;
&lt;li&gt;the country that the region is in &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The challenge was to turn the 386Mb CSV file into linked data. The result is up and available for you to look at; a good starting point is &lt;a href=&quot;http://geo.data.gov.uk/0/country&quot;&gt;http://geo.data.gov.uk/0/country&lt;/a&gt;. Just follow the links from there.&lt;/p&gt;

&lt;p&gt;With a few false starts and mis-steps, this is the process that I went through:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Tidied the CSV file so that it could be processed using awk. That meant replacing the commas that were delimiters with &lt;code&gt;|&lt;/code&gt;s. It also meant removing a couple of weird ^M characters that had snuck into the file.&lt;/li&gt;
&lt;li&gt;Examined the data and came up with an informal ontology and prototype URI scheme.&lt;/li&gt;
&lt;li&gt;Created a bunch of awk scripts to extract different data from the files and create RDF/XML from it.&lt;/li&gt;
&lt;li&gt;Ran the scripts to create RDF/XML.&lt;/li&gt;
&lt;li&gt;Uploaded the data into a Talis store.&lt;/li&gt;
&lt;li&gt;Created appropriate PHP for the data and put it into a proxy server.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Some of this has been covered by my recent posts, so I&amp;#8217;m just going to talk about a few of these steps in a bit more detail.&lt;/p&gt;

&lt;p&gt;First, the URIs. Frankly, they&amp;#8217;re an experiment to see how it plays. The templates are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;countries: &lt;code&gt;http://geo.data.gov.uk/0/id/country/{name}&lt;/code&gt;, eg &lt;a href=&quot;http://geo.data.gov.uk/0/id/country/england&quot;&gt;http://geo.data.gov.uk/0/id/country/england&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;regions: &lt;code&gt;http://geo.data.gov.uk/0/id/region/{name}&lt;/code&gt;, eg &lt;a href=&quot;http://geo.data.gov.uk/0/id/region/north-west&quot;&gt;http://geo.data.gov.uk/0/id/region/north-west&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;areas: &lt;code&gt;http://geo.data.gov.uk/0/id/area/{ONS code}&lt;/code&gt;, eg &lt;a href=&quot;http://geo.data.gov.uk/0/id/area/00KA&quot;&gt;http://geo.data.gov.uk/0/id/area/00KA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;local authorities: &lt;code&gt;http://local-government.data.gov.uk/0/id/local-authority/{ONS code for area}&lt;/code&gt;, eg &lt;a href=&quot;http://local-government.data.gov.uk/0/id/local-authority/00KA&quot;&gt;http://local-government.data.gov.uk/0/id/local-authority/00KA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;roads: &lt;code&gt;http://transport.data.gov.uk/0/id/road/{name}&lt;/code&gt; or &lt;code&gt;http://transport.data.gov.uk/0/id/road/U-{random number}&lt;/code&gt;, eg &lt;a href=&quot;http://transport.data.gov.uk/0/id/road/M5&quot;&gt;http://transport.data.gov.uk/0/id/road/M5&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;traffic count points: &lt;code&gt;http://transport.data.gov.uk/0/id/traffic-count-point/{number}&lt;/code&gt;, eg &lt;a href=&quot;http://transport.data.gov.uk/0/id/traffic-count-point/36195&quot;&gt;http://transport.data.gov.uk/0/id/traffic-count-point/36195&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;traffic counts: &lt;code&gt;http://transport.data.gov.uk/0/id/traffic-count/{point number}/{direction}/{date}/{hour}/{traffic type}&lt;/code&gt;, eg &lt;a href=&quot;http://transport.data.gov.uk/0/id/traffic-count/4/N/2008-06-05/08:00:00/HGVr2&quot;&gt;http://transport.data.gov.uk/0/id/traffic-count/4/N/2008-06-05/08:00:00/HGVr2&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The subdomains are one way of subdividing the vast set of public sector information into vague categories that might be handled by different departments, without using the (highly changeable) department names in the URI. The &lt;code&gt;/0&lt;/code&gt; portion of each URI is a version number: these URIs are experimental and liable to be unsupported in the future so they&amp;#8217;re marked with a version 0. The &lt;code&gt;/id&lt;/code&gt; portion of each URI indicates that these are URIs for non-information resources; the response is a &lt;code&gt;303 See Other&lt;/code&gt; redirect to the same URIs but without the &lt;code&gt;/id&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;After the &lt;code&gt;/id&lt;/code&gt;, the URIs follow a common pattern of naming a class of resource, followed by an appropriate identifier for that resource. The identifiers themselves are designed to be unique, &lt;a href=&quot;http://www.jenitennison.com/blog/node/112&quot;&gt;unlikely to change&lt;/a&gt;, and &lt;a href=&quot;http://www.jenitennison.com/blog/node/114&quot;&gt;human readable&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The ontologies, well, actually they don&amp;#8217;t exist as yet except in my head. It&amp;#8217;s been more important to make the data available than to provide ontologies for it. Triplestores and SPARQL queries work without ontologies; indeed you have to go out of your way to find applications that actually reason with them. Like schemas for XML documents, they&amp;#8217;re not absolutely essential, but useful for documentation purposes and &lt;em&gt;potentially&lt;/em&gt; useful for applications.&lt;/p&gt;

&lt;p&gt;There are, though, a couple of &lt;a href=&quot;http://www.w3.org/2004/02/skos/&quot;&gt;SKOS&lt;/a&gt; schemes for categorising roads and vehicle types. These are available via:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;http://transport.data.gov.uk/0/category/road&lt;/li&gt;
&lt;li&gt;http://transport.data.gov.uk/0/category/vehicle&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They were informed by the &lt;a href=&quot;http://www.cbrd.co.uk/roadsfaq/&quot;&gt;British Roads FAQ&lt;/a&gt; and the &lt;a href=&quot;http://www.dft.gov.uk/matrix/forms/definitions.aspx&quot;&gt;data definitions from the Department for Transport&lt;/a&gt;. I heartily recommend a read; it&amp;#8217;s scintillating stuff!&lt;/p&gt;

&lt;p&gt;Anyway, with this size of file, and the kind of processing that needed to be done with it, the simple XSLT that I talked about &lt;a href=&quot;http://www.jenitennison.com/blog/node/109&quot;&gt;previously&lt;/a&gt; for extracting data out of CSV files just wasn&amp;#8217;t going to cut it. Awk, on the other hand, is designed for this kind of processing. Most of the RDF/XML could be generated by collecting unique values from the file. For example, to generate the RDF/XML for the regions I used:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;BEGIN { 
  FS = &quot;|&quot;;
  print &quot;&amp;lt;rdf:RDF xmlns:rdf=\&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#\&quot;&quot;;
  print &quot;  xmlns:rdfs=\&quot;http://www.w3.org/2000/01/rdf-schema#\&quot;&quot;;
  print &quot;  xmlns:g=\&quot;http://geo.data.gov.uk/0/ontology/geo#\&quot;&amp;gt;&quot;;
}
FNR &amp;gt; 1 {
  countries[$2] = substr($1, 2, length($1) - 2);
  regions[$2] = substr($2, 2, length($2) - 2);
  codes[$2] = substr($3, 2, length($3) - 2);
}
END { 
  for (region in regions) {
    country = countries[region];
    name = regions[region];
    code = codes[region];
    path = tolower(name);
    gsub(&quot; &quot;, &quot;-&quot;, path);
    print &quot;&amp;lt;g:Region rdf:about=\&quot;http://geo.data.gov.uk/0/id/region/&quot; path &quot;\&quot;&amp;gt;&quot;;
    print &quot;  &amp;lt;rdfs:label&amp;gt;&quot; name &quot;&amp;lt;/rdfs:label&amp;gt;&quot;;
    print &quot;  &amp;lt;g:isInCountry&amp;gt;&quot;;
    print &quot;    &amp;lt;g:Country rdf:about=\&quot;http://geo.data.gov.uk/0/id/country/&quot; tolower(country) &quot;\&quot;&amp;gt;&quot;;
    print &quot;      &amp;lt;g:hasRegion rdf:resource=\&quot;http://geo.data.gov.uk/0/id/region/&quot; path &quot;\&quot; /&amp;gt;&quot;;
    print &quot;    &amp;lt;/g:Country&amp;gt;&quot;;
    print &quot;  &amp;lt;/g:isInCountry&amp;gt;&quot;;
    if (code != &quot;&quot;) {
      print &quot;  &amp;lt;g:ONScode rdf:datatype=\&quot;http://www.w3.org/2001/XMLSchema#NCName\&quot;&amp;gt;&quot; code &quot;&amp;lt;/g:ONScode&amp;gt;&quot;;
    }
    print &quot;&amp;lt;/g:Region&amp;gt;&quot;;
  }
  print &quot;&amp;lt;/rdf:RDF&amp;gt;&quot;; 
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This generated RDF/XML that looks like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;rdf:RDF xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;
  xmlns:rdfs=&quot;http://www.w3.org/2000/01/rdf-schema#&quot;
  xmlns:g=&quot;http://geo.data.gov.uk/0/ontology/geo#&quot;&amp;gt;
&amp;lt;g:Region rdf:about=&quot;http://geo.data.gov.uk/0/id/region/london&quot;&amp;gt;
  &amp;lt;rdfs:label&amp;gt;London&amp;lt;/rdfs:label&amp;gt;
  &amp;lt;g:isInCountry&amp;gt;
    &amp;lt;g:Country rdf:about=&quot;http://geo.data.gov.uk/0/id/country/england&quot;&amp;gt;
      &amp;lt;g:hasRegion rdf:resource=&quot;http://geo.data.gov.uk/0/id/region/london&quot; /&amp;gt;
    &amp;lt;/g:Country&amp;gt;
  &amp;lt;/g:isInCountry&amp;gt;
  &amp;lt;g:ONScode rdf:datatype=&quot;http://www.w3.org/2001/XMLSchema#NCName&quot;&amp;gt;H&amp;lt;/g:ONScode&amp;gt;
&amp;lt;/g:Region&amp;gt;
&amp;lt;g:Region rdf:about=&quot;http://geo.data.gov.uk/0/id/region/yorkshire-and-the-humber&quot;&amp;gt;
  &amp;lt;rdfs:label&amp;gt;Yorkshire and The Humber&amp;lt;/rdfs:label&amp;gt;
  &amp;lt;g:isInCountry&amp;gt;
    &amp;lt;g:Country rdf:about=&quot;http://geo.data.gov.uk/0/id/country/england&quot;&amp;gt;
      &amp;lt;g:hasRegion rdf:resource=&quot;http://geo.data.gov.uk/0/id/region/yorkshire-and-the-humber&quot; /&amp;gt;
    &amp;lt;/g:Country&amp;gt;
  &amp;lt;/g:isInCountry&amp;gt;
  &amp;lt;g:ONScode rdf:datatype=&quot;http://www.w3.org/2001/XMLSchema#NCName&quot;&amp;gt;D&amp;lt;/g:ONScode&amp;gt;
&amp;lt;/g:Region&amp;gt;
...
&amp;lt;/rdf:RDF&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In other cases, I needed to split up the RDF/XML that was generated into several files. Uploads to Talis of more than about 2Mb cause the upload to fail. The traffic count point RDF/XML needed to be split into 13 separate files. The traffic counts themselves&amp;#8230; well, I haven&amp;#8217;t managed to do it all yet but to give you an idea, the 2008 data alone generated 1800 RDF/XML files, each about 1.6Mb in size and each taking about a minute to upload. What&amp;#8217;s there now is all the 2008 data, and the overall motor vehicle counts from all the years. More will be added gradually.&lt;/p&gt;

&lt;p&gt;The awk script that generates the count data in separate files is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;BEGIN { 
  FS = &quot;|&quot;;
  fileCount = 0;
  countCount = 99999;
  curlFile = &quot;traffic-counts.curl.sh&quot;;
}
FNR &amp;gt; 1 &amp;amp;&amp;amp; $15 ~ /\/2008 / {
  countCount += 1;
  if (countCount &amp;gt; 200) {
    if (fileCount != 0) {
      print &quot;&amp;lt;/rdf:RDF&amp;gt;&quot; &amp;gt; fileName; 
      close(fileName);
    }
    countCount = 0;
    fileCount += 1;
    fileName = &quot;traffic-counts/traffic-counts.&quot; fileCount &quot;.rdf&quot;;
    print &quot;creating&quot;, fileName;
    print &quot;echo loading&quot;, fileName &amp;gt; curlFile;
    print &quot;curl -H \&quot;Content-type: application/rdf+xml\&quot; -o progress.txt --digest -u username:password --data-binary @&quot; fileName &quot; http://api.talis.com/stores/transport/meta&quot; &amp;gt; curlFile;

    print &quot;&amp;lt;?xml version=\&quot;1.0\&quot; encoding=\&quot;ASCII\&quot;?&amp;gt;&quot; &amp;gt; fileName;
    print &quot;&amp;lt;rdf:RDF xmlns:rdf=\&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#\&quot;&quot; &amp;gt; fileName;
    print &quot;  xmlns:rdfs=\&quot;http://www.w3.org/2000/01/rdf-schema#\&quot;&quot; &amp;gt; fileName;
    print &quot;  xmlns:xsd=\&quot;http://www.w3.org/2001/XMLSchema#\&quot;&quot; &amp;gt; fileName;
    print &quot;  xmlns:t=\&quot;http://transport.data.gov.uk/0/ontology/traffic#\&quot;&quot; &amp;gt; fileName;
    print &quot;  xml:base=\&quot;http://transport.data.gov.uk/0/id/traffic-count/\&quot;&amp;gt;&quot; &amp;gt; fileName;
  }

  cp = $7;
  date = $15;
  direction = substr($16, 2, length($16) - 2);
  split(date, dateFields, &quot; &quot;);
  date = dateFields[1];
  split(date, dateFields, &quot;/&quot;);
  date = sprintf(&quot;%04d-%02d-%02d&quot;, dateFields[3], dateFields[2], dateFields[1]);
  hour = sprintf(&quot;%02d:00:00&quot;, $17);
  base = &quot;http://transport.data.gov.uk/0/id/traffic-count/&quot; cp &quot;/&quot; direction &quot;/&quot; date &quot;/&quot; hour;

  cycles = $18;
  motorbikes = $19;
  ...

  print &quot;&amp;lt;t:Count rdf:about=\&quot;&quot; base &quot;/cycle\&quot;&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;t:point&amp;gt;&quot; &amp;gt; fileName;
  print &quot;    &amp;lt;t:CountPoint rdf:about=\&quot;http://transport.data.gov.uk/0/id/traffic-count-point/&quot; cp &quot;\&quot;&amp;gt;&quot; &amp;gt; fileName;
  print &quot;      &amp;lt;t:count rdf:resource=\&quot;&quot; base &quot;/cycle\&quot; /&amp;gt;&quot; &amp;gt; fileName;
  print &quot;    &amp;lt;/t:CountPoint&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;/t:point&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;t:hour rdf:datatype=\&quot;http://www.w3.org/2001/XMLSchema#dateTime\&quot;&amp;gt;&quot; date &quot;T&quot; hour &quot;&amp;lt;/t:hour&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;t:direction&amp;gt;&quot; direction &quot;&amp;lt;/t:direction&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;t:category rdf:resource=\&quot;http://transport.data.gov.uk/0/category/bicycle\&quot; /&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;rdf:value  rdf:datatype=\&quot;http://www.w3.org/2001/XMLSchema#integer\&quot;&amp;gt;&quot; cycles &quot;&amp;lt;/rdf:value&amp;gt;&quot; &amp;gt; fileName;
  print &quot;&amp;lt;/t:Count&amp;gt;&quot; &amp;gt; fileName;
  print &quot;&amp;lt;t:Count rdf:about=\&quot;&quot; base &quot;/motorbike\&quot;&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;t:point&amp;gt;&quot; &amp;gt; fileName;
  print &quot;    &amp;lt;t:CountPoint rdf:about=\&quot;http://transport.data.gov.uk/0/id/traffic-count-point/&quot; cp &quot;\&quot;&amp;gt;&quot; &amp;gt; fileName;
  print &quot;      &amp;lt;t:count rdf:resource=\&quot;&quot; base &quot;/motorbike\&quot; /&amp;gt;&quot; &amp;gt; fileName;
  print &quot;    &amp;lt;/t:CountPoint&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;/t:point&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;t:hour rdf:datatype=\&quot;http://www.w3.org/2001/XMLSchema#dateTime\&quot;&amp;gt;&quot; date &quot;T&quot; hour &quot;&amp;lt;/t:hour&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;t:direction&amp;gt;&quot; direction &quot;&amp;lt;/t:direction&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;t:category rdf:resource=\&quot;http://transport.data.gov.uk/0/category/motorbike\&quot; /&amp;gt;&quot; &amp;gt; fileName;
  print &quot;  &amp;lt;rdf:value  rdf:datatype=\&quot;http://www.w3.org/2001/XMLSchema#integer\&quot;&amp;gt;&quot; motorbikes &quot;&amp;lt;/rdf:value&amp;gt;&quot; &amp;gt; fileName;
  print &quot;&amp;lt;/t:Count&amp;gt;&quot; &amp;gt; fileName;
  ...
}
END {
  print &quot;&amp;lt;/rdf:RDF&amp;gt;&quot; &amp;gt; fileName; 
  close(fileName);
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This also generates a shall script that includes the curl instructions to upload the files.&lt;/p&gt;

&lt;p&gt;The original data contained easing/northing information about each point when generally latitude/longitude is easier for mapping. So I extracted the easting/northings, used the &lt;a href=&quot;http://gps.ordnancesurvey.co.uk/convert.asp&quot;&gt;free (Windows only) software available via the Ordnance Survey&lt;/a&gt; to turn these into latitude/longitude &amp;#8212; there is a &lt;a href=&quot;http://gps.ordnancesurvey.co.uk/convertbatch.asp?location=0&quot;&gt;web service&lt;/a&gt; to do the same, but you can only do 200 coordinates at a time &amp;#8212; converted those into decimals, then RDF, and uploaded them.&lt;/p&gt;

&lt;p&gt;The PHP scripts that serve the data as linked data are exactly what I&amp;#8217;ve &lt;a href=&quot;http://www.jenitennison.com/blog/node/111&quot;&gt;shown before&lt;/a&gt;. I amended the &lt;code&gt;.htaccess&lt;/code&gt; file to redirect to an appropriate PHP script like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;IfModule mod_rewrite.c&amp;gt;
  RewriteEngine on
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteCond %{REQUEST_FILENAME} !-d

  RewriteRule ^id/(.+)$  id.php [L]

  RewriteCond %{REQUEST_URI} !\.php
  RewriteRule ^([^/]+)(/.+)? $1.php$2 [L,QSA]
&amp;lt;/IfModule&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and created PHP scripts for each of the types of data being published. For example, &lt;code&gt;region.php&lt;/code&gt; is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;?php
  include &quot;utils.php&quot;;
  proxy(&#039;http://geo.data.gov.uk/0/ontology/geo#Region&#039;, 50);
?&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And there we have it. Linked traffic count data on the web.&lt;/p&gt;

&lt;p&gt;(And because this is all published through Talis, there&amp;#8217;s also a &lt;a href=&quot;http://api.talis.com/stores/transport/services/sparql&quot;&gt;SPARQL endpoint&lt;/a&gt; that you could use to run queries and &lt;a href=&quot;http://www.jenitennison.com/blog/node/112&quot;&gt;create visualisations&lt;/a&gt;. Knock yourself out.)&lt;/p&gt;

&lt;p&gt;Please take a look and comment on what we&amp;#8217;ve done. What&amp;#8217;s your opinion of the URI scheme? Is it useful to be able to access the data as linked data? Which other formats would you like to see?&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/115#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/46">linked data</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/50">psi</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/31">rdf</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/47">Talis</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/48">uri</category>
 <pubDate>Sun, 26 Jul 2009 15:38:54 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">115 at http://www.jenitennison.com/blog</guid>
</item>
</channel>
</rss>

