<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.jenitennison.com/blog" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>provenance</title>
 <link>http://www.jenitennison.com/blog/taxonomy/term/58</link>
 <description>The taxonomy view with a depth of 0.</description>
 <language>en</language>
<item>
 <title>Using Freebase Gridworks to Create Linked Data</title>
 <link>http://www.jenitennison.com/blog/node/145</link>
 <description>&lt;p&gt;When we encourage people to put their data on the web as linked data, the biggest question is &amp;#8220;How?&amp;#8221;. There are so many &amp;#8220;How?&amp;#8221; questions to answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how do we choose what URIs to use for things?&lt;/li&gt;
&lt;li&gt;how do we choose what vocabularies to use?&lt;/li&gt;
&lt;li&gt;how do we handle changing data?&lt;/li&gt;
&lt;li&gt;how do we tell people how the data was created?&lt;/li&gt;
&lt;li&gt;how do we publish it?&lt;/li&gt;
&lt;li&gt;how will other people know about it?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and, of course:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how do we create it?&lt;/li&gt;
&lt;/ul&gt;

&lt;!--break--&gt;

&lt;p&gt;Our goal within the linked data part of data.gov.uk (and I know we haven&amp;#8217;t achieved it yet) is to both answer these questions and to make the answers as simple as possible. The answers to the questions &lt;em&gt;cannot&lt;/em&gt; either require up-front knowledge of all possible types of data that might be published or depend on the availability of linked data for all the things we want to talk about. It &lt;em&gt;cannot&lt;/em&gt; require registration at centralised services. It &lt;em&gt;cannot&lt;/em&gt; require everyone to do everything in the same way or at the same pace.&lt;/p&gt;

&lt;p&gt;We must take adopt an approach that encourages people to make their data available in forms that are easier for other people to pick up and use &lt;strong&gt;because they see the benefits for them&lt;/strong&gt; and their stakeholders and because the effort of doing so is not too high to bear. We must grow, adapt and evolve incrementally. If linked data eventually wins, it will be due to its benefits, not to faith.&lt;/p&gt;

&lt;p&gt;Anyway, enough rant. The point of this blog post is to talk about one of the answers to the &amp;#8216;How do we create it?&amp;#8217; question: using &lt;a href=&quot;http://code.google.com/p/freebase-gridworks/&quot;&gt;Freebase Gridworks&lt;/a&gt;. For those who haven&amp;#8217;t encountered it, Gridworks is an incredibly useful application that enables you to easily analyse, clean and manipulate tabular data. In a few steps, it can be used to generated linked datasets which can then be published on the web just like any other file, ready for other people to reuse without jumping through hoops. I&amp;#8217;m going to assume that you can &lt;a href=&quot;http://code.google.com/p/freebase-gridworks/wiki/Downloads?tm=2&quot;&gt;download it&lt;/a&gt; and &lt;a href=&quot;http://code.google.com/p/freebase-gridworks/wiki/GettingStarted&quot;&gt;install it&lt;/a&gt; following the instructions provided on the Gridworks site.&lt;/p&gt;

&lt;p&gt;In this post, I&amp;#8217;m going to talk about how to use Gridworks to generate linked data, using an example of local government spending data from &lt;a href=&quot;http://www.rbwm.gov.uk/web/finance_payments_to_suppliers.htm&quot;&gt;Windsor and Maidenhead council&lt;/a&gt;. Like a good train journey, there&amp;#8217;s quite a lot to see along the way.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: Many thanks to Dave Reynolds for his work on this data and comments on an earlier version of this post.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;Importing Data&lt;/h2&gt;

&lt;p&gt;The first step is to import the data into Gridworks. If you just take the Windsor &amp;amp; Maidenhead data and import it directly, you&amp;#8217;ll get a single not-very-useful column as shown in the following screenshot:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/blog/files/bad-import.jpg&quot; title=&quot;Bad import into Gridworks&quot; style=&quot;width: 100%&quot; /&gt;&lt;/p&gt;

&lt;p&gt;If you look at the spreadsheet in a normal spreadsheet programme then you&amp;#8217;ll see why. Like a lot of spreadsheets created by normal people, who want to create something readable by human beings rather than computers, it has some extra lines at the top to explain what the spreadsheet contains, as shown in the following screenshot:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/blog/files/spreadsheet.jpg&quot; title=&quot;Original spreadsheet&quot; style=&quot;width: 100%&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Fortunately, Gridworks lets us easily skip over these first few lines. When you import the data, put the number &lt;code&gt;1&lt;/code&gt; in the box for &amp;#8220;Ignore X initial non-blank lines&amp;#8221;, as shown here:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/blog/files/import-dialog.jpg&quot; title=&quot;Import dialog&quot; style=&quot;text-align: center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;(You need the number &lt;code&gt;1&lt;/code&gt; because although there are three lines before the table really starts, the second two of those are blank.)&lt;/p&gt;

&lt;p&gt;That done, the data should look a lot more useful, as shown in the following screenshot:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/blog/files/good-import.jpg&quot; title=&quot;Good import into Gridworks&quot; style=&quot;width: 100%&quot; /&gt;&lt;/p&gt;

&lt;h2&gt;Cleaning Data&lt;/h2&gt;

&lt;p&gt;The next thing to do is to explore the data a bit to get a handle on what&amp;#8217;s there and work out whether any cleaning or rationalisation is necessary to improve its quality.&lt;/p&gt;

&lt;p&gt;With columns that hold names, such as &amp;#8216;Directorate&amp;#8217;, &amp;#8216;Service&amp;#8217; or &amp;#8216;Supplier Name&amp;#8217;, you&amp;#8217;re looking for slight misspellings caused by bad data entry. Gridworks helps you find these by creating a list of the distinct values for a particular column and telling you how many instances there are of each. Use the arrow at the side of the column name to pull down the menu, then choose &lt;code&gt;Facet &amp;gt; Text Facet&lt;/code&gt; to create this list, as shown here:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/blog/files/facet-menu.jpg&quot; title=&quot;Choosing from the facet menu&quot; style=&quot;text-align: center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Once you&amp;#8217;ve chosen &lt;code&gt;Text Facet&lt;/code&gt;, the list pops up on the left hand side of the window. You can click on these to filter the table to contain just those rows that have that value for that column, but you can then scan through this to spot any places where there looks to be a typo or two entries that should really be the same. For example, the Services list holds both &amp;#8216;Libraries &amp;amp; Information Services&amp;#8217; and &amp;#8216;Library &amp;amp; Information Services&amp;#8217;, as shown here:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/blog/files/services-list.jpg&quot; title=&quot;Repetition in the Services list&quot; style=&quot;text-align: center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;It&amp;#8217;s unlikely that there are really two distinct services with such similar names, so we&amp;#8217;d like to clean up this data by standardising on one name or another. You can quickly change all occurrences of one value to another using the &lt;code&gt;edit&lt;/code&gt; option that appears just to the right of the value when you hover over it. This brings up a dialog that enables you to change all of those values to something else, as shown here:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/blog/files/edit-value-dialog.jpg&quot; title=&quot;Editing a value across the spreadsheet&quot; style=&quot;text-align: center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;You can do something similar with numeric columns, such as the &amp;#8216;Amount excl vat £&amp;#8217; column. This time choose &lt;code&gt;Numeric Facet&lt;/code&gt; rather than &lt;code&gt;Text Facet&lt;/code&gt; and you&amp;#8217;ll get a histogram up as shown here:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/blog/files/amount-facet.jpg&quot; title=&quot;Amount histogram&quot; style=&quot;text-align: center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This is useful for identifying outliers. If you grab the handle on the left of the histogram and move it to the centre, the rows will get filtered to only those that have an amount within that range. For example, moving it to only show rows between £500,000 and £1,500,000 shows that there are three payments of this size, all made by Children&amp;#8217;s Services to Wilmott Dixon Construction Limited, as shown in this screenshot:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/blog/files/high-value-transactions.jpg&quot; title=&quot;High value transactions&quot; style=&quot;width: 100%&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Although these values are much higher than most of the others in the spreadsheet, they don&amp;#8217;t seem to be errors &amp;#8212; I guess a new school was being built or something &amp;#8212; so there&amp;#8217;s nothing to correct here, but it shows how numeric facets can be used to explore the data.&lt;/p&gt;

&lt;p&gt;Another approach to exploring and cleaning the data is to use the clustering algorithms that are built into Gridworks to identify duplicates. To do this, pull down the column menu and this time choose &lt;code&gt;Edit Cells... &amp;gt; Cluster and Edit&lt;/code&gt;, as shown in the following screenshot, this time for the &amp;#8216;Supplier Name&amp;#8217; column:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/blog/files/edit-cells-menu.jpg&quot; title=&quot;Choosing from the Edit Cells menu&quot; style=&quot;text-align: center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This brings up a dialog that groups together values that look similar. In this case, &amp;#8216;Siemens plc&amp;#8217; and &amp;#8216;Siemens PLC&amp;#8217;, as shown in the following screenshot:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/blog/files/cluster-dialog.jpg&quot; title=&quot;Clustering values in a column&quot; style=&quot;width: 100%&quot; /&gt;&lt;/p&gt;

&lt;p&gt;You can use this dialog to change all the similar values to a standard one. Check the &lt;code&gt;Merge&lt;/code&gt; checkbox for the clusters of values that should be merged, edit the &lt;code&gt;New Cell Value&lt;/code&gt; field to whatever standard value you want to adopt, and choose &lt;code&gt;Apply &amp;amp; Re-cluster&lt;/code&gt; or simply &lt;code&gt;Apply &amp;amp; Close&lt;/code&gt; to make the change.&lt;/p&gt;

&lt;p&gt;You will often find that the default clustering algorithm (key collision/fingerprint) doesn&amp;#8217;t come up with any clusters as it&amp;#8217;s fairly conservative. It&amp;#8217;s worth playing around a bit with different algorithms to look for other duplicates by selecting other possibilities from the drop-down menus. For example, choosing the &amp;#8216;nearest neighbour&amp;#8217; method with the Levenstein distance function and a radius of 2 (edits) results in four possible duplicates within the Suppliers list, as shown here:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/blog/files/levenstein-cluster.jpg&quot; title=&quot;Clustering values with Levenstein distance&quot; style=&quot;width: 100%&quot; /&gt;&lt;/p&gt;

&lt;p&gt;If you&amp;#8217;re not sure about whether the cluster is due to a typo or not, hover over the row and click on the &lt;code&gt;Browse this cluster&lt;/code&gt; link that appears. That will bring up a separate window that will show you just the rows in the cluster, from which you should be able to make a judgement. For example, it&amp;#8217;s not clear whether &amp;#8216;Academia Ltd&amp;#8217; is a typo for &amp;#8216;Academics Ltd&amp;#8217; but browsing the cluster shows that the Cost Centre codes and the Types of the transactions are completely different for the two Suppliers, so they are probably different.&lt;/p&gt;

&lt;h2&gt;Deriving Data&lt;/h2&gt;

&lt;p&gt;The next step is to derive some data from what we have within the spreadsheet. Since our goal is to produce linked data, the kind of derived data that we&amp;#8217;re interested in are URIs.&lt;/p&gt;

&lt;p&gt;At this point we need to start making decisions about what URIs to use. If you look at the &lt;a href=&quot;http://www.rbwm.gov.uk/web/finance_payments_to_suppliers.htm&quot;&gt;list of spending data from Windsor and Maidenhead&lt;/a&gt;, you&amp;#8217;ll see that there are a whole bunch of these spreadsheets. It would be really useful if we could tie these spreadsheets together by using the same URIs for the same things across the datasets. For that reason, the only URI that&amp;#8217;s going to be local to the dataset is the URI for each line (or data point if you like) itself. On the other hand, most of the things that are named here are going to be local to Windsor &amp;amp; Maidenhead: &amp;#8216;Abba Cars&amp;#8217; may be sufficient to identify a single company within Windsor &amp;amp; Maidenhead, but certainly wouldn&amp;#8217;t be nationwide. So the URIs I&amp;#8217;m going to create here are mostly going to be within the &lt;code&gt;www.rbwm.gov.uk&lt;/code&gt; domain.&lt;/p&gt;

&lt;p&gt;Here&amp;#8217;s the table of the columns and the associated URIs that I&amp;#8217;m going to use. I should stress that this is just for example purposes, but I&amp;#8217;ve used the following principles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;URIs for datasets are just like URIs for any other web document, but shouldn&amp;#8217;t have an extension because the data itself should be available in many formats&lt;/li&gt;
&lt;li&gt;URIs for real-world things should have &lt;code&gt;/id&lt;/code&gt; at the start of the path, and URIs for conceptual things should have &lt;code&gt;/def&lt;/code&gt; at the start of their paths; both should result in a 303 redirection to a suitable web page&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is what we&amp;#8217;re doing within data.gov.uk, but it&amp;#8217;s an important principle of the web that different councils might well choose their own URI schemes, depending on the kind of technology support that they have, without any bad side-effects on the interpretation of the data.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Column&lt;/th&gt;
      &lt;th&gt;URI pattern&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;(Dataset)&lt;/th&gt;
      &lt;td&gt;http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;(Row/ExpenditureLine)&lt;/th&gt;
      &lt;td&gt;http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2#{row-number}&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;(Council)&lt;/th&gt;
      &lt;td&gt;http://statistics.data.gov.uk/id/local-authority/00ME&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Directorate&lt;/th&gt;
      &lt;td&gt;http://www.rbwm.gov.uk/id/directorate/{directorate-slug}&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Updated&lt;/th&gt;
      &lt;td&gt;http://reference.data.gov.uk/id/day/{date}&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;TransNo/Payment&lt;/th&gt;
      &lt;td&gt;http://www.rbwm.gov.uk/id/transaction/{transaction-number}&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Service&lt;/th&gt;
      &lt;td&gt;http://www.rbwm.gov.uk/id/service/{service-slug}&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Cost Centre&lt;/th&gt;
      &lt;td&gt;http://www.rbwm.gov.uk/def/cost-centre/{cost-centre-code}&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;Supplier Name&lt;/th&gt;
      &lt;td&gt;http://www.rbwm.gov.uk/id/supplier/{supplier-slug}&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;As you can see, those of the columns that contain text fields have, as part of their URI, a &lt;a href=&quot;http://en.wikipedia.org/wiki/Slug_(production)&quot;&gt;&amp;#8216;slug&amp;#8217;&lt;/a&gt;. This is a shortened, normalised value suitable for putting in a URI: basically ensuring that the string doesn&amp;#8217;t contain any punctuation or spaces. For example, &amp;#8216;Adult &amp;amp; Community Services&amp;#8217; would turn into &amp;#8216;adult-community-services&amp;#8217;.&lt;/p&gt;

&lt;p&gt;Our first task will be to create these slugs. To do this, we&amp;#8217;ll create a new column based on the existing ones by choosing &lt;code&gt;Edit Column &amp;gt; Add Column Based on This Column ...&lt;/code&gt; from the drop-down menu on the appropriate column:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/blog/files/edit-column-menu.jpg&quot; title=&quot;Edit Column menu&quot; style=&quot;text-align: center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Selecting this will bring up a dialog which will ask you to name the new column and then enter a formula to calculate the new value, as shown here:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/blog/files/create-slug.jpg&quot; title=&quot;Edit Column menu&quot; style=&quot;text-align: center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The default language for this formula is Gridworks&amp;#8217; own, though there are other options available. To create the slug, we need to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;turn the value to lower case&lt;/li&gt;
&lt;li&gt;replace all spaces with hyphens&lt;/li&gt;
&lt;li&gt;remove anything that isn&amp;#8217;t a letter, number, or hyphen&lt;/li&gt;
&lt;li&gt;replace all sequences of two hyphens with a single hyphen&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is done in two steps. The first three steps can be done using the formula:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;replace(replace(toLowercase(value), &#039; &#039;, &#039;-&#039;), /[^-a-z0-9]/, &#039;&#039;)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Gridworks helps by listing the original and resulting values for the first several rows of the spreadsheet, so that you can see whether it&amp;#8217;s working as expected. When you&amp;#8217;re happy, hitting &lt;code&gt;OK&lt;/code&gt; creates the new column.&lt;/p&gt;

&lt;p&gt;The last step (replacing all sequences of two hyphens with a single hyphen) can be done by editing the cells in the new column. Bring up the &lt;code&gt;Edit Cells... &amp;gt; Transform...&lt;/code&gt; dialog using the menu:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/blog/files/edit-cells-menu-2.jpg&quot; title=&quot;Edit Cells menu&quot; style=&quot;text-align: center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;and use the formula:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;replace(value, &#039;--&#039;, &#039;-&#039;)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;then check the &lt;code&gt;Re-transform until no change&lt;/code&gt; checkbox so that any pairs of hyphens are repeatedly replaced with single hyphens, as shown here:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/blog/files/transform.jpg&quot; title=&quot;Edit Cells menu&quot; style=&quot;text-align: center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The other tabs in the new column and edit cells dialogs are really helpful. The &lt;code&gt;History&lt;/code&gt; tab lets you choose formulae that you&amp;#8217;ve used before to use again. This is useful here because we want to create the slugs for the Service and Supplier Name in the same way. The &lt;code&gt;Help&lt;/code&gt; tab lists all the functions that you can use within the formula.&lt;/p&gt;

&lt;p&gt;Creating the URIs for the columns proceeds in the same way, except this time the formulae are more like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&#039;http://www.rbwm.gov.uk/id/directorate/&#039; + value
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;There are two that are slightly different. First, there&amp;#8217;s the URI for the date, which needs to be constructed from the date/time value held by Gridworks as follows. We can do this in two stages. First, to construct a new column called &amp;#8216;Date&amp;#8217; to hold the formatted date:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;datePart(value, &#039;year&#039;) + &#039;-&#039; + 
if (datePart(value, &#039;month&#039;) &amp;lt; 9, &#039;0&#039;, &#039;&#039;) + replace(datePart(value, &#039;month&#039;) + 1, &#039;.0&#039;, &#039;&#039;) + &#039;-&#039; + 
if (datePart(value, &#039;day&#039;) &amp;lt; 10, &#039;0&#039;, &#039;&#039;) + datePart(value, &#039;day&#039;)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;(note that the &lt;code&gt;datePart()&lt;/code&gt; function returns a 0-based count for the month) and then to create the Date URI column based on this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&#039;http://reference.data.gov.uk/id/day/&#039; + value
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Second, there&amp;#8217;s the URI for the row (an expenditure line) itself, which needs to be constructed using the row number. It&amp;#8217;s useful to construct it as a local URI (ie just the fragment) as this means the same code can be used to construct the column across different datasets, so it&amp;#8217;s just:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&#039;#&#039; + rowIndex
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Exporting Data&lt;/h2&gt;

&lt;p&gt;Once the extra columns have been made, it&amp;#8217;s time to export data from Gridworks. While Gridworks makes it easy to export to CSV or into Freebase, it&amp;#8217;s also possible to export in any format you want using templates. Use the &lt;code&gt;Project&lt;/code&gt; menu and choose &lt;code&gt;Export Filtered Rows &amp;gt; Templating ...&lt;/code&gt;, as shown in the following screenshot:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/blog/files/project-menu.jpg&quot; title=&quot;Project menu&quot; style=&quot;text-align: center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Note that this will only export the rows that you currently have selected, so if you want to export everything, make sure that you deselect any facets that you&amp;#8217;ve currently got selected.&lt;/p&gt;

&lt;p&gt;Choosing the &lt;code&gt;Templating ...&lt;/code&gt; option will open up a dialog that you can use to create whatever format you want. The default, as shown in the following screenshot, is JSON.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/blog/files/template-dialog-json.jpg&quot; title=&quot;Templating dialog to create JSON&quot; style=&quot;width: 100%&quot; /&gt;&lt;/p&gt;

&lt;p&gt;On the left are four fields:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Prefix&lt;/strong&gt; is content that&amp;#8217;s put at the top of the exported data&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Row Template&lt;/strong&gt; is content that&amp;#8217;s generated for each row&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Row Separator&lt;/strong&gt; is content that&amp;#8217;s put between each row&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Suffix&lt;/strong&gt; is content that&amp;#8217;s put at the bottom of the exported data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One thing to be extremely careful of here is that any changes you made to the fields on the left here &lt;strong&gt;will not be saved&lt;/strong&gt; when the dialog is closed. For that reason, it&amp;#8217;s a good idea to create your templates in a separate text file and copy and paste them in. Also note that the sample data on the right is only for the first set of rows, not for the whole spreadsheet.&lt;/p&gt;

&lt;p&gt;We&amp;#8217;re going to generate Turtle using the template, so the next stage is to work out precisely what Turtle to generate. We&amp;#8217;ve been working on small vocabulary for payment data based on the &lt;a href=&quot;http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html&quot;&gt;Data Cube vocabulary&lt;/a&gt; and that&amp;#8217;s what I&amp;#8217;ll use here, although it isn&amp;#8217;t quite complete and available yet as it will be. We&amp;#8217;ll start at the bottom, with the individual rows, and then add extra surrounding information as we go.&lt;/p&gt;

&lt;h3&gt;Row Template&lt;/h3&gt;

&lt;p&gt;Within this data, each row corresponds to a &lt;code&gt;payment:ExpenditureLine&lt;/code&gt; within the dataset. The expenditure lines can be organised into groups based on the &lt;code&gt;payment:Payment&lt;/code&gt; that they&amp;#8217;re associated with, which is indicated through the &amp;#8216;TransNo&amp;#8217; column in the database. Within the payment vocabulary we&amp;#8217;re using, we can assign individual expenditure lines to the payment using the &lt;code&gt;payment:expenditureLine&lt;/code&gt; property.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;payment:payer&lt;/code&gt; of each &lt;code&gt;payment:Payment&lt;/code&gt; is Windsor &amp;amp; Maidenhead council. The &lt;code&gt;payment:payee&lt;/code&gt; is the &amp;#8216;Supplier&amp;#8217; listed in the spreadsheet. The &lt;code&gt;payment:date&lt;/code&gt; is the &amp;#8216;Updated&amp;#8217; date.&lt;/p&gt;

&lt;p&gt;Each individual line in the spreadsheet is a &lt;code&gt;payment:ExpenditureLine&lt;/code&gt; which is associated with one of these payments. The &lt;code&gt;payment:expenditureCode&lt;/code&gt; is the &amp;#8216;Cost Centre&amp;#8217; and the actual &lt;code&gt;payment:amountExcludingVAT&lt;/code&gt; is the &amp;#8216;Amount excl vat £&amp;#8217; value. Some example Turtle for the first line is thus:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2&amp;gt;
  qb:slice &amp;lt;http://www.rbwm.gov.uk/id/transaction/2650750&amp;gt; .

&amp;lt;http://www.rbwm.gov.uk/id/transaction/2650750&amp;gt;
  a payment:Payment , qb:Slice ;
  rdfs:label &quot;Transaction 2650750&quot;@en ;
  qb:sliceStructure payment:payment-slice ;
  payment:transactionReference &quot;2650750&quot; ;
  payment:payer &amp;lt;http://statistics.data.gov.uk/id/local-authority/00ME&amp;gt; ;
  payment:payee &amp;lt;http://www.rbwm.gov.uk/id/supplier/1st-choice-d-b-driveways-limited&amp;gt; ;
  payment:date &amp;lt;http://reference.data.gov.uk/id/day/2010-04-09&amp;gt; ;
  payment:expenditureLine &amp;lt;http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2#0&amp;gt; .

&amp;lt;http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2#0&amp;gt;
  a payment:ExpenditureLine , qb:Observation ;
  rdfs:label &quot;Expenditure Line 0&quot;@en ;
  qb:dataSet &amp;lt;http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2&amp;gt; ;
  payment:expenditureCode &amp;lt;http://www.rbwm.gov.uk/def/cost-centre/LM05&amp;gt; ;
  payment:amountExcludingVAT 1875.00 .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That&amp;#8217;s the basic data for each line, but there&amp;#8217;s also some other information which should be brought out for each line:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the name of the payee&lt;/li&gt;
&lt;li&gt;the date, year, month and day-of-month for the payment, which may help further analysis of the data&lt;/li&gt;
&lt;li&gt;the meaning of the expenditure code (particularly its association to a particular service)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In each of these cases, pulling the information out from each line is going to lead to a lot of repetition, because the same payee, date and so on will be described in multiple lines, but we don&amp;#8217;t have any choice and we can tidy it up by removing duplicates afterwards. The Turtle for the first line will look like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://www.rbwm.gov.uk/id/supplier/1st-choice-d-b-driveways-limited&amp;gt;
  a org:Organization ;
  rdfs:label &quot;1st Choice - D B Driveways Limited&quot;@en .

&amp;lt;http://reference.data.gov.uk/id/day/2010-04-09&amp;gt;
  a interval:CalendarDay ;
  rdfs:label &quot;2010-04-09&quot; ;
  time:hasBeginning &amp;lt;http://reference.data.gov.uk/id/gregorian-instant/2010-04-09T00:00:00&amp;gt; ;
  interval:ordinalYear 2010 ;
  interval:ordinalMonthOfYear 4 ;
  interval:ordinalDayOfMonth 9 .

&amp;lt;http://reference.data.gov.uk/id/gregorian-instant/2010-04-09T00:00:00&amp;gt;
  a time:Instant ;
  time:inXSDDateTime &quot;2010-04-09T00:00:00&quot;^^xsd:dateTime .

&amp;lt;http://www.rbwm.gov.uk/def/cost-centre/LM05&amp;gt;
  a rbwm:CostCentre , skos:Concept ;
  rdfs:label &quot;Cost Centre LM05&quot;@en ;
  rbwm:costCentreCode &quot;LM05&quot;^^rbwm:CostCentreCode ;
  rbwm:service &amp;lt;http://www.rbwm.gov.uk/id/service/magnet-leisure-centre&amp;gt; .

&amp;lt;http://www.rbwm.gov.uk/id/service/magnet-leisure-centre&amp;gt;
  a rbwm:Service ;
  rdfs:label &quot;Magnet Leisure Centre&quot;@en ;
  rbwm:providedBy &amp;lt;http://www.rbwm.gov.uk/id/directorate/adult-community-services&amp;gt; .

&amp;lt;http://www.rbwm.gov.uk/id/directorate/adult-community-services&amp;gt;
  a rbwm:Directorate ;
  rdfs:label &quot;Adult &amp;amp; Community Services&quot;@en ;
  org:unitOf &amp;lt;http://statistics.data.gov.uk/id/local-authority/00ME&amp;gt; ;
  rbwm:provides &amp;lt;http://www.rbwm.gov.uk/id/service/magnet-leisure-centre&amp;gt; .

&amp;lt;http://statistics.data.gov.uk/id/local-authority/00ME&amp;gt;
  org:hasUnit &amp;lt;http://www.rbwm.gov.uk/id/directorate/adult-community-services&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You&amp;#8217;ll see that in the last part of this I&amp;#8217;ve introduced some properties and classes with a &lt;code&gt;rbwm:&lt;/code&gt; prefix. These are for classes and properties that are here in this data, but aren&amp;#8217;t part of the payment vocabulary. The basic schema is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;rbwm:CostCentre a rdfs:Class ;
  rdfs:label &quot;Cost Centre&quot;@en ;
  rdfs:comment &quot;A cost centre.&quot;@en .

rbwm:Service a rdfs:Class ;
  rdfs:label &quot;Service&quot;@en ;
  rdfs:comment &quot;A service provided by the council.&quot;@en .

rbwm:Directorate a rdfs:Class ;
  rdfs:label &quot;Directorate&quot;@en ;
  rdfs:comment &quot;A directorate within the council&quot;@en .

rbwm:service a rdf:Property , owl:ObjectProperty ;
  rdfs:label &quot;Service&quot;@en ;
  rdfs:comment &quot;The service associated with a particular cost centre.&quot;@en ;
  rdfs:domain rbwm:CostCentre ;
  rdfs:range rbwm:Service .

rbwm:providedBy a rdf:Property , owl:ObjectProperty ;
  rdfs:label &quot;Provided By&quot;@en ;
  rdfs:comment &quot;The directorate that provides this service.&quot;@en ;
  rdfs:domain rbwm:Service ;
  rdfs:range rbwm:Directorate .

rbwm:provides a rdf:Property , owl:ObjectProperty ;
  rdfs:label &quot;Provides&quot;@en ;
  rdfs:comment &quot;A service provided by this directorate.&quot;@en ;
  rdfs:domain rbwm:Directorate ;
  rdfs:range rbwm:Service .

rbwm:costCentreCode a rdf:Property , owl:DatatypeProperty ;
  rdfs:label &quot;Cost Centre Code&quot;@en ;
  rdfs:comment &quot;The code of this cost centre.&quot;@en ;
  rdfs:domain rbwm:CostCentre ;
  rdfs:range rbwm:CostCentreCode .

rbwm:CostCentreCode a rdfs:Datatype ;
  rdfs:label &quot;Cost Centre Code&quot;@en ;
  rdfs:comment &quot;A cost centre code consisting of two capital letters followed by two digits.&quot;@en .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This illustrates how individual councils might extend the information that they make available in RDF without having to seek any kind of prior agreement from anyone else. If, later on, a third party starts to make available ontologies for cost centres, services and directorates, Windsor &amp;amp; Maidenhead could start to link up their RDF with those more widely standardised classes and properties, with appropriate use of &lt;code&gt;rdfs:subClassOf&lt;/code&gt; or &lt;code&gt;rdfs:subPropertyOf&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now we have an idea about what data we can extract for a single row, we can turn this into a Gridworks template. The templates are fairly straight forward. Wherever you want to insert a value from a particular column, you use the syntax &lt;code&gt;${Column Name}&lt;/code&gt;. If you want to do any further processing, you can use the syntax &lt;code&gt;{{Formula}}&lt;/code&gt; to insert the result of a calculation.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2&amp;gt;
  qb:slice &amp;lt;${Transaction URI}&amp;gt; .

&amp;lt;${Transaction URI}&amp;gt;
  a payment:Payment , qb:Slice ;
  rdfs:label &quot;Transaction ${TransNo}&quot;@en ;
  qb:sliceStructure payment:payment-slice ;
  payment:transactionReference &quot;${TransNo}&quot; ;
  payment:payer &amp;lt;http://statistics.data.gov.uk/id/local-authority/00ME&amp;gt; ;
  payment:payee &amp;lt;${Supplier URI}&amp;gt; ;
  payment:date &amp;lt;${Date URI}&amp;gt; ;
  payment:expenditureLine &amp;lt;http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2${Line URI}&amp;gt; .

&amp;lt;http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2${Line URI}&amp;gt;
  a payment:ExpenditureLine , qb:Observation ;
  rdfs:label &quot;Expenditure Line {{rowIndex}}&quot;@en ;
  qb:dataSet &amp;lt;http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2&amp;gt; ;
  payment:expenditureCode &amp;lt;${Cost Centre URI}&amp;gt; ;
  payment:amountExcludingVAT {{cells[&#039;Amount excl vat £&#039;].value + 0}} .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Note that the last line here uses the expression &lt;code&gt;cells[&#039;Amount excl vat £&#039;].value + 0&lt;/code&gt; in order to ensure that every figure has a decimal place, which makes them into &lt;code&gt;xsd:decimal&lt;/code&gt; values within the resulting RDF.&lt;/p&gt;

&lt;p&gt;I won&amp;#8217;t do the rest of the row template here, though it&amp;#8217;s &lt;a href=&quot;/blog/files/finance_supplier_payments_2010_q2_provenance.ttl&quot;&gt;available in full in a separate file&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The other parts of the template are easier to complete. The prefix needs to contain any namespace prefixes that are used within the RDF. It&amp;#8217;s also useful to put a base URI here and describe the dataset itself. The RDF for the dataset should contain a number of properties about the dataset as a whole. There are a number of levels at which the dataset can be described:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;basic metadata such as its title and the license that it&amp;#8217;s available under&lt;/li&gt;
&lt;li&gt;statistical metadata including what dimensions it has and how it&amp;#8217;s sliced&lt;/li&gt;
&lt;li&gt;linked data metadata such as how this dataset links out to other linked datasets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Turtle for this description is shown here:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://www.rbwm.gov.uk/public/finance_supplier_payments&amp;gt;
  a void:Dataset ;
  void:subset &amp;lt;http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2&amp;gt; .

&amp;lt;http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2&amp;gt;
  a payment:PaymentDataset , void:Dataset ;
  # basic metadata
  rdfs:label &quot;Windsor &amp;amp; Maidenhead Supplier Payments where charge to specific cost centre is &amp;gt;= £500 for period April 2010 - June 2010&quot;@en ;
  dct:license &amp;lt;http://data.gov.uk/id/licence&amp;gt; ;
  dct:temporal [
    # this time is retrieved from the Last-Modified date on the original spreadsheet
    time:hasBeginning &amp;lt;http://reference.data.gov.uk/id/gregorian-instant/2010-08-02T08:37:02&amp;gt;
  ] ;

  # statistical metadata
  qb:structure payment:payments-with-expenditure-structure ;
  qb:sliceKey payment:payment-slice ;
  payment:currency &amp;lt;http://dbpedia.org/resource/Pound_sterling&amp;gt; ;

  # linked data metadata
  void:exampleResource
    &amp;lt;http://www.rbwm.gov.uk/id/transaction/2650750&amp;gt; ,
    &amp;lt;http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2#0&amp;gt; ;
  void:vocabulary payment: , qb: , rbwm: ;
  void:subset [
    a void:Linkset ;
    void:linkPredicate qb:slice ;
    void:subjectsTarget &amp;lt;http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2&amp;gt; ;
    void:objectsTarget &amp;lt;http://www.rbwm.gov.uk/id/transaction&amp;gt; ;
  ] , [
    a void:Linkset ;
    void:linkPredicate payment:payer ;
    void:subjectsTarget &amp;lt;http://www.rbwm.gov.uk/id/transaction&amp;gt; ;
    void:objectsTarget &amp;lt;http://statistics.data.gov.uk/id/local-authority&amp;gt; ;
  ] , [
    a void:Linkset ;
    void:linkPredicate payment:payee ;
    void:subjectsTarget &amp;lt;http://www.rbwm.gov.uk/id/transaction&amp;gt; ;
    void:objectsTarget &amp;lt;http://www.rbwm.gov.uk/id/supplier&amp;gt; ;
  ] , [
    a void:Linkset ;
    void:linkPredicate payment:date ;
    void:subjectsTarget &amp;lt;http://www.rbwm.gov.uk/id/transaction&amp;gt; ;
    void:objectsTarget &amp;lt;http://reference.data.gov.uk/id/day&amp;gt; ;
  ] , [
    a void:Linkset ;
    void:linkPredicate payment:expenditureLine ;
    void:subjectsTarget &amp;lt;http://www.rbwm.gov.uk/id/transaction&amp;gt; ;
    void:objectsTarget &amp;lt;http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2&amp;gt; ;
  ] , [
    a void:Linkset ;
    void:linkPredicate payment:expenditureCode ;
    void:subjectsTarget &amp;lt;http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2&amp;gt; ;
    void:objectsTarget &amp;lt;http://www.rbwm.gov.uk/def/cost-centre&amp;gt; ;
  ] , [
    a void:Linkset ;
    void:linkPredicate rbwm:service ;
    void:subjectsTarget &amp;lt;http://www.rbwm.gov.uk/def/cost-centre&amp;gt; ;
    void:objectsTarget &amp;lt;http://www.rbwm.gov.uk/id/service&amp;gt; ;
  ] , [
    a void:Linkset ;
    void:linkPredicate rbwm:providedBy ;
    void:subjectsTarget &amp;lt;http://www.rbwm.gov.uk/id/service&amp;gt; ;
    void:objectsTarget &amp;lt;http://www.rbwm.gov.uk/id/directorate&amp;gt; ;
  ] , [
    a void:Linkset ;
    void:linkPredicate rbwm:provides ;
    void:subjectsTarget &amp;lt;http://www.rbwm.gov.uk/id/directorate&amp;gt; ;
    void:objectsTarget &amp;lt;http://www.rbwm.gov.uk/id/service&amp;gt; ;
  ] , [
    a void:Linkset ;
    void:linkPredicate org:hasUnit ;
    void:subjectsTarget &amp;lt;http://statistics.data.gov.uk/id/local-authority&amp;gt; ;
    void:objectsTarget &amp;lt;http://www.rbwm.gov.uk/id/directorate&amp;gt; ;
  ] , [
    a void:Linkset ;
    void:linkPredicate org:unitOf ;
    void:subjectsTarget &amp;lt;http://www.rbwm.gov.uk/id/directorate&amp;gt; ;
    void:objectsTarget &amp;lt;http://statistics.data.gov.uk/id/local-authority&amp;gt; ;
  ] .
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Provenance&lt;/h2&gt;

&lt;p&gt;I&amp;#8217;ve described here, verbally, exactly what I&amp;#8217;ve done in terms of the cleaning of the data, deriving new columns, and the template that I&amp;#8217;ve used to create a Turtle rendition of the data in this spreadsheet. One of the things that we&amp;#8217;ve worked hard on within data.gov.uk is finding ways of expressing this provenance information in RDF. There are two reasons for this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Providing provenance increases transparency and enables you to check the processing that the data has been through, increasing your trust in the data.&lt;/li&gt;
&lt;li&gt;Describing the process in sufficient detail for you to replicate that process enables you to modify and repeat the process, which both enables you to add value and to apply the same processing to your own situation, thus spreading best practice.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The basic provenance vocabulary that we&amp;#8217;re using within data.gov.uk is the &lt;a href=&quot;http://code.google.com/p/opmv/&quot;&gt;Open Provenance Model Vocabulary&lt;/a&gt;. This vocabulary talks about Artifacts, Processes that create and use them, and Agents that control those processes. We&amp;#8217;ve created an extension of this vocabulary specifically to help describe this kind of scenario, where a spreadsheet is processed using Gridworks and then exported using a template. I&amp;#8217;ll put this provenance information in a separate file simply because embedding provenance information, which includes a template, in the template itself gets us into nasty recursion issues.&lt;/p&gt;

&lt;p&gt;As well as the template, there are two supplementary artifacts that we need to record the provenance of this data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the Gridworks project itself&lt;/li&gt;
&lt;li&gt;the JSON description of the set of operations performed by Gridworks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first can be exported using the &lt;code&gt;Project&lt;/code&gt; menu. The second is accessed through the &lt;code&gt;Undo/Redo&lt;/code&gt; tab as shown in the following screenshot:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/blog/files/undo-redo.jpg&quot; title=&quot;Undo/Redo tab&quot; style=&quot;text-align: center&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This tab shows the actions that have been carried out on the data, and enables you to undo them in sequence. The &lt;code&gt;extract&lt;/code&gt; link at the bottom opens up the dialog shown in the following screenshot:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/blog/files/extract-dialog.jpg&quot; title=&quot;Extract Operations dialog&quot; style=&quot;width: 100%&quot; /&gt;&lt;/p&gt;

&lt;p&gt;You have to manually copy and paste the JSON description from the right of this dialog into a separate file in order to save it.&lt;/p&gt;

&lt;p&gt;We can then start describing the provenance of the RDF; this needs to go in the Turtle file itself. We start by saying that the RDF that we&amp;#8217;ve created was created from the Gridworks project and through an extraction operation. A simple link to the spreadsheet that was used as the source of the data also provides a quick link back to the original data:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2&amp;gt;
  a opmv:Artifact ;
  dct:source &amp;lt;http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2.xls&amp;gt; ;
  gridworks:wasExportedBy &amp;lt;finance_supplier_payments_2010_q2_provenance#gridworks-export&amp;gt; ;
  gridworks:wasExportedFrom &amp;lt;finance_supplier_payments_2010_q2_project.tar.gz&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The provenance information then needs to describe the export process:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;#gridworks-export&amp;gt;
  a gridworks:ExportUsingTemplate , opmv:Process ;
  rdfs:label &quot;Process for Exporting Windsor &amp;amp; Maidenhead data as Turtle&quot; ;
  gridworks:project &amp;lt;finance_supplier_payments_2010_q2_project.tar.gz&amp;gt; ;
  gridworks:template &amp;lt;#gridworks-template&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The project itself was created from the original Excel spreadsheet. The details of how it was generated are through an import that ignored a single non-blank header row and then went through the set of operations described by the JSON.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;finance_supplier_payments_2010_q2_project.tar.gz&amp;gt;
  a gridworks:Project , opmv:Artifact ;
  rdfs:label &quot;Windsor &amp;amp; Maidenhead Supplier Payments April 2010 - June 2010 Gridworks Project&quot;@en ;
  gridworks:wasCreatedFrom &amp;lt;http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2.xls&amp;gt; ;
  opmv:wasGeneratedBy &amp;lt;#gridworks-processing&amp;gt; .

&amp;lt;#gridworks-processing&amp;gt;
  a gridworks:Process , opmv:Process ;
  rdfs:label &quot;Processing on the Gridworks Project&quot;@en ;
  common:usedData &amp;lt;http://www.rbwm.gov.uk/public/finance_supplier_payments_2010_q2.xls&amp;gt; ;
  gridworks:ignore 1 ;
  gridworks:operationDescription &amp;lt;finance_supplier_payments_2010_q2_operations.json&amp;gt; .

&amp;lt;finance_supplier_payments_2010_q2_operations.json&amp;gt;
  a gridworks:OperationDescription , opmv:Artifact ;
  rdfs:label &quot;Dump of the Processing carried out by Gridworks on Windsor &amp;amp;amp; Maidenhead Supplier Payments April 2010 - June 2010 data&quot;@en ;
  gridworks:wasExportedFrom &amp;lt;finance_supplier_payments_2010_q2_project.tar.gz&amp;gt; ;
  gridworks:wasExportedBy &amp;lt;#gridworks-operation-description-extraction&amp;gt; .

&amp;lt;#gridworks-operation-description-extraction&amp;gt;
  a gridworks:ExtractOperationDescription , opmv:Process ;
  rdfs:label &quot;Extraction of the operation description from the Windsor &amp;amp;amp; Maidenhead Supplier Payments April 2010 - June 2010 Project from Gridworks&quot;@en ;
  gridworks:project &amp;lt;finance_supplier_payments_2010_q2_project.tar.gz&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The template is described in terms of the separate parts; in fact it&amp;#8217;s useful to use this provenance file as the record of the template that you use, given that Gridworks won&amp;#8217;t save the template in the project itself.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;#gridworks-template&amp;gt;
  a gridworks:Template , opmv:Artifact ;
  gridworks:prefix &quot;&quot;&quot;
...
&quot;&quot;&quot;^^xsd:string ;
  gridworks:rowTemplate &quot;&quot;&quot;
...
&quot;&quot;&quot;^^^xsd:string .
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Rinse and Repeat&lt;/h2&gt;

&lt;p&gt;Gridworks makes it easy to repeat a given set of operations on another spreadsheet that follows the same structure. If you download the &lt;a href=&quot;http://www.rbwm.gov.uk/web/finance_payments_to_suppliers.htm&quot;&gt;Windsor and Maidenhead spending data from 2009 Q4&lt;/a&gt; and import it into Gridworks, you&amp;#8217;ll see that it uses the same set of columns as the 2010 Q2 data that we&amp;#8217;ve been looking at. (Strangely enough, the 2010 Q1 data doesn&amp;#8217;t quite follow the same structure as it doesn&amp;#8217;t include the &amp;#8216;TransNo&amp;#8217; column.)&lt;/p&gt;

&lt;p&gt;There are a couple of differences:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the &amp;#8216;Updated&amp;#8217; column isn&amp;#8217;t recognised as holding dates on import; you can use &lt;code&gt;Edit Cells... &amp;gt; Transform&lt;/code&gt; to change these values into dates using the &lt;code&gt;toDate(value)&lt;/code&gt; formula&lt;/li&gt;
&lt;li&gt;the &amp;#8216;Amount excl vat £&amp;#8217; column isn&amp;#8217;t recognised as holding numbers on import because the values have commas in them; you can use the formula &lt;code&gt;toNumber(replace(value, &#039;,&#039;, &#039;&#039;))&lt;/code&gt; to rectify this&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You might want to do some more cleaning, for example to check for duplicates, but once that is done, you use the &lt;code&gt;apply&lt;/code&gt; link at the bottom of the &lt;code&gt;Undo/Redo&lt;/code&gt; tab to apply the JSON operation description that you imported for the previous spreadsheet on this one. The templates require only a little tweaking to give different filenames and labels, but otherwise can be used as-is.&lt;/p&gt;

&lt;p&gt;So while the process of cleaning data, deriving values and creating a template for exporting as Turtle is a bit of effort, the likelihood is that you will be able to repeat the same operations on similar data with a minimal amount of work.&lt;/p&gt;

&lt;h2&gt;Conclusions&lt;/h2&gt;

&lt;p&gt;Gridworks is a simply amazing tool for data cleansing, analysis and, as we&amp;#8217;ve seen, transformation. It&amp;#8217;s set to become more so for our purposes in the near future, as it comes to support the mapping of names for things to URIs using configurable reconciliation services (which might allow it to automatically map Government Department names to URIs, for example), and the creation of RDF using a more intuitive and user-friendly approach than the templates that I&amp;#8217;ve illustrated here.&lt;/p&gt;

&lt;p&gt;Of course there are issues, particularly for UK civil servants who typically have to operate on locked-down machines running IE7 (if they&amp;#8217;re lucky). Gridworks also only deals with the fairly simple cases of data that fits in a spreadsheet-like structure, without the complexities of annotations on rows, columns or individual cells that we often see in government data.&lt;/p&gt;

&lt;p&gt;Nevertheless, there&amp;#8217;s huge potential here to provide a fairly easy route to the publication of linked data for people who are familiar with spreadsheets, in particular one that can be tweaked and extended to allow for the variety and complexity of real-world data.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/145#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/54">datagovuk</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/59">gridworks</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/46">linked data</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/58">provenance</category>
 <enclosure url="http://www.jenitennison.com/blog/files/finance_supplier_payments_2010_q2_project.tar.gz" length="458733" type="application/x-gzip" />
 <pubDate>Sun, 22 Aug 2010 22:23:32 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">145 at http://www.jenitennison.com/blog</guid>
</item>
<item>
 <title>Translating Existing Models to RDF</title>
 <link>http://www.jenitennison.com/blog/node/142</link>
 <description>&lt;p&gt;As we encourage linked data adoption within the UK public sector, something we run into again and again is that (unsurprisingly) particular domain areas have pre-existing standard ways of thinking about the data that they care about. There are existing models, often with multiple serialisations, such as in XML and a text-based form, that are supported by existing tool chains.&lt;/p&gt;

&lt;p&gt;In contrast, if there is existing RDF in that domain area, it&amp;#8217;s usually been designed by people who are more interested in the RDF than in the domain area, and is thus generally more focused on the goals of the typical casual data re-user rather than the professionals in the area.&lt;/p&gt;

&lt;!--break--&gt;

&lt;p&gt;To give an example, the international statistics community uses &lt;a href=&quot;http://sdmx.org&quot;&gt;SDMX&lt;/a&gt; for representing and exchanging statistics (and a lot more besides; it&amp;#8217;s a huge standard). SDMX includes a well-thought through model for statistical datasets and the observations within them, as well as standard concepts for things like gender, age, unit multipliers and so on. By comparison, &lt;a href=&quot;http://sw.joanneum.at/scovo/schema.html&quot;&gt;SCOVO&lt;/a&gt;, the main RDF model for representing statistics, barely scratches the surface in comparison.&lt;/p&gt;

&lt;p&gt;This isn&amp;#8217;t the only example: the &lt;a href=&quot;http://inspire.jrc.ec.europa.eu/&quot;&gt;INSPIRE Directive&lt;/a&gt; defines how geographic information must be made available. &lt;a href=&quot;http://www.gigateway.org.uk/metadata/standards.html&quot;&gt;GEMINI&lt;/a&gt; defines the kind of geospatial metadata that that community cares about. The &lt;a href=&quot;http://openprovenance.org/&quot;&gt;Open Provenance Model&lt;/a&gt; is the result of many contributors from multiple fields, and again has a number of serialisations.&lt;/p&gt;

&lt;p&gt;You could view this as a challenge: experts in their domains already have models and serialisations for the data that they care about; how can we persuade them to adopt an RDF model and serialisations instead?&lt;/p&gt;

&lt;p&gt;But that&amp;#8217;s totally the wrong question. Linked data doesn&amp;#8217;t, can&amp;#8217;t and won&amp;#8217;t replace existing ways of handling data. But it has got some interesting features that can bring great benefit to people who want to publish their data, namely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;web-scale addresses&lt;/strong&gt; &amp;#8212; being able to name and refer to things like individual observations in a statistical hypercube, a particular road junction, or the particular process that led to something being created&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;annotation&lt;/strong&gt; &amp;#8212; the ability to record metadata about everything that you can name, which is everything!&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;distributed publication&lt;/strong&gt; &amp;#8212; enabling multiple publishers to control the publication of their data without having to upload it to a central location&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;links&lt;/strong&gt; &amp;#8212; the joining of information to other information, providing more context, supporting more queries and reducing the requirement for duplication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The question is really about how to enable people to reap these benefits; the answer, because HTTP-based addressing and typed linkage is usually hard to introduce into existing formats, is usually to publish data using an RDF-based model alongside existing formats. This might be done by generating an RDF-based format (such as RDF/XML or Turtle) as an alternative to the standard XML or HTML, accessible via content negotiation, or by providing a &lt;a href=&quot;http://www.w3.org/TR/grddl/&quot;&gt;GRDDL&lt;/a&gt; transformation that maps an XML format into RDF/XML.&lt;/p&gt;

&lt;p&gt;Either way, the underlying model needs to be mapped into RDF. We&amp;#8217;re furthest down this road with &lt;a href=&quot;http://groups.google.com/group/publishing-statistical-data&quot;&gt;statistical data&lt;/a&gt;. I wanted to explore here what it might look like for the Open Provenance Model, building on lessons learned from the statistical domain.&lt;/p&gt;

&lt;h2&gt;Open Provenance Model&lt;/h2&gt;

&lt;p&gt;The Open Provenance Model talks about three main &lt;strong&gt;nodes&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;artifacts&lt;/strong&gt;, which are the things that are produced or used by processes&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;processes&lt;/strong&gt;, which are actions that are performed using or producing artifacts&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;agents&lt;/strong&gt;, which are the people or systems that perform actions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and five kinds of &lt;strong&gt;edges&lt;/strong&gt; that can be defined between them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;process A &lt;strong&gt;used&lt;/strong&gt; artifact B&lt;/li&gt;
&lt;li&gt;artifact A &lt;strong&gt;was generated by&lt;/strong&gt; process B&lt;/li&gt;
&lt;li&gt;process A &lt;strong&gt;was controlled by&lt;/strong&gt; agent B&lt;/li&gt;
&lt;li&gt;process A &lt;strong&gt;was triggered by&lt;/strong&gt; process B&lt;/li&gt;
&lt;li&gt;artifact A &lt;strong&gt;was derived from&lt;/strong&gt; artifact B&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then things start getting more complicated. OPM indicates that each artifact and agent plays a different &lt;strong&gt;role&lt;/strong&gt; when it is used by, generated by or controls a process. What&amp;#8217;s more, each artifact and agent might be involved in the process at different &lt;strong&gt;times&lt;/strong&gt; (though timing information is optional within OPM). And a given provenance graph may contain several &lt;strong&gt;accounts&lt;/strong&gt; of how artifacts, processes and agents fit together.&lt;/p&gt;

&lt;h2&gt;Existing Mapping to RDF&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;http://openprovenance.org/model/opm.owl&quot;&gt;OWL ontology for OPM&lt;/a&gt; for OPM is a very literal mapping of OPM into RDF. Each of the types of nodes is a separate class, and each of the types of edges is a separate class. Thus, it introduces a lot of n-ary relationships. Take a really simple example of an XML file being transformed into HTML using XSLT. With the OPM ontology, the RDF would look something like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;_:transformation a opm:Process .
&amp;lt;doc.html&amp;gt; a opm:Artifact .
&amp;lt;doc.xml&amp;gt; a opm:Artifact .
&amp;lt;doc.xsl&amp;gt; a opm:Artifact .
_:processor a opm:Agent .
_:Jeni a opm:Agent .

_:stylesheetLink a opm:Used ;
  opm:effect _:transformation ;
  opm:cause &amp;lt;doc.xml&amp;gt; ;
  opm:role eg:xsltSource .

_:sourceLink a opm:Used ;
  opm:effect _:transformation ;
  opm:cause &amp;lt;doc.xsl&amp;gt; ;
  opm:role eg:xsltStylesheet .

_:resultLink a opm:WasGeneratedBy ;
  opm:effect &amp;lt;doc.html&amp;gt; ;
  opm:cause _:transformation ;
  opm:role eg:xsltResult .

_:processorLink a opm:WasControlledBy ;
  opm:effect _:transformation ;
  opm:cause _:processor ;
  opm:role xslt:processor .

_:userLink a opm:WasControlledBy ;
  opm:effect _:transformation ;
  opm:cause _:Jeni ;
  opm:role xslt:user .

_:derivation a opm:WasDerivedFrom ;
  opm:effect &amp;lt;doc.html&amp;gt; ;
  opm:cause &amp;lt;doc.xml&amp;gt; .

xslt:source a opm:Role ;
  opm:value &quot;source&quot; .

xslt:stylesheet a opm:Role ;
  opm:value &quot;stylesheet&quot; .

xslt:result a opm:Role ;
  opm:value &quot;result&quot; .

xslt:processor a opm:Role ;
  opm:value &quot;processor&quot; .

xslt:user a opm:Role ;
  opm:value &quot;user&quot; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To give you an idea of what this mapping means, if I wanted to work out who created &lt;code&gt;doc.html&lt;/code&gt;, I would have to do a query like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT ?who
WHERE {
  ?generatedBy 
    opm:cause &amp;lt;doc.html&amp;gt; ;
    opm:role xslt:result ;
    opm:effect ?transformation .
  ?controlledBy
    opm:effect ?transformation ;
    opm:role xslt:user ;
    opm:cause ?who .
}
&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Some Observations&lt;/h2&gt;

&lt;p&gt;There are two things that I want to pull out about the RDF mapping described above.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it&amp;#8217;s incredibly literal; every entity type within the model is mapped onto an RDF class, including the edges, the roles and the accounts (which I didn&amp;#8217;t show above)&lt;/li&gt;
&lt;li&gt;it doesn&amp;#8217;t reuse any existing vocabularies, even when they might help (such as for the &amp;#8216;value&amp;#8217; of a role, which is really a label)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It reminds me of the mapping of object-oriented or relational data models into each other or into XML, which often result in a god awful mess and people swearing that technology X is goddamned ugly. &lt;/p&gt;

&lt;p&gt;The fact is that elegant uses of each modelling paradigm &amp;#8212; ones that are easy to understand and efficient to query &amp;#8212; always take advantage of the unique features of that paradigm. For example, good XML vocabularies take advantage of the distinctions between attributes and elements, of nesting and hierarchies, and of the ability to hold mixed content.&lt;/p&gt;

&lt;p&gt;It&amp;#8217;s the same with RDF. There are four features of RDF that I think good vocabularies will take suitable advantage of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;existing vocabularies&lt;/li&gt;
&lt;li&gt;inheritance&lt;/li&gt;
&lt;li&gt;shortcuts and reasoning&lt;/li&gt;
&lt;li&gt;named graphs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Reusing existing vocabularies&lt;/strong&gt; takes advantage of the ease of bringing together diverse domains within RDF, and it makes data more reusable. For example, an OPM mapping that encourages the reuse of FOAF for people and organisations saves time and effort for the developers of the OPM RDF vocabulary, that they would otherwise have spent modelling the details of agents; and it means that any agents that are described within the description of a piece of provenance are automatically available as agents in the wider FOAF cloud. The same goes for using DOAP to describe software.&lt;/p&gt;

&lt;p&gt;By reusing vocabularies, the data isn&amp;#8217;t isolated any more, locked within a single context designed for a single use. This is a huge benefit of the linked data approach and it makes sense to leverage it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Using inheritance&lt;/strong&gt; means creating general purpose classes and properties and encouraging other people to use &lt;code&gt;rdfs:subClassOf&lt;/code&gt; or &lt;code&gt;rdfs:subPropertyOf&lt;/code&gt; to specialise them according to their own requirements. Within OPM, the different roles that artifacts and agents might play in a process is a natural fit with either sub-properties or sub-classes, depending on how the edges in the model are represented. For example, rather than&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;_:stylesheetLink a opm:Used ;
  opm:effect _:transformation ;
  opm:cause &amp;lt;doc.xsl&amp;gt; ;
  opm:role eg:xsltStylesheet .

xslt:stylesheet a opm:Role ;
  opm:value &quot;stylesheet&quot; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;you could generate data that looked like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;_:stylesheetLink a xslt:Stylesheet ;
  opm:effect _:transformation ;
  opm:cause &amp;lt;doc.xsl&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;where &lt;code&gt;xslt:Stylesheet&lt;/code&gt; is defined as a subclass of &lt;code&gt;opm:Used&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Inheritance is a basic form of &lt;strong&gt;reasoning&lt;/strong&gt;. In the case of the subclass relationship outlined above, the reasoning is that anything that is a &lt;code&gt;xslt:Stylesheet&lt;/code&gt; is also a &lt;code&gt;opm:Used&lt;/code&gt;, and thus:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;_:stylesheetLink a xslt:Stylesheet .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;implies&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;_:stylesheetLink a xslt:Used .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Taking the scenario where you&amp;#8217;re doing native linked data publishing &amp;#8212; storing data in a triplestore and then publishing it out from there &amp;#8212; you have two choices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you can store just the basic data, and let the application retrieving it carry out whatever reasoning is necessary to derive the information they need; this limits the size of the triplestore, but can place a large burden on people using it &amp;#8212; either they have to be very familiar with the exact choices made in modelling the basic data, or they have to construct complex SPARQL queries that take account of the fact that the data might be modelled in many different ways&lt;/li&gt;
&lt;li&gt;you can store not only the basic data but also anything that can be derived from it; this increases the number of triples you have to store, but means that people can query it without having to perform any reasoning themselves&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The latter is obviously the more user-friendly approach. (And a triplestore could make it easy by understanding and applying schemas, ontologies and rules as data is loaded in.)&lt;/p&gt;

&lt;p&gt;To take a more complex example, provenance could be modelled in a much more direct way, such as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;doc.html&amp;gt; a opm:Artifact ;
  opm:derivedFrom &amp;lt;doc.xml&amp;gt; ;
  opm:generatedBy [
    xslt:source &amp;lt;doc.xml&amp;gt; ;
    xslt:stylesheet &amp;lt;doc.xsl&amp;gt; ;
    xslt:processor _:processor ;
    xslt:user _:Jeni ;
  ] .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;where &lt;code&gt;xslt:source&lt;/code&gt; and &lt;code&gt;xslt:stylesheet&lt;/code&gt; are sub-properties of a property called &lt;code&gt;opm:used&lt;/code&gt;, and &lt;code&gt;xslt:processor&lt;/code&gt; and &lt;code&gt;xslt:user&lt;/code&gt; are sub-properties of &lt;code&gt;opm:controlledBy&lt;/code&gt;. This removes the n-ary properties, which (given the use of inheritance to represent roles) are only actually needed if the model needs to capture the timing of the involvement of particular artifacts or agents within a process, and makes the provenance information much easier to query than before:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT ?who
WHERE {
  &amp;lt;doc.html&amp;gt; opm:generatedBy ?transformation .
  ?transformation xslt:user ?who .
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;But what if we also want to support the more complex, n-ary-relation-based models? We would need to assert, somehow, a rule that said that the presence of a &lt;code&gt;opm:controlledBy&lt;/code&gt; relationship from a process to an agent was equivalent to having a &lt;code&gt;opm:WasControlledBy&lt;/code&gt; instance with a &lt;code&gt;opm:cause&lt;/code&gt; pointing to the agent and an &lt;code&gt;opm:effect&lt;/code&gt; pointing to the process. Combine this with &lt;code&gt;xslt:user&lt;/code&gt; being sub-property of &lt;code&gt;opm:controlledBy&lt;/code&gt; and you have the statement:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;_:transformation xslt:user _:Jeni .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;implying:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;_:transformation opm:controlledBy _:Jeni .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;which in turn implies:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;[] a opm:WasControlledBy ;
  opm:effect _:transformation ;
  opm:cause _:Jeni .
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The same reasoning could be applied in the opposite direction, of course. Part of the definition of the use of OPM in RDF could be that the presence of a &lt;code&gt;opm:WasControlledBy&lt;/code&gt; with a &lt;code&gt;opm:cause&lt;/code&gt; pointing to an agent and &lt;code&gt;opm:effect&lt;/code&gt; pointing to a process implies a &lt;code&gt;opm:controlledBy&lt;/code&gt; link between the &lt;code&gt;opm:effect&lt;/code&gt; and the &lt;code&gt;opm:cause&lt;/code&gt;. Whichever was used in the initial modelling of the data, the same query could be used to query the data (accepting some loss of precision along the way, but if you&amp;#8217;re not interesting in timing information then why should you suffer the cost of querying through n-ary relations?).&lt;/p&gt;

&lt;p&gt;The final thing that I mentioned above that mappings from existing models to RDF should take advantage of is &lt;strong&gt;named graphs&lt;/strong&gt;. In OPM, the obvious way that named graphs could play a role is in providing support for the different &lt;em&gt;accounts&lt;/em&gt; of provenance. Separate named graphs could be used to represent separate accounts, referencing the same artifacts, agents and processes where appropriate. Individually, the graphs can remain simple; together, you have the full power of OPM.&lt;/p&gt;

&lt;h2&gt;Conclusions&lt;/h2&gt;

&lt;p&gt;Modelling is a complex design activity, and you&amp;#8217;re best off avoiding doing it if you can. That means reusing conceptual models that have been built up for a domain as much as possible and reusing existing vocabularies wherever you can. But you can&amp;#8217;t and shouldn&amp;#8217;t try to avoid doing design when mapping from a conceptual model to a particular modelling paradigm such as a relational, object-oriented, XML or RDF model.&lt;/p&gt;

&lt;p&gt;If you&amp;#8217;re mapping to RDF, remember to take advantage of what it&amp;#8217;s good at such as web-scale addressing and extensibility, and always bear in mind how easy or difficult your data will be to query. There is no point publishing linked data if it is unusable.&lt;/p&gt;
</description>
 <comments>http://www.jenitennison.com/blog/node/142#comments</comments>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/46">linked data</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/57">modelling</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/58">provenance</category>
 <category domain="http://www.jenitennison.com/blog/taxonomy/term/31">rdf</category>
 <pubDate>Sat, 13 Mar 2010 20:35:46 +0000</pubDate>
 <dc:creator>Jeni</dc:creator>
 <guid isPermaLink="false">142 at http://www.jenitennison.com/blog</guid>
</item>
</channel>
</rss>

