When we encourage people to put their data on the web as linked data, the biggest question is “How?”. There are so many “How?” questions to answer:
and, of course:
One of the biggest selling points of linked data is that it’s supposed to facilitate web-scale distributed publication of data. Just as with the human web, anyone can publish data at their local site without having to go through any kind of central authority.
Just as with the human web, convergence on particular sets of URIs for particular kinds of things can happen in an evolutionary way: in a blog post I might point to Amazon when I want to talk about a particular book, Wikipedia to define the concepts I mention, people’s blogs or twitter streams when I mention them.
And with everyone using the same terms to talk about the same things, there’s the prospect of being able to easily pull together information from completely different sources to find connections and patterns that we’d never have found otherwise.
What’s been very unclear to me is how this distributed publication of data can be married with the use of SPARQL for querying. After all, SPARQL doesn’t (in its present form) support federated search, so to use SPARQL over all this distributed linked data, it sounds like you really need a central triplestore that contains everything you might want to query.
This post is an attempt to explore this tension, between distributed publication and centralised query, and to try to find a pattern that we might use within the UK government (and potentially more widely, of course) to publish and expose linked data in a queryable way. It’s a bit sketchy, and I’d welcome comments.
As we encourage linked data adoption within the UK public sector, something we run into again and again is that (unsurprisingly) particular domain areas have pre-existing standard ways of thinking about the data that they care about. There are existing models, often with multiple serialisations, such as in XML and a text-based form, that are supported by existing tool chains.
In contrast, if there is existing RDF in that domain area, it’s usually been designed by people who are more interested in the RDF than in the domain area, and is thus generally more focused on the goals of the typical casual data re-user rather than the professionals in the area.
As you probably know, I’ve been working quite a lot recently on the UK government’s use of linked data, and in particular on providing guidance for people who want to publish their data as linked data. One of the things that we need to provide guidance about is how to publish linked data that changes over time. I’ve touched on this topic before but things have progressed now to the stage where we have to make some real, practical, recommendations.
data.gov.uk was finally launched to the public last week (still in beta, but now a more public beta than the beta that it’s been in for the last few months). It’s a great step forward, and everyone involved should be proud of both the amount of data that’s been made available and the website itself, which (unlike a lot of UK government IT) was developed rapidly by a small team based on open source software (and at low cost).
This is a first step on a long road.
This is the fifth part in this series about creating linked data. I’ve talked previously about analysis and modelling, defining URIs, defining concept schemes and defining a vocabulary. In this instalment I’ll talk about the finishing touches that can make linked data easier to browse, query, locate and trust.
Note that we don’t have to do any of these things; they’re not part of the core data. We shouldn’t beat ourselves up if we don’t have time to do it right now, because we can always add them later, and it might be that you just don’t agree that they should be done. But many of them don’t take a lot of time and can enhance the user’s experience of the data.
This is the fourth instalment in a series about turning an existing dataset into some linked data. I’ve previously talked about analysis and modelling, defining URIs and defining concept schemes. In this instalment, we’ll look at developing a schema in which we define the classes, properties and datatypes that we want to use in the RDF that describes the things in our dataset.
This is the third instalment in a series that I’m writing about turning data into linked data. I’m using traffic count data as the example, since that’s a dataset that I’m currently working on. In the last two instalments, I talked about analysing and modelling the data and about designing URIs for the things in that model.
Within the model, there are three sets of things that are concepts:
This is the second instalment in a series of posts about how to create linked data from existing data sets, using traffic count data as an example. In the last instalment, I talked about analysing and modelling data. This instalment discusses the creation of URIs for the various things that have been identified within the model.
This part of the process is the same as what you’d do if you were simply creating a RESTful API to a website. The principal is that everything has a URI, and if you resolve that URI you get information about the thing.
In the Linked Data world, we talk a lot about having URIs that are identifiers for things, and making them HTTP URIs so that they can be dereferenced and people can find more information about those things.
This raises the questions of “what information should you publish?” Let’s make this concrete by using a real example: UK Legislation, which TSO is publishing for OPSI as Linked Data.