In the Linked Data world, we talk a lot about having URIs that are identifiers for things, and making them HTTP URIs so that they can be dereferenced and people can find more information about those things.
This raises the questions of “what information should you publish?” Let’s make this concrete by using a real example: UK Legislation, which TSO is publishing for OPSI as Linked Data.
UK Legislation now has a set of URIs that are explicitly intended to be used as unique identifiers for items of legislation and parts, sections, subsections and so on within them. If you request one of these URIs, requesting RDF/XML, you will get some information about that bit of legislation, such as:
So we provide some basic information, and the links we know about, ie those within UK Legislation.
It turns out that lots of things aside from UK Legislation reference legislation, and that when you publish information about them it’s helpful to be able to point to the relevant legislation. For example:
These are all inward pointers. As we publish information about UK Legislation, we won’t know about all these links to the information we publish. But people who access information about UK Legislation might well want to know about those links. Wouldn’t it be useful to know — given an item of legislation — what it makes illegal, what it compels local authorities to do, which administrative areas it defines, which notices it has caused to be published?
We were discussing the same issue the other day in respect of spatial objects. The Ordnance Survey, or other organisations peddling spatial data, may define spatial objects, but other people define the things that those spatial objects represent, such as schools, roads, parks and so on. It’s obviously useful to go from a school to the spatial objects that represent its buildings, but it would also be useful to go from a spatial object that is a school building to the school.
So what should we, as publishers, do about the inward links (that we know about)? When we publish information about something should we also try to publish information about the things that (we know) reference that thing? I think the answer’s “yes,” at the very least in any human-readable access we give to the information. And from that come two further thoughts:
If you are publishing data with outward links, it would be a good idea to provide feeds or other mechanisms that enable people to pull in basic information about the things that you’re publishing that link to something they’re publishing. SPARQL queries would do, but something a bit less general purpose and more approachable — I’m thinking a URL like http://example.org/links?url=http://example.net/linked/resource — would be better.
Information from another source is going to have different provenance/trust etc characteristics than the primary information you publish. That needs to be clearly indicated somehow; sounds to me like a requirement for named graphs.
Comments
Re: Publishing Information About Inward Links
Hi Jeni,
I think that one of the interesting things indexing services like Sindice will do as they build up maps of mentions of things by URI is to build up those inward links. e.g. in looking for (void:)Datasets I've used Sindice to search for [?thing rdf:type void:Dataset]. Something similar ought (in the fullnes of time) to work for finding things that mention a particular piece of legislation (at least at the leaf-level).
Stuart
--
http://sindice.com/search?q=rdf%3Atype&qv=http%3A%2F%2Frdfs.org%2Fns%2Fv...
Content negotiation for getting meta data.
"...a set of URIs that are explicitly intended to be used as unique identifiers..."
"If you request one of these URIs, ..., you will get some information about that bit of legislation..."
Sorry, I realise this is, at best, very tangential to your main point but I really think doing a request on a URI should either return a representation of the resource or should redirect to information about the resource. It shouldn't return a representation of something else (e.g., information about the original resource). I think it's an abuse of the content negotiation mechanism to use it to distinguish between a request for a resource and a request for metadata about that resource.
On the other hand, I don't have a better suggestion. I first asked the TAG about this a while ago:
http://lists.w3.org/Archives/Public/www-tag/2007Aug/0075.html
and they seem to have recently decided it's something worth talking about:
http://www.w3.org/2001/tag/group/track/issues/62
Still, I'm not going to hold my breath.
Re: Content negotiation for getting meta data.
What happens with the legislation API is that if you request an identifier URI for some item of legislation (which is a non-information resource) you will get a 303 response which redirects you to an information resource (representing the current information about that legislation). Requests to that information resource will be given a response appropriate to the requested representation, eg RDF/XML or HTML. I glossed over this because, as you say, it’s tangential to the point of the post :)
But I am slightly uncomfortable about the current state of affairs in that requests for RDF/XML to the information resource URI will provide metadata about it, whereas requests for HTML to that same URI will provide content (and some metadata). I’m still debating (largely with myself) about whether the 303 redirection should point at different resources for metadata vs content.