To me, the biggest deficiency in RDF is how hard it is to associate metadata with statements. I’ve talked before about the requirement in the genealogical application I’m toying with to provide metadata such as who made a statement, when, based on which source, the certainty in it and so on. But there’s one type of metadata that I think is required in practically every domain: the temporal scoping of statements.
In the London Gazette RDFa work that we’ve been doing for OPSI at TSO, there are frequently notices that contain a statement that is not true at the time the notice is published, but will become true on a particular date in the future. Take this example from a notice published on 10th February 2009:
Jikoa Chiwete Monu, ceased to be a Partner in John & Co Solicitors, of Suites G, H & I, 1st Floor, 135-143 Stockwell Road, London SW9 9TN, with effect from 30 January 2009.
We know that Jikoa Chiwete Monu was a partner in John & Co Solicitors up to 30 January 2009, and was not a partner in John & Co Solicitors after 30 January 2009. We know that on 10 February 2009, the address of John & Co Solicitors was the one given in the notice, but we don’t know that was their address a year ago.
Notices like this are published every day in the London Gazette. The assertions in the notices reflect the state of the world on the day the notice is published, or (as in this case) at some other known date. Also, the date on which a notice is published is important: in many cases it has legal significance. But it would be wrong to believe that the statements in the notices continue to be true indefinitely, or that they were true before the date of publication.
There seem to be two acceptable ways of handling the problem, and one unacceptable way. The unacceptable way is to give a reified triple and hang metadata from that reified object. Something like:
:triple1 a rdf:Statement ;
rdf:subject :JikoaChiweteMonu ;
rdf:predicate partnerships:isMemberOf ;
rdf:object :JohnAndCoSolicitors ;
dc:temporal [
g:endDate "2009-01-30"^^xsd:date
] .
:triple2 a rdf:Statement ;
rdf:subject :JohnAndCoSolicitors ;
rdf:predicate organisation:hasAddress ;
rdf:object [
a vcard:Address ;
vcard:street-address "135-143 Stockwell Road" ;
vcard:locality "London" ;
vcard:postal-code "SW9 9TN"
] ;
dc:temporal [
g:includesDate "2009-02-10"^^xsd:date
] .
The reason that this is unacceptable is that reified statements aren’t incorporated into triplestores in the same way as normal statements, so you can’t query as naturally on these statements as you can on unreified statements. It’s even more of a problem when some of the triples are reified and some aren’t.
On to the acceptable ways of representing the temporal scope of the statements. One is to create objects that represent things like ‘membership’ and ‘occupation’. Then you can do things like:
:membership a partnerships:Membership ;
partnerships:hasMember :JikoaChiweteMonu ;
partnerships:hasPartnership :JohnAndCoSolicitors ;
dc:temporal [
g:endDate "2009-01-30"^^xsd:date
] .
:occupation a organisation:Occupation ;
organisation:isOccupiedBy :JohnAndCoSolicitors ;
organisation:hasAddress [
a vcard:Address ;
vcard:street-address "135-143 Stockwell Road" ;
vcard:locality "London" ;
vcard:postal-code "SW9 9TN"
] ;
dc:temporal [
g:includesDate "2009-02-10"^^xsd:date
] .
The other approach is to use named graphs to provide metadata about the statements. This next example uses TriG syntax, in which statements can be clustered into named graphs:
:G1 {
:JikoaChiweteMonu partnerships:isMemberOf :JohnAndCoSolicitors .
}
:G2 {
:JohnAndCoSolicitors organisation:hasAddress [
a vcard:Address ;
vcard:street-address "135-143 Stockwell Road" ;
vcard:locality "London" ;
vcard:postal-code "SW9 9TN"
] .
}
{
:G1 dc:temporal [
g:endDate "2009-01-30"^^xsd:date
] .
:G2 dc:temporal [
g:includesDate "2009-02-10"^^xsd:date
] .
}
I prefer the latter approach because it doesn’t require any forward planning: you don’t have to think before hand about the ways in which things might change over time in order to make statements about the temporal scope of assertions. If you look at existing ontologies, such as FOAF, they don’t really address the fact that people might change their name or (even more likely) place of work over time.
If you think about it long term (which we have to given that the Gazette goes back 350 years or thereabouts), even the details of an address can change over time, as streets can change name, be reclassified as boundaries change to belong to different localities and regions, and can change their postcode. Introducing additional resources for all these temporally scoped statements would cause an amazing amount of bloat and lots of indirection.
The named graph method also, in my opinion, enables more natural querying of the dataset. You can do something like:
SELECT ?locality
WHERE {
:JohnAndCoSolicitors organisation:hasAddress ?address .
?address vcard:locality ?locality .
}
rather than:
SELECT ?locality
WHERE {
?occupation organisation:isOccupiedBy :JohnAndCoSolicitors ;
organisation:hasAddress ?address .
?address vcard:locality ?locality .
}
Named graphs can be queried with SPARQL. So with the named graph method, it’s possible to find look for partners of :JohnAndCoSolicitors prior to a particular date:
SELECT ?partner
WHERE {
GRAPH ?graph {
?person partnerships:isMemberOf :JohnAndCoSolicitors .
}
OPTIONAL {
?graph dc:temporal ?scope .
?scope g:endDate ?date .
FILTER (?date > "2009-01-01"^^xsd:date)
}
}
But the standard way of naming a graph seems to be to use the location from which the graph was retrieved. This makes a lot of sense as a default: you can infer the provenance, and that the statements on the page were true on the day the RDF representation was retrieved. But it doesn’t work in cases like that above where there are several statements on the page with different provenance or certainty or temporal scope (or any other statement metadata you might care to mention).
We’re serving the London Gazette notices as RDFa, so the challenge is how to incorporate information about the graph in which a triple appears within RDFa.
A generalised way to do it would be to introduce a new graph attribute, defaulting to the base URI of the page. Then we could do:
<meta about="[:G2]" property="g:onDate" content="2009-02-10" datatype="xsd:date" />
...
<p graph="[:G1]" about="[:JikoaChiweteMonu]">
Jikoa Chiwete Monu, ceased to be a Partner in
<span rel="partnerships:isMemberOf" resource="[:JohnAndCoSolicitors]">
John & Co Solicitors, of
<span graph="[:G2]" rel="organisation:hasAddress">
<span typeof="vcard:Address">
Suites G, H & I, 1st Floor,
<span property="vcard:street-address">135-143 Stockwell Road</span>,
<span property="vcard:locality">London</span>
<span property="vcard:postal-code">SW9 9TN</span>
</span>
</span>
</span>, with effect from
<span about="[:G1]" property="g:endDate"
content="2009-01-30" datatype="xsd:date">30 January 2009</span>.
</p>
Another possibility would be to use the id attribute to label parts of the page that hold statements, and use fragment identifiers in about attributes to provide metadata about those particular statements:
<meta about="#G2" property="g:onDate" content="2009-02-10" datatype="xsd:date" />
...
<p id="G1" about="[:JikoaChiweteMonu]">
Jikoa Chiwete Monu, ceased to be a Partner in
<span rel="partnerships:isMemberOf" resource="[:JohnAndCoSolicitors]">
John & Co Solicitors, of
<span id="G2" rel="organisation:hasAddress">
<span typeof="vcard:Address">
Suites G, H & I, 1st Floor,
<span property="vcard:street-address">135-143 Stockwell Road</span>,
<span property="vcard:locality">London</span>
<span property="vcard:postal-code">SW9 9TN</span>
</span>
</span>
</span>, with effect from
<span about="#G1" property="g:endDate"
content="2009-01-30" datatype="xsd:date">30 January 2009</span>.
</p>
The disadvantage of this approach is that you can’t have statements in two different parts of the page that belong to the same graph. On the other hand, it’s valid!
I’m really interested in other approaches that people have used to address the requirement of associating metadata with triples, particularly using RDFa. I’m also interested to know if anyone has existing vocabularies for periods of time with known start/end dates and included dates.
Comments
Re: Temporal Scope for RDF Triples
To quote:
I believe this is a problem with the triple stores and not necessarily RDF. Although an RDF document may have mixed reified triples and normal triples, the store can reify all non-reified triples and insert a normal triple for every reified triple. After this simple parsing, what is the issue? I do not know about SPARQL making it difficult to deal with reification, but it seems that RDF is expressive enough to deal with this.
Nonetheless, reification is pretty annoying with RDF.
Re: Temporal Scope for RDF Triples
We are currently working on a project trying to provide geo data for the University of Oxford called OxPoints. After a long debate on what data model to choose we finally decided to go for RDF. Since one of the requirements is to store historical data, we thought about how to introduce a time dimension to RDF and came to similar conclusions (see http://oxforderewhon.wordpress.com/2008/11/28/rdf-and-the-time-dimension-part-1/). We have started to implement the first prototype and so far everything seems to work out. I hope to be able to give an update on the current status of OxPoints soon. If you find the time, I would really be interested in your comments.
Re: Temporal Scope for RDF Triples
I completely agree with Richard here - best to represent it purely in RDF. Hence, I have a preference for your second example, with:
For completeness, another way of representing that information (similar to the Named Graph approach) would be in N3 (a superset of RDF), and using a graph literal.
Re: Temporal Scope for RDF Triples
Oh, almost forgot: Toby Inkster had basically the same idea about graphs in RDFa and has written it up already: Named Graphs in RDFa (“RDFa Quads”)
Re: Temporal Scope for RDF Triples
While the one-named-graph-per-doc seems a lot more elegant (because you just name the single doc/graph with a HTTP URI and lots of things work), I think the multiple-… is probably necessary. Compare blog front pages or RSS/Atom feeds - multiple nameable (permalinkable) chunks within a single doc. Big drawback with HTML is what happens when you pull the chunks out from the doc wrapper, losing the html:head (Atom allows individual entries to exist as first-class named entities too, smart move).
Dunno how well Toby’s spec handles this aspect, but on a skim it generally looked promising.
Re: Temporal Scope for RDF Triples
I'm a bit hesitant about serializing Named Graphs in everyday Web documents. I heavily use Named Graphs to track provenance, by loading each document into its own Named Graph internally. If I would allow documents to contain several Named Graphs, then essentially someone could fool my store by claiming that this is the graph representing some other document. Does this mean I need to move to Quintuples for my internal data model?
That's why I prefer solutions that remain within the standard RDF data model, and I advocate the solution of creating classes representing the time-scoped relationship in your vocabulary.
Jiri Prochazka and I have written up a little draft RDFS extension that might help with mapping such relationship classes into the pure triple form: Property Reification Vocabulary.
But anyway, this is an important and interesting topic and it's good to see this well thought out proposal.
Re: Temporal Scope for RDF Triples
Thanks Richard,
I get the argument for naming graphs based on where they’re retrieved from as a means of establishing provenance. I thought about responding that the source of a particular graph could be recorded against the graph, but of course if you have multiple sources of the same graph, you can’t distinguish between the triples from different sources.
So instead, I take this as an argument for limiting the method of naming graphs to fragments within the page (ie use something like the
idattribute or something with the same lexical space).(By the way, do you honour the
<base>element in RDFa (orxml:basein RDF/XML) when you name a graph based on a retrieved document, or use the retrieval URI? I ask because obviously the<base>orxml:basecould point to any URI, raising the same issues, but I would have thought that they’d be honoured because they usually are when working out the base URI of a document.)The Property Reification Vocabulary is interesting. To continue the example I used in the post, I think it would mean using:
and then in the ontology having something like:
(I’ve changed the reified property name from
organisation:hasAddresstoorganisation:hasOfficeto prevent something horribly recursive from happening.)The only thing I wonder is whether it would ever actually be implemented. Statement reification doesn’t seem to be supported in triple stores despite being baked right into the specs. Property reification requires the same kind of dual-view of any triple that uses a reified property as is required in statement reification, and, as far as I can tell, an ontology like that above requires OWL Full (because the reified property is being treated as a class as well as a property).
Conversely (in favour of named graphs), triple stores are already actually quad stores, and SPARQL can be used over them without the requirement for any level of reasoning capability.
I’m not particularly advocating using named graphs, by the way, just trying to work out the best practical way of addressing this real-world (and pressing!) problem.
Re: Temporal Scope for RDF Triples
Instead of using the URI of the document you retrieved the data from you can do the following when loading external data into your store (given that it's a HTTP URI!):
Use the HTTP vocabulary to record the details of the retrieval process. Create internal URIs for the http:Request and http:Response resources you're describing. Put all retrieved triples into your store using the URI of the http:Response resource as the named graph. The document URI would be available through http:requestURI. You can make the description as detailed as you want, from just storing basic facts about the retrieval process to storing all HTTP headers and the whole message body. That way you're actually building a HTTP cache. This also means that you can load one and the same document more than one time into your store and keep the data (which might have changed) separate. This seems more natural to me since the retrieved representation of the resource doesn't depend on the resource's URI alone but also on the request parameters and the time of retrieval.
Of course this doesn't solve the problem of what to do with the named graphs used in the response. As Richard said, you'd have to extend your store to be a quintuple store. And then you load data from someone who also uses quintuples and you need to add another dimension to your store (and to the serialisation syntax). Not workable. But I don't think the solution of creating classes for things like membership works either because then you can't re-use widely deployed vocabularies - and in the end, wouldn't you end up creating classes for everything?
Re: Temporal Scope for RDF Triples
Jeni, it’s true that Named Graphs are already in SPARQL stores, but support for the different serialisation syntaxes is still very patchy. So they work well as the de-facto data model of any modern SPARQL store, but so far aren’t really used for sending stuff over the wire.
I believe that property reification can be implemented fairly easily in any environment that has an RDF rules engine, the definition of a reified property translates directly into rules. I also think that this proposal fits well into the RDF technology jigsaw because it addresses a longstanding problem in mapping between vocabularies.
Allowing only local fragments as graph names would address my concerns about messing up provenance, I think. You could allow only graph=”foo”, which would translate to documentURI#foo (rather than baseURI#foo). This is quite a nifty idea actually. Since we now don’t need to encode arbitrary URIs/CURIEs but only a fragmentID, one could also steal a bit of the namespace of some existing HTML attribute, e.g. class=”x-graph-foo” which would again translate to documentURI#foo. This way, to standard RDFa parsers the document would look completely normal.
Re: Temporal Scope for RDF Triples
I never did understand the objections to OWL Full anyway ;)
I was wondering about piggy-backing on
class(ala eRDF) too. I thought justgraph-foomight do it as I don’t think anxprefix has any particular significance withinclass? But usingidalso seems natural. I might try it out in rdfQuery and see what works.Re: Temporal Scope for RDF Triples
Or else it’s time to bite the bullet and switch to topic maps, which (being invented by librarians rather than logicians) allow any statement to be scoped by a topic (= resource), such that the statement is only true within the scope of that topic.
Of course, standard TM serialization is almost as messy as standard RDF serialization; someone desperately needs to invent a clean serialization of TM in terms of the TAO (topic/association/occurrence) model.
Re: Temporal Scope for RDF Triples
Maybe this could be interesting as well http://people.kmi.open.ac.uk/carlos/resources/ontologies/time-ontology.lisp Some colleagues from KMi have deveoped this Time Ontology which brings support for reasoning about time.
Re: Temporal Scope for RDF Triples
Jeni, I suggest you have a look at Martin Fowler’s discussion of things that change in time (http://martinfowler.com/ap2/timeNarrative.html), if you haven’t already. For example, temporal information can require more than just end dates, depending on what you need to do with that information. Cheers, Tony.
Re: Temporal Scope for RDF Triples
In a topic maps project I’ve used a set of properties similar to that used in HEML (www.heml.org) - earliest start, latest start, earliest end, latest end. Which allows you to be fuzzy about the time period. Also HEML allows you to do relative positioning of events (e.g. Event A ends before start of Event B, or Event C occurs during Event D).
Also, if you are going back 350 years you have the problem of Gregorian vs Julian dates to worry about. Using a Julian Day Number instead of a calendar date makes you calendar-independent.
Re: Temporal Scope for RDF Triples
Anyone having ideas/opinions on using OWL2 Punning for this?
See use case 15, http://www.w3.org/TR/2008/WD-owl2-new-features-20081202/#F12:_Punning