Metadata about RDF triples: reification and Linked Data

  • This discussion is closed: you can't post new comments.
  • This discussion is closed: you can't post new comments.
  • This discussion is closed: you can't post new comments.

Those of you who have been following this blog will know that I’ve been thinking recently about how to handle uncertainty related to RDF triples (specifically in the context of a genealogical web app). Certainty isn’t the only kind of metadata-about-triples that you’d want to keep in an app like this. We need to know things like:

  • who made the statement
  • when the statement was made
  • what evidence that led to the statement being made
  • licensing information about the reuse of the statement
  • (if we go with the rating idea) what ratings the statement has been given
  • (if we allow editing of statements) what changes have been made to the statement over time

and so on. In short, all the metadata that you’d want to associate with resources you’d also want to associate with statements.

I’d anticipated using reification to associate metadata with statements. Something like this

<rdf:Statement rdf:about="#statement1">
  <rdf:subject rdf:resource="/people/CharlesDarwin" />
  <rdf:predicate rdf:resource="/ontology/event-roles/passenger" />
  <rdf:object rdf:resource="/events/BeagleVoyage" />
  <dc:creator rdf:resource="/users/JeniT" />
  <dc:date rdf:datatype="xsd:date">2008-04-11</dc:date>
  <g:certainty rdf:datatype="xsd:decimal">1.0</g:certainty>
  ...
</rdf:Statement>

or using rdf:ID, although this does limit the URI of our statements to hash-URIs:

<rdf:Description about="/people/CharlesDarwin">
  <r:passenger rdf:ID="#statement1" rdf:resource="/events/BeagleVoyage" /> 
</rdf:Description>
<rdf:Description about="#statement1">
  <dc:creator rdf:resource="/users/JeniT" />
  <dc:date rdf:datatype="xsd:date">2008-04-11</dc:date>
  <g:certainty rdf:datatype="xsd:decimal">1.0</g:certainty>
</rdf:Description>

(Please feel free to correct my RDF, RDF-folks!)

We can embed this information into our web pages using RDFa:

<div about="#statement1" instanceof="rdf:Statement">
  <p class="statement">
    <a rel="rdf:subject" href="/people/CharlesDarwin">
      Charles Darwin
    </a>
    was a
    <a rel="rdf:predicate" href="/ontologies/event-roles/passenger">
      passenger
    </a>
    on the
    <a rel="rdf:object" href="/events/BeagleVoyage">
      <span about="/people/CharlesDarwin" 
            rel="r:passenger" 
            resource="/events/BeagleVoyage">
        Beagle Voyage
      </span>
    </a>
  </p>
  <dl class="metadata">
    <dt>Author:</dt>
    <dd>
      <a rel="dc:creator" href="/users/JeniT">
        Jeni Tennison
      </a>
    </dd>
    <dt>Date:</dt>
    <dd property="dc:date" datatype="xsd:date" 
        content="2008-04-11">
      11 Apr, 2008
    </dd>
    <dt>Certainty:</dt>
    <dd property="b:certainty" datatype="xsd:decimal"
        content="1.0">
      <img src="stars5.gif" alt="five stars" />
    </dd>
  </dl>
</div>

Note that I’ve incorporated both the reified statement and the statement itself into the RDFa. If I’m correct in my mental parsing of RDFa, I think this leads to the set of triples from the RDF/XML in the above examples plus the triple:

</people/CharlesDarwin> r:passenger </events/BeagleVoyage> .

But then the other day, I was reading the tutorial How to publish Linked Data on the Web, which says

We discourage the use of RDF reification as the semantics of reification are unclear and as reified statements are rather cumbersome to query with the SPARQL query language. Metadata can be attached to the information resource instead, as explained in Section 5.

Jumping to Section 5, I find

Metadata: The representation should contain any metadata you want to attach to your published data, such as a URI identifying the author and licensing information. These should be recorded as RDF descriptions of the information resource that describes a non-information resource; that is, the subject of the RDF triples should be the URI of the information resource. Attaching meta-information to that information resource, rather than attaching it to the described resource itself or to specific RDF statements about the resource (as with RDF reification) plays nicely together with using Named Graphs and the SPARQL query language in Linked Data client applications…

There are some examples of what this looks like within the tutorial. The first is an “authoritative description” found at http://dbpedia.org/data/Alec_Empire after a 303 redirection from http://dbpedia.org/resource/Alec_Empire.

# Metadata and Licensing Information
<http://dbpedia.org/data/Alec_Empire>
    rdfs:label "RDF description of Alec Empire" ;
    rdf:type foaf:Document ;
    dc:publisher <http://dbpedia.org/resource/DBpedia> ;
    dc:date "2007-07-13"^^xsd:date ;
    dc:rights <http://en.wikipedia.org/wiki/WP:GFDL> .

# The description
<http://dbpedia.org/resource/Alec_Empire> 
    foaf:name "Empire, Alec" ;
    rdf:type foaf:Person ;
    rdf:type <http://dbpedia.org/class/yago/musician> ;
    rdfs:comment
        "Alec Empire (born May 2, 1972) is a German musician who is ..."@en ;
    rdfs:comment
        "Alec Empire (eigentlich Alexander Wilke) ist ein deutscher Musiker. ..."@de ;
    dbpedia:genre <http://dbpedia.org/resource/Techno> ;
    dbpedia:associatedActs 
      <http://dbpedia.org/resource/Atari_Teenage_Riot> ;
    foaf:page <http://en.wikipedia.org/wiki/Alec_Empire> ;
    foaf:page <http://dbpedia.org/page/Alec_Empire> ; 
    rdfs:isDefinedBy <http://dbpedia.org/data/Alec_Empire> ;
    owl:sameAs <http://zitgist.com/music/artist/d71ba53b-23b0-4870-a429-cce6f345763b> .

The second is a non-authoritative description found at http://sites.wiwiss.fu-berlin.de/suhl/bizer/pub/LinkedDataTutorial/ChrisAboutRichard:

# Metadata and Licensing Information
<>
    rdf:type foaf:Document ;
    dc:author <http://www.bizer.de#chris> ;
    dc:date "2007-07-13"^^xsd:date ;
    cc:license <http://web.resource.org/cc/PublicDomain> .

# The description
<http://richard.cyganiak.de/foaf.rdf#cygri> 
    foaf:name "Richard Cyganiak" ;
    foaf:topic_interest <http://dbpedia.org/resource/Category:Databases> ;
    foaf:topic_interest <http://dbpedia.org/resource/MacBook_Pro> ;
    rdfs:isDefinedBy <http://richard.cyganiak.de/foaf.rdf> ;
    rdf:seeAlso <> .

Note that rdfs:isDefinedBy does not necessarily point to the data you get when you retrieve the resource, but to an (presumably there can be more than one) authoritative description of the resource. It’s also associated with a particular resource rather than a particular statement.

To know which metadata applies to a particular statement, an application must know where it got the statement from. In effect, a statement here has four parts: subject, property, object and location (with the possibility that multiple statements with the same subject, property and object might have different locations and therefore different metadata). This is similar to assigning an ID to a statement, as with rdf:ID, but restricts the statement’s identifier to being the location where it was found.

So what does that mean for the genealogical web app? Well, in the app we’re going to find any given statement by a particular user quoted on lots of pages. I was intending to RDFa them all but that would mean lots of duplicate statements from different locations, potentially bloating applications that were harvesting the data.

I can’t work out whether I like or loathe the Linked Data concept of associating metadata with the document in which you find triples. In some ways it seems very natural — look for information about a resource at the URI for the resouce — but the metadata mechanisms restrict where you can place statements on the web (or at least assign semantics to their location which aren’t necessarily intended), and that seems like a Bad Thing. On the other hand, perhaps I’m just being overly influenced by the desire to use RDFa, which does lead one to want to mark up data wherever it appears.

I’d welcome any advice.

Comments

Re: Metadata about RDF triples: reification and Linked Data

In my view, the best way to approach these problems is to convert the verb - the predicate - into a noun. John Sowa has illustrated this very nicely with Conceptual Graphs. In one of his examples, he has a cat chasing a mouse: (cat, chase, mouse). This can always be converted so that there is an act (the “chasing”). The cat is the chaser, or actor, and the mouse is the chasee or “patient” (that which is acted upon).

With this transformation, the instance of “chasing” can be annotated with as much meta data as you want. Note that this approach is essentially the same as a typical Topic Maps approach. It is also essentially isomorphic to a relational database approach, where this instance of a chase would be a row in a “chase” table. Indeed, rdf models can be seen as very highly normalized relational models.

IMO, most real-world modeling needs to reflect clusters of properties, and those properties are usually grouped together in specific ways, rather than being diffusely spread out over a graph. I like to call these kinds of constructions “idioms”.

Another way of looking at it is that in RDF we can’t directly make statements about a statement. But each instance of a predicate actually represents the entire statement that it occurs in. So if we can make it into a noun, or subject, we can then use it as the hub of all the associated meta data. We could do this by actually creating an instance of the predicate type, but for some reason RDFers don’t favor this method. Alternatively, we can in effect reify the predicate by transforming the original statement into the alternative form I mentioned above. I.e., the verb “chase” when reified becomes an instance of “the chasing”.

Well, this is very long-winded and I don’t know that it helps you directly in writing RDFa for your case, but it is really the situation from a modeling point of view.

Re: Metadata about RDF triples: reification and Linked Data

Hi Jeni, a few links for ya: on named graphs - http://www.w3.org/2004/03/trix/

on n-ary relations (like Tom’s refactoring, a lot of places you can avoid making statements about statements by tweaking the model a bit) - http://www.w3.org/TR/swbp-n-aryRelations/

the W3C validator is invaluable for checking RDF/XML (I’d forgotten what rdf:ID did until I pasted a sample in and looked at the diagram): http://www.w3.org/RDF/Validator/

in case you get bored, stuff related to timbl’s N3 and formulae: http://www.w3.org/DesignIssues/Reify.html http://www.w3.org/2000/10/swap/doc/Rules

Re: Metadata about RDF triples: reification and Linked Data

Many thanks, Danny. Named graphs as in TriX give me a nice warm feeling; the Linked Data method I described in this post, of using the URI at which triples are found is then a default rather than the only method of naming a graph.

I also found TriG for a Turtle-based readable syntax, which is pleasing.

So what I’m missing now is a way to name graphs in RDFa. A graph attribute holding a URIorCURIE should do it, with the nearest graph to a triple determining which graph it belongs to…

(I don’t think n-ary relations are right for statements-about-statements; I’m not prepared to stretch the model that far!)