- markup (76)
- xml (11)
- xslt (23)
- pipelines (8)
- atom (9)
- overlapping markup (6)
- schema (11)
- creole (5)
- dtll (1)
- xforms (1)
- xpath (1)
- xquery (2)
- coding (2)
- datagovuk (1)
- genealogy (4)
- hardware (1)
- linked data (16)
- modelling (1)
- named graphs (1)
- opendata (1)
- provenance (1)
- psi (3)
- skos (1)
- sparql (4)
- Talis (7)
- unicode (1)
- uri (4)
- versioning (1)
- visualisation (6)
- web (78)
- google (4)
- html5 (5)
- jQuery (2)
- rdf (46)
- ontologies (2)
- rdfa (8)
- rdfQuery (5)
- rest (6)
- wikis (1)
- work (3)
- legislation (2)
- xmlsummerschool09 (2)
- life (28)
- children (5)
- equality (6)
- gtd (1)
- environment (4)
- gadgets (5)
- software (3)
- xlinq (2)
- conferences (11)
- ukgc09 (1)
- xtech (9)
- xtech2008 (3)
- blog (8)
- drupal (3)
Re: Overlap, Containment and Dominance
Speaking only for myself of course, although since Jeni and I are “on the same team” (working on LMNL) we do find our attitudes frequently align.
As I see it, #2 and #4 are actually not mutually exclusive, if a syntactic representation were to be taken as a warrant for a preferred graph, while applications remain free to do other things with a serialization, or more things, than represent that particular graph (as XML applications arguably are free with respect to XML syntax).
In any case, the core of the issue is what the information model is. Much progress was made at the Goddag workshop in exploring the limits that might best be placed on the Goddag structure proposed by Huitfeldt and Sperberg-McQueen in order to ensure that it could be serialized in the form of a markup instance. But the TexMECS/Goddag approach has always been #1, not #2 or #4. This raises the issue of whether the particular structure implied by the syntax (such as TexMECS or XML), where the dominance relation is implicit in the order of tagging, is actually always going to be the best structure, even while other approaches such as #3 (taking a CREOLE instance, say, as not only a schema but a set of structural declarations) might yield a different structure from the same syntactic instance.
This issue aside, taking up approach #2. Given an information model — let’s say a Goddag structure with limitations to ensure serializability — the question gets to be how to represent dominance relations. One solution might be:
(Note: I leave aside issues relating to what CMSMcQ calls “spurious overlap”, namely when tag ordering seems not to reflect containment or enclosure properly, such as <b|bold <i|italic|b>|i>. LMNL deals with this by saying that tag ordering in itself carries no information, and enclosure is to be inferred only from the relation among the tagged ranges as such, so the ‘i’ range is enclosed by the ‘b’ range, despite appearances. But other definitions of the relation between markup and model may differ on this point.)
The main question that arises in my mind about this approach would be the overhead necessary to check and enforce well-formedness, especially with respect to the third rule. It’s partly due to this that I tend to agree with Jeni that any syntax is likely to be just too heavy duty.
This is especially the case since in practice, I think users of any system that really handled overlap would have to rely on a structural validation mechanism over and above well-formedness checking in order to maintain their data properly — schemas would be even more essential (and more necessarily switchable) than with XML. If this is the case, then we are leaning back towards options #3 or #4.
The key distinction between approach #2 and either #3 or #4 is that in the latter, the document structures can only be maintained at the level of the “document type” (either essential, as per #3, or incidental, as per #4), whereas in approach #2, one could have syntactic instances that were invalid with respect to the type — if, say, a range pointed to a parent that was syntactically legal (by virtue of its being enclosed by it) but semantically unsound and invalid according to its schema.
Whether this would be a feature or a bug remains to be seen, I think. But as to that, I guess I can even imagine schemas for which such a mechanism would be needed to disambiguate which of several graphs was meant — with the possibility of expressing an invalid graph a necessary corollary of this.
Hm.