POWDER: A Balance Between Operational Practicality and Semantic Formality

Document status

This document was written following a meeting with Jeremy Carroll, Stuart Williams and Dan Brickley on 18 December 2007. It is offered as input the W3C POWDER WG but is not a formal document of that group. Furthermore, it represents a personal interpretation of the problems to be overcome and should not be taken as being endorsed by the working group members. For these reasons, it is not appropriate for this document be posted in the group's Web space.


Understanding the problem

As recorded in recent posts in the POWDER blog, the basic model for a Description Resource that we've developed in our current published documents, has been the subject of considerable discussion since the face to face meeting in Boston last month.

The WG has sought, and is grateful to have received, a lot of input from leading figures in the Semantic Web. So exactly what is the problem and, more pressingly, what's the solution?

Here's attempt to summarise the problems:

  1. POWDER is about describing collections of resources, typically "a Web site". RDF is designed to describe individual resources such as an image, a document etc. Front and centre, that's a big problem.
  2. POWDER is likely to be used by people who are not specialists in Semantic Web technology so that usability and simplicity are important factors. Complex descriptions quickly become highly verbose and cannot readily be manipulated by non-specialists.
  3. RDF has well-defined semantics so that what 'looks obvious' to a human may not be what a machine interprets — and we really want machines to be able to process the meaning of the data (for smart content personalisation).
  4. Since RDF was first developed there has been an active and ongoing debate about how to trust the data, something that is critical to POWDER. How can you trust any data if you don't know where it's from? Who is making the assertions? Several solutions have been suggested including: reification, named graphs and signed graphs. All of which have problems, at least for now.
  5. Although reification remains part of the RDF specs, the consensus in the Smenatic Web community is that it perhaps should have been deprecated in 2004 and that it is a dangerous thing to use (again, the formal semantics don't always fully agree with a human interpretation). Therefore, the message is clear "don't use reification." Which is a pity because the Owl based model developed in Boston depends on it.
  6. Named graphs. If you have a named graph, the graph itself, that is, the block of RDF, can itself be the subject of further description including provenance and so on. See Carroll et al. But, this is not fully specified and, as the recent mailing list thread shows, there is no consensus on this (at least for now).
  7. There is ongoing work on signing RDF — or rather, signing a particular serialisation of a given graph — and methods like PGP and XML signature can be used. These are all of potential use in POWDER applications but the POWDER WG does not wish to mandate a particular signing technology as a) it is beyond the scope of the group and b) different methods may be more appropriate in different circumstances. i.e. you need a different level of trust depending on the importance of whatever it is the DR is describing and the consequences of that information being true or false.
  8. As suggested within the WG, it seems the best way forward is for a DR to always be in a self contained file with a URI that does not have a fragment ID. That way you can add metadata to the DR using rdf:about="" and thus provide information on the provenance of all the triples in the file. The problem with this is that we anticipate organisations such as trustmark operators to make many thousands of DRs available as an RDF dump.
  9. There's no 'quoting mechanism' in RDF. You can't say "I believe that x has property y" without asserting that x has property y — which means that "x has property y" is a piece of data on the Web that may be handled without any reference to the entity that asserted it.
  10. If you create a blank node and give it some properties, the formal semantics say that this means that there must exist at least one thing in the universe that has those properties. POWDER is about generalised statements and could well describe something that doesn't exist at the time the description was created.
  11. It's very easy to create self-denying DRs — something like I assert that all resources on example.com are on example.org.
  12. A DR typically has an issue date and a valid until date. Temporal conditions are not part of the RDF model.

Taken together these problems do rather beg the question: should we be using RDF at all?

Yes. If we can make it work, then yes, this really is the best way.

The Semantic Web is about the machine processing of meaning. A description of a Web site that can be understood (as far as today's computers can be made to understand anything) can provide a much richer set of personalisation options now and in the future than a series of essentially meaningless strings in, say, an XML file (or a PICS label). This and the extensibility and flexibility of RDF are at the core of its potential. OWL, and especially OWL 1.1, hold even greater promise for the future interpretation and processing of Description Resources. A DR is the expression of an opinion. By design, that opinion is open to question and may be contradicted by others — it's a social interaction between the person providing the description and the person evaluating the described content. That's not a linear relationship and needs a multi-dimensional technology at its heart — and that's what the Semantic Web provides.

Towards a solution

The problems listed above perhaps provide an illustration of possible work to be done in future on RDF itself. Some elements of the road map seem reasonably clear already (for example, SPARQL supports the concept of named graphs). But that's not what the POWDER WG is for. So let's look for a way forward.

Here's some RDF/XML with relatively loose meaning that is simple enough to be manipulated by hand if necessary. It's a slightly amended version of the currently published DR example.

1  <DR rdf:id="DR_1" xmlns:="http://www.w3.org/2007/05/powder"
                     xmlns:ex="http://example.org/schema">
2    <maker>http://authority.example.org/foaf.rdf#me</maker>
3    <issued>2007-07-02</issued>
4    <validFrom>2008-07-07</validFrom>
5    <validUntil>2008-07-07</validUntil>

6    <resourceSet rdf:parseType="Resource">
7      <includeHosts>example.org</includeHosts>
8    </resourceSet>

9    <descriptors rdf:parseType="Resource">
10     <ex:property1>value 1</ex:property1>
11     <ex:property2>value 2</ex:property2>
12     <ex:property3 rdf:resource="http://../value3"/>
13     <ex:property4 rdf:resource="#value4"/>
    
14 <!-- NOT: -->
15     <ex:property5>
16        <rdf:Description> <!-- blank node -->
17           ...
18        </rdf:Description>
19     </ex:property5>
20 <!-- end NOT --> 
21   </descriptors>

22   <rdf:Description rdf:about="#value4">
23      ...
24   </rdf:Description>

25   <description>Textual information to display to end users</description>
26 </DR>

This looks rather like a DR but there are some formal elements missing— and the basic idea is that those formal elements can be added through a GRDDL transform. The output of that transform will define the Descriptors class (and possible the Resource Set) using OWL property restrictions and much better-defined semantics overall.

Good news: we can use a lot of what we have already published and add the formal semantics to those documents.

This two-tier approach is not dissimilar to the one we discussed in Boston. There we posited an XML instance that would use GRDDL to transform it into RDF/XML but here we use RDF/XML for the simple — operationally practical — version too. The verbose version will have greater power and flexibility and there will be situations that demand the 'full version' but for the simple use cases that drive POWDER, DRs expressed as in the example above will be sufficient. In effect we have POWDER Lite and POWDER Full but the two are yolked together through the GRDDL transform.

Some points:

Line 1: The GRDDL transform itself will be associated with the namespace so no further declaration is necessary. Making POWDER the base namespace keeps the tags nice and simple, only the descriptors need a prefix.

Lines 2 -5 are the familiar metadata about the DR. These will be 'GRDDLed' into things like <foaf:maker rdf:resource=" http://authority.example.org/foaf.rdf#me" />.

Lines 6 - 8 is the Resource Set. Note that we use RDF/XML syntax to create a blank node. Following 'problem 10' above, this means, formally, that there exists at least one thing in the universe that has the property of includeHosts with the value example.org. It therefore becomes a requirement that the Resource Set is not empty.

Lines 10 - 13: This shows the benefit of using RDF/XML for POWDER Lite rather than just XML. It's easy to put in values for properties or link to classes defined elsewhere.

Notice that on line 9 we use the same trick as in line 6 to create a blank node. This is OK if and only if that blank node has properties with literals or named classes as values. Having a property that links to another blank node (lines 15 - 19) is NOT OK in this situation as the formal semantics break down. This is a limitation on POWDER Lite that, I believe, will not apply in POWDER Full.

At the time of writing the remaining open questions surround the identification of the graph. The example above gives an identity to the DR which is likely to be OK, at least for POWDER Lite. As noted above, the use cases mean that it would be operationally awkward to require each DR to be in a discrete RDf instance. The concept of a Package with its ordered, closed list of DRs also looks to be supportably in POWDER Lite. The GRDDL transform should be able to generate fully-defined DRs that would be mutually consistent when processed in any order.

Next steps

The next step will be to create the fully-defined, verbose output of the GRDDL transform, then create the transform itself and do some tests, including semantic tests.

As noted above, the currently publish documents will not need radical re-writing — we just need to add some more detail and more formalism along the lines that Jeremy Carroll wrote in a recent post on the subject..

Phil Archer
19 December 2007

Addendum 7/1/08

I've been playing with the POWDER Lite example above and have corrected some errors to create the example below. [RDF/XML, graph].

1  <rdf:RDF
2     xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
3     xmlns = "http://www.w3.org/2007/05/powder#"
4     xmlns:ex="http://example.org/schema">

6    <DR rdf:ID="DR_1">
7      <maker>http://authority.example.org/foaf.rdf#me</maker>
8      <issued>2007-07-02</issued>
9      <validFrom>2008-07-07</validFrom>
10     <validUntil>2008-07-07</validUntil>

11     <resourceSet rdf:parseType="Resource">
12       <includeHosts>example.org</includeHosts>
13     </resourceSet>

14     <descriptors rdf:parseType="Resource">
15       <ex:property1>value 1</ex:property1>
16       <ex:property2>value 2</ex:property2>
17       <ex:property3 rdf:resource="http://../value3"/>
18       <ex:property4 rdf:resource="#value4"/>
19     </descriptors>

20     <description>Textual information to display to end users</description>

21   </DR>

22   <rdf:Description rdf:about="#value4">
23     <ex:property5>value 5</ex:property5>
24   </rdf:Description>

25 </rdf:RDF>