« Youtube data disclosures: The limits of data governance. | Main | RSS feeds: ICANN correspondence and minutes »

Si tacuisses, Enrique, ...

Among the great privileges of working at W3C is the occasional geeking with people like Michael Sperberg-McQueen's evil twin Enrique.

Enrique's latest is on what RDF gets us. In that blog item, RDF is characterized as an extremely thin semantic layer -- interestingly, ignoring the RDF Semantics recommendation. The point of that recommendation is that RDF is -- even when you ignore RDF schema, OWL and friends -- more than just nodes, arrows, and URIs.

The critical piece that's added is a bit of logic that effectively tells you the following rules (which are really flip-sides of each other):

  • You can always add more stuff, and that won't invalidate anything you've learned so far.
  • You can always remove stuff, but you won't learn anything new if you do.

If you think of RDF as a framework to do web-scale data aggregation, then these are very useful principles: They guarantee that you won't run into a world of inconsistency when you discover additional information, and they also guarantee that you can learn things about the world piece by piece. These principles permit relatively stupid and generic software to draw useful conclusions without knowing anything about the "real" meaning of data. They are also why comparing XML and RDF is comparing apples and oranges: There's nothing in XML that permits software to make similar assumptions; XML's semantic layer is indeed thinner than RDF's. All the interesting logic needs to be dealt with on the application layer.

Now, one important piece of Enrique's thinking is that precisely the thinness of RDF's semantic layer (similar to the thinness of XML's) is what makes it appealing. So, what does the semantic layer that the RDF Semantics add mean for that argument? The gain is clear, in that tools can make stronger assumptions about the data they deal with, and some aspects of application logic are pushed deeper in the stack. The price, though, is that those who model data on top of RDF need to understand what constraints are imposed on them by the format's properties -- in an RDF world, there isn't much of a "no"; "si tacuisset, philosophus mansisset" is a conclusion that won't work, since once you're a philosopher, you remain so till the end of your days.

RDF semantics, therefore, is exposed to criticism from two angles: On the small scale, it imposes restrictions on those who model data -- restrictions that are harder to understand than those imposed by just using XML trees, and that can indeed bite badly. On the large scale, real life isn't monotonic (we invalidate prior knowledge all the time), and RDF's modeling can't deal with that. The first of these criticisms is ultimately about the ability of people to use the model. The second is about the problem space to which the model can be applied.

XML is "dumb" enough to not be subject to either of these criticisms. It is, however, not even trying to address the issues that large-scale data integration and aggregation will bring.

TrackBack

TrackBack URL for this entry:
http://log.does-not-exist.org/mt/mt-tb.cgi/2127

Comments (3)

Thank you, Thomas. Nice post. Enrique says to remind you that his analysis was done long ago, before the RDF Semantics spec was published; I think he's lying about that, although it's true he has never read it as far as I know. Your post raises some important questions, which I have attempted to address in a response in my own worklog, at http://people.w3.org/~cmsmcq/blog/?p=66
"They guarantee that you won't run into a world of inconsistency when you discover additional information" huh? I might learn A is disjoint from B, and then later learn that c is in both A and B, which is clearly inconsistent. What did you mean?
The inconsistency that you describe is on the level of a specific set of interpretations (namely, those that attribute the meaning "disjoint" to a certain URI). There are other interpretations for which the same set of triples will consistent. (But yes, the way I wrote things in the blog item was indeed painting things with too broad a brush.)

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on July 20, 2008 2:46 PM.

The previous post in this blog was Youtube data disclosures: The limits of data governance..

The next post in this blog is RSS feeds: ICANN correspondence and minutes.

Many more can be found on the main index page or by looking through the archives.

Creative Commons License
This weblog is licensed under a Creative Commons License.
Powered by
Movable Type 3.35