A while back, when various alternative forms of merging RSS 1.0 with RSS 0.9x were being invented and RSS 2.0 was still called RSS 0.94, I asked a question at a Burningbird's (aka Shelley Powers') comments page. Today Shelley posted her answer. This is my follow-up.
(Before anything else: THANK YOU, Shelley, for your time, your patience [*], and for actually looking down and taking notice of a little guy while the rest of the big ones are too loud to hear anybody but their own voice.)
(If you're new to RDF (like I am), pause now, and read her piece. Then read the RDF Primer. Then, of course, come back.)
I've used the last few days to complete some reading on RDF. From the little I know, I fully agree with Shelley that RDF is potentially very useful. RDF, essentially a model for describing and linking distributed islands of information, could be the glue that takes us to the next level of automated knowledge representation and mining. It also has the power to make me sound like I work for marketing ;-)
The story doesn't end here, however.
I'm a developer with some XML experience. I also have some RSS experience. Yet when I look at RSS 1.0 (the specification that unifies syndication with RDF technology), I get an uneasy feeling. From lurking around the RSS-DEV mailing list (and reading one or two weblogs on the subject), I think mine is not a unique experience.
And I think I know where the gap between RDF people (like Shelley, who is writing a book on the subject[**]) and XML people (like myself) lies. The gap is the RDF/XML Syntax Specification. You see, I think this spec fails to answer the needs of XML developers.
What are "the needs of XML developers"? We, XML developers [***], need backward-compatible XML formats that we can parse with our existing XML tools. In particular, when we already have an XML schema (either virtual, or a DTD, or a real XML Schema document) in place, we don't want the addition of RDF to break any existing stuff.
This has been done before. Take an existing, plain-vanilla, data-centric, XML document. What does it take to add a stylesheet (CSS or XSL) to it? Surprisingly, all you need to do is add just a single processing instruction (PI) to it. The stylesheet comes completely out-of-band. It doesn't affect the document. A single stylesheet can be used by a dozen XML documents (a major winning point of CSS in the HTML world). But more importantly, it doesn't break existing tools, and can be ignored. Adding that PI to a document has almost no chance of breaking anyone who is reading your documents. It is also a very minor change to the tools producing your documents. XML Schema works the same way.
The RDF/XML Syntax Specification (let's make that RXSS) is a completely different beast. Where stylesheets and XML Schema are out-of-band, RXSS is in-band. Where stylesheets and XML Schema allow you to keep your documents as they are, RXSS requires you to modify them. At best, it asks you to add attributes to existing elements (such as the rdf:about attribute mentioned in my question). I say "at best" because adding an attribute is still somewhat "out of band", in the sense that existing XPath queries are probably not going to break. At worst, you need to introduce various elements into your document to appease RXSS.
Now if cheap, fully-debugged, performant RDF tools were commonly available, we could all become RDF developers and leave XML behind. Until this is the case, however, some of us (the vast majority of us, I think) must still work at the XML level. At that level, RXSS fails. It fails because you can't take an existing XML document and treat it as an RXSS document.
For starters, RXSS demands that the root element be rdf:RDF (where rdf is the namespace prefix for the namespace http://www.w3.org/1999/02/22-rdf-syntax-ns#). Existing specialized tools you have built for your XML documents all need to be checked and possibly modified to handle the old root document suddenly becomes a child element. This is stupid.
Before I continue pouring cold water on RXSS, I want to discuss an important issue Shelley raises. In her answer, Shelley says:
[U]sing straight XML is equivalent to only allowing communication with one verb -- To Have.
I respectfully disagree. The XML spec does not say what "<x><m><l/></m></x>" means. The XML spec is about a model (a tree, with some special considerations such as the notion of order of child elements) and a way to serialize and de-serialize this model to and from text files. What an XML application does with the data is completely unspecified.
Nothing in XML says that the x element (note that in XML we're talking about elements, not the entities they represent, because there's no such association) "has" the m element. This association can be made in various ways. For example, you can write an XML application that has a special knowledge that whenever it sees an m element that is a child of the x element, their relation is that of ownership ("has"). Or, you can use XML Schema to say a similar thing (for example, you assert that the x element is an object of a class and the m element is a data field of that class). Or, you could have an RXSS document that says the relation of an x element that has a child element m is that of ownership. Only you can't, because RXSS is not out-of-line, and seems a bit obsessive about making all predicates explicit.
Reading the RXSS document, I couldn't [****] come up with a way to keep the existing structure of RSS 0.9x without requiring me to modify existing documents. In theory, RDF should allow it (see RDF Model Theory). There's no reason why we can't map an existing RSS 0.9x document to an RDF graph, other than RXSS's articifical restrictions. I can take an RSS 0.9x document and derive from it the necessary <subject, preficate, object> triplets with ease.
But while such mapping can be done on paper, it can't be done in RXSS. For example, I would like the channel element in RSS to be an RDF resource. I can't just say that the "//rss/channel" element (to use XPath) is a nameless resource (a "blank node", I think is the proper RDF name). No, I have indicate it is a resource by adding an rdf:about attribute. (There are other ways, and they all require some modification to the element.)
Now, RSS is a very simple format. The channel element is always going to be a resource, and while it seems reasonable (indeed, useful) to have a global name for an RSS channel, I don't see why RXSS is not flexible enough to let me keep the RSS structure as-is, while providing the information necessary for constructing an RDF model in an out-of-band file.
Consider another example. The items of an RSS channel are not located as child elements of the channel element. This is not too disturbing, right? However, there's no way to associate the channel with its items other than to introduce an RDF collection element into the channel element consisting of links to the items. Now imagine what this means: whenever an item is added or removed from the RSS document, a change must occur in two places -- the item itself, and the channel's item collection. Why can't an external document say that all the item elements in the document are associated (using the same predicate!) with the channel resource is beyond me.
Shelley later writes (and this is, I think, the central two paragraphs of her answer, around which all others revolve):
Representing this within XML requires a set of syntactic rules that ensures we don't accidentally shove a predicate next to a predicate and so on. There are rules for how to identify a subject, and how to add a predicate. There are rules for how to repeat properties (predicate-object pairs), and how to group properties. There are even rules for how to create a statement about a statement (known in RDF as 'reification', though I prefer 'RDF's Big Ugly', myself). But fundamentally the rules break down into nothing more than node-edge-node-edge-node, forming a particularly interesting XML pattern called The Striped RDF/XML syntax.
Rule's that basically say that predicates can't be nested directly beneath predicates (edges next to edges) or that whole node-edge-node thing gets blown out of the water. And rules that state when an rdf:about attribute can be applied. In my simplified RDF/RSS, the rdf:about attribute can't be applied directly to the ITEM element because ITEM in this instance is acting as a predicate, with an implied URI of "item" -- it can't act as a new subject, too. Edge-edge.
It is here that the differences between RDFers and XMLers is greatest, I think. RDF not only has this beautiful model (the RDF graph) and a simple way to construct it (triplets). It also has a language, RXSS, that says says how a serialization of the model looks like, and that language is incompatible with existing XML practices. To an XML guy, there's nothing wrong with an edge/edge construct, as long as somewhere somehow there is a schema that says such constructs imply there's an arc with a known URI leading from one to the other.
RXSS is a language upon itself. What's the difference between document formats and langauges? People write document formats assuming they'll have to parse it, and so they try to constrain document creators as far as possible (without losing expressiveness, of course). Languages, OTOH, are usually much more loose -- they usually provide more than one way to do something. RXSS is a language in this respect, as it can say the same thing in many different syntaxes. While this is very nice to document authors (who get to pick up the syntax that's most useful to them), people who parse the document get the dirty job of supporting multiple syntaxes. As I indicated already, if you have an RDF parser that reads the syntax and builds a normalized object database for you, you generally don't care. But if your tool is the plain-old XML parser, you really want to limit the freedom authors have. While the core RSS 1.0 is quite nice in that it restricts the syntax people can use in documents, I get the feeling some of its extension modules are based on the assumption consumers have full RDF toolkits. No constraints are put on the variety of syntax options RXSS allows.
Bottom line is, RSS can be a showcase problem for RDF. The "core" RSS has shown itself to be both simple and useful. RSS extension ideas are invented by the dozens, and many of them look like they're in the problem domain RDF was created to solve (describing resources and their relations). However, RSS developers are more likely to develop using XML tools (and thinking) rather than RDF tools. If we could take an existing XML spec, add a single "link-to-RDF-schema" attribute to it, and get something which is both compatible with existing tools and compliant with RDF tools, then we have achieved something.
[*] If this doesn't bring a smile to you Bb, I don't know what will.
[**] Don't take it the wrong way: To me, an "RDF Person" is an "XML Person" who -- instead of modelling knowledge in terms of elements and attributes -- is trying to climb up the ladder of abstractions by using RDF graphs. If RDF proves successful, we'll all climb that ladder.
[***] With Yom Kippur only a few days away, I ask in advance all the XML developers who don't want me to represent them for their forgiveness.
[****] A bit late, perhaps, but better than never: I base this post on what I understood from reading various documents on RDF, primarily the primer, RXSS, RDF Schema, and the first section of the RDF Model. It is entirely possible that there are ways of doing all this, which I have missed. If so, they are not mentioned in the RSS 1.0 spec, and so all my rants should be lifted from RXSS and transferred to that document instead.
|