>

Saturday, April 5, 2003
> RSS 2 :: Breaking it down
Exploring topics in RSS2.0.

I've been doing some thinking about how to encode topic information into RSS2.0 feeds.  As a simple test of the Radio callback facility I have implemented a very simplistic protocol.  Within each <item> is a tag

<topic id="topic_id" type="topic-type" source="url">topic name</topic>

for each topic associated with the item (post).  A concrete example (using the rsstopics namespace):

<rsstopics:topic rsstopics:id="the_state" rsstopics:source="http://matt.blogs.it/topics/topicsT.html#the_state" rsstopics:type="generic">the state</rsstopics:topic>

Whilst this does have the advantage that it's simple and direct it's also a bit silly to invent a new format for topic information when we have two standard culprits available already:

RDF is a general format for describing resources.  A resource in RDF terms is anything which can be uniquely identified by a URI.  An RDF statement (utilizing Dublin Core metadata) that asserts me as the owner of my weblog might look something like:

<rdf:Description rdf:about="http://matt.blogs.it">
    <dc:Creator>Matt Mower</dc:Creator>
</rdf:Description>

If you cut away the syntactic fluff what this says is:

Matt Mower is the Creator of http://matt.blogs.it

Referring back to the problem at hand, describing what a post (expressed as an RSS item) is about we could come up with something like:

<item rdf:about="permalink">
    <topic id="topic_id" type="topic-type" source="url">topic name</topic>
</item>

Which is more or less exactly where we started -- using RDF hasn't altered the solution but it has added some framework around it (in this case adding rdf:about to signal the presence of RDF data within the item).  However we can go a step further.  A useful article by Eric van der Vlist discusses this very subject and refers to the RSS1.0 taxonomy module.

Somewhat counter to what you would expect RSS2.0 does not follow on from RSS1.0, nor does RSS1.0 follow on from the popular RSS0.9x formats.  RSS1.0 is, depending upon your point of view, a step forward or an aberation.  RSS1.0 uses a modular set of RDF based tags to describe items in the RSS feed.  One such module is the Taxonomy module which is intended to allow classification of RSS channels & items.

Using the taxonomy module you create something like:

<item rdf:about="permalink">
    <taxo:topics>
        <rdf:Bag>
            <rdf:li resource="topic-uri-1"/>
            <rdf:li resource="topic-uri-2"/>
        </rdf:Bag>
    </taxo:topics>
</item>

Here the <topics> element contains a list (using the RDF defined Bag - or unorderer list - container element) of resources indicating topics that describe the item.  Each resource then has a <topic> element that describes the topic.  It  might look something like:

<taxo:topic rdf:about="http://matt.blogs.it/topics/topicsT.html#the_state">
    <taxo:link>http://matt.blogs.it/topics/topicsT.html#the_state<taxo:link>

    <rsstopics:type>generic</rsstopics:type>
    <dc:title>The State</dc:title>
</taxo:topic>

Although it's a jumble of RDF, the RSS1.0 taxonomy module, Dublic Core, and, a custom rsstopics schema this says exactly the same thing as the original:

<topic id="topic_id" type="topic-type" source="url">topic name</topic>

But do we have to deal with such an ugly mess?  Perhaps not.  Our original choices included the XML Topic Maps format.  This is a complete specification for exchanging topic information.  An example of a topic in XTM format might look something like:

<topic id="the_state">
    <instanceOf>
        <topicRef xlink:href="http://www.purl.org/rss-topics/rss-topics#generic">
    </instanceOf>
    <baseName>
        <baseNameString>The State</baseNameString>
    <occurence id="the-state-item">
        <instanceOf>
            <topicRef xlink:href="http://www.purl.org/rss-topics/rss-topics#story">
        </instanceOf>
        <resourceRef xlink:href="<permalink-uri>">
    </occurence>
</topic>

Again this encodes the same information, using a standard format and only one required namespace (that of XTM itself).  A URI such as http://www.purl.org/rss-topics/rss-topics#generic points at a topic in another map (in this case a topic describing the topic-type generic).

The use of XTM comes with a number of advantages with the main one being that there are an increasing number of tools available to process & manipulate it (for example, see topicmap.com).  However there also a number of problems with this representation when you attempt to embed it within another XML format such as RSS.

  • It's not clear whether an XTM fragment such as this is valid when used in this way
  • Each time a topic is used we will be duplicating it's details, bloating the markup & potentially creating invalid entries
  • The <occurence> relation within the <topic> element is technically redundant.  The enclosing <item> indicates the occurrence. 

One way to avoid these problems would be to embed the topics within the RSS <channel> definition and refer to them from each <item>.  However we still need a way to refer to the topic and XTM doesn't provide this.  If we had a good way to reference topics then we could either embed mini topic map within the RSS file, or just have the <topicmap> in an external file and point to it.  What could we use?  One possibility is RDF.

Using a combination of RDF and XTM would mean something like:

<item rdf:about="<permalink-uri>">
    <rsstopics:topic>http://www.example.org/myTopicMap.xtm#topic-id</rsstopics:topic>    <!-- XTM in an external map -->
</item>

or

<item rdf:about="<permalink-uri>">
    <rsstopics:topic>#topic-id</rsstopics:topic>  <!-- XTM element inline in the RSS -->
</item>

In this example the item now refers to an XTM defined topic either elsewhere in the RSS feed (contained within a valid <topicmap> element) or within an external topic map.  The referenced <topic> element can further describe the topic (names, types and so on) using all the expressiveness of XTM.  It's also efficient since there is no duplicated information within the feed.

I have described approaches using RDF, XTM and a hybrid of the two.  Each has advantages and disadvantages although I believe the hybrid makes the best use of both formats.

I'd welcome comments and or opinions from interested parties.

[Curiouser and curiouser!]
> RSS Resources.
Extensive listing of RSS resources: Processing RSS with PHP, Python, Java, Perl, and XSLT... [elearnspace blog]

Another extensive directory of RSS resources.