Expanding the role of RSS into data-oriented applications
For well over 5 years, I've been excited about the role of syndication in evolving how the Internet is used and applied, and am thrilled with the progress that's been made with RSS as a common standard for content syndication, and with SOAP web services for application integration and communication. Content and data syndication represent a powerful model for value exchange in the Internet economy, and open up the possibilities for cooperating applications.
Both RSS and SOAP enable forms of distributed collaboration based on syndicated business models. In theory, RSS can be applied for applications where simple content can be published and subscribed to, and SOAP can be applied for applications where real-time, synchronous data access and transactions are involved. This distinction feels roughly accurate. Increasingly, however, RSS advocates are seeing the power of asynchronous, pub/sub style data exchange and are attempting to use RSS and RSS namespaces to accomodate these applications. While asynchronous SOAP messages could provide a substitute, it requires stateful runtime end-points, breaking the flexibility and power of RSS as literal documents, and also introduces a potential level of complexity not needed for data-oriented syndication applications.
What's needed is a simple data language that can enhance RSS 2.0 applications, expanding it's role into a much broader range of data-oriented applications, rather than it's current, predominant focus on news and content-oriented applications.
RSS - Keep It Simple Stupid
RSS has been adopted because of its relative simplicity, and that makes it beautiful in my view. RSS solved a specific problem in a relatively general way, and now it's been adopted en masse, the lingua franca of content syndication on the Internet. As RSS readers/clients and parsers proliferate, bright people all over are thinking about ways to tunnel other non-news information into the format. There are hundreds of examples, ranging from bug reports, to classifieds, to calendar data, to dating information -- essentially anything that is text and can be contained in an item, with basic Dublin Core meta-data.
But this approach quickly and clearly breaks down when one wants to share more structured information, such as a purchase order and its fields, or a discussion thread and its tree structure. Even a calendar item will contain much more complicated meta-data.
RSS 2.0 includes an extensibility mechanism by way of XML namespaces, and there are some modules that exist which take advantage of this capability. This could solve the problem, but once again re-introduces the issue of having to write custom parsers for every new application that extends RSS -- developers would need to parse and map the elements of a calendar XML format, which would be fine, but requires more work and isn't portable across other application types.
RDF is supported in earlier RSS specifications as a means for data extensibility, but RDF is cumbersome, difficult to read and write, and doesn't map cleanly to the kinds of simple data structures that exist in Internet scripting languages, such as structs and arrays.
SOAP emerged because we needed a common data and messaging model for the exchange of object data and messages between programs. Back in 1998, most people thought the world would evolve into thousands of different XML "vocabularies", where programs would access and share these XML documents. Amazingly, not a lot of people understood what a cumbersome world that would be, and that most of the interesting integration use cases were things that would be better left to object-level protocols. Fortunately, we ended up in a good place -- SOAP enables the benefits of loosely coupled applications without the pain of having to define lots of custom over-the-wire formats.
In some respects, RSS is a great message envelope for asynchronous data, but without an over-the-wire data format.
What we need is a simple data model that can expand the use of RSS into application arenas, enabling applications to output RSS with object data, and clients and other applications to easily and predictably include that data. In other words, RSS needs a schema, but it's not XML Schema.
A bit of history: WDDX, XML-RPC, SOAP
My interest in applications based on data syndication goes back to the origins of ColdFusion, but really manifest itself first with the introduction of the Web Distributed Data Exchange (WDDX) format back in 1998. WDDX was designed to enable Internet programming languages to easily exchange data -- synchronously or asynchronously. It was a simple object serialization format written in XML. Eventually, nearly every Internet scripting language supported it -- Perl, PHP, Python, COM/VB/VC, Java, ColdFusion.
Right around the same time, Dave Winer was evangelizing XML-RPC as a format to accomlish similar things, though it also included (and required) an RPC-oriented message envelope. We (Allaire) didn't think the message envelop was necessary, because we envisoned many types of applications where object data exchange would occur without an API invovcation, and in an asynchronous manner. Dave and myself used to butt heads about this.
A short time later, Microsoft started to actively get involved in this space, and were actively looking at WDDX, XML-RPC and SOAP as formats to be the basis of "web services". SOAP won the day, presumably because it offered a good message envelope with extensibility, and because it used XML Schema, which could reflect object data in its richest form. SOAP used the "proper" layering of standards to create a powerful, extensible protocol.
A Simple Data Language
I'd like to revist some of these standards, especially in light of the incredible growth and prowess of RSS. The world of data syndication (publish/subscribe) can be a transformative element to the emerging Internet landscape, and the standards we have today just don't quite cut it.
A few months ago I approached Dave Winer and a few other people with a very simple idea. Why not use XML-RPC's data serialization format to create a simple data language for object meta-data in RSS (and other!) applications. Interestingly, if you subtract the message envelop from XML-RPC, add Unicode and time-zone support to the standard, you've actually got WDDX, quite literally. Dave really liked the idea, and we came up with the idea of RSS-Data.
Why use RSS-Data? Pragmatism. Because of the rapid growth of blogging software, XML-RPC parsers are already implemented in dozens of languages and platforms. As a result, a simple data language based on XML-RPC's data model could emerge in a matter of days or weeks, as developers quickly refactor their parsers to simply provide data serialization/deserialization components.
RSS-Data would require no changes or revisions to RSS 2.0, though developers wishing to support RSS-Data would obvioulsy need to write RSS parsers that recognized and deserialized RSS-data in the <sdl:data> namespace. But, rather than writing custom parsers for every new namespace extension to RSS, developers could confidently work with just one RSS/Data parser that handled 99% of their application meta-data needs.
Here's what I think is necessary for RSS-Data, which is almost literally the XML-RPC data serialization model.
- Same data model, including all elements such as <struct>, <array>, <boolean>, <dateTime>, <string>, <number>, <base64binary>, etc.
- Unicode-based, fixing a known problem with XML-RPC
- Time-zone aware, also fixing a known problem a variety of serialization approaches
RSS-Data could be used inside any RSS 2.0 element that can contain namespace extensions, including <item>, <channel>, and inside other custom namespaces. Likewise, other XML applications in need of a simple object data exchange format could use the <sdl> namespace to extend their applications.
A New World of Data Syndication Applications
My hope is that RSS-Data will open up a much wider range of data syndication applications layered on top of RSS. Whether it be a calendar data exchange format, or a better way to do trackbacks and threaded comments, RSS-Data has the potential to make RSS much more powerful than it is today.
RSS-Data library builders, let's get going on this!