Jeremy Allaire's Radio

Wednesday, October 01, 2003

More discussion on RSS/XML-Data
There's been good discussion in response to the idea of RSS-Data. I've included a few thoughts below in response to questions and concerns.

I'm not sure I get it, or I quite understand what the client is in this case. Is the vision RSS aggregators that go out and harvest all kinds of data on behalf of a user and hand it off to his or her calendars, recipe books, shopping lists and so forth?

Yes, the idea is that a new breed of RSS aggregators would emerge around domain-specific application areas. I can imagine these aggregators running on servers and in rich clients. With the advent of clients like Macromedia Central, and in the future, .NET client applications, I think it will become more common for client applications to aggregate data across disparate feeds in the network. For example, one might create a "Classifieds Viewer" that subscribed to and aggregated multiple distributed RSS-Data enabled classifieds feeds into a single view. In such an example, RSS-Data could be added to <items> that provided additional structured data, such as price, quantity, geography, payment details, and other things relevant to a product.

A recipe aggregator would be another good example, per the above. A client or server application could subscribe to a diverse set of recipe feeds. In the current RSS world, the recipe "data" is contrained to being one large text blog, making it not very useful to a client application that is geared towards rich display and manipulation of recipie information. One can imagine wanting to have access to 'ingredients' a struct, containing arrays of items with quantity information, etc.

Ultimately, I'd love to see RSS reader/aggregators become much more generic application clients that can be extended to support additional RSS-Data formats. I can imagine an extensible Reader, perhaps with Flash, Java, or HTML client UI, that could manage feeds in a consistent way, but which could render each into different viewers. I think Macromedia Central will make a great container for these kinds of applications.

Is XML-RPC wedded to HTTP? SOAP can work asynchronously--I can transport it over FTP, SMTP and POP, instant messaging or whatever transport in as stateless a mode as I want--so why can't XML-RPC?

I think you're missing the defintion of RSS-Data -- it has nothing to do with XML-RPC as a transport/message envelope. We're just using the data model with some minor improvements, and it can be transported anywhere that an XML document can be transported.

Most RSS feeds are transient. The general understanding is that once an item is no longer included in a feed of the channel, the resource it contains or points to is obsolete. Does an RSS-Data item have an expiration date on it, or is it permanent? Would RSS calendar items, like concert dates from a newspaper, disappear from one's calendar after they've passed, or would calendars accumulate a mass of public event information?

Since this is just inline data inside RSS feeds, it follows whatever expiration you've supplied in your items. For example, a recipe feed might just be for the Top 10, or Most Recent for a Week, or whatever makes sense. This is up to the content provider.

Last I checked, there were already XML formats for things like calendar events and Amazon items. Would RSS-Data items be wrappers for the existing formats, or would the data need to be transformed into a new RSS-Data way of expressing things?

This gets to a core issue, and something I discuss in my first post on the topic. All RSS-Data really does is provide a scripting-language friendly data exchange model that doesn't require developers to parse and marshall data into and out of XML entities, but instead just pass around common data structures used for applications. This is similar to the motivations for SOAP for API-level data exchange, but is applied to syndication-oriented applications. It's really just a pragmatic way to get people to use RSS as a transport for application data, rather than just textual news content.

While the idea is good, I think this is an excellent opportunity to examine the XML-RPC data model, and extend it in simple, yet appropriate ways. Over the last few years, I've seen comments regarding a lack of tuples (Python), <nil/>, 64-bit numbers (or, ideally, BigNums), etc., some of which might be considered "unnecessary," but others which are reasonable requests for extension to the ways XML-RPC represents data.

I think changing the data types supported at this point would be a mistake, simply in the interest of getting this all going. The data types supported today cover 90% of use cases, and since most of the code for serializaton/deserialization is available and in production, it would be a quick hack to get these refactored into RSS parsers. Adding new data types (now) would require more interop testing, etc.

However as folks deploy feeds of a similar nature, it will highlight the need for standardized semantic models. Right now, RSS works in part because of its relatively consistent semantics. An RSS-DATA spec leaves that issue open, but that's okay for now. Others will address application-specific semantics.

Crucially, RSS-Data does not address the need for agreeing on application-specific schemas, but it simply accelerates the use of new data formats with RSS. Interestingly, RSS readers/aggregators could support any RSS-Data element even if they didn't understand it's context simply by rendering the item content in a browsable tree view. I can imagine a variety of clever ways to automatically render property-sheet style views of feeds with semi-structured data. Nonetheless, the primary design goal here is to facilitate application and domain specific RSS data feeds where developers and producers agree on format.

I couldn't agree more with the desire of associating domain-specific metadata to RSS and extending the usage beyond the news/blog world, but I guess I'm wondering why the current namespace support in RSS 2.0 doesn't satisfy this need. Is it because there's no way to associate any typing information with the elements in the referenced namespace?

I'd love to see an example showing how RSS-Data is a Good Thing compared to a similar RSS 2.0 w/namespace example. It just seems like we're losing some precious semantic information when we drop down to datatypes in the document.

This is an age-old debate that pre-dates the world of RSS, and if you go back to the origins of SOAP, XML Schema and XML Namespaces, you can see this discussion in action. Something like RSS-Data (or pre-cursors like WDDX) are very simple in that they only require a data structure, in a self-contained document. XML namespaces require well-defined XML document entities and accompanying XML schema. I think that XML Schema are too complex for the the needs of RSS syndication applications. It ultimately comes down to which approach can enable the broadest number of developers, and RSS-Data will be simpler for a broader group of developers (those that use simple scripting languages, as well as developers using stronger object oriented languages).

I agree with Eric - I don't think this solves any problems. Sure, you define a markup for how to represent a struct, int, date, etc., but that's not helpful. A client tool still doesn't know what to do with the data. It doesn't know whether the int it sees is a price or a quantity. You still need to have some kind of out-of-band agreement between sender and receiver as to what each piece of information means. Given that, it seems to me there's no major win over arbitrary namespaced RSS extensions...

Some of this is discussed above, but it ultimately comes down to a question of what is simpler for developers to adopt. XML Namespace extensions + XML Schemas, or one namespace extension where schemas are implied. Yes, there still requires out-of-band agreement, though not entirely, as one can imagine "smarter" RSS reader/aggregators that surface property meta-data in interesting ways so the viewer can understand the attached data -- grids, property sheets, tree views, etc.

As per Greg and Eric... whether you use this approach or an XML namespace approach, you still have the same need for an out-of-band agreement. In either case you will have nested values and name/value pairs that only "mean" something to the people who write the code that makes it useful. In short, I could get code working with either approach (and will probably have to). There will be thrash, but congratulations for getting a very important ball rolling.

I think a related question to ask is "Why haven't more peopled used Namespaces to extend RSS?" Is it because RSS isn't a good envelope for richer types of data, or is it because developers prefer to have to work with as few XML formats as possible? Certainly, we're just now hitting a point where RSS can be useful for more than blogs/news exchange, so nonetheless this is really important topic to discuss.

Would it be possible to show us a "hello world" example? I think I'm Just Not Quite Getting It. Show me them angle brackets and I'm pretty sure I will, though.

I'll work on some examples....a very simple example might be to inline a "comment" data structure into an item. A comment would include a top level struct with the latest comment count, and date of last posted comment. It would contain a nested struct for the comment data, each reprsented as arrays or structs containing strings of meta-data. A reader could choose to a) ignore this data, or b) render comments in an interesting way that's contextual for the user.
You'd still need a separate namespace for the comment data, but all of its elements would be reprsented in RSS-Data.

9:19:08 PM comment []

RSS-Data: A Proposed Format
Expanding the role of RSS into data-oriented applications

For well over 5 years, I've been excited about the role of syndication in evolving how the Internet is used and applied, and am thrilled with the progress that's been made with RSS as a common standard for content syndication, and with SOAP web services for application integration and communication. Content and data syndication represent a powerful model for value exchange in the Internet economy, and open up the possibilities for cooperating applications.

Both RSS and SOAP enable forms of distributed collaboration based on syndicated business models. In theory, RSS can be applied for applications where simple content can be published and subscribed to, and SOAP can be applied for applications where real-time, synchronous data access and transactions are involved. This distinction feels roughly accurate. Increasingly, however, RSS advocates are seeing the power of asynchronous, pub/sub style data exchange and are attempting to use RSS and RSS namespaces to accomodate these applications. While asynchronous SOAP messages could provide a substitute, it requires stateful runtime end-points, breaking the flexibility and power of RSS as literal documents, and also introduces a potential level of complexity not needed for data-oriented syndication applications.

What's needed is a simple data language that can enhance RSS 2.0 applications, expanding it's role into a much broader range of data-oriented applications, rather than it's current, predominant focus on news and content-oriented applications.

RSS - Keep It Simple Stupid

RSS has been adopted because of its relative simplicity, and that makes it beautiful in my view. RSS solved a specific problem in a relatively general way, and now it's been adopted en masse, the lingua franca of content syndication on the Internet. As RSS readers/clients and parsers proliferate, bright people all over are thinking about ways to tunnel other non-news information into the format. There are hundreds of examples, ranging from bug reports, to classifieds, to calendar data, to dating information -- essentially anything that is text and can be contained in an item, with basic Dublin Core meta-data.

But this approach quickly and clearly breaks down when one wants to share more structured information, such as a purchase order and its fields, or a discussion thread and its tree structure. Even a calendar item will contain much more complicated meta-data.

RSS 2.0 includes an extensibility mechanism by way of XML namespaces, and there are some modules that exist which take advantage of this capability. This could solve the problem, but once again re-introduces the issue of having to write custom parsers for every new application that extends RSS -- developers would need to parse and map the elements of a calendar XML format, which would be fine, but requires more work and isn't portable across other application types.

RDF is supported in earlier RSS specifications as a means for data extensibility, but RDF is cumbersome, difficult to read and write, and doesn't map cleanly to the kinds of simple data structures that exist in Internet scripting languages, such as structs and arrays.

SOAP emerged because we needed a common data and messaging model for the exchange of object data and messages between programs. Back in 1998, most people thought the world would evolve into thousands of different XML "vocabularies", where programs would access and share these XML documents. Amazingly, not a lot of people understood what a cumbersome world that would be, and that most of the interesting integration use cases were things that would be better left to object-level protocols. Fortunately, we ended up in a good place -- SOAP enables the benefits of loosely coupled applications without the pain of having to define lots of custom over-the-wire formats.

In some respects, RSS is a great message envelope for asynchronous data, but without an over-the-wire data format.

What we need is a simple data model that can expand the use of RSS into application arenas, enabling applications to output RSS with object data, and clients and other applications to easily and predictably include that data. In other words, RSS needs a schema, but it's not XML Schema.

A bit of history: WDDX, XML-RPC, SOAP

My interest in applications based on data syndication goes back to the origins of ColdFusion, but really manifest itself first with the introduction of the Web Distributed Data Exchange (WDDX) format back in 1998. WDDX was designed to enable Internet programming languages to easily exchange data -- synchronously or asynchronously. It was a simple object serialization format written in XML. Eventually, nearly every Internet scripting language supported it -- Perl, PHP, Python, COM/VB/VC, Java, ColdFusion.

Right around the same time, Dave Winer was evangelizing XML-RPC as a format to accomlish similar things, though it also included (and required) an RPC-oriented message envelope. We (Allaire) didn't think the message envelop was necessary, because we envisoned many types of applications where object data exchange would occur without an API invovcation, and in an asynchronous manner. Dave and myself used to butt heads about this.

A short time later, Microsoft started to actively get involved in this space, and were actively looking at WDDX, XML-RPC and SOAP as formats to be the basis of "web services". SOAP won the day, presumably because it offered a good message envelope with extensibility, and because it used XML Schema, which could reflect object data in its richest form. SOAP used the "proper" layering of standards to create a powerful, extensible protocol.

A Simple Data Language

I'd like to revist some of these standards, especially in light of the incredible growth and prowess of RSS. The world of data syndication (publish/subscribe) can be a transformative element to the emerging Internet landscape, and the standards we have today just don't quite cut it.

A few months ago I approached Dave Winer and a few other people with a very simple idea. Why not use XML-RPC's data serialization format to create a simple data language for object meta-data in RSS (and other!) applications. Interestingly, if you subtract the message envelop from XML-RPC, add Unicode and time-zone support to the standard, you've actually got WDDX, quite literally. Dave really liked the idea, and we came up with the idea of RSS-Data.

Why use RSS-Data? Pragmatism. Because of the rapid growth of blogging software, XML-RPC parsers are already implemented in dozens of languages and platforms. As a result, a simple data language based on XML-RPC's data model could emerge in a matter of days or weeks, as developers quickly refactor their parsers to simply provide data serialization/deserialization components.

RSS-Data would require no changes or revisions to RSS 2.0, though developers wishing to support RSS-Data would obvioulsy need to write RSS parsers that recognized and deserialized RSS-data in the <sdl:data> namespace. But, rather than writing custom parsers for every new namespace extension to RSS, developers could confidently work with just one RSS/Data parser that handled 99% of their application meta-data needs.

Here's what I think is necessary for RSS-Data, which is almost literally the XML-RPC data serialization model.

Same data model, including all elements such as <struct>, <array>, <boolean>, <dateTime>, <string>, <number>, <base64binary>, etc.

Unicode-based, fixing a known problem with XML-RPC

Time-zone aware, also fixing a known problem a variety of serialization approaches

RSS-Data could be used inside any RSS 2.0 element that can contain namespace extensions, including <item>, <channel>, and inside other custom namespaces. Likewise, other XML applications in need of a simple object data exchange format could use the <sdl> namespace to extend their applications.

A New World of Data Syndication Applications

My hope is that RSS-Data will open up a much wider range of data syndication applications layered on top of RSS. Whether it be a calendar data exchange format, or a better way to do trackbacks and threaded comments, RSS-Data has the potential to make RSS much more powerful than it is today.

RSS-Data library builders, let's get going on this!

6:28:30 AM comment []

Mitch Kapor (OSAF.org)

802.11b Blog

Reiter's Wireless Weblog

Ray Ozzie's Blog