Clemens Vasters: Enterprise Development & Alien Abductions

Sonntag, 9. März 2003

Just how RESTful is TV?

I've spent some time this morning going though a bunch of web pages explaining and debating RESTful web services. The claim that I see repetitively is that REST is scalable, because it works like the web works and that has proven to be massively scalable. What I don't get together is the basic idea that "every thing is a uniquely identifiable resource" with the bare fact that the seemingly limitless scalability of the largest websites is indeed an illusion created by folks like those at Akamai, which take replicated content around the globe and bring it as close as possible to the clients. It's a lot like TV. TV scales endlessly as long as you are in reach of a broadcast antenna or a satellite beam. It all works beautifully in one direction. However, we don't really have fully interactive TV, yet, because the whole backchannel story is a major problem. It's a pure scalability problem.

If you GET something from the web, you think you are getting it from a uniquely identifiable bucket, but you are indeed quite often smartly spoofed elsewhere and you will get nearly current information. If you PUT or POST something, however, the data will ultimately have to end up in a single bucket -- and unfortunately there is no magically better concurrency story in REST just because it sits directly 'in' the web technology stack.

I will have to admit that I had a weak moment yesterday where I thought that the whole REST story makes a lot of sense. I was thinking of ways to really make it dead simple to use a tool I am currently playing with in the context of a larger application and REST has a lot of appeal there. The "create" story of REST using HTTP PUT is very pretty for this and HTTP GET speaks for itself. It all falls apart when you think about concurrent updates and concurrent deletes. These simply can't exist in such a world. As long as you only create stuff and reference stuff, all is fine.

One of the core problems with turning the REST model into an universal approach for dealing with data is that it is, in all its simplicity and purity, absolutely unaware of what data it is dealing with and it has no sense of the sort "identity" that grows from within the data itself. Once you assign a URL like 'http://www.tempuri.org/store/items/castings/slagpots/7829' to a data set in reply to a create request (PUT), you may essentially be locking down a whole set of properties of this data set that can never be changed if the URL shall not lose all of its own organizational qualities. Even more so, the data gets totally locked down once you follow this path of "object identity" to the last consequence and allow direct references into the data like through this URI 'http://www.tempuri.org/store/items/castings/slagpots/7829!item/manufacturer/foundrylocation/@country' where the second portion (here separated by a '!') is an XPath reference into an assumed XML InfoSet located at aforementioned URL. If two parties working on a data snapshot (that's all you GET) were competing for updates here, and the first party to hit the resource would do a wholesale replacement of the 'manufacturer' data, the second party would now only manipulate a well-known place (/item/manufacturer/foundrylocation/@country), but a wholly different data context. Taken to the last consequence of every data facet being uniquely identifiable through a URI (which it then should be and is) this model can't properly deal with any modification concurrency. [Note that we have a 'manufacturer of this item' relationship here where the 'country' is immediately relevant to this exact type of item and not to the manufacturer in general - so we can't factor the manufacturer data out to a different URL]

As said, REST can deal beautifully with creating and referencing data sets, but how do you pass references around for which any guarantee of currency can given? In the end, I am interested in a data record that represents a specific data item of interest. In the absence of updatability, I will still always know that there is a URI that points to the most current data set for this data item, but unless I have created this item myself, I can never know which URI that is. So, what are the limits of data granularity to which REST applies? How do you define boundaries of data sets in a way that concurrency could work in some way and how do you define the relationship between the data and the URI that identifies it?

The theory behind REST sounds a lot like the theory behind distributed objects (not distributed systems), where you have one virtual thing that represents one real thing. An 'item' in a catalog is a uniquely identifiable thing with its own object identity. I used to be a big fan of this idea until I found it not to be working for a lot of aspects, including, most prominently, high scalability alongside proper concurrency management. This basic idea is very much in the spirit of what CORBA is all about -- and, don't get me wrong here, that model is very good and very valid for a big number of use cases, especially in real-time or near real-time enviornments that deal with a lot of transient data. [As a sidenote, it's quite interesting that if you go and look in Google's Usenet archives and look for the term CORBA combined with the names of some of the most prominent REST proponents you will find what you would expect.]

I have written about the REST'ish view of HTTP being the one and only protocol in the context of transactions and reliability quite a while ago and that perception hasn't changed. Werner Vogels had some additional comments on the reliable messaging angle two weeks ago.

All that being said - I do believe there is a place for REST and plenty of use-cases. These use-cases are likely very much those for which CORBA's fundamental idea of objects already was a very good model. And there are plenty of use-cases for stateless and stateful RPC (in the absence of the 'single thing' idea) and there are plenty for messaging. There is a place for SOAP, which is at its core simply an envelope for annotating data as it crosses system and organization boundaries, in all of these models. That's what Juliet's night is really about.

10:07:33 AM

comments []