REST + SOAP
By Sam Ruby, July 20, 2002.
The introduction of the Web Method Specification Feature in SOAP 1.2 hopefully will allow the continuing REST vs SOAP debate to focus on the substantive differences between these two approaches. This essay captures what I consider to be the strengths of each approach, and outlines a path whereby one can "cherry pick" the best features of each in designing an application.Rest vs RPC
In reality, there aren't two sides. There are at least four.
- Everything is a resource
- Everything is a get
- Everything is a message
- Everything is a procedure
Telling these guys apart is sometimes difficult. Here's a few clues. Read them along the lines of a Jeff Foxworthy "you might be a redneck if..."
You might be a Resource guy if you actually use HTTP PUT
You might be a Get guy if you use URLs to request parameterized actions
You might be a Message guy if you actually use XML attributes
You might be a Procedure guy if you feel you must encode XML in order to pass it as a parameter
OK, So, I won't quit my day job. But the key points here is that not all HTTP GETs are RESTful, nor are all SOAP calls RPC.
- Every assembler instruction and data location was individually addressable. Code and data are interchangeable.
- Gotos are considered harmful. Subroutines provided parameterized and controlled entry points.
- SQL and relational databases were introduced. All data could be accessed and manipulated with insert, select, update, and delete statements.
- Subnets were bridged by an "inter-net" protocol.
- Subroutines with parameters became objects with messages by virtue of moving the first parameter outside of the parenthesis.
- Stored procedures became the norm for any operation that modifies relational databases.
Several key points here. If your leanings are towards REST, then contemplate the notion of stored procedures: why do most modern relational database systems support such a concept? What problem do stored procedures solve? If your leanings are towards SOAP, get prepared for the object reference to move outside of the parenthesis. Either way, realize that things you believe in strongly today may - nay will - get abstracted away in the future.
Resources vs Services
From a protocol (i.e., what goes across the wire) perspective - what's the key difference? To put it in the most simplest of terms, the difference is between what goes inside the envelope and what goes outside. When you mail a check to a credit card company, do you put your account number inside or outside of the envelope? This difference might seem a bit esoteric, but the object oriented revolution can also be expressed in similarly simple terms.
An example of the difference is encoding - in other words, specifying the character set used. If you send XML over HTTP, there is a redundancy. XML provides for the specification of encoding, as does HTTP. The fact is, when you have two places where a piece of data can be represented, you open up the possibility of consistency problems. This exists in HTTP as HTTP is designed to be independent of the representation of the resource, and it exists in XML as encoding is not only relevant during transfer, but also when it is locally transformed or stored.
The most extreme difference between these two models is on the name of resource itself. In REST, the resource is identified outside of the envelope a Uniform Resource Identifiers (URI). While SOAP doesn't preclude this possibility, most soap services deployed today aren't designed in this fashion. This is not all bad. Once you realize that an architecture point of view, whether this service is accessed via GET or POST doesn't make it any more RESTful, you realize that Google is a service. One which permits parameters to be encoded on the URL.
Key point here: when designing for resilience in the face of changing requirements, it generally behooves one to make choices that preclude the least number of future alternatives. In particular, one needs to be prepared for the dynamic creation of new resources, parameterized requests, and obtaining the identifiers of resources in responses.
Cosmologists have long posited the existence of dark matter whose sole purpose is to contribute enough inertia to stop the infinite expansion of the universe. In physics, the way one generally makes observations is by bouncing a few photons off of the subject. For large bodies, the effect on the subject is miniscule and can largely be ignored.
On the internet, the analogy to a photon would be a HTTP GET. Clay Shirkey wrote an excellent article referring to PC's as the dark matter of the internet, largely because, they as a general rule, don't respond to HTTP GET. Unfortunately, the same is true of virtually al SOAP 1.1 services. While they may interact with one another using alternate mechanisms over HTTP, they don't interact with HTTP GET making them all but inaccessible to a large number of clients.
SOAP 1.2's WebMethod feature provides the means to shed some light on this situation. In the words of the spec
Applications SHOULD use GET as the value of
webmeth:Methodin conjunction with the 6.3 SOAP Response Message Exchange Pattern to support information retrievals which are safe, and for which no parameters other than a URI are required; I.e. when performing retrievals which are idempotent, known to be free of side effects, for which no SOAP request headers are required, and for which security considerations do not conflict with the possibility that cached results would be used.
The key point here is that applications that desire to be broadly accessible should be designed with this in mind - in other words, to maximize their visible surface area.
I have the utmost respect for those individuals who developed the protocols that became the backbone of the modern internet. However, I also have equal respect for those that built the networks that allow our financial institutions to securely transfer funds (e.g., CICS). And for those that have developed OLTP databases that are capable of handing hundreds of thousands of transactions per second and terrabytes of data (for examples, see TPC).
It is worth noting that many web sites are updated using mechanisms such as ftp and xcopy. So, while REST is clearly Turing complete, its best known application (i.e., the internet), only clealy demonstrates its applicability and scalability to highly read only and mostly public data. It is the expression of higher level operations (particularly ones that perform non-atomic updates) that SOAP's value proposition becomes apparent. Sometimes, one truly wants to have an atomic "transfer funds from savings to checking" transaction instead of simply a series of discrete GET and PUT's.
And for all of it's greatness, REST does little to assure that the HTML I produce will render properly in the browser of your choice. That's simply left as an exercise for the student. That's where WSDL comes in. WSDL builds upon your choice of schema languages (though XML Schema seems to have take an early and apparently commanding lead at the moment) and adds the notion of a PortType: namely if my service gets a message of a given shape, it will promise to return a message of a given description otherwise it will produce a fault.
Finally one of the key success factors of the web is not directly related to REST at all, but instead to HTML. This is the simple statement in the original HTML Internet Draft that any undefined tags may be ignored by parsers. This lead the way to a predictable path of evolution of the HTML standard where new content could remain backwards compatible with older browsers. As I argue in Coping with Change and A Busy Developers Guide to WSDL 1.1, these simple principles can be applied to Web Services. However, as this rule is not reflected in (or precluded by, for that matter) the current SOAP specifications, one unfortunately can not rely on all toolkits to implement it. If one could, the rules for evolving a web service would allow adding optional parts/elements with unique qualified names (that's the combined namespace name plus the local name) to existing services. Ideally, such rules would permit the inclusion of optional/ignorable/mandatory flags in the body, akin to the mustUnderstand attribute already permitted in the header.
The key points here are that much of the value (in even fully RESTful systems) is in the precise documentation of the structures expected in requests and responses.
It should be clear from the above that I believe it is quite possible to productively apply both supposedly incompatible approaches together. I'll sketch it out below, in prescriptive form. Note: while this is proscriptive, it is expected that local adaptations will be provided.
- Start by modeling the persistent resources that you wish to expose. By persistent resources, I am referring to entities that have a typical life expectancy of days or greater. Ensure that each instance has a unique URL. When possible, assign meaningful names to resources.
- Whenever possible, provide a default XML representation for each representation. Unlike traditional object oriented programming languages where there is a unique getter per property, typically there will be a single representation of the entire instance. These representations will often contain XLinks (a.k.a. pointers or references) to other instances.
- Now add high level methods which take care of all composite create, update, and delete operations. A key aspect of the design is that messages for these operations need to be self contained - both the sender and receiver should be able to make the absolute minimum of assumptions as to the other's state, and multiple requests should not be required to implement a single logical operation. All requests should provide the appearance of being executed atomically.
- Query operations deserve special consideration. A general purpose XML syntax should be provided in every case. In addition, when a reasonable expectation exists that query parameters will be of a relatively short size and not require significant encoding, then a HTTP GET with the parameters encoded as a query string should also be provided.
The following table emphasizes how this unified approach differs from the "pure" (albeit hypothetical) different positions described above.
Resource POST operations explicitly have the possibility of modifying multiple resources. PUT and DELETE operations are rarely used, if ever. GETs may contain query arguments. Get GETs must never be used for operations which observably change the state of the recipient. POST should be used instead. Message Do not presume that URLs are static, instead presume that they identify the resource. In particular, recognize that URLs can be dynamically generated. Expect URLs of other SOAP Resources in responses. Use the SOAP Response MEP for pure retrieval operations. Procedure Treat the URL itself as the implicit first parameter. Allow URLs to be dynamically generated, and returned in structures. Use HTTP GET for retrieval operations.
Looking to the future, the application level inter-networking protocols that emerge today will likely be the application level intra-networking protocols of the next decade. Both REST and SOAP contain features that the others lack. Most significantly:
REST - SOAP = XLink
The key bit of functionality that SOAP applications miss today is the ability to link together resources. SOAP 1.2 makes significant progress in addressing this. Hopefully WSDL 1.2 will complete this important work.
SOAP - REST = Stored Procedures
Looking at how other large scale systems cope with updates provides some key insights into productive areas for future research with respect to REST.
Finally, it bears repeating. Just because a service is using HTTP GET, doesn't mean that it is REST. If you are encoding parameters on the URL, you are probably making an RPC request of a service, not retrieving the representation of a resource. It is worth reading Roy Fielding's thoughts on the subject. The only exception to this rule that is routinely condoned within the REST crowd is queries.