Reading the WS-Transaction specification:
WS-Transaction and WS-Coordination are, as we say in German, "hartes Brot" (hard bread to chew on). Once potential point of confusion about the diagrams in the WS-Transaction spec is there seem to be request/response exchanges between participants -- while that's indeed not the case.
Here's what I read:
All of WS-Coordination and WS-Transaction does assume two-way connectivity between any two endpoints. When in Figure AT2, (step 8), CoordB sends a "Prepare" to CoordC, it will not sit there on the sending thread and wait for a "Prepared" or "Aborted" message (step 11). Instead, CoordB will send "Prepare" in a fire-and-forget, one-way fashion and just remain in phase 1 while it waits for CoordC to actively contact it with "Abort" or "Prepared" (step 11).
Here's a few thoughts as I read:
Not using request/response is essential for relaying WS-Transaction negotiation over protocols like HTTP, because HTTP delivery isn't a reliable transport. Mapping this to HTTP essentially means to drop packets at the remote endpoint using POST without even looking at the response body (however recognizing HTTP status classes 300/400). The remote endpoint must actively acknowledge every message on a separate connection. What's somewhat lacking in the WS-Transaction spec (as it stands) is a clear statement, that reliable messaging is required and QoS for delivery of every message must be "at least once" and better "exactly once".
Can such acknowledgements be piggybacked on the replies? Yes. However, there needs to be a logical time-instant where CoordB knows that it has CoordC in the "prepare pending" state. So, even if the ack and "Prepared" arrive in the same package, the ack must be processed before the "Prepared". Must the acknowledgements be piggybacked on the replies? No. Depending on the task, "Prepare" may take quite a bit of time and to keep the whole process short, CoordB will have a very little "time to panic" period in regards to timeout/resend for its "Prepare" message. CoordB cannot distinguish between "Prepare" getting lost and CoordC simply taking a long time to process it and therefore it's desireable that CoordC acks the message before starting its "Prepare" work.
Another not-so-clear aspect is the role of context expiration of WS-Coordination in the WS-Transaction scope. By default, with no expiration time set, the operation will take forever and until done. However, if a expiration time is set, CoordA will time out at the defined time and will cause a unilateral abort by timeout. So, does reaching the expiration time-instant mean that all further messages relayed within the scope established context will be discarded? No. The context expiration time must be disregarded entirely if one or more participants are in "commit pending" or in "committed" state and the context lifetime must be considered as having been extended to "forever".
The only participant knowing whether there is potential for any other participant being in any of these two states is indeed CoordA, who listens to the completion protocol requests and is the superior coordinator. None of the participants except for CoordA may cause a unilateral abort or may refuse message relay due to timeout of the context, if they have reached "Prepared", because some other participants may already be committing/committed.
Context timeout and message timeouts never have any actual relationship in phase 2 and individual message timeouts may be well past the initial context expiration time.
The context timeout is indeed of high information value for any participant before they are in "Prepared". When the context expires, they can unilaterally abort within their control scope, but must keep replying to messages until the protocol state is cleaned up. If a participant decides to unilaterally abort on context timeout before Phase 2, it must still be able to reply to "Prepare" by sending an "Abort" message. This may mean that if it doesn't find any trace of the transaction that shall be prepared (because it's been locally discarded already) it may reply with "Abort" just for that reason.
12:42:34 PM
|