Clemens Vasters: Enterprise Development & Alien Abductions

Montag, 30. Juni 2003

(Some background reading for my DEV359 session in Barcelona on Friday July 7, 16:00, Room 7)

Summary of a Year with Aspects

A bit less than a year ago, I got a few little hints.

Early last year I had been playing around with the managed portion of the .NET Framework’s ServicedComponent infrastructure and wanted to smuggle code between the client and server side for purposes of validating parameters, monitoring and other things. I learned a lot about the interaction between Remoting and the Enterprise Services infrastructure, but found that there was no way to get interception working using managed code. So I talked to some friends at Microsoft about this and after quite a bit of begging they pointed me to the relevant public patents on the COM+ extensibility points, which are documented there – in legal speak – and nowhere else. I also got a hint or two on what GUIDs to look up in the registry and a few other tips, which all wasn’t much but enough to get things rolling. Armed with plenty of assembly-level debugging experience from the time when I wrote large COM frameworks in unmanaged code, I went digging. Deep.

Now, a year later, I have two activators and one policy almost working (more on that in a bit) and what I have done is – by all the Enterprise Services people at Microsoft tell me – possibly the only inside-COM+ extension ever built by anyone outside of Microsoft. And because most of the people in that respective product group are busy building the next generation base infrastructure for Enterprise Services, I even seem to be the only one who has been writing new code in that area for at least two years.

Still, I am about to give up.

The reason for that is technical but not really a problem of Enterprise Services or COM or the .NET Framework. It’s the fact that I am trying to use a beautifully designed extensibility point in the exact way it was envisioned, but for which nobody ever assumed that it’d be used anywhere but outside the product group.

Let’s call that a problem of “opaque aspects”. However, before I can explain the problem, I need to explain a little more how the Enterprise Services (COM+) infrastructure works internally. I am simplifying a bit here, but it’s enough to get the picture.

Whenever a COM object is created from any programming environment, it happens through CoCreateInstanceEx in the end. One in CoCreateInstanceEx, the component’s configuration including the server identity (DLL, process or remote machine), threading models and all of the other essentials is looked up from configuration. The configuration is actually a chain of providers. The first stage is an in-memory cache, the second stage reads the COM+ catalog (which is a very efficient, COM+ specific ISAM database) and the third stage goes to the registry. If a component of which an instance shall be created is found in the COM+ catalog, it is called “configured” and instances are constructed using the COM+ infrastructure. Object construction happens through a chain of so-called “activators”. An activator is a COM object that gets associated with a component through one (or multiple) entries in the catalog. Each component can have any number of activators in each “stage”. The stages indicate on which level the activation process is currently working: client context, client machine, server machine, server process and server context. In each stage, an activator can perform work that needs to be done before the next stage can be entered. If you want to add a “policy” to a newly created context, you will do so at the “server process” stage, because the context setup needs to be complete before the activation process can enter into the “server context” stage. When you install a policy you are indeed usually adding two things to the context: the first thing is usually called a “property” and is an object that can be accessed (by those who have the right header files) using the object context at the application level, the second thing is an interceptor that can subscribe to get notified whenever a call passes in and out of the activated object’s context. Both, the interceptor and the property can be implemented on the same class and that’s what I usually do.

So, in short, the role of the activator is to deal with object creation, the property maintains related state and the policy acts on calls entering and leaving the context. If you configure a class to support just-in-time activation (JITA), the policy will inspect the “done” bit in the property on the call “leave” event and deactivate and disconnect the object if it’s set. When the next call gets in (“enter”) a new object is created and connected. If you configure transactions, a transaction is created by the policy on “enter” and terminated on “leave” if the “done” bit is set or whenever the context is closed. All of COM+ is based on these three elements.

I wrote two activators/policies. The first activator redirects activations into a secondary, configurable application domain in order to fix the problem that all managed Enterprise Services components end up being created in the default domain. The problem with the default domain for out-of-process components is that they all live on top of dllhost.exe and therefore the XML configuration file for all Enterprise Services apps is dllhost.exe.config in the Windows system directory. That’s annoying and therefore I decided to fix that.

The second activator and the policy exists to enable custom extensibility. The goal is to intercept all calls with all inbound and outbound parameters and pass this information on to custom, managed extensibility points that are attached to the class metadata using attributes. So, in essence, that’s an attribute-driven way to implement aspects. What the “AOP people” use pointcuts for in AspectJ is done here using attributes. It’s just a different way to express the necessary metadata to get the interception (“weaving”) going.

So, if you write an attribute (aspect) class called “GreaterThanAttribute” that implements a specific interface and put that on a method parameter like void MyFunc( [GreaterThan(1)] int param ) the aspect is going to be called every time just before the function actually gets invoked on the server. If the validation rule is violated, the aspect can throw an exception and deny the call to proceed. That was the idea and I got it to work – almost.

The “almost” is the sad part of the story. The code has been at 98% complete for the past four to five months and that’s where I am stuck.

There are multiple problems and most of them are related to the way Enterprise Services (managed) is cheating on the unmanaged COM+ infrastructure in order to keep managed calls managed calls and avoid COM/Interop. When you make an in-process call from managed code to managed code, there is no COM call. In fact, all that COM+ learns about the call is that it happens. It doesn’t learn about the exact object that the call is happing on, it doesn’t know about any of the parameters, it just doesn’t know. When you make an out-of-process call from managed code to managed code, there is also no proper COM call in most cases. While the call will come in via COM transport, the actual call data is contained within a binary package passed to on of the methods on the IRemoteDispatch COM interface. The managed implementation of that will unmarshal that package, finds a Remoting IMessage object and will dispatch that on the managed server object. These call paths exist in parallel with the support for inproc and outproc calls from unmanaged clients.

None of the existing COM+ policies ever looks at parameters, but because I wanted to allow parameter inspection I actually had to get at the parameters. Here’s where it gets hairy. For inproc, managed-to-managed calls, there is no COM call and therefore all the information about the call turns out to be NULL in the calls on the policy. No information. How shall I get at the parameters if all I have is nothing? I remember staring at the call stack in the debugger with little hope to get anywhere (that was several weeks and a couple of thousand lines into the project) and seeing everything being NULL, while all information I wanted to have was twelve stack frames above the current position on the stack.

The solution for that problem starts ugly: __asm mov __EBP, ebp. I ended up writing a custom stack walk (and having to compensate for an odd __cdecl frame) that figures out the right frame by a certain signature and steals the necessary parameters from “up there”. That worked. The outproc, managed-to-managed case was fairly easy, because I could simply unmarshal the IMessage myself using the BinaryFormatter. What turned out to be way more complicated than thought was the “traditional” case of unmanaged calls. First, I need to decode IDispatch::Invoke calls and correlate them with the target object “by hand”. That’s hard. Secondly, I need to chain in a universal interceptor that proxies each and every interface on the actual backend object in order to see calls that come in as regular COM calls. In essence this means that the activator will have the default activator create the backend object first, wrap the reference into the interceptor class and return the interceptor. Here’s where it gets ugly.

The “tracker” property/policy that gives you all the cute spinning balls and the somewhat useful statistics in the Component Services explorer doesn’t like the interceptor. While what I am doing is perfectly legal COM, the tracker just doesn’t expect that sort of thing to happen and gets confused. Just in time activation and object pooling have similar issues with that interceptor and either are hard to convince to deal with it (JITA) or simply crash (Pooling). The more services and combination of services you look at, the more colorful the effects become. COM+ is a well-tuned, perfectly integrated set of aspect-like services. The issue is that they don’t expect strangers to show up in the house. Once you introduce any significant changes into the behavior of the infrastructure, the problems you need to deal with get totally out of hand.

The underlying problem is that with aspects, in general, you get the same problems as with objects vs. components. Chaining an aspect into an activation or call chain is very much like overriding a virtual method of a class whose behavior you don’t fully understand. Because the combination and resulting order of aspects results in unknown preconditions for the activities of your code, you will have to understand the interaction of any configuration’s resulting set of aspects in order to get everything right. And just as with classes where you can override virtual functions that either means you will have to have the full source code to look at, change and recompile or very precise documentation to get things working, at all. The real problem is that the problems never end. You develop your aspects assuming a set of pre-existing other aspects that you need to be friendly to and someone else does the same. You combine the two resulting aspects on a single class and everything breaks, because the other person’s aspect doesn’t know to be friendly to yours.

There are very few use-cases where aspects can ever be truly independent of other aspects. “Passive” aspects like logging and monitoring seem harmless and “gatekeepers” like argument validation and custom authorization aspects are such use-cases.

However, even those may have important dependencies. If you log call data into a database for statistics, billing ore other purposes, what do you do if the call is transactional and fails? Do you want to roll back the call data, too? If so, you need to be behind the transaction aspect, if not you need to be before it. If you validate arguments and throw an exception before the call is ever executed, does that get logged and how? If you introduce custom authorization that should definitely happen before transactions are created. This list could go on forever.

Don’t get me wrong, I still see value in the interception approach for this small set of use-cases if you know what you are doing. You can save a lot of code by declaring the need for services in the Enterprise Services way instead of using them imperatively. However, for multiple development organizations to “cooperate anonymously” the model of putting aspects into a simple processing pipeline that acts on messages as they pass in and out of a context or to and from a method is severely broken and insufficient. In order to make that model work, we need something like COM. No, not the technology itself, but we need something that does for aspects what COM did for objects: Allowing multiple parties to build composable parts that can be queried for their requirements and capabilities and implement well-known protocols for effective coordination. I still think that’s entirely possible to do and – as I have mentioned earlier – I have some ideas for such a framework model, including using 2 phase-commit style processing, but that’s not going to fix the problems one faces in existing environments.

I learned a lot doing all this work, so it was definitely not a waste of time. I will move the Enterprise Services aspect framework portions out of the core utility assemblies and into set of special assemblies and declare it as “for experimental use only” for now. You only ever really learn when you fail ;)

3:14:27 PM

comments []