Gordon Weakliem's Weblog

	Tuesday, December 03, 2002

Dare on XML Schema

I've finally made time to read Dare's W3C XML Schema Design Patterns: Avoiding Complexity.

Dare wrote his article as a "counterpoint" (though maybe "derivation by extension" is more apt, to Kohsuke Kawaguchi's W3C XML Schema Made Simple. Kohsuke sums up his view by saying

Consider W3C XML Schema as DTD + datatype + namespace

though you might add "- Notation", since he points out that Notation declarations shouldn't be used because they aren't compatible with DTD Notations. This is probably decent, if conservative, advice. Judging from the comment I noted the other day, and from the comments on Kohsuke's article, the most controversial statement in either article was

Do not try to be a master of XML Schema. It would take months.

which is pretty much the point of both articles: learn what's useful and ignore all the nooks and crannies; they'll just get you into trouble. This is essentially conceding the argument of the anti-Schema crowd that WXS is too complex and ambiguous, but regardless, people are using WXS by choice or compulsion, and these articles are an attempt to steer users towards the best practices. And as far as I'm concerned, it's true. I've tried to wade through Patricia Walmsley's Definitive XML Schema, but as a friend of mine said, it's "dry as day-old toast". I feel better served by getting a more succinct guide and filling in the details later, if ever.

Dare loosen's Kohsuke's guidelines a bit. To start, rather than eliminate the use of local declarations, Dare takes the time to explain the elementFormDefault behavior that put Kohsuke off. It seems like Kohsuke's recommendation could be modified to say "use elementFormDefault='qualified'", which is one of Dare's recommendations, and more useful advice to boot. I don't see a particular problem with unqualified, except that I prefer the way qualified looks, and it seems like that's Dare's justification too. The other justification might be that unqualified interferes with default namespace declarations.

I don't quite get the recommendation on built in types. The initial list of recommendations says "Do use restriction and extension of simple types.", but the actual recommendation is to use the builtin simple types. Dare's recommendation is to use the simple types and consider avoiding the subtypes of string and integer. I've seen (and written, truth be told) schemas that start building levels on top of the simple types, and really all this achieves is a less readable schema. The OTA schemas are very much into subclassing simple types, and others I've talked to who've worked with OTA agree. The OTA defines types like StringLength32, which may be a valid restriction, but probably not a great first class type - it's true that lots of elements are 32 character strings, but this seems to me to be a micro-optimization in the type system. It makes sense to declare this type if all the StringLength32 data suddenly became StringLength64, but then you have to carefully consider whether the data's really related to another use of that type and likely to stay in sync. This seems like a paralell to the Inheritance vs. Aggregation considerations in OO design, where you should consider whether a new type really IS-A instance of another type. I'd say that it's not necessarily a good idea to declare named simple types, unless that type information is really going to be reused.

One other point that Kohsuke made was that when restricting complex types, you have to repeat the entire definition of the base type, and that validators have a difficult time with restriction. Dare gives some concrete examples of the validation problems, but doesn't really offer much besides "here's the rope, don't hang yourself". Restriction has its appeal, maybe because it doesn't work like the type systems I'm used to, but given the problems, I'm not sure complex type restriction is worth even a qualified endorsement.

Overall, Dare did a better job of explaining his rationale that Kohsuke. Kohsuke's guidelines are a bit too conservative, but my trouble with Dare's guidelines comes from features qualified with "use carefully". It's good to get an explanation of the pitfalls, but I felt like the justification for situations when the feature should used were pretty weak. Maybe this subject needs 2 articles, one for the "safe" parts, and another for the ones that need extra care.

3:42:30 PM permalink

Machine.config versus webconfig

I've developed some pretty strong opinions on managing configurations, and I've got the scars (and a #12 ranking on Google ;-) ) to prove it. Someone forwarded me this article by Jonathan Goodyear or ASP Today asking what I thought about storing app settings in machine.config versus web.config. Off the top of my head, it sounded like a bad idea, but I'm willing to listen.

So I've read the article, and I still think it's a bad idea. Jonathan's reason for discouraging use of web.config were:

"this approach would require you to change your web.config settings every time you migrated your web application to a different environment"
"Second, it is not a good idea for the development team to know what the QA and production database connection information is, so storing it in the web.config file doesn't make a whole lot of sense from a security perspective"

Setting aside the issues of trust, not to mention that you should NEVER use the administrative account in an application, and individual accounts should be limited in the damage they can do by a good DB design, the second point is predicated on the first, that is, it assumes that the same configuration file is used by everybody from development through production. So I'll just address that point, and it's simple: maintain a different configuration for each system and load it with each system. At least in our environment, our system administrators configure our production machines completly differently than developers do, so it's a little ridiculous to expect that the config would stay the same. In fact, Jonathan's suggesting that you do exactly that, only that you incur the following disadvantages:

If you screw up machine.config, you screw up the entire machine, not just your application.
If you're running multiple webapps that share components, the shared components have to use the same configuration.
Your system admin may want to set machine policy in machine.config, in which case you now have to merge your config with the administrator's.
In many hosting situations (i.e. shared hosting), you aren't allowed to touch machine.config.

The real headache with configuration comes with managing the configuration. You need to be sure that as the application changes, the configuration stays up to date, and you need to coordinate use of the <appSettings> section among developers, so that they don't both use the same setting for divergent purposes. Jonathan shows how to create a custom configuration section, which is an OK way to keep developers from stepping on each other, but that has the disadvantage that it's incredibly verbose. I think that the best way to handle this is to write a custom configuration handler (a class that implements IConfigurationSectionHandler). I won't go into the details of how to do that, but the log4net project has a nice implementation of just such a thing.

One other thing that has been on my mind is the general practice of having classes read their own configuration, i.e. directly calling ConfigurationSettings from your class. It seems to me that it's better to have external components hand in the configuration. For one, this improves testability, as your unit tests can alter a component's configuration for each test, rather than relying on the local configuration file to cover all the tests. The second reason is that this offers better decoupling that the method that Jonathan suggests, which is to store a path to a configuration section in <appSettings>. Again, log4net offers a nice model of how to do this. I've been searching for a pattern to document specifically what I mean here, Douglas Schmidt's Service Configurator is the one I'm familiar with but I'm not sure how well it lines up with the systems I'm working on.

10:06:23 AM permalink

Date	Title
1/23/2003	Why XML?
8/13/2002	Resolution for IE and Windows problems
8/10/2002	Supporting VS.NET and NAnt
5/11/2002	When do you stop unit testing?