Weblog: Morgan Delagrange

Friday, May 02, 2003

Continuous integration
Rod has started a series on our Continuous Integration environment [1] [2]. I've also jotted down notes on our CI process for my weblog, but Rod is doing a superlative job, so I'll leave it to the master.

Just one comment on the series so far. When putting together a Continuous Integration environment, you want build failures to evoke one of two feelings:

1. guilt [...] 3 : a feeling of culpability for offenses

2. shame [...] 2 : a condition of humiliating disgrace or disrepute : IGNOMINY

Source: http://www.m-w.com

For most people, shame works better.

2:04:01 PM    comment []

JDK 1.4, now including the kitchen sink
James: Be careful of JDK 1.4

No doubt. However, James left out my favorite JDK 1.4 "feature", the built-in XML parser. How convenient! Unless of course, you want to upgrade or modify it in any way. Then you have to resort to the incredibly awkward Endorsed Standards Override Mechanism, which is the least hosting-friendly addition to Java in recent memory. While the Servlet APIs have in recent years made substantial improvements to the organization of their classloaders, the JDK seems to want to unravel this by providing the least amount of modularity possible.

After making the very painful transition to JDK 1.4 a few months ago, I found myself wishing for an LE version of that JDK. As far as I am concerned, the less APIs the better. I could do without JAXP, without SAX (and thanks very much for grabbing other organization's APIs as well as your own), without Xerces, and even without JDBC. I am capable of putting jars in the classpath on my own; I don't need Sun's help for that. Or at the very least, repackage the classes that don't belong to you! Throw me a bone here, Scott!

Making all those APIs modular does limit Sun's ability to intermingle them, but I think it's worth the tradeoff. Otherwise, I dread all the new APIs I'll have to work around for 1.5.

1:43:14 PM    comment []

Encoding, schmencoding
I was writing a character encoding FAQ for our company's Wiki when I came across something in the JAXP javadocs that has always puzzled me:

StreamResult Javadocs: Normally, a stream should be used rather than a reader, so that the transformer may use instructions contained in the transformation instructions to control the encoding.

JAXP has a similar warning for StreamSource as well.

Why is using character streams so ill-advised? The character encoding behaviour that XSL parsers should use seems pretty straightforward to me; everything should be keyed off of the output format specified in the XSLT template. If a character produced by the transformation is allowed by the desired encoding, output it (based on the user's preference) either as a single Java character or as a numeric entity. If the character is unsupported by the encoding, output it as a numeric entity. Isn't this is reasonable way to implement encoding?

I guess the point of the warning in the JAXP docs is that if you are given an arbitrary Writer, you can't be sure if it is backed by a StringBuffer or an underlying OutputStream. If it's a stream, it's certainly possible that the Writer will receive characters that the character-to-byte conversion of the OutputStream does not support.

However, it seems just as likely that, if you give the transformation an arbitrary OutputStream, the bytes produced will be subsequently misinterpreted by any code that converts them back into characters. In the simple case of performing an XSL transformation to a file, an OutputStream would definitely be safer. However if you were, for example, chaining the result of one transformation to another, using an OutputStream seems like an unncessary point of potential misconfiguration. I think JAXP's documentation is very poorly worded in this regard; the implication that OutputStreams are somehow safer seems false.

And on the topic of encoding, there is nothing convenient about the "convenience" classes java.io.FileReader and java.io.FileWriter. If a principal goal of Java's is platform-independence, then those two classes are failures. They really should have made those classes use UTF-8 encoding rather than the platform's default encoding. At work I see lots of encoding problems that are caused by developers who use those classes.

1:15:59 PM    comment []

Home

Code

Fun