Gordon Weakliem's Weblog

	Wednesday, April 23, 2003

Baseline magizine on Delta

Jon Udell and Phil Windley have discussed the new IT systems at Delta Airlines, but so far, there hasn't been a lot of depth in the coverage I've seen, and it's been mostly from a consumer's perspective. Today Joel Spolsky pointed at a Baseline Magizine article on Delta that I thought was completely fascinating. It's a long article, but really detailed. I'm really impressed that they actually explained yield management concepts and how they're supported. Great information, though I did have a problem with the numbers quoted for DL. While revenue per seat mile is an interesting number, what makes WN the most successful airline is cost per passenger seat mile. I'm no authority, but it seems like maximizing revenue is ultimately a short term move; more revenue means the consumers are paying more, and eventually they'll wise up, you'd think. One other interesting aspect was that DL is appparently using a messaging architecture throughout. I'd like to hear more about that. Ted Leung also pointed at a News.com article on JetBlue. Ted's right, the News.com article is pretty light on content, but it's really tantalizing, because JetBlue represents what many think is the future of air travel, and because their approach seems to be almost exactly the opposite of Delta's (as far as I know, Southwest is mostly old tech, but I don't think that Southwest really sees information systems as being as key as Delta does). It's trite to say it, but it will be very interesting to see whether DL is one of the winners in the big airline shakeout, and whether their IT investment will be a decisive factor.

8:30:48 PM permalink

Remember monitoring and deployment

I wish that I had read Ted Neward's latest 2 installments of Effective Enterprise Java, Remember deployment and Remember monitoring 2 years ago. These 2 pieces are probably the most valuable in the whole lot, at least if you're working in an environment where developers don't get to log on to the production boxes and don't like getting paged during dinner. In fact, that's the ideal situation, on the project I'm finishing up with, I was the build master, defacto system admin for our system test and integration systems, and on-call for most of the change tickets involving the .NET side of the system. I'd really like to not repeat those roles anytime soon, but the one thing that kept me somewhat sane was automation. The lesson I learned from this project was this: automate everything. It's so much better to spend a day or 2 figuring out how to automate a 1 hour process than to do that 1 hour process 30 times. Even if you spend 8 hours to save 8, I think it's worth it, because automation removes the chance to fat-finger a command line. You also need to think about what the firewall restrictions are going to be early on in the project, and you should probably even allocate capital and resources to maintain a testing environment that resembles your production environment down to the network level. A responsible firewall administrator shuts off all access to a network and then opens up only those ports and routes that are required to operate the application. On our project, we did a lot of scrambling when the production systems were first brought online because we hadn't thought much about firewall rules and found out at the last minute that things we'd assumed would work, wouldn't. For example, we'd hadn't thought about how we'd load code onto the servers, assuming that the admin could just map a drive to some staging server on our network. It actually ended up being the opposite, partly because the network people didn't want to open up the corporate LAN to any intruder who might manage to break into the system. So now, we push our code onto the server and the system admin loads it locally.

Monitoring was something that we always wanted to do, but kept procrastinating until events forced us to deal with it. Don't procrastinate; you should think about the operational aspects of the system from day one. If it's inconvenient to administer the system on a development box, it'll be a nightmare on the production box. Even with a snazzy debugger and profiler on your desktop, you won't be able to rely on them in production, so don't use them as a crutch. I would like to expand on Ted's idea of a "happy page" a bit, I think it's also helpful to be able to drill down and get more detail. Our happy pages indicate pass/fail at the top, but then provide diagnostic output farther down if anything went wrong. So if the system status is fail, you might get a message saying "the database is not reachable" along with suggestions as to why it might not be reachable (firewall misconfigured, server offline, etc.) and suggestions as to how to test it (telnet to foo.bar.com port 1433). Which leads into another item: if you put a message into a log that non-developers read, make sure that the message makes sense and is actionable. If an admin sees "java.net.NoRouteToHostException", they'll stop reading at "java" and page a developer. If the admin sees "the database is not reachable, check the network connection to foo.bar.com", they have a chance of being able to start doing some diagnosis on their own. Also, if you fill up an administrator's console with messages that they can't do anything about, they'll start to ignore them (or worse, they'll start paging developers), so don't put anything onto the admin's channel unless you want an administrator to do something about it. One way that you can handle this using log4j/log4net is to tailor your error messages by severity: if you get a fatal error, it's probably going to be an administrator that needs to respond to it, so make sure the admin can understand what they need to do when you're creating error messages at the FATAL or ERROR levels. One final note: if you're using a commercial monitoring tool, licenses are often sold by the number of URLs that the tool can monitor. For this reason, the fewer "happy pages" you have, the less it'll cost your project. In our system, each web service has a happy page, but there's a system wide page that aggregates all the services' diagnostics. We point the monitor at that page. The savings aren't trivial, IIRC, our tool's license costs something like $1000 per URL.

4:39:21 PM permalink

Date	Title
1/23/2003	Why XML?
8/13/2002	Resolution for IE and Windows problems
8/10/2002	Supporting VS.NET and NAnt
5/11/2002	When do you stop unit testing?