Tuesday, February 26, 2002




Nicholas Riley talks about OS X stability today, commenting that he has experienced quite a number of crashes, instabilities and numerous other bugs. His experience has been very different than mine.

I would say that I'm a heavy duty user; I constantly rely upon OS X's ability to do many things at once-- including dealing with heavy simultaneous file I/O while also dealing with heavy swapping. I have many applications running simultaneously and regularly do high memory consumption tasks such as debugging software, building complex projects, and large scale graphics manipulation.

Since OS X 10.1 has shipped, I have had exactly one kernel panic. That particular kernel panic was due to my own stupidity; I had upgrading from 10.1.2 to 10.1.3, proceeded to copy several hundred megabytes from one drive to a firewire device, and pulled the plug on the firewire device in the middle of the copy. Not exactly a computer-friendly thing to do. The system paniced, but no data was lost and the system rebooted just fine.

I released XOptimize v0.42 on April 28, 2001 (jeez-- has it been that long? I should update that). The app still works fine on 10.1.3. Of the folks that have written to claim that XOptimize corrupted their hard drive, every single one had filesystem problems to start out with or had faulty hardware.

OS X is definitely more sensitive to hardware and disk problems that OS 9. As Nicholas mentions, that is because OS X tends to do a lot of things at once-- it tends to use all available system resources at all times and, as such, tends to uncover weaknesses in the physical system or filesystem much more rapidly than OS 9.

I suspect one of the reasons why my system is so rock solid is because I [coincidentally] do very little that requires kernel extensions beyond the primary drivers shipped by Apple. I use an iPod and have a standard Oxford 911 based external HD. Kernel drivers are nasty in that they are the one mechanism that effectively bonds user level processes into kernel space and can, hence, completely take out your system. I can understand how a random MP3 player could cause a problem-- the companies that make the MP3 players are notorious for making seemingly minor modifications in their hardware in the name of optimizing [i.e. lowering] the cost of construction. The unit that Apple received that is very likely not like the units that shipped six months before or six months after.

Digital cameras are quite similar. I have a Sony DSC-F505 camera-- one of the first that shipped-- and it was quite a long time before Apple shipped a version of OS X that wouldn't panic when I detached my camera. Other F505s worked fine. The difference was in the firmware version that Sony shipped to Apple vs. the version on my camera. It sucked.

Certainly, the kernel should never panic because some piece of hardware was disconnected from the USB bus. Having been involved in writing device drivers in the past (data capture board driver for a NeXTbus card that plugged into a cube), attempting to create a completely bulletproof driver is tediously and painfully hard. Especially when there is no way for you to duplicate the failures that some very small segment of your customers are experiencing.

Bottom line: These things should not happen, but they do. Before blaming Apple for all your woes, make sure it is actually Apple's fault.

If you have installed third party RAM, try running the system for a day or two without it and see if the crash goes away. Some RAM can behave in an extremely intermittent and borderline fashion; passing memory tests, but failing under heavy load in an operating system.

Check your filesystem!! I cannot reiterate this enough. I have seen much blame heaped upon Application developers, Apple, and numerous other people when the real problem has been a corrupted filesystem. Running fsck -y in single user mode is a start, but you really want to use a tool like Norton Utilities booted from a CD.
3:26:45 PM