Universal Rule

Book Reviews

Sunday, November 24, 2002

The craft of benchmarking -- Comment()
My professional career started in my first real summer job, where I had to organize a course on vectorization. In a couple of weeks I had to learn how to optimize comptationally heavy codes for the vector processor attached to an IBM 3090 mainframe. In those days the RISC phenomenon had not yet really surfaced, Cray vector processors were the measure of supercomputing, and PC processors were considered to be unsuitable for heavy number crunching.

For several years, I had to teach and write about optimizing codes for a variety of platforms: a Cray X-MP (and later C90), a Convex 3800, an IBM SP (POWER3), a Cray T3E, a Compaq Alpha EV6 system, etc.

In a way, the vector processors were straightforward, because usually you had a big interleaved main memory, which you could regard as an large cache memory. With RISC architectures the picture was much more difficult. Of course, the compilers got better all the time, but always when a new architecture came to the market, the first compilers were quite inefficient.

In the final years of my career in code optimization (I moved to other topics about there years ago), I made a simple example in linear algebra to show the benefit of code optimization. I used this example in several user quides, where I wrote or edited the chapter on optimization. The example started with an unoptimized code, which produced about 1-4 megaflop/s (millions of floating-point operations per second). By several steps I finally arrived at the final version, which called machine-specific suproutine libraries. This code typically achieved 150-250 megaflop/s. Thus the final code was about 50-100 times faster than the original.

Nowadays I'm not keen on comparing the performance of systems, because often the speed is more dependent on the style of coding and the quality of the compiler than the actual speed of the processor. Sometimes a code developed originally for processor A performs really poorly on processor B.

In any case, there is not much point in making detailed benchmarks of current personal computers. The speed of the processor should be more that sufficient for most tasks. When the application feels slow, it is more often a matter of bad coding than lack of speed in the processor.