Ed Foster's Radio Weblog

Tuesday, January 13, 2004

Reader Voices: Banishing Benchmarks
Readers had some interesting things to say about my recent Weblog item concerning the rising number of Microsoft products that come with a censorship clause in the license agreement prohibiting publishing of benchmark results. But the comments I found most intriguing came from a Microsoft employee who recalled benchmarks he examined several years ago when he was lead tester for one of the language products.

"Some were done by magazines, some were done by people trying to write fast code, and some were done by people who just found the topic interesting," he wrote. "In a majority of cases, there were problems with how the benchmarks were done. Everything from code that didn't test what it said it was testing, code that was fully optimized away, just plain poor coding, or even building in unoptimized mode --yes, this did happen, multiple times. Sometimes the error favored one compiler or another, sometimes they hurt all compilers. The problem is that it's hard to come up with good benchmarks and to do them well, so some people do it poorly, but it's hard for readers to tell the difference."

"There's also the concern of giving competitors the opportunity to benchmark your software in the way that is most flattering to them, and then use that as part of their advertising campaign," the Microsoft employee added. "I know that we have tended to participate in objective benchmarks when we think the ground rules are acceptable, though I should note that, as for most companies, ‘acceptable’ always includes a ‘how well will we do?’ consideration."

The Microsoft employee was not speaking officially for the company, but his comments were certainly in line with what Microsoft officials -- and officials of other companies such as Oracle and BEA that have censorship clauses of their own -- have always said in defense of this practice. And it’s easy enough to see why software developers would take offense when they feel their product is being trashed by a dumb benchmark test. When it’s your baby, you want the world to see all the sterling qualities you know are there, and in only the best light.

Designing fair comparative benchmarks is very hard. That’s one of the reasons we see less testing done by the major trade publications than we used to. Vendors can always say a benchmark is biased, because any particular test you run will happen to be closer to one product’s sweet spot than another’s. But the point is not to design benchmarks that demonstrate products at their finely tuned best, but to produce data that will convey some information of relevance to potential customers. Using non-optimized code might be biased, but it might also tell those who read the results something they want to know.

Like all statistics, benchmarks can lie. And, like any other form of free speech, it’s up to the listeners to decide if they want to listen and how much credibility to put in test results from a particular source. Benchmark testers with the best of intentions will make mistakes, and those who don’t correct them will soon use up any credibility they have. That’s the way free and open public discourse works. It’s a system that’s worked fine in this country for the last few hundred years, why now make server-based software products the one exception where free speech is not allowed?

Again, I should say that these sneakwrap-based censorship clauses have never been enforced in a court of law against a customer or a publication. But with some of the other horrendous court decisions we’ve seen recently, I’m no longer so sure that it won’t ever happen. And the day that vendors establish the legal right to block publication of information that they think is biased against them is the day we cease to be a free society.

11:54:59 AM comment []

Home