Kevin Schofield's Weblog

Kevin Schofield's Weblog
Musings on life, kids, work, the Internet, Microsoft, politics, orcas, etc.

Home

Microsoft Research

Danceworks Studio

Thursday, September 29, 2005

Curves and Stack Ranking Are Not Evil
OK, I'll admit it: I read Mini-Microsoft just like everyone else. While I certainly don't agree with everything that he says, I think that open, frank conversation is a Good Thing, and confronting problems, real or perceived (and it's always a mix of both) is also a Good Thing. So I'm glad he does what he does.

He's been off ranting for a while about the evils of "grading on the curve" and of stack ranking (or as he prefers to call it, "rank and yank") though in both cases he's very light on the details of how it actually works at Microsoft and very quick to dismiss both of these practices as evil. But the truth is, neither of them is getting a fair hearing in this court of public opinion, and if we're really going to discuss tossing them, we need to be super clear about what stack ranking, and performance reviews "on a curve" really mean -- both in theory and in practice.

Let me just cut to the executive summary of my opinion first, because it's going to take me a fair amount of prose to get there through reason: stack ranking and the curve are tools. They're good, useful diagnostics that point out inherent biases. They should NEVER be the only tools you use to do performance reviews, but they absolutely have their role, alongside other ones, in a fair performance review system.

Let's start with stack ranking. The exercise itself, in a vacuum, is actually very simple: it's the lifeboat drill and ideally it's done after all the managers have already determined everyone's performance review scores, promotions, bonuses, etc. Get your managers in a room, put aside all the data about review scores, ladder levels, etc. and within a particular discipline rank the people in the team from "most valuable" to "least valuable." How much positive impact do they have on the team's results, and how much potential do they have to continue to do that, and more, in the future? It's an incredibly difficult exercise, and every time I've had to do it, there's always been a LOT of arguing, which I think is healthy. When you're finally done (whew), you go back and put all that other information -- review scores, levels, promotions... next to each name. Things immediately jump out at you; you see people who are ready (or nearly ready) to be promoted. You see people who are at a high level but are just not delivering compared to their peers, or people chronically under-rewarded despite blowing away all of their peers. You see when one managers gives systematically higher review scores for the same level of work, or when people in one group are at higher levels than people doing equivalent work in another group. You see people hired in two years ago being paid less than someone who just started because the initial salaries needed to get more competitive but the internal compensation system is lagging behind.

And you get an opportunity to tweak and correct things, to fix injustices, before all the numbers are final. It's incredibly valuable in that way.

Now there are lots of ways to abuse a stack ranking exercise. You can use the ranking system to hand out review scores. You can decide the fate of someone's career based upon a single ranking exercise. You can do the stack ranking before managers have had the time to assign review scores to people, and invariably bias that process. Lots of other bad things come to mind. I am sure that there are lazy managers who do that. I have never done that, and I never will. Stack ranking is not the review system, and it never should be. It is one way to look at the results of the review system and find mistakes and trends. It is one of MANY ways to do that. After I get all of the numbers for my team, I sort and collate them at least 20 different ways looking for mistakes, biases, and trends. It's an incredibly difficult thing to do, and the results are never perfect because they are always reliant upon human judgment and imperfect information. But it's the best that I know how to do this, it isn't reliant upon any one thing, and there are checks and balances.

Now, on to "the curve." I'm going to start by digressing for a moment. There is a scene in the book "Jurassic Park" where they are testing the hypothesis that the genetically-engineered dinosaurs are incapable of breeding. The keeper brings up a histogram of real-time measurements of the dinosaurs they are tracking (all one species), and it forms a nice little normal curve. The mathematician in the group immediately concludes that the dinosaurs are definitely breeding. Why? Because they were all genetically engineered from the same DNA, and thus they should all be the same height. The only way there could be a normal distribution is if they're breeding, becasue that's what life does -- it breeds diversity.

OK, back to grading on the curve. In our minds, we like to think that Microsoft is like Lake Wobegon: all the children are above average. Yet, there are countless studies that show that there are real, fundamental differences in the performance of software developers (and all other fields I would assume, but I can personally attest that it's been studied to death for software developers -- see Brooks, the Mythical Man-Month). But even if Microsoft really could hire absolutely the cream of the crop, never hire a single "just above average" person, and never make a hiring mistake, even with all that, there are all the other factors. People get sick or depressed. People lose interest in their jobs, or get attracted to other interests. People find significant others, or break up with them. People get tired. It all happens, it all affects people's job performance, and the result of all that is that if you look across Microsoft as a whole and plot the employees' performance, you're going to get something that looks like a normal curve. Now it may get skewed upwards; if recruiting and the hiring managers did their job and you took Microsoft employees vs. some run-of-the-mill IT shop, I would hope that our curve skews higher. But it's still approximately a normal distribution, because thats' ALWAYS what life gives you when you look at lage numbers.

And that's the catch: the larger the group, the more you expect to see the distribution fit the curve. And the smaller the group, the less you can expect it to. There are a couple of corollaries to this:

1. If it doesn't look like a normal curve in the large, then something is wrong. You have bias, some people are gaming the system, you've set the wrong expectations with your employees and your managers, or something else unnatural is going on.

2. You can't force a small group to fit a curve. And it isn't a black/white thing: it's not like it either fits, or it doesn't; it's a measurement of "how close." Once again: the larger the group, the closer you should expect it to be (or something's wrong). In a group of less than 8 people, I barely pay attention to the curve unless things are completely out of whack (I had that happen once - long story).

3. There is one exception to rule #2: over the long term, even small groups will meet the curve. Meaning: take all the performance reviews in a single small group over several review periods, and you'll see the curve emerge. Every group has its ups and downs, and every person in every group has their ups and downs.

The curve, when used right, is a super useful diagnostic tool -- just like the stack ranking. When used wrong, as a prescriptive tool or when applied to a small group, the results are often horribly unjust. But let's be clear here: the curve isn't evil; what's evil are bad managers who force a tool to be used in the wrong way.

Now I'm not going to weasel out; I'm going come right out and say it: if a large group doesn't fit close enough to the curve, then something's wrong and a responsible management team will go fix it. It's heart-wrenching for the managers to go do that (because it almost always involves correcting downward for grade-inflation) but it's the right thing to do because left uncorrected it's unsustainable in the long term. It takes a lot of investigation, discussion, and just plain work, but it is the Right Thing to do.

But on the other hand, if your manager tells you that your group of five people needs to fit the curve, tell him or her to stuff it. That's right, refuse to do it. The manager that tells you to do something like that is either lazy or ignorant. Go talk to HR. That's what they're there for.

And just two closing notes: first, Microsoft doesn't use a perfect normal curve for analyzing its performance review distribution. I don't have the data to tell you exactly what it looks like but it's definitely been adjusted up. Second, even at the low end of the curve, the jobs at Micosoft pay well. Everyone wants more money, and you can't blame them. (I want more money too! :-) But the pay is competitive with the industry, and no one is being paid slave wages. Yes, you hear reports of people leaving for many reasons, but when was the last time you heard of someone leaving Microsoft because the pay was lousy?

OK, so to come full circle: neither the curve, nor stack ranking, are inherently evil. I am sure that there are managers out there who abuse them. Don't be one of them, and if you meet one, call them on it. Use these two practices as tools, use them well and thoughtfully to be a better manager, and insist that your peers and supervisors do too.

9:31:54 PM comment []

© Copyright 2005 Kevin Schofield.
Last update: 10/1/2005; 11:39:55 AM.

September 2005

Sun Mon Tue Wed Thu Fri Sat

1 2 3

4 5 6 7 8 9 10

11 12 13 14 15 16 17

18 19 20 21 22 23 24

25 26 27 28 29 30

Aug Oct