In many companies—both large and small—getting staffing right is a challenge. Critical teams are always starved for resources, and the common peanut butter approach to distributing headcount rarely maps to an explicit strategy. In fact, over time, some teams will grow disproportionately large, leaving other teams (system test, anyone?) struggling to keep pace.

But why is it that staffing is so difficult to get right?

Theory of Constraints

In yesterday’s blog post, I referenced Eliyahu Goldratt’s seminal business book The Goal, and highlighted the Theory of Constraints. I’ll summarize the theory here (if you read yesterday, you can safely skip to the next section).

The Theory of Constraints is basically a management paradigm that promotes attention to bottleneck resources above all else. In any complex system that has multiple subsystems required to deliver something of value, there will be one or more bottleneck resources. A bottleneck resource is the limiting subsystem in the overall system. To increase the overall output of the system as a whole, you have to address the bottlenecks.

The book focuses on a manufacturing plant. Basically, there are different stations in the manufacturing process. Each makes a different part of the widget. To get more widgets through the system, the bottlenecks need to be sniffed out and addressed. The premise of the book is that the only activity that matters is that which supports the goal, in this case of getting more widgets out the door so the company can make more money. Anything that doesn’t contribute to getting widgets out the door is just wasted effort.

Applying Theory of Constraints to staffing

So how do we apply the Theory of Constraints to staffing? Software development is a complex manufacturing system. Instead of relying on building physical widgets, developers create code that is ultimately assembled into a product. Imagine a simple case where there are two software development teams, a quality assurance team, and a regressions team. To get a product out the door, each team must contribute their piece.

system

In this system, if the first software team is capable of delivering 7 widgets, the second software team 5 widgets, the QA team 3 widgets, and the regression team 4 widgets, then the overall output through the system is 3 widgets.

This is obviously a simplistic model where everything is dependent on every team. In a real environment, you would have multiple production flows through the system, each with a different set of dependencies and capacities.

How headcount allocation usually works

Most people don’t realize how headcount planning actually works in many companies. You probably imagine that there is some system to determine which heads are the most valuable. Maybe roadmaps are scrutinized and skills gaps identified. It could be that there is some advanced balancing math to make sure that each team stays in perfect harmony with surrounding teams.

In reality, that just isn’t the case. Most companies do their roadmap planning 12 to 18 months in advance, but budget planning typically only happens at the end of each year. Sure, budget plans include three-year projections, but they are redone every year so that most companies really operate year-by-year. So if the roadmaps are set before the headcount planning is complete, how does headcount allocation work?

Most managers have a solid understanding of overall demand-capacity gaps. That is to say that they know how much most teams are asked to do and understand how much they are actually capable of doing. The delta between the two is the gap. If you know the gaps, it is pretty straightforward to close them.

Demand and capacity

The problem with this model is that each team will experience a different capacity gap depending largely on what is upstream.

system2

Using the same model as previous, imagine that there are 13 input requests to the first team in the chain. That team has the capacity to do 7 things, so they see a gap of 6. The second team only gets a demand of 7 (based on what is actually coming through the system), so their gap is 2. The third team sees a gap of 2, and the regression team actually has a surplus because they are positioned immediately following the bottleneck resource.

Headcount planning tends to be done based on gaps, in which case the first team sees by far the biggest gap. So when spreading out new heads, the easy thing to do is to give more heads to the first software team. Of course, even if you add 20 heads to the first software team, the total throughput in the system is the same.

The first head added to the system should be to the QA team to increase capacity to 4. The next step is almost unconscionable—you would add a head to the QA team again and then to the team operating with a surplus to get the throughput to 5. After that you would spread resources across the three bottleneck teams. In fact, the team with the biggest gap ought not get a resource until 9 resources have been allocated.

Easier said than done

When the systems are easy and dependencies clear, this is pretty straightforward. But most companies don’t understand their manufacturing line very well. It is far easier to look at supply and demand and then to close gaps. And then most teams have a sense of fairness, so they tend to distribute heads evenly across teams in an attempt to placate the more personal ambitions of their managers.

But if Goldratt is right, and the key to success is subordinating everything to the bottleneck resource, managing without knowing the bottleneck is like driving blind. The first thing to do is to identify the different production workflows in your group or company. This is not an easy task. Chances are that you need to look at historical data and map dependencies between teams.

Once you have the dependencies, then you need to look at historical demand and capacity. Remember that the team’s capacity is not how many people are in the team but rather the team’s ability to get stuff out. If it takes 2 people to get a feature out, then the meaningful number is the features, not the headcount. If you measure throughput as a function of headcount, you will find rather unsurprisingly that the headcount in equals the headcount out.

Finally, remember that this is not meant to be overly precise (at least not at the outset). The objective is to gain understanding and then start to work to identify the bottleneck. As you clear one bottleneck, another will always present itself, so this is more an iterate paradigm than a final solution. But your job should be to sniff out these bottlenecks and to unblock them to constantly increase the throughput in the system.

The bottom line

The ideas here are extremely simple and yet most groups and companies intuit their way through headcount planning. Some of our bottom’s up roll-ups fool us into thinking that we have way more command over where resources are allocated and needed than we really do. Don’t fall victim to false security. Instead, take a look at your actual workflows and map out where things flow. Let the data guide you and you will be way ahead of the game.

[Today’s fun fact: Rats and mice are ticklish, and they will laugh when tickled. That is just weird.]