“…and the little one said: ‘Roll over! Roll over!’” – traditional children’s song

Monte Carlo or bust

A couple of weeks ago we looked at Monte Carlo analysis. We saw how this technique can be used to forecast Sprint capacity, and to anticipate the likely completion schedule for a given backlog of work. Perhaps our most important “take-away” was that providing a range of estimates is not necessarily a matter of being vague. Indeed, the range from a Monte Carlo forecast can be more accurate and useful to a team in Sprint Planning than an absolute value, such as an average velocity or burn rate projection. An average can be precisely stated, but it will not necessarily be accurate. Averaging-out generally causes the texture of the available data to be lost.

However, using an average has the benefit of simplicity. It gives us only one number to deal with. In our earlier example, we knew the team velocities for seven Sprints, which were 114, 143, 116, 109, 127, 153, and 120 points. Since the average is 126 story points, the team might reasonably forecast a capacity of 126 points during Sprint Planning. Yet there wasn’t a single demonstrated occasion in which exactly 126 points were reduced in any Sprint at all. This simple and very precise estimate would, therefore, appear to be unreliable, and likely to prove inaccurate should it be employed.

If the team were to consider a Monte Carlo projection, on the other hand, they could deduce the probability of achieving a given forecast of work. Suppose that, during Sprint Planning, the team proposes to induct items totalling 125 story points into their Sprint Backlog. Since this forecast lies within the 126 point “budget” indicated by their average velocity, the team might feel bullish about their ability to do the work and to meet any associated Sprint Goal. Nevertheless, they prudently choose to verify their confidence by means of a Monte Carlo analysis. They run 1000 simulations to assess the likelihood of 125 points actually being completed in their Sprint of 10 working days.

table 1chart 1

Hmmm. It turns out that 543 out of 1000 runs completed in 10.00 days or less. That’s a 54.3% chance of the proposed 125-point backlog being completed within the Sprint time-box. The odds against a busted Sprint don’t seem to be much better than flipping a coin. This conclusion can be drawn from simulations using historical data, and to that extent it will likely prove to be accurate. While this analysis is enough to give our team pause, it also looks like a devil of a thing for them to use when revising their capacity.

In Sprint Planning, for example, work is either planned into a Sprint Backlog or it is not. An immutable Sprint Goal will be framed around the selection, and at any given point in time, it must be absolutely clear how much work is believed to remain. So, what can a 54.3% likelihood of 125 points being completed really mean for planning purposes? A team cannot shade work with a less than 54.3% chance of completion into their Sprint Backlog using a lighter color.

Stuff nuance then

What we need to remember is that the disposal of texture and nuance is advantageous when we genuinely want to get rid of it. The trick lies in knowing when and why to do so. In this situation, our team can use the enriched data from a Monte Carlo analysis to craft a better-informed policy about how much work they will take on.

For example, suppose that the 54.3% chance of completing a 125-point Sprint Backlog is unacceptable to the team. They say they want at least a 90% probability of completing a forecast before they will commit to a Sprint Goal which depends on this work. The Product Owner agrees to drop a 3-point user story from the proposal. It doesn’t seem like much of a concession, but the team run a Monte Carlo analysis for this slightly reduced forecast of 122 points.

table 2chart 2

Gosh, a total of 994 runs now completed in 10 days or less. That’s a 99.4% likelihood of completing the envisioned backlog of work, and a very significant improvement on the 54.3% chance the team would have had if they took on 3 points more. It appears that even a slight revision downwards can be enough to improve confidence substantially, based on the existing historical data.

Yet our cautious and observant team can also see that only 4 runs out of 1000 completed in less than 9.5 days. This suggests that if there were to be a significant and unforeseen event which impeded the team by even just half a day, then the chances of success would shrink awfully, to 0.4%. The team feel very nervous about this risk. What to do?

Oh, Mr. Product Owner Sir

It can be tempting to build in in a time-buffer which would minimize the risk to any commitment made. For example, the team might adopt a policy of reserving a full day each and every Sprint for handling unforeseen eventualities. By running more simulations, they discover that a backlog of 110 points would have a 98.7% chance of completion within 9 days. That’s more plausible, they feel, and so they take this revised expectation to the Product Owner.

Unfortunately, the PO balks at the idea of eliminating another 12 points-worth of work from the Sprint. He notes that there is a 99.4% likelihood of completing the present 122-point forecast, and complains that reducing capacity to 110 points would be tantamount to “giving the team a day off”. He snarls at them over his monstrous gyrating cigar and grizzled mustache, while his bushy eyebrows frame a cold, dead stare. What to do now?

Well, in Scrum, no-one can force the Development Team to take on more work than they feel reasonably able to complete. The team would be perfectly within their rights to play hard-ball and insist on their one-day buffer, if they genuinely thought it best to do so. Moreover, a good Product Owner should acknowledge the professionalism of the team and of team members’ ability to determine what they can and cannot reasonably commit to. Yet the team will also recognize that it is incumbent upon them to help the PO to maximize value delivery.

They might, therefore, undertake to bring work forward, into the Sprint, should the extra day actually prove to be available. They could promise to make a just-in-time decision on the matter at the beginning of Day 10. In truth, many teams adopt a similar policy of reserving a certain amount of capacity during each iteration. However, there are a number of problems in doing so. For one thing, the buffer which is reserved may not be enough. What if two days are lost due to issues which are by definition unforeseen? For another, it reduces transparency by introducing a fudge-factor which obliterates any lessons that might be drawn from careful simulations of completed work based on actual historical data. Thirdly, it reduces trust between the Development Team and its stakeholders. It creates the impression that the team are prone to accumulating and then sitting on reserve capacity. It brews suspicions that they build in slack, perhaps in order to coast, and that they cannot be entirely trusted to do an honest day’s work.

A better Sprint Planning policy

Let’s go back to the initial forecast of 125 points, and its grim prospect of a mere 54.3% likelihood of completion. Can the team develop a sensible and sustainable Sprint Planning policy which does not require them to game these rather bleak odds?

What they can do is to flex scope, rather than time, in order to mitigate risk. Remember that in Scrum, each well-formed Product Backlog Item ought to have a value. A relative value might allow an informed choice to be made about which items can be scrubbed – if push comes to shove – while preserving the Sprint Goal. The ability to qualify items in terms of MoSCoW priority (Must, Should, Could, Won’t) can also be used to this effect. Interestingly, the MoSCoW approach is sometimes accompanied with a recommendation that no more than 60% of items within a time-box ought to be mandatory. It seems that a workable and sensible policy could be in the offing.

Suppose that the team did adopt the policy of allowing no more than 60% of the items within a Sprint forecast to be “must haves” essential to the Sprint Goal. For an average velocity and projected budget of 126 story points, this would equate to a maximum of 76 points of mandatory work. The remaining 50 points of work might or might not articulate to the Goal, but would not be essential to it.

Running a Monte Carlo simulation indicates that 76 points of work should be thoroughly reduced in 6.30 days. All 1000 runs complete in that period. Of course, the team might not decide to action those mandatory items in the first 6.3 days of the Sprint. They may interleave them with various nice-to-haves for tactical or technical reasons. The important thing to take note of is that it gives them flexibility in making implementation decisions, and a policy a sensible Product Owner can reasonably buy into. Clearly, though, it is now incumbent upon a wise PO to value work according to this scheme, and not to insist that “all requirements are must-haves”. Inadequate scope-flex will pressure the team into making alternative contingency arrangements…such as reserving an opaque buffer of time instead. This is a healthier partnership which is being offered.

In other words, the team can use Monte Carlo analysis to determine not only how much work they should take on, but also what portion of that amount ought to articulate to a commitment such as a Sprint Goal. If 60% is thought to be suboptimal, they can improve that figure in light of further simulations with suitably revised parameters. Their Sprint Planning policy will be then an informed one, and backed up with the available data rather than averages and rules of thumb.

An important consideration

Any policy which incorporates scope-flex does, however, raise a fresh concern. Can non-mandatory work be actioned during a Sprint without the expectation of it being completed within that same iteration?

Logically of course, it can. There will, after all, be no escalation of commitment. A “nice-to-have” requirement cannot suddenly present hazard to business and become mandatory through the act of being brought into progress by a Development Team. Whether it is actually a good idea for that team to begin work which subsequently remains unfinished is a different matter. It will need to be re-estimated and returned to the Product Backlog. Any value invested in it will decay as it languishes incomplete, and technical debt may accrue. The work can be expected to fossilize as the product continues to evolve around it. It would be better to choose a small item, perhaps even a less valuable one, with the expectation of completing it quickly and of releasing the associated value.

However, if the team – during Product Backlog refinement – determine that a large and comparatively valuable item is quite likely to be planned into the next Sprint, then it may be prudent to start work on it if the Goal has already been met, and “roll it over” unfinished into the next. It may be better to gamble on unfinished work being so planned, rather than for the team to be idle. The risk could be worth it and it might prove to be the least wasteful option. The item will be re-estimated at the end of the Sprint to show the reduced work remaining. It will be returned to the Product Backlog and (hopefully) selected during Sprint Planning. Any other items which have been tentatively earmarked for the Sprint during Product Backlog refinement may have to be reconsidered since the delinquent work will still require some effort to complete. When work is “rolled over”, an anticipated item of equivalent size may have to “fall out” of a forecast if the Sprint budget is to be observed.

This is an interesting proposition, given that “rolling work over” into the next Sprint is generally held to be an antipattern. It is a despised practice, commonly linked to a weak sense of commitment and poor agile discipline. Sprints become meaningless when work just rolls over from one to the next, and their Goals turn to vapor. Yet the commencement and roll-over of unfinished items may be the best possible use of a team’s time, once critical work has been done and the Sprint Goal has actually been met. In short, at that point, the team might reasonably focus on optimizing the flow of work across the Sprint boundary.

Conclusion

In effect, the achievement of the Sprint Goal can be a trigger for re-planning. When that happens, the Development Team may reconsider how the remainder of the Sprint time-box should be best utilized. Strictly speaking, they can do pretty much anything they want, of course. They could throw a party. They could go on training courses or remediate technical debt. On the other hand, if they have outstanding “nice-to-have” items which were planned into their Sprint Backlog, then they might be expected to work on them, and that it will be their policy to do so. Those items were forecast for completion, and their actioning has been predicted even if they do not represent a commitment. Should it be more valuable for the team to work on something else instead, then the remaining work may be traded out of scope to accommodate this new requirement.

Whatever the remaining work for a team might be, they should consider how best to optimize flow across the Sprint boundary, and to minimize waste. Monte Carlo analysis can be used to baseline a Sprint Planning policy and to establish the associated budgets. A sensible policy inclusion is to favor work which can actually be completed within the Sprint time-box over that which cannot. Remember that in lean and agile practice there is no kudos for bringing work into progress, only in delivering it to a standard of release quality.