Applying statistical testing for NPS can be confusing. In this post, we’ll define the problem, break down the mechanics of how to solve it, and avoid math equations to keep things conceptual. In this way, it will be easier to understand what’s going on with stat testing of NPS scores.

The Issue with NPS

The first issue to understand is that we cannot apply a statistical test to an NPS score directly, because an NPS score has no discernable distribution. That is because many, many combinations of Detractors and Promoters can produce the same exact NPS Score.

Since we cannot compute the margin of error from an observed NPS alone, we aren’t applying the testing to the NPS Score itself, rather we are testing the distributive patterns of the measured incidence of Promoters and Detractors. That’s the first concept to get a handle on: we’re not testing “one number” despite the claims from the Reichheld disciples. We’re stat testing the component measures of the underlying subgroups of responses.

So, what we’re doing is trying to gauge the error (or precision) around the level of Promoters and Detractors. The Passives are in there too, but we’ll set them aside for the moment since they aren’t part of the calculation.

Think of it like drawing gumballs out of a bag. There’s a certain number of green gumballs for Promoters, red gumballs for Detractors and yellow gumballs for Passives. There are little tiny numbers on the gumballs which tell another level of information within the color, but that’s not important for the purpose. The main focus is to keep track of the three colors as they are drawn from the bag.

500 Gumballs

Provided you are making a more than trivial number of draws (say about 30 or more), it’s expected that what’s draw out is considered normal and close to what the true distribution of all the gumballs is, revealing a lot about the amounts of Promoters and Detractors.

Suppose it’s a really big Santa Claus-sized bag that’s nearly full, and 500 gumballs will be drawn.

  • 250 green gumballs (Promoters), which is 50% of the draws
  • 167 yellow gumballs (Passives) which is 33% of the draws
  • 83 red gumballs (Detractors) which is 17% of the draws

This results in an NPS score of 33, computed by the 50 percent minus the 17 percent. Though it’s good to know, what’s really interesting is the underlying proportions of the 500 draws made, and their differences. Those are the measures which have distributions that can be teased out and worked with.

Calculating the Standard Error and Confidence Level

So, what we need to do is understand the error and confidence level of the draws in order to do the statistical test. Since we don’t know the true proportions of the population, estimations must be based off the draws. This is identical to what we do with a normal Top2 Box Score or percent of “Yes” in a “Yes/No” type question. In this case, proportions are being estimated. This is a variation where we are looking at the difference between two dependent proportions. If the percentage of Promoters goes up, the percentage of Detractors must stay the same or go down, assuming that Passives hold.

As such, we need to calculate the standard error of the difference in the proportions drawn for green gumballs and red gumballs. There is an underlying equation, but suffice it to say that the comparison is in light of the sample size’s proportions. The more gumballs drawn out of the bag, the better the estimate of the gumball proportions will be.

Once the standard error is calculated, (which is 3.3 percent) combined with the information of how confident we would like to be (90%, 95% or 99%), the result estimate of the margin error is +/-6.57 for the first set of draws.

This is interesting as it gives us an error band around our NPS of 33, but what we really need is that standard error to move forward for the period-to-period difference test.

Another Giant Bag of Gumballs

Santa brings back his bag of gumballs the next Christmas and asks for 500 gumballs to be drawn. Since one measurement is being tested against another, the process will be repeated for the second measurement. Crossing our fingers and hoping to get more green gumballs and fewer of the red gumballs, (and fewer of the yellow ones too), the draw comes out like this:

  • 200 green gumballs (Promoters), which is 40% of the draws
  • 175 yellow gumballs (Passives) which is 35% of the draws
  • 125 red gumballs (Detractors) which is 25% of the draws

Rats! There’s a lot fewer green gumballs than last year, with more red (and a little more yellow). This is an NPS of 15. Ugh. So, was it just chance or did Santa change up the mix of flavors from last year? Check your stocking for coal, I guess. Or keep going with our statistical approach and discover the answer…

Getting the Combined Standard Error

Remember that each survey will have its own margin of error, and the error for each must be accounted for. Both times Santa came around, it was a separate draw.

Running the same equation, the new measurement has a standard error of 3.5 percent; very close to last year. If Santa had said there could only be 200 total draws this year, that error would go much higher. But for now, it’s almost the same because 500 draws are being made again.

He never said whether the mix of gumballs was the same or not, so the differences in the standard errors of the two proportional differences require acknowledgment in light of each base size.

When the standard error of the difference in the proportions drew for green gumballs and red gumballs is calculated again, the standard error for measurement #1 and for measurement #2 is combined to get an overall standard error for the difference of the comparison.

Technically, this works by taking the square root of the respective squared standard errors and essentially the errors are rolled up.

The worst-case scenario is that we will see roughly double the error of a single measure, like Top Box percent, compared across the two periods. But don’t worry, it’s often lower than this, particularly when the sample sizes are similar.

Look at the Distribution Rather Than the Actual NPS Score

Now with the combined standard error for the two scores, we just need to understand how big it is compared to the actual difference in scores and check to see where that falls on a normal bell curve of results.

When looking at the ratio of the actual 18 point difference (33 minus 15) to the standard error of the difference, and where that falls along the normal distribution, the likelihood that this mix was drawn by chance is apparent. This result occurs by chance only .02 percent, or two in ten thousand. Indeed, Santa changed up the mix of gumballs!

At the end of the process, just know that NPS scores can’t be tested directly, but can come to conclusions when testing the distributions and the associated statistical error of the component groups, namely the proportion of Promoters and Detractors.

It’s a test of dependent proportions, coming from a multinomial distribution. Looking at the combined error in the pair of differences, so the combined errors tend to “roll up.” Similar to running a pair of Box percent Scores across time periods, just with more error than a single point estimate Z test.

The answers needed to drive business improvement are out there, waiting to be utilized. Remembering the lesson from Santa’s bag will bring you one step closer to understanding how to properly leverage data, and in the end achieve higher NPS scores.