Searching for Significance: Are You Missing the True Results of Your A/B Tests?

Where marketing is science, A/B testing is like the homemade volcano of science fair projects. It is one of the most basic tools at a marketer’s disposal for learning about your audience – what they respond to, what their preferences are, or even what distracts them from the action you want them to take – and for getting the most out of your campaigns.

When you successfully run a test, how do you analyze the reports? The answer may not be as simple as the question sounds, so check out the following examples to make sure you aren’t missing key insights from your experimentation efforts.

Example 1: The No-Brainer

Some results are easy to read. Take this A/B test run using two versions of a landing page. In this example, the goal was to determine which landing page would convert at a higher rate based on one question being different on the form.

Example 1: A/B Test of Two Landing Pages

When reviewing the conversion rate of the two pages, Landing Page A had a several-percentage-point edge on Landing Page B, which is evident to the naked eye. If you back up that hunch by plugging the numbers into a statistical significance calculator like this one from Kissmetrics as shown in the image below, you will uncover that using Landing Page A will increase your overall form conversion rates with 100% certainty.

Example 1 plugged into the Statistical Significance calculator | Results are significant

What is statistical significance? If you took a Statistics class in college, dig back into the glossary of your mind to find that the exact definition is “the likelihood that a result or relationship is caused by something other than mere random chance.”

For those lucky enough to have filled your class schedule with other courses, you can think about this concept as it relates specifically to A/B tests in marketing. Statistical significance indicates that the change you made between the two versions of your asset produced dramatically different outcomes; so much so, that the difference cannot be chalked up to other possible causes.

In the above example, determining the winner is considered a no-brainer. One version looks better than the other, and some quick calculations confirm what jumps out on paper. When you get these types of conclusive answers, you can feel more than confident moving forward with the winning version.

Example 2: The Hidden Gem

In other cases, you will run A/B tests and the results may seem non-conclusive. The below chart shows an example report of two emails where we used subject line variations to optimize for opens.

Example 2: A/B Test of Two Subject Lines

The two open rates seem very similar, but, when you plug the numbers into a calculator, you will discover that the results are, indeed, significant. That’s why this example is called a Hidden Gem. On the surface, the variations made don’t seem to have moved the needle either way. But, when you dig a little deeper, you get insights that you can capitalize on.

Example 3: The Imposter

My third and final example starts out looking like either Example 1, a No-Brainer, or Example 2, a Hidden Gem, but, when you dig in, you find it is a mirage.

In this case, you may see reports like this chart from trying two slightly-varied banner ads:

Example 3A: A/B Test of Two Banner Ads

The conversion rate of Banner A looks a good bit higher than Banner B, so you could be tempted to rely purely on what you see on the page. But, if you input these results into a trusty calculator, you’ll uncover that the difference isn’t statistically significant and may be merely chalked up to chance.

Or, you could have something like this chart from trying two different subject lines in an email:

Example 3B: Second A/B Test of Two Subject Lines

Email A performed slightly better than Email B when looking at the open rates, and the sample size is quite large, so you may suspect that you have gotten a clear winner like we saw in Example 2. Alas, when you enter these numbers into the calculator as shown in this image, you will find that you do not have conclusive results.

Example 3B plugged into the Statistical Significance calculator | Results are not significant

Foiled again! So, what do you do when faced with an Imposter? Keep testing…either the same variations with a higher volume or a different variation. In each case, you will learn something and make your campaign better, and there’s a heap of value in that.

Well, that’s it for this statistics refresher. I hope you feel a bit more empowered now to analyze your reports and share your data-driven insights with others. As Horace once said, “Begin, be bold, and venture to be wise.” That’s good advice for us marketers and our testing strategies!