8 25 blog


Let me share the lessons I learned the hard way about ad testing.

I’m going to focus primarily on social (display) ads, which I’ve generally found trickier to optimize than search ads. I have an Adwords background, and search ads have a couple of intrinsic pluses:

  1. By definition, people are searching for a product or service similar to what you offer
  2. Adwords includes experiment functionality within the platform
  3. Keyword bidding is, more or less, specific and intuitive (If you’re selling cheese sandwiches, you probably want to bid on “cheese sandwiches.”)

If a search ad is not converting well for you right out of the gate, sure there might be keyword or bidding problems, but most likely the fault lies with the landing page (design, offer, clarity, etc.). This is because you can be reasonably sure that those who click on your ad have some kind of proactive interest in the keywords on which you’re bidding. The object of your search optimization is not so much to find the right target, but mostly to get more targeted traffic for your money.

Social ads, by contrast, are display ads. They’re interruptive. Meaning, the average interest of a visitor who clicks on this type of ad is more casual. You can create and target them as professionally as possible and still have a hard time getting traction.

Social ads (particularly Twitter and Facebook) allow you to target on “interests” and “behaviors”, and this targeting is deceptively easy to set up. I’m trying to sell baby bottles…I turn to Facebook ads, and lo, I see a targetable interest group for “baby products.” Harrah, my job is done! I can just set my ads up to target young women interested in baby products, and then sit back and count my money!

…said no digital advertiser, ever.

After much trial and error, it became necessary for me to set up a formal split test procedure on social advertising just to get ads to convert at all, let alone to optimize conversions.

After making nearly every conceivable mistake in the book, I seek to make sure others don’t fall into the same potholes. So, if you want to ruin your social ad campaign, and drive it into the ground, by all means do any of the following:

1) Run Informal Trial-and-Error Rather than Hypothesizing and Tracking

This is a sin of sloth. If you’re managing many, many ads, you’re tempted to just tinker with it and then move on, assuming that now it’ll work better. Controls, variables, and tracking are a pain.

There are some split testing platforms that help automate this process, like AdEspresso. These are very helpful for the marketing departments that have that kind of cash lying around. For us in the SMB world, this will be manual.

Bite the bullet and set this up now. Open up Excel, and do like this:


Pay no attention to the fact that these experiments only last a couple of days. That’s not nearly long enough of a time period. I’ll get to that later.

Come up with your first few hypotheses in advance, and don’t be afraid to shuffle the scheduling if a particular result suggests a new line of inquiry.

This leads me to the second way you can really mess up your testing:

2) Do Not Choose Specific, Relevant Metrics

Social ad platforms and Google Analytics offer an overwhelming variety of metrics to track. Getting data is easy; focusing on useful intelligence is hard.

By the way, I’m assuming that you’re tagging all your traffic to register as a campaign through Google Analytics, and that you’ve installed the proper conversion and audience pixels on your website. This is a necessary step. If you haven’t done this yet, you need to stop reading now and make sure this is set up. I’ll wait.

Which metrics you want to track will depend on what you want to test, but when we’re talking about converting traffic we usually talk about:

  • Conversion Rate (And Cost/Conversion)
  • Bounce Rate
  • Funnel Drop-off Rate (e.g. step 2 drop off, step 3 drop off, etc.)
  • To a lesser extent, Time on Site and Pages per Session

When we’re optimizing costs, we’ll also talk about metrics like cost/click, although cost metrics work a little differently on social platforms than they do on search.

When you allow a social platform like Facebook to optimize your bidding for a certain result (say, conversions), it does this by showing your ad to a concentrated subset of your defined target that is more likely than average to convert. Web clicks may become more expensive when focusing on this special, conversion-rich group, and that’s okay because we expect them to convert at a higher rate than average. So depending on your objective and bidding scenario, a comparatively high CPC may not be a negative indicator.

So far, we’ve wrecked our advertising by failing to structure, and by watching the wrong metrics. We now move on to the third form of destruction:

3) Run Tests On the Insignificant and Piddly

As Peep Laja of ConversionXL so eloquently puts it, “There is no best color.” Ad experiments take time and cost money. The best advertisers are the ones who know which experiments to conduct first. Spending a week testing a green background, and another week testing a blue background and another week testing a fuchsia background is dumb for several reasons. Using Fuchsia is one of them.

I have found some success by starting out using the broadest possible relevant target audience, and then experimenting with how I can step down into a more specific target. My experiments seek to slice out less relevant components of an audience until what I have left is highly effective.

Here are some generally good tests to run at the beginning, in that they tend to move the needle to an above average extent:

  • Desktop vs mobile device – many landing pages don’t convert well for mobile traffic specifically, and desktop traffic tends to be more expensive
  • Newsfeed vs column vs ad network placement – newsfeed ads have worked best for me so far, but again, more expensive
  • Age – Look at your initial conversion reports and cut out disinterested age groups
  • Custom audiences – test with page fans, newsletter-matched audiences and web retargeting
  • Headline and Copy – make sure you use this not just to entice, but also to qualify and to set the expectation for what they will see on the landing page.

Once you’ve tested with these major groups, then set up your experiment schedule and start honing your targeting. One concept that’s worked well for me on one or two occasions is micro-targeting – targeting by mentioning a specific competitive product or personality as an interest to target.

4) And Above All, End Experiments Too Early

Of all the pitfalls in advertising, this is one of the hardest for me to deal with. The pressure from clients to see immediate results is directly at odds with the concept of establishing the significance of a test result.

Statistics, for those of us who slept through the class, is the science of avoiding false positives. If you begin a test and it has amazing results, did the test cause those results or are you just lucky?

There are certain principles and calculations that tell us, for example, that in order to make sure that a certain effect we are seeing is actually real (i.e. not due to chance), we have to gather a data set of an appropriate size. One such tool to tell you exactly how big the sample needs to be can be found here.

Most people gather far too little data to meet the standard of 95% confidence (i.e. no more than 5% risk of a false positive). Some have even said that professional testing tools like Optimizely declare winners too soon. To give you a ballpark range, you would generally look to collect enough data to have seen at least 100 conversions for both the control and variable groups. If your site converts at, say, 5%, then you’re looking at 2,000 visitors to the control and the variation, or 4,000 total.

That’s a lot of visitors, and costs an awful lot of money. In the real world, I have had to make optimization decisions before receiving a significant sample size. I’ve also gotten hammered on this. I’ll make a change that looks great in the first couple of days, because of a hot streak, but the upside eventually disappears over time.

We can’t always wait for a 95% confidence, and how much data you collect before you declare a winner is a risk you need to be wary of. The good news is that huge differences in results (e.g. the variable shows a 500% increase over the control) require smaller samples to prove. So if pressed, I will focus first on factors that move the needle greatly (device, placement, custom audience, etc.), and cut those experiments short because I’m looking for guidance and direction, rather than certainty. Once I’ve established a viable initial campaign, I will run experiments for longer periods.

Last tip with the testing period: think in full weeks. Certain times during the week will get you more clicks and conversions than other times. So if your test lasts only a few days, you’re at risk of having picked an especially productive or unproductive couple of days. This is called seasonality. Taking a sample from no less than a full week helps to eliminate this effect.