Show an A/B test case study to a group of 12 people and ask them why they thought the variation won. It’s possible you could get 12 different answers.
This is called storytelling, and it’s common in the optimization space.
Why Did This Test Win?
I used to train new consultants on advanced optimization techniques as part of my larger role in a consulting organization.
One of the first things I had new consultants do was to take any test that they have not seen the results for, and to explain to me why every version in that test was the best option and why it would clearly win.
Every time we started this game the consultant would say it was impossible and complain how hard it would be, but by the time we were on our 4th or 5th test they would start to see how incredibly easy it was to come up with these explanations:
- “This version helps resonate with people because it improve the value proposition and because it removes clutter.”
- “This version helps add more to the buyers path by reinforcing all the key selling points prior to them being forced to take an action.”
- “This versions use of colors helps drive more product awareness and adds to the brand.”
Keep in mind they have not seen the results and had no real reason what works or what doesn’t, these stories were just narratives they were creating for the sake of creating narratives.
The second part of this exercise was then to have them talk with the people who worked on the test and to see the results, and see how often those exact same stories were told by them. How they always used the same structure to explain something no matter how often. Grabbing a random sentence off of a jargon generator has as much relevance (and often is more coherent) yet people so desperately want to believe their narrative.
You yourself can do this – just go check out the comments on just about “case study” and you will inevitably see the same tired explanations played out.
Grasping at Straws to Explain A/B/n Test Results
If you want true humor go look fake intelligentsia sites like WhichTestWon and Leadpages which actually encourage this behavior.
This behavior is a real plague in our industry. They ask people to reason why and then show results. Even worse they present extremely limited data set with biased metrics and then expect anyone to get anything from their “case studies.”
A big red flag that someone is using testing to make themselves look good instead of using testing to actually accomplish something – and make no mistake, those are often opposing goals – is how much they want to create a story around their actions. These people are using the data to choose which story they prefer, but without fail the story itself is decided on well before the data is available.
I used this exercise to help people see just how often and how easy it was for people to create a story. In reality the data in no way supported any story because it couldn’t. One of the greatest mistakes that people make is confusing the “why” that they generate in their head with the actual experience that they test.
You can arrive at an experience in any number of ways and story you tell yourself is just the mental model you used to create it. The experience is independent of that model yet people constantly are unable to dissociate them.
Science Doesn’t Care Whether or Not You Believe In It
When you reach the conclusion of any test, no matter how many experiences you have in the test, the only data available is simply positive or negative relative to a change, independent of why you thought you made a change. To use one of my favorite quotes, this one from Neil deGrasse Tyson, “The good thing about science is that it’s true whether or not you believe in it.”
No matter how much you believe in your mental model, the data doesn’t care and has nothing to do with it.
There is no mathematical way from a test to even describe correlation, let alone causation. The best we can hope for is mathematical influence of factors, and even that requires a massive data set and hard core discipline (as anyone who has gone past cursory ANOVA analysis can attest). It is just a single data point, positive or negative, and yet people hold onto these stories, no matter what.
The Narrative Fallacy
What is actually being presented is what is known as narrative fallacy – the need people have to weave facts into a story so that they can pretend to have a deeper understanding of the world. Humans – and animals – are wired to need to feel like they control their world and their environment, to mitigate the perception of randomness, and to try and make order of chaos.
It is impossible to discuss the narrative fallacy without talking about the work of Nassim Taleb and modern economic theory. While he ascribes macro-level impact to the need to create post-hoc rationalizations, I especially love his own story of how he first was taught the narrative fallacy.
“When I was about seven, my schoolteacher showed us a painting of an assembly of impecunious Frenchmen in the Middle Ages at a banquet held by one of their benefactors, some benevolent king, as I recall. They were holding the soup bowls to their lips.
The schoolteacher asked me why they had their noses in the bowls and I answered, “Because they were not taught manners.” She replied, “Wrong. The reason is that they are hungry.”
I felt stupid at not having thought of this, but I could not understand what made one explanation more likely than the other, or why we weren’t both wrong (there was no, or little, silverware at the time, which seems the most likely explanation).”
Our stories about what we see says far more about us then it does the facts.
Just as Taleb had a form of manners pounded in his head, he described the narration of the lack of manners to the people in the picture. In all cases, the rationalization of the information presented is coming after the fact and is not based on the data but the observer. It in no way changes the actual action. His view of the lack of manners does not change the painting, but it does change his view of it, and yet there was no real additional information added.
Other Forms of Post Hoc Rationalizations
Post hoc rationalizations can cause havoc in a large number of ways, from changing future actions to ignoring actual data that we do not want to see.
Even worse is the fact that people are hard-wired to group the data points after the fact to suit their story, which is known as the Texas Sharpshooter Fallacy. Not only do people create these false stories, but they do it by grouping data together after the fact to meet their expectations. We are now shaping our view and the data, leaving no part of reality left to actually use in the decision model.
One of Sherlock Holmes most famous quotes best explains this:
“It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.”
Now keep in mind, I am describing a known scientific psychological bias. I am not saying why people are wired that way, nor do I care.
What matters is the existence of the behavior itself. Just like a test, in the end it DOES NOT MATTER why something won, what matters is…
- can you act on the results?
- can you move on to the next test? and
- was the original set-up created in a way to maximize efficiency?
None of those are about story telling. They are about the mathematical realities of the act of optimization.
This is why I make a requirement for my testing programs to have no storytelling. Storytelling is just a brain trying to rationalize the facts to meet some preconceived notion. By forcing people to stop that act, we can have rational discussion about the actions that can be taken and efficiency.
‘Hypothesis’ is Just Another Name for Storytelling
Now keep in mind another fact: storytelling happens after a test, but it also happens before the test for many groups in the form of a “hypothesis” (note that I am using quotes because most people think in terms of hypothesis as what they learned in 6th grade science and not its actual scientific meaning, unless you are holding yourself accountable for amongst many other things disproving all alternative hypothesis).
People’s confusion of prediction and hypothesis leads them to assume that they have played out the hypothesis in a meaningful way, when in fact they have only looked at a localized prediction.
They say X is true, therefore Y will happen, but they fail to see how often and where X is true, not just held to that local outcome. To actually use a hypothesis and not just pretend to, the onus is on you to correctly prove the concept is correct ALL the time and is also the best explanation of the phenomenon observed, not just construct an action to help your prediction look good (which makes the 10-12% average success rate in industry even funnier).
The hard part of scientific method is design your experiment(s) to prove your hypothesis as well as disprove other alternative hypotheses. Even worse, they fail the basic tenets of experiment design by not evaluating alternative hypotheses, and by creating a bias in the initial experiment to suite their prediction. Even if they’re so adept at grouping data points that they subconsciously create a positive conclusion to their story, they fail to reach step B in a journey that should take them from A to Z.
Essentially we have created a negative feedback loop.
You have invested a story in the front, which is fed by your own ego, which is then built up by the experiment design and then the data afterwards is manipulated to continue the fraud.
You feel that you are had used data and experience to decide what and where to test. Even worse, it is done subconsciously and done in such a way that feels good. We have a practice which feels good, feels like it helps, but ultimately is left without any ability to add value and with numerous ways to consistently limit value.
How to Combat Storytelling
By creating these stories all you are doing is creating a bias in what you test and allowing yourself to fall for the mental models that you already have.
You are limiting the learning opportunity and most likely reducing efficiency in your program as you are caught up the mental model you created and not looking at all feasible alternatives.
Your belief that a red button will work better than a blue one has no bearing on whether the red does work better. Nor does it answer the question of what is the best color, which would require testing Red, Blue, Yellow, Black, Purple, Etc.
Discovery of what is the best option does not require a “hypothesis.” It requires discipline. Any storytelling, no matter where and when in the process, has no way of being answered by the data but adds the opportunity to reduce the value of your actions. Be it what you test now or in the future, or what you ignore based on your feelings on a subject, the only outcome is lost value over time.
Discipline in Place of Stories
The great thing is that you don’t need stories to be successful in optimization. Being able to act with discipline is actually the opposite of storytelling. There are many tactics to dealing with this, but I want to reiterate the key ones:
- Never Focus on Test Ideas (All ideas are Fungible)
- Focus on the beta of multiple options
- Attack an issue, don’t tell a story
- No stories at the start or at the beginning
- Hold yourself far more accountable for biases than you hold anyone else
When you drop the stories at the start, you are then only discussing what is possible to do.
When you focus on the beta of options, you are expanding the likelihood of scale and outcome. When you attack an issue, you are optimizing towards a single success metric and not just to see who was right.
When you stop the stories at the end, you are asking people to choose the best performing option and to look for descriptive patterns only.
When you focus on your own biases, and note that you can’t eliminate them, you create a system that is focused on efficiency and not self aggrandizing.
By having the discipline to only do these things, and only these things, you are allowing the data to take you down the natural course. The more you make excuses, or let discomfort and negative learned behavior dictate your course, the more you are focusing on yourself and not the problem.
If you were “right,” then you will arrive at that same point because the data will take you there. But if you are wrong then you will arrive at a much better performing place. Even better, you just might actually learn something new and unexpected instead of just creating a feedback loop to fulfill your own ego.
The Natural Pushback
Without fail when I implement this rule, or when I present it to an audience, I get responses along the refrain, “But understanding ‘why’ helps us not make the same mistake and/or helps us with future tests and future hypotheses.”
Nice in theory, but again, it is impossible for the data to validate or invalidate a theory, which means that the only thing that is going to happen is you create a narrative to make you feel like you know something.
You may actually be right, but the data, the test, and the actions you take have no bearing on that and in no way can validate you. Even worse, you stop looking for conflicting or additional information and keep moving down the predefined path, further grouping data as you want to keep pushing forward a story.
The instant that something is “obviously” wrong or that something is going to work “because…” is the moment that your own brain shuts down. It is the moment that our own good intentions change from doing the right thing and to doing what feels best.
Cognitive dissonance is a really powerful force, and yet we are incapable of knowing when it is impacting us. Comfort comes in the known, not the unknown. It comes from making sense of what we see or in feeling we know why people do whatever it is that they do. It can be scary and painful to really look out at the world and realize that we have very little power and even less real understanding of the actions of others or most importantly ourselves.
Conclusion
The power in stories and the power in “why” is that it helps us place ourselves in the narrative.
It helps us feel we have power and insight into the world. When we deal with the reality of our own lack of understanding it is often very humbling and always mentally painful thanks to cognitive dissonance.
We love to think of ourselves as the unsung hero in a world of Dunning-Kruger lead sheep, not realizing that our own wool is just as thick and just as blinding. We seek to shelter, we seek to weave that wool into a cover and blanket ourselves from the harsh realities of the business world.
The irony of course is that nothing should invigorate you more than the knowledge that there is still so much to discover.
Knowing so little and understanding how few hard and fast rules there are means that there is so much to discover and so much that is yet to figure out. Optimization gives us the opportunity to seek out more and more information and to prove what little understanding of the world we have wrong. It gives us the means and the tools to go so much farther than other disciplines because it allows us to get past ourselves.
Exploration is equally scary and exciting. It is hard, it is dangerous, and it is often lonely. It is always easier to stay close to comforting shores and to beware of those fables and tales of legendary adventurers. Of course, are those tails true? Do you discover anything but staying in close shores? How much of the world do we really see versus how much is left to discover. There is only one real way to figure that out, and it all starts why a simple decision:
Do you want results or do you want to stick to “why”?