The concept of perfection is an interesting one.
Like infinity, it represents something that can never truly exist, yet most of us talk about it as if it does. How often have you heard a friend or coworker say they don’t want to put their name on anything that is not “perfect”?
40 years ago, Nadia Comaneci became the first gymnast to score a perfect “10” in an Olympic gymnastics event, breaking the paradigm that perfection was unattainable.
Is it possible to create perfection in our website testing?
For the athlete, achieving perfection is all about maximizing the output of their body and mind, while achieving more than anyone ever has before. These same principles can be applied to something as simple as website A/B testing. By carefully designing and running a test, then building upon what you have learned to improve your CRO, you can continue to raise your personal bar of success.
In other words, perfection is really a process.
The Process of Perfection
Look at your website testing as a continuous process, not an end in and of itself, and the idea of achieving perfection becomes a little bit less daunting.
“Perfection is attained by slow degrees; it requires the hand of time” – Voltaire
Let’s take a look at some of the fundamental elements of the A/B testing process that are often overlooked. Think of them as some of the building blocks of perfection.
1. The Foundation
Michael Phelps didn’t just set down the potato chips and roll off of his couch to win 18 gold medals. His training included around 50 miles of swimming per week, along with a strict diet, strength and flexibility training.
While intense and specific training provides a foundation for an athlete’s success, there are a couple of key elements that lay the groundwork for a successful A/B test.
Deciding What to Test
Before you spend any time and effort performing an A/B test, it is important to choose the right element of your website to test, and to create a meaningful new variation with which to compare it.
Take a close look at any analytics data you may have available, and carefully study the trends. Heatmaps are an incredible tool for understanding where your customers’ eyes and mouse have been traveling.
Once you decide where to best place your testing efforts, you quickly create a variation “B” to test, so that you can verify that was the area needing improvement, right?
Not so fast.
To be worth testing, your variation “B” should be designed with a clear idea in mind of why variation “A” was under-performing. The same data that helped you pinpoint the issue might help you to design the improvement. For example, if customers’ eyes were consistently drawn to a company logo graphic that had no real conversion utility, try adding that logo to your new order button.
Once you have carefully chosen what to test, you need to be able to articulate why you are testing it, and how much improvement you expect to see. This will form the basis of your statistical hypothesis.
Creating a Great Hypothesis
Let’s take a minute to refresh on hypothesis testing, since this can sometimes be a confusing concept.
Before you can design a test, you need to have a hypothesis, or theory. In basic terms, your null hypothesis will always be the theory that nothing changed, and your alternate, or “research” hypothesis will be that whatever you are trying to prove is true. For the test to be successful, your goal is to disprove the null hypothesis, which is the opposite of the alternate “research” hypothesis.
On a fundamental level, your null hypothesis is that A and B are equal, and your alternate hypothesis is that B is greater (better) than A.
To maximize the value of your test results, try thinking of your hypothesis as more than just the theory that one variation is better than the other. Decide what your threshold value for success is, and think about the underlying causes that would make option B outperform option A.
The table refers back to our hypothetical observation that elements including the company logo were frequently clicked on. You can use the results of the test to understand the value of the logo, which may be useful information for future website design and testing as well.
Rather than creating a random option B to find out if your original order button was a dud, you can turn the hypothesis test into a multi-faceted learning opportunity.
2. Minimize Uncertainty
No matter how much time and effort goes into planning a test, your results will be meaningless if there is too much uncertainty allowed into your test plan.
One thing that is sometimes overlooked is the Margin of Error. It’s amazing to me how often this fundamentally important concept is buried in the (very) fine print.
This recent poll shows one candidate 3% ahead of the other. If you read the fine print below the numbers, the sampling error is +/- 3.5%
What does that mean? Based on this margin of error, Clinton actually could be as high as 48.5% and Trump as low as 44.5%, essentially flip-flopping the standings shown. Margin of Error is closely related to confidence interval. If your margin of error is +/- 3.5%, that means the confidence interval for Trump, in this example, is 48% +/-3.5%. This is the range of probable values.
If you are trying to detect a small improvement in your A/B test, always take a close look at margin of error. If the confidence intervals for A and B overlap, this typically indicates too much uncertainty in the results.
Increasing the sample size will help to lower the margin of error, but only up to a certain point. Since the relationship between sample size and margin of error is not linear, you will begin to see “diminishing returns” as your sample size increases.
What is sometimes found in the even finer print, or perhaps no print at all, is the confidence level. If this were added to the presidential poll, it would probably read “sampling error +/- 3.5%, at a 95% confidence level.” That means you are 95% sure that the “real” poll number for Clinton, for example, is between 41.5% and 48.5%.
The reason confidence level is often left out, in my opinion, is that 95% confidence is almost implied, since this default value is used so often. Keep in mind that this is not etched in stone. If you really want to minimize uncertainty, a confidence level of 99% is another commonly used value. Using 99% confidence drops the odds of being incorrect from 1 in 20 to 1 in 100. I like those odds.
As I will discuss later, this uncertainty becomes all the more important as you begin to build upon past results through subsequent testing.
3. Avoid Common Mistakes
If you have carefully designed your experiment, minimized sources of uncertainty and completed your test with adequate sample sizes for both A and B, you are well on your way to completing the “perfect” test. But like the marathon runner who stumbles and falls in the last 100 yards, there are still many ways to squander your hard earned gains.
When your test is complete, be careful to avoid some of these common analysis mistakes that can lead you astray.
Calling the Test Early
This is a common error, since it is often human nature, and the nature of business, to want things done quickly. Even if you have reached your target sample size for both the A and B legs of your test, and have gotten a conclusive answer with regard to significance (or lack of same), you still need to continue running the test for a full business cycle.
Going back to the “Foundation” planning phase, this full business cycle should be based on patterns you have observed in user behavior, and how often they tend to repeat. The worst that can happen by extending the test duration is that you will end up with an even higher sample size, thus further minimizing uncertainty.
The same applies to the opposite scenario: If you have completed a full business cycle but have not fulfilled your sample size requirements, you need to continue on. Even if the results are already showing a clear signal one way or the other.
Since your sample can never give you the complete picture, it’s extremely important that your sample looks as much like the whole as possible. That’s why an adequate sample and a complete business cycle are so important.
Extrapolating Beyond the Scope
Once you have completed a test and perhaps seen some encouraging results, resist the temptation to extrapolate beyond the bounds of your test. For example, if adding the logo to your order button was successful, you would be extrapolating if you decided that the same would work for a different website or logo, without testing it. This is what statisticians refer to as “getting outside of the box”.
On the other hand, when you interpolate, you are staying inside the box. To interpolate means to draw inferences within the bounds of what you have tested. If you had order buttons on multiple pages of the same website and added the same successful logo to all of them, this would be considered interpolating.
Think about the relationship between x and y. If you have tested the effect of x on y at the corners of the box, you can safely predict within the box, but not outside of it.
4. The Feedback Loop
Perhaps the true value of perfection stems from the fact that it is unattainable – the carrot on a stick that keeps all of us rabbits running around the track. Although we can never get to the carrot, we can continually learn and improve with each lap.
This continuous improvement process is what keeps testing and experimenting interesting. There is always a way to make things better – the bar can always be raised higher.
For A/B testing or other types of website testing, there are five basic steps for putting this into practice:
- Conduct an A/B test
- If successful, implement changes based on your test results, if not successful, go back to #1
- Observe the changes in practice and identify areas where your test results did not match expectations
- Plan adjustments based on these observations and discrepancies
- Go to #1; Run the next test based on steps 3-4
This continuous loop can apply to testing the same website element multiple times, or testing multiple elements sequentially. The key is to build on past results and learn from your failures, as well as successes.
This is where the optimized hypothesis, minimized uncertainty, and lack of analysis errors really come into play. If you optimize each test run, then less iterations of the loop will be required to approach perfection. Contrarily, any errors or mistaken conclusions you make will be magnified when you use them as the basis for the next test.
Can you create perfection in your website testing? Not according to the textbook definition of the word. But there are many ways to make your testing as useful, efficient and “perfect” as possible.
Website A/B testing is both a continuous loop, and an unbroken chain of events. Since a chain is only as strong as the weakest link, optimizing your planning, minimizing uncertainty, and avoiding errors all play important roles. Falling short in any one of these areas will not only impact your current test, but subsequent tests that build on the previous results as well.
Scoring a perfect “10” may not happen overnight, but following the process of perfection will eventually lead you closer and closer to this elusive goal.