They say you can never have too much of a good thing. When it comes to time, money and friendship, I completely agree.
For many of us, website testing for CRO has become another “good thing” we have harnessed to put data to work for maximizing profitability. But could there be a time when we have tested too much?
Under certain conditions: Yes.
While A/B and other types of testing for CRO can utilize data in valuable and meaningful ways, we are ultimately responsible for the decisions we make about our testing strategy. Unfortunately, this can leave us open to pesky human emotions like impatience, sentiment and perfectionism.
When we reach important decisions points in our testing strategy, emotions and intuition can play an important role, but there is still plenty of room for analysis and logic.
Some of the “signs” I will describe may sound familiar, but the interpretation might help you to apply some sound statistical principles to your decision making.
The Value of Testing
In the 1940’s, multi-millionaire Howard Hughes spent several years modifying, testing and re-testing the giant H-4 Hercules transport plane, also known as the ‘Spruce Goose’. By the time Hughes completed the plane and his memorable 1-mile test flight, World War II was over, and the un-needed plane became a giant albatross around Hughes’ previously stellar reputation.
The fact that Hughes was believed to have had an undiagnosed obsessive-compulsive disorder is only somewhat relevant. The more important takeaway is that spending too much time and effort on testing can sometimes be counter-productive.
Thomas Edison is widely known as the inventor of the light bulb, among other modern wonders. In reality, the light bulb was actually invented over 70 years prior to Edison getting involved.
What Edison and his research organization did contribute was the painstaking testing of a myriad of materials to be used as the light bulb filament. Edison, the business person, knew that the light bulb would never catch on commercially unless it could be made to last longer.
“I have not failed. I’ve just found 10,000 ways that won’t work” -Thomas Edison
History is full of examples, both good and bad, where a testing strategy either launched a business empire, or led to a lost opportunity.
To help you make some history of your own, here are four signs to look for in your test results that could be telling you it’s time for a change.
1. The Vacuum
While Thomas Edison may have been compelled to test every element known to man until he found the best possible light bulb filament, you may not have the same luxury of time and money required to perform endless test iterations.
Hughes and Edison each spent years testing and perfecting their respective products. Why did Hughes fail where Edison succeeded?
The answer can be summed up by awareness of context.
While Edison was perfecting the scientific vacuum, or lack of air, within the light bulb, he knew that his testing was not performed within a business vacuum. In other words, he was aware of the progress of his competitors, financial impact of delayed time to market and other factors that weighed into his testing strategy, outside of the testing and analysis itself.
One potential drawback of A/B testing we need to remain aware of is the inherent lack of context.
Placing too much emphasis on specific testing results could mean keeping your eye off of the bigger picture. It is entirely possible for elements of your testing to show continuous improvement, while your overall sales or traffic continues to decrease.
Since an A/B test is only comparing one option to another, it inherently lacks external context. Option B may out-perform Option A, but does it out-perform the option A of six months ago? Is the improvement translating to the bottom line? Although it may sound obvious, keeping track of the big picture and the specific details you are testing simultaneously might be easier said than done.
The reasons behind this sort of contradiction might take some time and effort to decipher. One situation that could be producing mixed signals is a lack of power in your A/B testing (I’ll talk about this more shortly).
2. Diminishing Returns
This is the most obvious sign that a testing change is in order, but deserves a closer look anyway.
Let’s say you have already completed 2 or 3 successful A/B tests on the same website element, each time observing at least a 2% gain in conversions with option B, although this margin continues to shrink with each successive test. Your instincts may tell you to continue going, optimizing this feature again and again, until you are no longer seeing significant gains.
You may continue to see (statistically) significant improvements with additional test runs, even if your effect size (margin of improvement between A and B) continues to shrink.
The watch-out here is the correlation between effect size and statistical power. Basically, the power of a test is a measure of how sure you are that the change has created an improvement in conversions, or whatever you were seeking to optimize.
What many people don’t realize is that your results can be significant and under-powered at the same time.
You can set a pre-defined threshold for power before you run the test, which will drive your sample size, or review the resulting power of your test after it has concluded. Either way, the effect size will directly impact the resulting power of your test. The smaller the effect size is, the lower the power.
Anything less than a .80 value for power (on a scale of 0-1.0) is typically considered under-powered, but you may want to set an even higher bar for your own testing, given the importance of this decision.
Don’t be ruled by significance alone. Diminishing power with each successive A/B test is a good sign it’s time to find a new feature to optimize.
If you want to learn more about the concept of statistical power, How Alpha and Beta Spell Improved A/B Testing provides more depth and insight into this topic.
While diminishing returns on A/B test results could be telling you it’s time to revisit your test strategy, it could also be a sign that your testing has been confounded.
Confounding means including another factor in your test, intentionally or otherwise, that is not being controlled and can, therefore, impact your results.
Hypothetically, if you performed a study to find out who had a higher rate of heart disease, those who consumed large quantities of processed food, or those who did not, your results would be confounded if the group who avoided processed foods also exercised more, smoked less, or experienced less stress than the group who ate processed foods.
There are two possible avenues for confounding factors to creep into our website testing:
I. Internal Confounding
This is the easier to diagnose, since the noise is coming from within the website design itself. Diagnosing the source of the confounding should involve careful study of your analytics to see what is influencing your customer behavior. Some examples might be:
- Hyperlinks that unintentionally divert user attention
- Slow loading videos that increase bounce rate
- Seemingly benign changes to other website elements, such as font or copy changes, that have detrimentally affected your sales funnel
Once you have found the culprit, or have at least identified some compelling clues in that direction, you can either eliminate the confounding element, or change your testing strategy to include (test) it.
II. External Confounding
When the confounding is within the website, figuring out the root cause is a little bit easier. Unfortunately, confounding can also be created by outside factors. External confounding factors can be harder to diagnose, but can have an equally detrimental impact on our testing. Examples of external confounding factors include:
- Competing websites/businesses
- Shift in user population to mobile interfaces that behave differently for the element under test
- Seasonal changes (vacations, Holidays, weather)
Many other factors could be a potential source of external confounding. When attempting to identify them, it is important not to jump to conclusions.
A seasonal change, such as the start of summer vacation, might correlate with a shift in user behavior and test results, but this correlation is not necessarily conclusive. It is entirely possible that your closest competitor launched a new website on the same day summer vacation began. You may choose to poll users to gather more information about this change in behavior. Otherwise, you will need to observe and test the suspected confounding factor in both the ON (school out) and OFF (school in) conditions to know for sure.
Staying on the lookout for confounding, both internal and external, is essential to managing your test strategy, and knowing when to alter it. If the confounding is internal, the fix might be as simple as shifting the focus of A/B testing. If the confounding factor is external, more insight and analysis may be required to remain a step ahead of the competition.
4. The Maxima
Is there really such a thing as a “maxima”; a hypothetical mountain peak of CRO where no further upgrades, testing, tweaking or optimizing are necessary?
However, there are times when minor tweaks and improvements to your website design are no longer worth the time and effort required to test them. Instead, it may be time for a complete website redesign that will let you freshen up your presentation while incorporating the latest technology.
Knowing when this time has arrived is a very important and individual decision, with dozens of factors weighing into it. Despite the pressure and emotion involved, the projected gains to be achieved through further testing vs. the potential improvements through redesign should be carefully weighed out and analyzed.
This is a complex topic, but like any long-term observer of multiple E-Commerce websites, I have witnessed some mistakes and over-reactions from big brands that, based on the fate of these companies afterwards, apparently did not go un-noticed by other users either. These included:
- Too-frequent changes causing user confusion and loss of familiarity
- Updating to copy a competing website look and feel
- Not-ready-for-prime-time redesigns released complete with bugs
In hindsight, I often wonder if some of these “frequent changes” I observed were actually A/B testing run amok, with the over-testing actually driving away the very customers it was intended to attract. All of these errors in judgment could cause customers to move on, never to return. I liken it to a bad meal at your favorite restaurant – you’re only as good as the last customer’s experience.
With the speed of technological advancement today, a periodic wholesale change to your website design has become inevitable. When you decide to take the plunge, think of it as a giant A/B test with numerous other test opportunities embedded within it.
What do I mean by that?
If your previous design was version A, then the new design is now version B. Hopefully, you will see a marked and immediate improvement with version B. Beyond that, every element within your new design, no matter how similar to the original, will now be perceived differently by users, based on new paradigms for confounding and context, among other things. This is a golden opportunity to begin your testing anew, with miles of uncharted territory ahead of you.
When you reach a crossroads in website testing, make sure you take the time to carefully interpret what the data is telling you. What is perceived as an improvement could be a false hope taken out of context. An A/B test could easily be masked by a confounding variable, or a statistically significant result could be seriously underpowered.
Were Thomas Edison, Bill Gates and Steve Jobs wildly successful because of the superior level of technology their companies advanced, or rather their innate understanding of the changing consumer tastes? In my opinion, it was fairly equal measures of both.
In his college days, Gates was known among his close friends as both a brilliant mathematical mind and a master of the game of poker. Among games of chance, poker is unique for its equal dependence on mathematical and psychological factors contributing to success. While a computer can instantaneously predict the odds of success for any given hand, it cannot read a competitor’s facial expression or project a demeanor of its own that influences the moves of others.
Website testing, like poker, requires patience, analysis, and insight to maximize success. Since emotion can often over-rule logic, an emphasis on analysis and data can never be overstated.
*Featured Image Source