6 Common Ways Your Split Test Results Could Be Off

by Dale Cudmore

Last updated on August 3rd, 2017

“That can’t be right.”

You ran a split test and picked a winner.

But fast-forward a month and your winner has turned into a loser.

What happened?

Despite being on the right track, there are some things, both in and out of your control, that could lead to declaring the wrong variation a winner.

But don’t worry, it’s nothing to be afraid of. Just understand these 6 common problems with split tests and you’ll be good to go.

1. Your Software is Off

split testing software

(Image source)

There are hundreds of split testing software options out there. Maybe you’ve even created your own custom software.

Either way, some of them are off — that’s a fact.

Most of them are reliable, but you should always confirm that your software is working as it should beforehand.

To do this, you need to run an A/A test.

The most basic type of split test is an A/B test, where you have two variations, A and B. With an A/A test, you simply create two versions of the same, original page, and split your traffic to both.

This might seem silly at first, but you’ll sometimes get significantly different results. In other words, one of your (identical) pages will win.

If this happens, it’s typically due to a problem with your software. Either it’s calculating an important variable incorrectly, like sample size or confidence, or it’s not recording conversion events as it should, like tracking button clicks.

This is the reason why Neil recommends to always run an A/A test before running your actual split tests.

2. You Didn’t Segment Your Traffic

Segmenting is the most important part of analyzing the results of a split test.

It refers to looking at the conversion rates of different sections of your traffic. For example:

  • returning visitors
  • new visitors
  • visitors from different countries
  • visitors from different traffic sources
  • visitors on different days
  • visitors on different devices
  • visitors with different browsers

segment by day

(Image source)

You need to identify the most likely segments that could affect the conversion rate of your traffic. Typically, starting with traffic source is  a good idea.

Peep Laja of ConversionXL recommends getting 250-400 conversions for each segment that you’re analyzing. But he has clarified that “magic numbers don’t exist.” You need to run a sample size calculator to know how many conversions are needed for each specific test.

3. You’ve Found the 1%

When you run a split test, you continue until you reach a minimum sample size and confidence.

Under good conditions, your confidence level will be 99%. If you don’t have the traffic, it might only be 95%, but hopefully not less.

For a 99% confidence test, you will conclude the wrong result 1% of the time — it’s unavoidable. For 95% confidence, one result for every 20 tests will be incorrect.

This is completely out of your control, and one day in the future you may discover that the “winning” variation is actually performing worse. The only way to check it would be to rerun the test from scratch.

4. You Didn’t Test a Full Business Cycle

business cycle

(Image source)

The reason you segment your traffic is that visitors in different situations behave differently.

But you can look at your visitors by day and decide that you had a significant amount of conversions to declare your test a winner. In some cases, this is not true.

Whenever you do a split test, it needs to be run for a full business cycle. For many businesses, running tests in intervals of weeks will be sufficient.

However, some businesses operate on other cycles, like monthly cycles. Buying will pick up or slow down in the first and last weeks of the month. While you might be tempted to just run tests during the middle weeks of the month, those visitors at the start and end are necessary to get a whole picture of the test effects.

Figure out how long your business cycle is, and then make sure you have enough conversions in each part of your cycle to be confident of the final split test result.

5. Your Special Day Wasn’t So Special

When you’re trying to run tests as quickly as possible, you don’t leave a lot of room for error.

A single abnormal day can affects your split test results enough to alter or change the final result. The abnormal day could be the cause of very high or very low conversions. In any case, you can’t assume that an abnormal day affects all variations equally.

So what causes abnormal days when it comes to conversions? It could be many things, but typically will be:

  • holidays
  • the start or end of a sale
  • a large mention in the press (positive or negative)
  • a niche-specific event

Take Christmas, for example. It affects a huge number of businesses. But the visitors who come to your site, not just during Christmas but during the entire month of December, will probably be different from those who visit during January.

It doesn’t mean that you can’t run split tests during these months, but you may need to repeat the tests later to confirm results or gain additional insights.

If it’s only a single day that affects your results, you can either remove that day from your results,or raise your test duration (always an okay option) to drown out the noise.

6. Your Test Won, But Only For Some

browser differences

(Image source)

Imagine this: Your original control option won the test by a small margin, let’s say 2%. Rightfully so, you stick with that and discard your variation.

But you didn’t dig deep enough.

Like I said earlier, it all comes back to segmenting at one point or another when analyzing results. You may look through all your major segments and think everything is fine, but miss something simple like browser or device issues.

Going back to our scenario, the variation actually performed 10% better on Chrome and Firefox, but performed 25% worse on Internet Explorer. Overall, if Internet Explorer makes up about a third of your traffic, this results in your variation performing 2% worse overall.

But clearly, it’s the better option. It’s likely you didn’t perform sufficient quality assurance checks up front. You should always see how your original and test pages look in multiple browsers (use BrowserShots), and check as many devices as you can (phones, tablets, laptops).

Say you do discover an anomaly like this. What do you do?

Since you suspect your variation should outperform the original, it’s worth taking the time to redesign your variation so it renders correctly, and then rerun your test. Obviously this takes a lot of time, which is why it’s best to double-check everything up front.

Conclusion

Most of the problems that arise from running split tests come from an over-reliance and trust on conversion tools.

Yes, they are powerful tools that can make your life a lot easier, but if you don’t understand how they work or what specific role they play in your testing, you’ll end up making one or more of the costly mistakes we’ve looked at in this article.

The biggest takeaway I hope you get from this post is that running a split test properly isn’t about just plugging in some ideas to a tool and analyzing the results later.

A significant amount of time should be spent verifying the software, the test setup, and identifying any potentially anomalies.

After all that, then you can start digging into the results and ensuring that you have a significant amount of conversions in each segment.

Can I ask you a personal question?: What software do you use to run your split tests, and what have you done to verify that you’re getting reliable, long-term results?

Read other Crazy Egg posts by Dale Cudmore

6 Comments

DON’T MISS OUT

Get updates on new articles, webinars and other opportunities:

Dale Cudmore

Dale Cudmore is a professional marketing writer. He focuses on actionable, exciting ideas and strategies that can be used to grow your business.

6 COMMENTS

Comment Policy

Please join the conversation! We like long and thoughtful communication.
Abrupt comments and gibberish will not be approved. Please, only use your real name, not your business name or keywords. We rarely allow links in your comment.
Finally, please use your favorite personal social media profile for the website field.

SPEAK YOUR MIND

Your email address will not be published.

  1. Sarkari Naukri says:
    November 19, 2015 at 5:01 am

    For sure I will Christine! Thanks for your nice Post

  2. August 10, 2015 at 12:08 pm

    Re: “For a 99% confidence test, you will conclude the wrong result 1% of the time — it’s unavoidable. For 95% confidence, one result for every 20 tests will be incorrect.”

    No. That’s not what it means. These numbers only relate to random variations. They don’t tell you anything about the likelihood that your results are caused by the factor that you think you’re testing. You could run a 99% confidence test and be wrong 100% of the time if there’s a problem with your experimental design.

    This issue means there’s very little advantage for most of us in testing to 99% significance, instead of 95% or even 90%, because the main cause of error is probably the design of the test. So do small quick tests and move on.

  3. neha says:
    July 29, 2015 at 4:49 am

    Thanks for Sharing Great Articles ..

  4. Alvin says:
    July 15, 2015 at 7:16 am

    Yes I agree with point 1. Sometimes we get so confused about everything else we’re testing, when the issue was with the software.

    • July 15, 2015 at 8:02 am

      Hey Alvin,

      I think it’s very common to get a little overwhelmed with everything and just trust that tools are working as they should. The problem is that they don’t always do that!

  5. July 15, 2015 at 6:06 am

    Thanks for your article. I find it very helpful especially since a lot of businesses are blindly using A/B testing.

Show Me My Heatmap

literally addicted to the stuff I'm seeing in @CrazyEgg. Such a fantastic piece of software! #Marketing #Content #Digital

Chris Vella-Bone

@chrisv2432

What makes people leave your website?