When Are You Ready For A/B Testing?

by Daniel Sims

Last updated on July 27th, 2017

So your website has been successfully launched, your traffic is increasing day over day, and you’re ready to A/B test some new content to improve your CRO…

…or are you?

The answer lies in that magical word that has produced the proverbial “deer in the headlights” reaction on the face of marketing professionals and engineers alike for decades: Statistics.

When one attempts to visualize the meaning of the word, two terrifying images often come to mind: The first one is data – lots and lots of data that requires endless sorting, organizing and analyzing.

The other image is a list of fun (albeit useless) facts presented in a graphical or table form, such as, “More than 50% of Americans fall asleep on their sides”.

percent-of-adults-who-sleep-on-backs

Statistics Defined

The true definition is actually neither of those things, and possibly both, depending on how you look at it.

statistics-definition

Simply put, a statistic is a piece of information, and, definitions aside, the power (or the “force” if you will) lies in how you use the information.

Understanding some statistical fundamentals is a great way to optimize and understand your A/B testing, decide how much traffic you will ideally need before you get started, and learn how to analyze your results for maximum impact.

Statistics can even be fun…really!

A Few Concepts

Trying to use statistical tools without understanding some of the underlying principles can be like using a calculator without knowing how to add and subtract by hand. Sadly, both scenarios are probably more and more common as technology continues to take over our lives.

Since I want you to avoid that pitfall, I will cover a few basic tools you will need to determine if your website is ready for A/B testing.

Data: Is it all created equally?

Basically, there are 2 main types of data we come across. Categorical data, like “heads or tails”, “men or women”, “yes or no” is also know in statistics as attribute data. The other type, numerical data, like “the man was 5’10” tall”, or “the house has 1580 square feet” is also known as variable data. In general, you can do more slicing and dicing with variable data, since there are an infinite number of values, but attribute data can be easier to analyze and understand, for the opposite reason.

Since A/B testing is more or less in the “yes or no” category, the good news for now is that we will mainly be concerned with attribute data.

Confidence: Be sure of yourself

Confidence is one of the easiest statistical concepts to grasp, since it essentially has the same meaning in everyday life as it has in statistics.

Step 1 in determining whether your traffic can support A/B testing should be to ask yourself, “How confident do I need to be?” In other words, how sure do I want to be that these results are accurate?

Confidence Level (CL)

Confidence level is simply answering that question in the form of a percentage (%). If your confidence level is 95%, that means you can be 95% sure of what your results are telling you. In my experience, 95% is the level often chosen for life and death decisions, like medical device and pharmaceutical testing, so you might be comfortable with a lower level for your A/B testing. As you will see, this decision has a direct impact on your sample size!

Confidence Interval (CI)

Confidence interval is a related concept, giving you the range that is 95% (or whatever your confidence level is) sure to contain the “real” value. For example, “I am 95% confident that 60-70% of customers preferred landing site A to landing site B” means your confidence interval is the range, 65% ± 5%.

Margin of Error (MOE)

So then, the Margin of Error, or MOE, is just the “± 5%” part of that Confidence Interval description.

Obviously, we want the margin of error to be as small as possible, but making this smaller also means making your sample size bigger. And does your site have enough traffic to support such a sample size?

margin-of-error-vs-sample-size

Sample Size

With the number of sample size calculators readily available on the web and elsewhere, I won’t include any formulas (getting back to my “deer in the headlights” thing). Just make sure you are using sample size calculators or tools intended for attribute data or proportions. Proportion is basically just another way of saying percentage (%).

The important takeaway is that higher confidence level and lower margin of error both require a larger sample size. Deciding what you can tolerate for those will drive everything else going forward.

So to figure out whether or not you have enough web traffic to begin some A/B testing, you will always need to have the following pieces of information available:

  • Your total web traffic population (number of distinct users in a given time frame)
  • Your desired confidence level
  • Your acceptable margin of error

Before we move on, another thing to keep in mind about sample size is that your sample and your population (in other words, all users) are two different but inter-related things. As your sample becomes a larger percentage of your overall population, the results become more reliable and less likely to be biased.

Bias and Noise

SPOLER ALERT: Un-necessary Star Wars analogy ahead.

So let’s look at it this way; if the statistical tools I’ve just described provide “The Force” in your decision to begin A/B testing, then Bias and Noise should be considered the “Dark Side”. What you thought was a valid sample size can quite easily be derailed by these twin forces of statistical evil.

Bias: 9 out of 10 dentists

When I think about bias, I can’t help but remember all those toothpaste and gum commercials I watched as a kid, telling us that “9 out of 10 dentists were recommending” something to their patients. I sometimes wondered if 9 out of those 10 just worked for the toothpaste company? If they did, that is bias in a nutshell. To avoid this, always make sure your A/B test sample includes as random a segment of your population as possible.

A classic book from the 1950’s called “How to Lie with Statistics” includes many entertaining examples of bias, and how statistics in general can be used to mislead and misinform the audience. Most still hold true almost 60 years later.

the-book

Get the book here!

Noise: Be Quiet!

Statistical noise is similar to bias, but it cannot be controlled by making your testing sample random. Think of noise as something unintended and unexpected, like a huge malware, space alien or killer bee attack happening in the middle of your A/B testing, changing the usual internet behavior of the test subjects.

The best way to avoid the adverse effects of noise is to spread your study out over the course of time (at least a week, depending on what you are testing) so than no event or conditions on a given day can overly influence your results.

Let’s Run Through An Example

Let’s say you started your E-commerce website, “deerintheheadlights.com”, around 6 months ago. You have noticed your traffic growing steadily, and you now have over 3000 visitors per week (3182 at last count). You also have a cool new idea for an updated landing page, but you don’t want to tamper with your 30% conversation rate!

You have decided that you want to attempt an A/B test to see if your new landing page idea positively influences your conversion rate. But do you have enough traffic?

First, decide on what your confidence level should be. Since our example calculator only includes 95% or 99%, let’s select 95% here.

sample-size-calculator

surveysystem.com/sscalc.htm

The next thing you need to select is a confidence interval (it’s actually the MOE value but they call it confidence interval in this particular calculator). If you want to be 95% confident your result is accurate within ±7%, you would type “7” under confidence interval.

For the population, just enter your traffic for a given time period. Make sure this time period is equal to the duration you plan to run the A/B test.

When you hit “Calculate”, the necessary sample size of 185 appears.

Keep in mind that once you calculate your sample size (n=185), you will need this many test subjects for both the control and test legs of your A/B test, so the total number of users required for the study would be 185 x 2 = 370. Therefore, you clearly have enough traffic to run this A/B test at the desired confidence level and interval.

Results: what to make of it all?

If you chose your sample size wisely, your results will paint a clear picture that you can confidently base decisions on.

Significance and hypothesis testing are statistical concepts that have to do with proving, and associating numbers with, the validity of your results. There’s a lot to these topics (I see that deer up ahead again) so I won’t say too much about that here.

Results Chart AB

To simplify, a personal rule of thumb I have seen to be true, is that if the graph of two distributions do not overlap at all, there is indeed a valid (not to be confused with significant) difference between the 2 test groups. The same holds true for either attribute or variable data.

In the example above, you can see that the graph of your control group conversion rate results, with the MOE’s included, do not overlap with your optimized landing page results, which is a good early indicator that you have found a meaningful improvement!

Conclusion

A/B testing can be an invaluable tool in today’s ultra-competitive CRO environment. By utilizing some basic statistical concepts, you can better understand your data and know when your website traffic has reached a level where A/B testing makes sense for you and your business.

I have found that almost everyone is eager to use statistics for one purpose or another, but we have become too quick to use web tools or software to do the work for us.

The concepts of data type, confidence, margin of error and sample size are some of the first things you need to understand before you evaluate your web traffic for A/B testing.

These and other basics can also be stepping stones to a greater understanding. Most of us (myself included) are accomplished “guessers” when it comes to making decisions in life and business. The more you learn about statistics, the easier it becomes to back up those equally important gut instincts with some real data analysis.

When you do that, statistics can truly be fun…really!

4 Comments

DON’T MISS OUT

Get updates on new articles, webinars and other opportunities:

Daniel Sims

Daniel Sims is a Certified Quality Engineer, part-time writer and Six Sigma practitioner who espouses the use of Engineering discipline and problem solving skills in all areas of business and life.

4 COMMENTS

Comment Policy

Please join the conversation! We like long and thoughtful communication.
Abrupt comments and gibberish will not be approved. Please, only use your real name, not your business name or keywords. We rarely allow links in your comment.
Finally, please use your favorite personal social media profile for the website field.

SPEAK YOUR MIND

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  1. carissa rose says:
    January 21, 2016 at 5:22 am

    I have been really interested in conversion optimization for some months now, and how data can be utilized for such. This is such a good article. I haven’t had the time to digest every part of it yet, but I’ve already added it to my to-read pile. Thanks heaps!

    • daniel sims says:
      January 26, 2016 at 12:31 am

      Thanks Carissa. I’m glad you enjoyed it and hopefully it will help your CRO as well!

  2. Amit Roy says:
    January 14, 2016 at 11:25 pm

    This is so useful. How do you determine the confidence interval? Everything seems to be so logical. I’m going to recommend it to my team. Thanks!

    • Daniel Sims says:
      January 15, 2016 at 7:44 pm

      Thanks for the feedback Amit. In this example, we chose an arbitrarily large confidence interval of 14% (i.e. +/-7%) and used that to calculate our sample size. This might apply when you were looking for or expecting a large shift. If you want to detect a small shift, you would need a confidence interval less than the difference you want to detect, in other words if you want to be able to detect a 2% difference reliably, you would calculate a sample size for a 2% (i.e. +/- 1%) C.I., which would be much larger. You can also use the calculator to plug in your available sample size, and calculate your C.I. from it.

Show Me My Heatmap

Click tracking, heat maps, and without a spreadsheet? Yes, please. is one solut...

Aimee Graeber

@AimeeGraeber

What makes people leave your website?