How Alpha and Beta Spell Improved A/B Testing

Almost three thousand years ago, the ancient Greeks were already well on their way to developing such innovations as the catapult, indoor plumbing, and of course, the alphabet.

What ultimately became the modern letters “A” and “B” actually originated as Alpha (α) and Beta (β) in the Greek alphabet.

Today, Alpha and Beta have also become commonly used statistical terms. This is particularly true for Hypothesis testing, such as A/B testing for conversion rate optimization. So might there be a connection between A/B testing and these ancient Greek letter counterparts?

Not really.

However, I have found that the statistical concepts of Alpha and Beta, despite their importance in A/B testing and many other applications, are pretty commonly ignored, misunderstood, or both. This is definitely not ideal, since the values of Alpha and Beta selected for your A/B test do nothing less than determine the accuracy and reliability of your results!

You can find many useful A/B testing and sample size calculators on the web with an interface similar to these:

ab test calculator

drpete.co/split-test-calculator

a b significance test

getdatadriven.com/ab-significance-test

Some of these interfaces do not include Alpha or Beta in the user selections, or do include them, only with suggested or “default” values for Alpha and Beta already set. As we will see, these values play an important role in defining the test and making decisions about the results, so we should always know what values are being used.

Given the importance and power of Alpha and Beta in our A/B testing, it is definitely worth the time to understand them. Even the ancient Greek philosophers understood the power of knowledge.

“The only true wisdom is in knowing you know nothing.”

– Socrates

1. From Greek to Geek

Like many ancient ideas, Alpha and Beta found their origin in the most basic of human needs: Food and Shelter.

To make a (very) long story short, the ancient Greeks began to adapt the alphabet of the ancient Phoenicians somewhere around the year 800 B.C. The Phoenician alphabet had associated a noun with each letter in the alphabet. For example, the letter Alpha (α) symbolized the ox, and the letter Beta (β) symbolized the house. Loosely interpreted, the order of the letters in the alphabet was assigned based on their relative importance for survival.

greek alphabet

Image Source

Flash forward to the year 2016, and the symbols Alpha and Beta now have more than 30 definitions in the field of mathematics alone. We have all heard of alpha particles, beta testing and other similar terminology.

Luckily, the definitions of Alpha and Beta in the world of statistics are a bit more straightforward and consistent, since this is where our interest should be.

2. What is Alpha?

Just like the Alpha dog leading the pack across the frozen tundra, the concept of Alpha is a very important one in the world of statistics, since it ties into several other key concepts that influence A/B testing and set the stage up-front for the accuracy of your results later on.

In other words, if you understand Alpha, you can more easily understand related terms like:

p-value
Significance
Confidence Level

Since Alpha is mathematically related to these other three terms, it can unfortunately be confused with them as well. That means all four terms are sometimes mistakenly used interchangeably, making things more difficult than they need to be.

Perhaps the easiest way to remedy this confusion is a side-by-side comparison of how these various concepts relate to one another.

p-value significant confidence level definitions

“Significance” might be the most difficult to understand of all of these concepts, since the word itself can sometimes make it misleading. If you want to learn more about this subject, The Heart of Significance takes a closer look at this very important concept.

In an A/B test, the Alpha value (α) you select when you set up your experiment is:

The probability you are willing to accept for incorrectly concluding that your improvement was successful, even though it was not.

For example, if you select an Alpha value of .05 (which is typical) you stand a 5% chance of thinking your website change improved conversions, when it really didn’t. Since it is a probability, it can range from 0-1, but obviously a lower threshold value makes for a stronger test result.

Is 5% a probability you can live with? Before you choose a value for Alpha, think about how certain you really want (or need) to be.

Being “wrong about it” is also known as a Type 1 error. One easy way to remember this is that Alpha (first letter) is related to Type 1 (first number) errors.

Used correctly, Alpha can be the sturdy and dependable ox that leads you to a successful A/B test.

3. What is Beta?

When I hear the term “Beta” the first thing that comes to mind is usually Beta Testing, as in software.

Why do they call it Beta Testing? Going back to the historical hierarchal sequencing of letters, an Alpha Test in the software industry is an initial test of new software done by the developers “in-house”, and the Beta Testing is done afterwards by rolling the software out to select outside users for a trial run.

In the world of statistics, Beta (β) often plays second fiddle to Alpha, since significance, confidence and p-value are what most people focus their attention on. In my opinion, setting the correct Beta value for your A/B test can be equally as important as setting the correct Alpha value.

So what is Beta? In many ways, Beta is just the corollary or opposite of Alpha, meaning:

The probability you are willing to accept for incorrectly concluding that your improvement was not successful, even though it was.

The wrong Beta value could potentially mean having a great enhancement idea that actually was working, but rejecting it based on your test results. Making this particular mistake is also known as a Type 2 error. Once again Beta (second letter) is related to Type 2 (second number) errors.

The relationship between Alpha and Confidence Level is the same as the relationship between Beta and Statistical Power, since:

Power = 1 -Beta

Even though the word “power “ can conjure mental images of Zeus throwing down lightning bolts from Mt. Olympus, all it really means in this sense is robustness against Type 2 errors. Power is directly related to sample size, so the best way to power up is just to increase your sampling.

If you want to make sure you don’t miss out on detecting a great enhancement due to statistical error, just set your initial Beta value low, which in turn makes the power of your test high.

I find it somewhat curious that most online sample size calculators default to a value of .20 for Beta. That means you still have a 20% chance of getting this wrong! If you want to be more certain than this, make sure you put some careful thought into selecting a Beta value, rather than just going with a default. Of course the trade-off will probably be a higher required sample size to gain this additional power, but like any great house, you need to start with a great foundation.

If the tool you are using doesn’t give you the opportunity to select a Beta value for your test, a .20 value might be pre-selected for you automatically by the software. As you gain more understanding about the importance of Beta and how it can impact your decisions, you will definitely want to know what value is built into your test software, and have the ability to control this variable yourself.

4. It’s All Greek to Me

Many years ago, I found myself sitting with a colleague, staring at an equation like this one in a text book, wondering what the symbols meant, or if this was even the right equation to use for our experiment.

n equation

“What does α (Alpha) mean? Is that just the standard deviation? Maybe β (Beta) is just 1- α ?”

Had I been working on an A/B test at the time, the presence of another Greek letter in the equation would have told me right away that I was on the wrong track: Sigma (σ)

In Statistics, Sigma stands for the Standard Deviation, which is a measure of how much your data deviates from the mean, or average value, and it is the square root of the Variance.

Standard deviation is one of the most useful and important indicators in statistical analysis. Unfortunately, it (typically) does not apply to world of website testing.

high low variance

Why?

In order to calculate a standard deviation, you need to be using what is called variable data. An example would be measuring the size of a car part in the factory. Once you had measured a large group of parts, you would know the average size, and the standard deviation from the average. This value would tell you if your process was in control, or if you were in danger of getting some rejects, based on the size and trend of the standard deviation.

When we look at the data generated from an A/B test, we are usually looking at percentages, i.e. percentage of users buying, clicking, signing up, etc. So there is no average, and no deviation from the average for each user, just a percentage. This is known as attribute data. Much like flipping a coin 100 times, what you are left with in the end is a number of flips, a number of heads, and a number of tails. No mean value or standard deviation are calculated.

However, what we do have plenty of in A/B testing is variation. This is sometimes confused with variance, but is not the same thing.

Things like differences in user demographics, types of devices and daily conversion rate patterns are all examples of variation observed in A/B testing. To go from variation to variance, you need another layer of data which assigns a number to anything that can change, then measures the difference between that number and what is observed over time. This might be used in more complex forms of testing, but typically not in a basic A/B test.

Conclusion

As the ancient Greeks, Egyptians and Phoenicians understood, a symbol can be more versatile, meaningful and unfortunately, confusing than any simple number or letter could ever be. The very presence of an ancient symbol implies importance, depth and historical significance.

Around the same time Alpha and Beta were becoming the first letters of the Greek alphabet, the first Olympic Games were held in Olympia, Greece in 776 B.C. Over the millennia since, these letters have come to represent multiple concepts and ideas, both mathematical and otherwise.

By the time the Games were reborn with a modern slant in 1896, the letters had already come to symbolize important statistical concepts that now have meaning and utility in the world of website testing.

Instead of being confused and intimidated when statistical terms such as Alpha (α), Beta (β) and Sigma (σ) appear, knowing what they really mean leads to a realization that they are not that complicated after all.

By simply knowing that Alpha quantifies a false belief in a test being successful, and Beta quantifies a false belief in lack of success, we can use these concepts to more successfully design and optimize our A/B testing.