In website testing for CRO, most things are pretty definitive.
Did a customer visit convert into a sale or not? Was it A or B, yes or no, win or lose? The simplicity makes data both easy to collect and easy to analyze.
But are we missing out on something with this simplistic model and types of testing?
The ones and zeros that make up a binary digital signal combine in the countless billions to form images on our TVs, phones and computer screens.
In the same way, we can use data to make our website testing paint a more complete picture, sometimes with nothing more than this most basic information.
There are many different types of statistical data. Some are simple, and some are more complex.
What many people don’t realize is that the more basic types of data can actually “evolve” into the more complex ones.
Much like the evolution of technology and all of the possibilities this has engendered (good and bad), the evolution of data is a cautionary tale as well. As the data becomes more complex, the understanding and interpreting of it can be more complex as well.
“Experts often possess more data than judgment.”
– Colin Powell
1. Defining Data
Data is all about observation, and observation is as old as mankind itself. When primitive hunters first began collecting data, they probably formed patterns based on observations like these:
- Did you see a Mammoth today? “Yes.”
- What color was it? “Brown.”
- How many did you see? “One.”
- What time did you see it? “I left my iPhone in the cave, but it was around the time the sun went behind that big tree.”
Along with the first data came the first predictions based on that data. But when the sun went behind the big tree the next day, no Mammoth!
These cave-dwelling statisticians probably didn’t realize they were attempting to use data to predict an outcome. In fact, they were actually using several different types of data to draw this conclusion. They also had no idea their experiment was under-powered.
Like everything else in the world of statistics, there are no simple answers. But to understand the different types of data and how they inter-relate, it is probably best to start with a few basic definitions:
- Binary Data is the afore-mentioned Yes/No, Pass/Fail data. This most basic form of attribute data has only 2 possible values (i.e. Yes or No). This type of data is familiar to anyone who has performed A/B testing. You are reviewing your results in the most basic form; either a customer converted or they did not.
- Categorical Data is the next rung on the evolutionary ladder of data types. This is also a form of attribute data, but rather than just 2 possibilities, you now have multiple categories into which you can assign your observation.This could be something as simple as political party, with maybe 5 or 6 possible categories, or something like “state of residence” with 50 possible categories.
- Attribute Data, whether binary or categorical, is usually interpreted by converting it into percentages. For example, in an A/B test, 10% of Group A converted, while 12% of Group B converted. Categorical data gives you that next level of granularity, such as “25% of Republicans in Washington State converted with Option B.”
- Ordinal Data is a type of categorical data where numbers are assigned to each category that have some meaning or rank. Some examples would be letter grades A-F converted to values 4-1, or the number of stars assigned to an Uber driver. Ordinal data falls somewhere between attribute data and variable data.
- Variable Data assigns numbers to observations, which is why it is also known as numerical data or quantitative This is the most commonly used data type in Engineering and Science, where you are typically measuring a part, chemical reaction or some other observation that a very specific number can be assigned to.
- Discrete variable data is where the observations are always whole numbers, such as the number of customers purchasing or the number of individual pages within a website.
- Continuous variable data is numerical data that can be more specific, such as “the video downloaded in 9.18 seconds”. Depending on how you are measuring, your continuous variable data can become more and more precise, down to several decimal places.
Compared to the other data types, continuous variable data is the most specific, and also the most powerful for analyzing a given situation. Since it is so precise, you can sometimes use a smaller sample size to draw conclusions, depending on the test you are running.
Many of the data types we have defined were observed by the cavemen in our Mammoth example. Unfortunately, they didn’t know what to do with the data they had collected. They drew conclusions based on a very small sample size, which is why their prediction turned out to be incorrect.
In A/B testing, we are reviewing Attribute-Binary data when we observe whether a customer converted or not.
Once the test is complete, we can turn the overall results into percentages for A and B. These percentages are usually not whole numbers, rather something like 9.3% or 6.1%. So if we think about it, our results are actually in the form of Variable-Continuous data.
With that, the data evolution has begun.
2. A Day at the Races
I once spent several months house-sitting a multi-million dollar home for a family friend in the Bay Area. A few weeks into the assignment, I relayed a phone message from local horse racing guru, Sam Spear.
Why was Sam calling? As it turns out, the family friend was also a PH.D. Economist and Statistician who had once developed a predictive algorithm with his partner to accurately handicap horse racing results.
Algorithms put the computing horsepower behind complicated calculations used in many mathematical and programming applications.
Using predictive algorithms that model the outcome of a situation based on a number of key inputs has become the “Holy Grail” for statisticians. Predicting future behavior of financial markets, driving behavior of insurance customers, or the success of an advertising campaign are just a few of the applications.
The horse racing algorithm used both traditional and non-traditional factors to predict winners. Traditional factors included things like past performance of the horse and jockey, breeding factors and age of the horse.
Non-traditional included things like diet, travel distance to the race track and number of races competed in over a given time frame.
The predictive algorithm is more or less a “soup” with the statistician determining the recipe. This means not only figuring out which factors do or do not influence the outcome, but also accurately weighing these factors so that winners can be predicted.
The ingredients for this soup can include nearly every type of data we have discussed.
Like the non-traditional data used to predict horse race winners, there is more data than first meets the eye available for website CRO. Could this same type of computing power and statistical know-how be used to predict the most successful design for your website?
Definitely. The advent of website analytics has given us the power to measure, track, analyze and optimize virtually every conceivable type of data associated with customer behavior and website performance.
Let’s take a look at one of the many statistical tools we can use to make more sense of the data available to us, and turn our bland broth of binary data into the tastiest soup in town.
3. Multivariate Testing
When we use A/B testing, we are taking an attribute-binary input (select option A or B) and converting the output into meaningful percentages.
What we typically use to decide whether the difference in output is meaningful or not is significance. In the difference between A and B is significant, then you might want to consider going with option B.
Multivariate testing, also known in other industries as Design of Experiments or DOE, lets us take another step forward in data analysis.
This type of testing is done by creating a matrix to test multiple factors at once, then analyzing which factors are significant by themselves (or not), which combinations of factors are significant, and which combination of factors is optimum.
One nice thing about multivariate testing is that you are able to use pretty much any type of data as an input factor. In the hypothetical example below, we are testing 4 input factors, with each one using a different type of data.
Even though you can test different data types at once, all factors need to be tested at the same number of levels when you run the experiment.
If you are using variable-continuous data as an input, such as our “minimum purchase” in the example, you typically pick values to test that are either at the two extremes of what you might implement, or close to the extremes. Your software will tell you if the best option is one of the extremes you picked, or somewhere in between them.
If your input data is categorical, like the “font color”, the number of colors you can test at once is equal to the number of levels you defined. So if you were running a 4 level experiment, you could have tested every color at once, along with more (4) minimum purchase values instead of 2.
To create the test matrix, you needed to use 24 = 16 combinations, for 2 levels with 4 factors.
What the matrix doesn’t show is the number of repetitions for each run needed to estimate your output percentage.
Keep in mind when using multivariate testing, even for just 2 or 3 input factors, that the sample size needed to accurately predict your conversion rate significance will need to be the same for each combination.
In other words, if you were required to run a sample of 10,000 customers for each version in a traditional A/B test, you would be required to have a sample size of 10,000 x 16 = 160,000 customers for the multivariate test.
Obviously this multiplication factor means you need to have more traffic available to run these more complicated tests accurately.
Another great feature of multivariate testing is that it can be used either prospectively or retrospectively.
Prospectively would be the traditional way of setting up your matrix, running your experiment, then analyzing the results. Retrospectively would mean looking at data you had already collected over time, plugging it into a multivariate test and looking at what your data is telling you about the past performance.
Using retrospective data is also a good way to get around the sample size dilemma we discussed.
When your test is complete, your software will form a predictive equation, made up of all your input factors, which can point you toward the most favorable output.
This equation is the first recipe for your “soup” that tells you what combination is working best.
With this information in hand, you can continue optimizing, testing new factors, and continually monitoring and improving your recipe as your customers’ taste evolves and expands.
Multivariate testing is just one of many statistical tools available that can be applied to CRO. Understanding the type of data you are collecting will help you to learn what other tools could be in your arsenal.
When we began elementary school, grades were pretty simple.
You either got a smiley face or a frowny face on your report card, along with a sentence or two explaining what a nice (or naughty) child you were.
Somewhere around the 3rd or 4th grade, things became a bit more complex. Now we had letter grades, which converted to numbers, which in turn became a grade point average. Soon after, we began hearing about how important this number was and all the doors it would open (or close).
Somehow I found the smiley face model more satisfying; either I succeeded or I didn’t – no need to put a number value on it.
Life, like statistics, has a way of becoming more complicated over time. Since the field of statistics is all about data and how to analyze it, it was inevitable that specific tools would be developed over the years for different types of data.
Luckily, there are also more flexible tools, like multivariate testing, that can take many different types of data and use them collectively to help you optimize your conversion rate.
A/B testing is a powerful and useful tool, but it is just the beginning. You need to test different types of testing methodologies.
To begin your evolution from binary to categorical, attribute to variable, and ultimately discrete to continuous data analysis, no PH.D. is required.
In fact, it’s so easy a caveman could do it.