What Multivariate Testing Solves That AB Testing Can’t

Table of contents

Multivariate testing isn’t better or worse than AB testing. They solve different problems.

You can run 90+ percent of the tests you will ever need on a website with AB testing. But there are a few key questions it cannot answer.

Can multivariate testing help you figure out the best next move? Here’s what you need to know.

What Is Multivariate Testing?

Multivariate testing (MVT) is a controlled experiment which allows people to examine the impact of changing two or more page elements on a website at the same time.

The core objective of multivariate testing is almost always to find a combination of page elements that has a higher conversion rate than the existing page. The conversion companies care about is usually sales, signups, or some other revenue-driving metric.

Using MVT, brands can answer questions like:

What is the highest-converting combination of elements?
Which elements work well together?
Which elements have the most/least influence on conversions?

The type of multivariate testing used on websites has its roots in experimental designs from behavioral sciences and industrial manufacturing. It’s helped manufacturers perfect intricate assembly lines, and allowed pharmaceutical companies to isolate drug interactions inside the human body.

MVT is perfect for websites, which have diverse and fluctuating traffic. There are so many variables that impact whether a user converts.

Multivariate testing can help you start investigating how different layouts perform and how different elements interact with one another. You’ll discover what really matters, and be able to predict the best performing possible combination of page elements.

How Multivariate Testing Works

MVT testing starts with a high-value, high-traffic page that you think can convert better than it is right now.

Once you and your team know which elements on this page you want to test, you start building out the different page experiences.

This is almost always done in a specialized testing platform, which makes it easy to create different variations of page elements, track conversions, and split traffic between the different page experiences you are testing.

In the simplest possible test, where you make a single change to 2 different elements, there will be combinations. For example:

Headline A, Image A (control)
Headline A, Image B
Headline B, Image A
Headline B, Image B

Each of these different combinations provides a slightly different user experience (UX).

Graphic to show how multivariate testing works with four different phone screens displayed with different views.

Typically, traffic is randomized and then divided evenly between all possible combinations of page elements. In the above example, each page experience would receive 25% of the traffic.

After the test is complete, you will be able to see the performance metrics of all the different page experiences, measured against the control.

MVT vs AB testing

AB testing is constrained to changing a single page element. This is a feature, not a bug.

Because you are only changing one element, you can get really great data on that specific element. While MVT tells you about the effect of elements in concert, AB testing isolates individual elements to see their direct effect.

AB testing is phenomenal for validating your creative and strategic assumptions about what users want on a webpage.

But AB testing can only test two different experiences at once.

Even if you create an AB test where you change a whole bunch of elements on one page, you are still only testing two different experiences.

In order to test multiple experiences with AB testing, you have to run a series of AB tests, proceeding sequentially through different combinations of page elements.

Companies use sequential AB testing, and it is valuable. It’s just slow, running one test after another. With MVT, you can run through many of those different combinations of elements at the same time.

That’s one of the big selling points. Teams can iterate really fast because they are assessing so many different combinations at once.

They also get to see how different persuasion techniques and typography elements interact, and what effect this has on conversions.

But as I said to start this post, AB testing and MVT testing are both useful tools with different jobs.

AB testing is less expensive, easier to set up, requires way less traffic, and in many cases, it provides more valuable business insights. Which elements matter at all? What messaging and images resonate with users?

Most businesses that use MVT have already AB tested heavily to figure out a design they want to further refine.

Why Does MVT Need So Much Traffic?

You may have heard this, and it’s true: multivariate testing requires a ton of traffic.

There are two contributing factors.

First, you have to get a lot of people through a test in order to know that the results aren’t random. Just think, you can’t be as confident about a test with 10 participants as you can about a test with 1,000 participants.

Second, you need to get enough traffic to every experience you want to test. In AB testing, you usually split traffic in half, 50% goes to the control 50% goes to the variation.

In the least complex possible MVT design, you are splitting traffic between four experiences, with each getting 25% of your website traffic.

The more elements you change, the higher the number of possible combinations, and the more your traffic is divided. For example:

3 headlines + 2 images = 6 total experiences
3 headlines + 3 images + 2 CTA styles = 18 total experiences
4 headlines + 3 images + 3 CTA styles = 36 total experiences

There are very few websites in the world that could supply enough traffic to 36 different experiences simultaneously.

Full-factorial vs fractional MVT

When you run all possible combinations of experiences, it is known as full-factorial MVT.

But you can also elect to run fractional MVT, which is where not all possible combinations are tested. Test designers can pick and choose a fraction of the total possible combinations to test.

There is an obvious risk here of not getting complete data, but it decreases the traffic requirements significantly.

Estimating MVT traffic requirements

Many websites don’t have enough traffic to split their traffic 50/50 for AB testing, let alone split it 4+ ways to run MVT.

For AB testing, the rule of thumb is 8,000 web page visitors per month, which should get 1,000 visitors through both experiences in a week. This is usually what is required to hit statistical significance within a reasonable timeframe under normal conditions.

For MVT, even the least complex design (4 experiences) requires 16,000 visitors per month to get 1,000 users through each experience.

If you start testing lots of page elements and including multiple variations, the required traffic quickly jumps into the 100k+ visitors per month.

Driving more traffic to your site is always a good idea, but it takes time. If you are interested in digging into individual page performance, but you are still a ways out from the traffic requirements of MVT or AB testing, consider using website heatmaps, click maps, and session recordings. These tools can provide critical insights without needing much traffic at all.

Common Page Elements To Test With MVT

Because MVT is expensive conduct, businesses only pursue MVT on web pages that drive significant revenue, and only spend time testing page elements that are likely to impact the conversion rate.

These are some of the typical page elements targeted for MVT testing:

Headlines
Featured images
Call-out boxes
Special offer language
Form field length
Form field labels
Social proof
Trust signals
Calls to action (CTAs)

There are other potential page elements to experiment with, but you want to be as sure as possible that you are only testing elements that will actually influence user behavior.

Brands will also use MVT to analyze the success of different layouts, often moving these target elements higher on the page to see which ones lead to improvements.

One easy way to find impactful page elements is to run a quick CRO audit on the page to determine if there are obvious places on the site where you are losing conversions. Focusing changes on elements that can address those issues is likely to have a meaningful effect.

If you are truly at the refining stage with MVT testing, you’ll have to get more creative. I would find inspiration from your competitors’ websites. What are they doing to convince buyers in your space?

You can also look at landing pages of brands who are buying sponsored ad space in Google search results. The companies spending lots of money on ads are testing their landing pages heavily. You may be able to snipe a few new tricks from them.

How To Run Multivariate Testing

Everyone I know uses some sort of testing platform to run MVT on websites. These tools enforce traffic randomization, automate the calculations, and provide an easy interface to create and analyze tests.

The testing tool you use will shape the specific workflow, but virtually every MVT method follows a similar process:

Select a page: It must have lots of traffic and (in most cases) you should be happy with the overall design as MVT works best for refining specific elements.
Select goal: This is how the performance of each experience is assessed. The goal must be a trackable metric (like clicks on a specific element) that’s tied to a meaningful business objective.
Create experiences: This involves selecting elements and creating variations in the MVT testing platform editor. Every variation results in more potential combinations of website elements that users could experience.
Estimate the existing performance: This is the baseline, normal rate of the page that you are testing. During the test you will use the actual performance of the existing page to measure, but this estimation is important for estimating an appropriate test duration.
Confirm statistical power and significance level: Most testing platforms default to 80% statistical power and 95% significance level. Only adjust these if you have a good reason.
Set improvement threshold: This is the change you are looking to detect. If your goal is clicks on a specific element, for example, do you want to see a 10% lift to the conversion rate? A 20% lift? Smaller improvements need a larger sample size to detect with confidence compared to large improvements.
Estimate test duration: This is automatically calculated based on your goals, improvement threshold, the total number of combinations, and the average number of monthly visitors that come to your site. The lower the traffic, the longer the test will have to run.
Launch the test: Let it run and don’t peek at the results. Interfering with the test jeopardizes the validity, which means you can’t be certain any results are repeatable.

The testing platform will alert you when enough users have experienced all of the combinations to have achieved statistical significance. This means that any effect you detected with the test is not due to random chance.

You can safely stop the test at this point. 100% of your website traffic will return to your existing URL.

Now let’s take a closer look at some of the key concepts in the MVT process.

Statistical significance in multivariate testing

There is a lot of randomness on a website. Users could be on their phones, at a desk, talking to their children, making dinner, or fighting with a lousy internet connection.

Based on circumstances entirely outside of your control, the same user could have a completely different experience on your site from one day to the next.

Both AB tests and multivariate tests must run for a long enough duration and test a large enough sample size of users to achieve statistical significance.

This is a key concept in hypothesis testing, and tells researchers that they have enough data to conclude that the outcome was not just due to random chance.

If you stop your test before it has reached statistical significance, you cannot be confident in the results.

Even if you saw an amazing conversion lift. Doesn’t matter. The change you observed might just be due to random chance.

An easy way to think about statistical significance is to imagine a survey that 10 people have responded to. Would you take those results as seriously as a survey that has 1,000 responses?

Of course not.

The larger sample size gives us more common sense confidence that the survey results reflect what we’d see again if we asked 1,000 more people.

So how long do you have to let the test run? How many users need to go through the test?

Calculating MVT sample size and duration

The testing platform will do all of this for you with a few basic inputs about the page you are testing. These are:

Average monthly traffic: The number of visitors who land on the page, typically
Existing conversion rate: the current baseline, normal average.
Improvement threshold: The minimum change in the conversion rate you want to detect.
Statistical power: Usually set to 80%, this is the likelihood your test will detect a real effect, if one exists.
Significance level: Usually set to 95%, this is how confident you want to be that your results aren’t due to chance.
Total number of experiences: The total combinations being tested, based on the number of variables and variations.

Once you plug all the information in, the platform will calculate how many weeks the test needs to run.

Ideally it will say something close to 2 weeks. That’s a good amount of time to run an MVT test. A week can work, but there can be a lot of randomness week to week that your test won’t be able to pick up on.

If the duration is calculated out at like 10-20+ weeks, consider reducing the number of variations and testing higher-impact elements only. This will split your traffic among fewer total experiences, allowing more users to get through the test quicker.

Analyzing MVT Test Results

The key results that for each combination are:

Conversion rate: Percentage of conversions to total traffic. Platforms may provide the rate as a fixed number or range.
Improvement: Percentage of change compared to the control. Positive percentage for improvement, negative percentage for decline.
Confidence level: Percentage expressing how likely the changes you made are responsible for the outcomes you see.

A clear winner would have a large improvement with a high confidence level.

For MVT on websites, most people are shooting for 95% confidence level, which basically means there is only a 5% chance that the outcome was random.

So if you saw an improvement of 20% with a 96% percent confidence level, that would be fantastic.

But it’s not always so cut and dry. Sometimes there is not a clear winner, much improvement, or a high enough confidence level about the result.

In these cases, you can still sometimes find wins with particular traffic segments. Most testing tools allow you to look at traffic from different times of day, countries, referrers, and so on.

Where there isn’t a general trend that stands out, sometimes you can find a win with a particular segment of your target audience.

Request a personalized
demo of Crazy Egg

What Multivariate Testing Solves That AB Testing Can’t