If an experiment or study is designed to determine which factors might influence other factors of interest, you are testing the correlation between these factors. For example, you may have noticed that men prefer diet cola and women prefer mineral water. Proving this type of correlation allows you to establish a predictive relationship for future behavior.
The concept of correlation was first attributed to Sir Charles Galton, a cousin of Charles Darwin, in the 1880’s. Galton spent many years studying patterns related to physical traits in humans and what behaviors could be predicted by them. By later developing the concept of regression, he was able to apply a statistical scale to his observations that could prove or disprove relationships between them.
Correlation is important to consider in website testing for two primary reasons:
- When you run an A/B test, you want to know if the variable being changed is actually influencing conversions. If you detect a strong correlation between either of versions A or B and higher conversions, you may want to pursue that option.
- Valid correlation studies require isolating the variable of interest from all other lurking variables to ensure your results are not tainted. This is why both correlation and causation are important.
Two variables might be correlated without having any type of cause-and-effect relationship.
Causation means that one factor actually caused the other to happen, i.e. changing the order button on your website from green to red caused conversions to improve. However, if the new order button was tested in a different month, or on different users (desktop rather than mobile for example) then you may have detected correlation without the all-important causation.
Correlation studies can be used to determine quantitative (numerical) correlations, such as correlating height to weight. Website testing, such as A/B testing, is typically looking at categorical correlation, which means a large enough difference between the results seen in one category vs. the other is considered significant.
In a variable correlation, the strength of the relationship between the two factors can be defined by a correlation coefficient. A coefficient value close to 1.0 means the two factors have a strong relationship, and plotting one against the other results in a nearly straight line. You can then use what you observed about the two factors to create an equation, which can predict the expected value of one factor given the actual value of the other.
Variable and categorical correlation studies require one variable to be defined as the response variable, and one to be the predictor variable. In our variable example, height would be considered the predictor variable since it might be used to approximate weight. In our A/B test (categorical) example, the website version would be the predictor, and the number of conversions would be the response. If there is a strong correlation, knowing the predictor value might tell you something about the response, but not necessarily vice versa.