Introduction
Multivariate (A/B/n) testing is a crucial part of the Fredhopper Insights module. Users can run tests to investigate using multiple Fredhopper configurations on their website, to determine which variant has the best conversion rate. However, looking at the conversion rates alone is not enough to determine whether the variant really is the 'best' one. In order to choose a winner, it's necessary to also calculate whether the improvement is significant to warrant a change. This is done through the Statistical Significance service, which is an extension on the A/B testing functionality.
What is Confidence / Significance?
Confidence Level: The percentage of time that a statistical result would be correct based on many random samples
Confidence is often associated with how sure of a result you can be. In statistical terms, it's the expected percentage that you expect your findings to be true. e.g. 95% confidence that variant A has higher conversion than variant B means that 95% of the time this should be true based on current information.
n.b. 50% confidence describes a situation where variants A and B are equally likely to be the winning variant (assuming they cannot be equal)
In short, after the tests have completed, the service will pick a winning variant (if there is one) and calculate the confidence for it. This will be displayed on the A/B test. When the tests are run, one variant is considered the control variant, and all other variants are measured against the control.
Types of A/B tests
Depending on the number of variants there are a few possible outcomes:
- Basic A/B test - a control variant, Case A, versus a single alternative variant, Case B. In this case there are three possible outcomes:
- The alternative variant, Case B, is the winner. In this situation, Case B has the better conversion rate, and the difference in the conversion rate passes the minimum level of confidence to label it as the winner.
Variant Visitors Number of Conversions Conversion Rate Test Result A 1000 90 9% Variant B converted 34% better than Variant A. This test is statistically significant with a confidence level of 99% B 1000 120 12% - The control variant, Case A, is the winner. In this case, Case A(the control), has the better conversion rate, and the difference in the conversion rate passes the minimum level of confidence to label it as the winner.
Variant Visitors Number of Conversions Conversion Rate Test Result A 1000 120 12% Variant A converted 34% better than Variant B. This test is statistically significant with a confidence level of 99% B 1000 90 9% - There is no winning variant. In this case, regardless of which variant has the better conversion rate, the difference in the conversion rate doesn't pass the minimum level of confidence to label it as a winner.
Variant Visitors Number of Conversions Conversion Rate Test Result A 1000 90 9% Variant A converted 0% better than Variant B. This test is not statistically significant with a confidence level of only 50% B 1000 90 9%
- The alternative variant, Case B, is the winner. In this situation, Case B has the better conversion rate, and the difference in the conversion rate passes the minimum level of confidence to label it as the winner.
- A/B/N test - a control variant, Case A, versus two or more alternative variants, Case B, Case C, ... , Case N. In this case there are three possible outcomes:
- An alternative variant, Case C, is the winner. In this case, at least one of the alternative variants, for example Case C, has a better conversion rate than Case A(the control), and the difference in the conversion rate has the minimum level of confidence to label it as a winner. This does not imply that the difference between the Case C and the other alternative variants is statistically significant, but the picked variant has the best improvement compared to Case A. As you can clearly see in the below image, both Case B and Case C have roughly a 12% conversion rate, where Case C has a slightly higher rate, but Case C is still selected as a winner with high statistical significance since the comparison is with the control i.e. Case A, and not with each other. In order to decide whether this variant is absolutely better than the rest of the variants, the test needs to be run again, and the variant in question needs to be set as control.
Variant Visitors Number of Conversions Conversion Rate Test Result A 1000 90 9% Variant C converted 34% better than Variant A. This test is statistically significant with a confidence level of 99% B 1000 120 12% C 1000 120 12% - The control variant, Case A, is the winner. In this case, the control variant, Case A, has the best conversion rate out of all the variants, and more importantly, the difference in the conversion rate between Case A and second best variant passes the minimum level of confidence to label it as a winner.
- There is no winning variant. In this case, regardless of which variant has the best conversion rate, the difference in the conversion rate of the control, Case A, and the best alternative variant doesn't pass the minimum level of confidence to label any variant as a winner.
- An alternative variant, Case C, is the winner. In this case, at least one of the alternative variants, for example Case C, has a better conversion rate than Case A(the control), and the difference in the conversion rate has the minimum level of confidence to label it as a winner. This does not imply that the difference between the Case C and the other alternative variants is statistically significant, but the picked variant has the best improvement compared to Case A. As you can clearly see in the below image, both Case B and Case C have roughly a 12% conversion rate, where Case C has a slightly higher rate, but Case C is still selected as a winner with high statistical significance since the comparison is with the control i.e. Case A, and not with each other. In order to decide whether this variant is absolutely better than the rest of the variants, the test needs to be run again, and the variant in question needs to be set as control.
Confidence level and what it means:
When calculating the confidence level, you would get a number between 50 and 100 percent. By definition, if variant A is better than variant B by 10%, with a confidence level of 95%, this implies that in 95 out of 100 random samples of data, variant A will be better than Variant B by 10%.
Whether the confidence level is statistically significant to warrant adopting the proposed change is left at the discretion of the user. However, basic statistical guidelines suggest that in order for something to truly be considered statistically significant, confidence level needs to be higher than 95%, or if you're being more liberal at least higher than 90%.
To help the user make a more informed decision, the service not just offers the percentage but also shows:
- Pending: If confidence level is below 90%
- Confidence level is low, it is not recommended to pick the winning variant based on the test.
- Confident: If confidence level is 90-95%
- Confidence level is moderately high, you can pick the winning variant based on the test, but it may not be the best choice.
- Significant: If confidence level is above 95%
- Confidence level is high, it's recommended that you pick the winning variant based on the test.
Statistical formula used
The calculation of the confidence level of the A/B test is done by using a Z test of proportions. The library used is jStat - a popular javascript statistical library. The test essentially tests, whether their statistical significance in the difference in the observed conversion proportions(rate) of the variants, based on the proportions themselves and the total number of entries.
Comments
0 comments
Please sign in to leave a comment.