What is it?
In A/B testing we use statistical significance to determine if the results we have are due to a real performance difference or a result of chance. We display this as a percentage of the confidence we have in the results. The higher the percentage, the more confident we are that the selected winner case of the A/B test is meaningful. A test should have at least 90% confidence before we can draw conclusions about the results of the A/B test.
How to use it?
In Insights A/B Testing we show you the confidence pecentage for your primary KPI, i.e. revenue by default. However, we also report on the following secondary KPIs:
-
View conversions
-
Basket Conversions
-
Purchase Conversions
If the A/B test has a winner for the primary KPI, an image of a trophy will appear next to the name of the specific test case.
How is it calculated?
The new statistical significance is performed by using the Kruskal-Wallis test. The test decides whether there are statistically significant differences between two or more test cases. The test cases would be the variants provided in the A/B test.
The tracking data collected is given to the Kruskal-Wallis test which then provides us with a number called a p-value. The lower the p-value, the lower the chance that there is no difference between the given groups. In our statistical significance model we consider anything above 0.10 to not be significant enough to draw any conclusions about the A/B test.
On the other hand, if the p-value is low enough, we choose the group that had the best performance (for example, on average the most views/purchase/add-to-baskets) and perform another tests (called post-hoc Dunn’s test) to determine that it was actually statistically significantly different from all the other groups. Again, we use p-value of 0.10 as a margin. If the Dunn’s test passes we deem this test case the winner and display it on the dashboard with the percentage of how confident the result is. Note that the confidence is the reverse of the p-value, so if the p-value was 0.07 then we are 0.93 or 93% confident that the results we have observed were statistically significant.
This seems different to how it was calculated before?
The results will differ quite a bit from what they were before. That is because we are using a more accurate and more suitable statistical model than the previous one.
Comments
0 comments
Please sign in to leave a comment.