When I first started as a product manager, I didn’t know what an A/B test was. I read about it briefly and I thought “Okay, all you have to do is compare two versions of the same thing. No big deal”. It wasn’t until I actually implemented my first A/B test that I realized there were so many nuances I hadn’t even considered. As I’ve learnt and continue to learn over time, the concept of an A/B test is simple, if you know where and when to use it.
When you are immersed in a culture where you must test everything before coming to a conclusion, it becomes a habit to test even the most trivial things. If you don’t have a hypothesis to test or if you know for certain that the hypothesis is wrong—don’t A/B test it. This is something I’ve battled with constantly.
Whenever our CTRs (click through rates) are down, the first culprit pulled up is the user interface because this is the easiest, most visible thing to fix (ignoring other sources of the problem like page load times, seasonality, sources of traffic). So you jump in with your designer, do a revamp of the page and A/B test the old and new versions without an actual hypothesis. A wasted exercise.
The statistical significance of an A/B test is vital, and hence you can only do these exercises if you have a reasonable number of users. If you are using an A/B testing framework like Optimizely, the math is done for you. However, when we actually ran A/B tests on our own platform, knowing when to start (minimum number of users) and when to stop (how long to run the test) was much more complicated than we realized. The numbers at the end were shrouded in confusion and the result of the experiment was inconclusive. The lesson I learnt here was that unless you are confident about the statistical rigour of your model, rely on the experts in the industry.
There was an excellent post by Kissmetrics that I’m going to quote from here, which lays out the steps required to run an A/B test correctly.
- Decide the minimum improvement you care about. (Do you care if a variant results in an improvement of less than 10%?)
- Determine how many samples you need in order to know within a tolerable percentage of certainty that the variant is better than the original by at least the amount you decided in step 1.
- Start your test but DO NOT look at the results until you have the number of examples you determined you need in step 2.
- Set a certainty of improvement that you want to use to determine if the variant is better (usually 95%).
- After you have seen the observations decided in step 2, then put your results into a t-test (or other favorite significance test) and see if your confidence is greater than the threshold set in step 4.
- If the results of step 5 indicate that your variant is better, go with it. Otherwise, keep the original.
Image credit: https://vwo.com/ab-testing/