- We’ve heard that A/B tests are awesome and all, but should we use them for everything?
- And if not, then when should we run an A/B test?
- We should treat A/B testing just like any other decision making tool – it has costs and benefits and we should use A/B testing when the benefits outweigh the costs
- Like all other tools it might yield incorrect results (false positives or false negatives) and it carries costs (implementation complexity and time, waiting for results).
- When deciding whether or not to run an A/B test we should consider the tradeoffs versus alternative weighs to make a decision.
Decision making tools
Here’s a quick summary of a handful of decision making tools and when to use each. At TaskRabbit, we generally like to use gut calls, qualitative research, and A/B tests to make most business and product decisions.
Couple of observations:
- As a startup, we’re OK with trading off decision-making confidence for speed (i.e., if an A/B test is looking particularly hairy to set up and run, then we’re perfectly fine with making the call based on gut or small-sample qualitative research).
- The folks who work at TaskRabbit generally start to develop pretty good intuition about where problems/opportunities are in our space because we use our own service a lot. So, if making a gut call can instantaneously yield a ~80% confidence decision, then running an A/B test for a week to get an incremental 10% more confidence is a pretty high price to pay.
* = our preferred decision-making methods
<table width=”100%” border=1 valign=”top”>
- Instaneous decisions
- Fast decisions
- You could be the next Steve Jobs if your gut calls are right 90% of the time
- Team buy-in
- Access and summarize an entire group's expertise
- Get to know users and build empathy with their concerns
- Observing users is a great way to discover and surface unknown problems/opportunities
- Very difficult to argue with video footage or direct quotes from real users
- Good way to gauge sentiment of a large sample of people
- Medium-speed decisions; it's fairly easy/fast to recruit and drive traffic to a survey
- Statistically significant results can be tuned to make them as confident as you want them to be (most teams use a 90%+ confidence interval)
- Collects very reliable data on real users interacting with a live product
- Very difficult to argue with statistically significant results
- 50/50 chance of correct decision
- Only handles binary-choices, not so good with multi-variate permutations
- Difficult to justify decisions to a skeptical audience
- Incorrect decisions made this way get attributed to you, personally
- Can turn into a slow, political decision making process
- Groupthink can de-rail rational decision making
- The research subjects you choose may not be representative of the entire userbase
- Recruiting research subjects and running tests takes time and budget
- What people say they would do in a situation often turns out to be different than what they'll actually do when faced with a real, working product or a real, live human being
- A skeptical audience will argue with your survey design
- Takes time, code, and sometimes marketing coordination to set up an A/B test
- Once a test is running, takes a while (a week or more) to reach a large enough sample size
- Can only run a limited number of A/B tests at one time
- It's a bit like panning for gold -- many/most tests will be inconclusive, i.e., will not yield statistically significant differences
Hope that helps! Please feel free to comment…