When A/B Testing Doesn't Work

In technical products, there’s a tendency to lean towards A/B tests. To run simultaneous changes across different slices of your user base and to measure the outcome.

A/B tests can be extremely useful in some cases — if you’re at Google or Meta scale or if you’re doing something like performance marketing. But in the vast majority of cases, it’s more pain than it’s worth — and might even be detrimental.

You don’t have enough data. Most products don’t have enough users to generate statistically significant results. The more you extrapolate from small sample sizes, the more you risk drawing incorrect conclusions.
A/B tests mean incremental changes. Incremental changes often lead to incremental results. Google testing an algorithm change or UI improvement is unlikely to change the business by more than a few basis points (and that would be a very successful experiment). For most startups and businesses, you need much bigger shifts and effects.
Twice the work. A/B testing is resource-intensive. You have to build both features. You have to build them in a way they can be feature-gated. You have to build the infrastructure to randomly distribute and measure the changes in both populations. You need to not confuse your users. You need expert data analysts to interpret the results.
Not sure what to measure. While hyper-focused organizations like Meta had a clear North Star (for many years, growth), most experimenters don’t know exactly what they are trying to optimize for. And many organizations don’t fully grasp the more qualitative consequences of a change.

Startups especially have to be opinionated. They can’t do everything (it’s hard enough to do one thing), and they don’t have the data or users to run tests.