What do you do when you cannot run a randomized A/B test?
When randomization is not feasible — due to ethical, operational, or technical constraints — quasi-experimental designs such as difference-in-differences, regression discontinuity, and synthetic control can recover causal estimates, but each requires strong and testable assumptions.
How to think about it
When randomization fails
Common blockers: a pricing change that must apply uniformly across a market (no ability to give different users different prices), a regulatory constraint (health, finance), a feature that is infrastructure-wide and cannot be selectively rolled out, or a geo-level product launch where you only have a handful of markets.
Difference-in-differences (DiD)
Compare the treated group before and after treatment, and subtract the same before-after change in a comparable untreated group. The key assumption is parallel trends: the treated and control groups would have followed the same time trend absent the treatment. Validate by checking pre-period trends are parallel; if they diverge before the treatment, DiD is invalid.
Used extensively for policy changes, price experiments, and geo-launches (e.g., “we launched same-day delivery in city A but not city B — what was the revenue effect?”).
Regression discontinuity (RD)
Exploit a sharp threshold in an assignment rule. Users just above and below an eligibility cutoff (age 18, credit score 700, account tenure 90 days) are treated as locally randomized. The causal effect is estimated at the discontinuity. Very credible in a narrow bandwidth around the cutoff; extrapolation away from the cutoff is not warranted.
Synthetic control
For a single treated unit (one country, one product line, one city), construct a weighted combination of untreated units that matches the treated unit’s pre-treatment trajectory as closely as possible. Used by Abadie et al. for California’s tobacco legislation and widely adopted in tech for country-level launches. Requires several years of pre-treatment data and many potential control units.
Instrumental variables (IV)
When treatment is not randomly assigned but there exists an instrument — a variable that affects treatment uptake but has no direct effect on the outcome — IV recovers a local average treatment effect. Classic example: using a randomized email nudge as an instrument for actual feature adoption. Finding valid instruments in product settings is genuinely hard; weak instruments produce unreliable estimates.
In practice, when you cannot randomize, the honest answer involves three things: stating the design clearly, articulating the identifying assumption, and showing the best available test of that assumption (e.g., an event study plot for parallel trends, a density test at the RD cutoff).