What is CUPED and how does it reduce variance in A/B tests?
CUPED (Controlled-experiment Using Pre-Experiment Data) removes variance in the outcome metric that is explained by a pre-experiment covariate — typically the same metric measured before the experiment. This makes the residual variance smaller, which is equivalent to running a more powerful test or reaching significance faster with the same sample.
How to think about it
The core idea
A user’s post-experiment revenue is highly correlated with their pre-experiment revenue. That pre-experiment signal is available before the experiment starts and is unaffected by treatment assignment (it is in the past). If you regress out this pre-experiment covariate from the post-experiment metric, the residual variance is substantially smaller — often 30–70 % variance reduction in practice.
The CUPED estimator replaces the raw outcome Y with an adjusted outcome:
Y_cuped = Y - theta * (X - E[X])
Where X is the pre-experiment covariate, theta is estimated via OLS regression (theta = Cov(Y, X) / Var(X)), and E[X] is the grand mean of the covariate. Because X is independent of treatment assignment, this adjustment does not bias the treatment effect estimate — it only reduces variance.
What this means practically
If CUPED cuts variance by 50 %, you need only half the sample size to achieve the same power. Equivalently, for a fixed sample size, your effective MDE shrinks — you can detect smaller effects. Microsoft (who invented CUPED, published 2013) and Booking.com report routine variance reductions of 40–60 % on revenue and engagement metrics.
What covariate to use
The best covariate is the same metric measured in the pre-experiment period (e.g., 14-day revenue before experiment launch). The longer and more stable the pre-period, the higher the correlation and the greater the variance reduction. For new users who have no pre-experiment history, the adjustment is zero — CUPED only helps for returning users with prior data.
CUPED vs. stratified sampling
Stratification at assignment time (e.g., block-randomize by user tenure decile) achieves a similar goal but must be planned before launch. CUPED is applied post-hoc and is therefore more flexible. Most modern experiment platforms support CUPED by default.
For continuous metrics with high between-user heterogeneity (spend, session duration), CUPED is almost always worth applying. For binary metrics with low baseline rates (rare conversion events), the variance reduction is modest because pre-experiment binary signals are noisier predictors.