Statistics & Probability Medium Asked at MetaAsked at GoogleAsked at AmazonAsked at Booking

How do you design an A/B test from scratch?

For Data Scientist Data Analyst ML Engineer

The short answer

A rigorous A/B test requires a pre-registered hypothesis, a single primary metric, sample size calculated before launch, random unit-level assignment, and a fixed runtime. Skipping any of these steps opens the door to false positives and post-hoc rationalization.

How to think about it

A well-designed A/B test follows six concrete steps.

1. Write the hypothesis before touching data. State the expected direction and the mechanism: “Showing price-drop badges increases add-to-cart rate because it reduces purchase hesitation.” Pre-registration forces you to commit to what counts as success.

2. Choose the randomization unit. Users are the default, but use sessions for pure UX tests and account/household IDs when spillover within a household is a concern. The unit must be the same as the analysis unit.

3. Pick one primary metric and a short list of guardrail metrics. The primary metric drives the ship-or-no-ship call; guardrail metrics (latency, revenue-per-user, support tickets) ensure you are not buying a win on one dimension by degrading another.

4. Calculate sample size. Fix the significance level (alpha = 0.05), desired power (80–90 %), and the minimum detectable effect (MDE) — the smallest lift that would be business-meaningful. These three inputs determine required sample size. Do not start the test until you have enough traffic to reach that sample size within a reasonable window.

5. Launch with clean assignment. Use a deterministic hash of user ID + experiment ID so assignment is stable per user, then flip traffic on. Log the assignment event, not just the conversion event.

6. Decide at a fixed horizon. Set the end date before launch based on the power calculation. Analyze once — or use a sequential testing framework if you must peek.

After the test ends, report the point estimate with confidence intervals, check novelty effects by looking at day-1 versus day-7 lift, and confirm guardrail metrics held.

Learn it properly A/B testing

How do you design an A/B test from scratch?

Keep practising

Explore further