How do you choose a primary metric and guardrail metrics for an A/B test?
The primary metric should be the closest measurable proxy for the user or business outcome you are trying to move, sensitive enough to detect a real change within your traffic budget. Guardrail metrics are the constraints — things that must not degrade even if the primary metric improves.
How to think about it
Primary metric selection criteria
A good primary metric satisfies three properties simultaneously:
- Sensitive: it moves measurably in response to the treatment within the traffic and time window you have. Long-term retention is directionally correct but may take months; a 7-day engagement proxy is often the practical stand-in.
- Aligned: it reflects genuine user or business value, not a vanity metric. Click-through rate is easy to inflate by making a button larger; time-in-product can rise because users are confused. Prefer rate or ratio metrics (conversion rate, revenue per session) over raw counts.
- Owned: the team must have a plausible causal mechanism linking the treatment to the metric. If the mechanism is vague, the metric is wrong.
Guardrail metrics
Guardrail metrics prevent ship decisions that win narrowly on the primary metric while quietly destroying value elsewhere. Standard guardrails include:
- Performance: p99 page-load latency, API error rate. A feature that boosts conversions by 2 % but raises p99 latency by 400 ms is often a net negative.
- Revenue integrity: average order value, refund rate.
- Core engagement: any DAU/MAU signal outside the immediate funnel, to catch cannibalization.
- Ecosystem health: ad load, publisher revenue — relevant in marketplace products.
Set guardrail thresholds before launch (e.g., “p99 latency must not increase by more than 50 ms”). A guardrail breach vetoes a ship even if the primary metric is significantly positive.
In practice, one primary metric plus two to five guardrails is the norm. More guardrails increase the multiple-testing burden without adding proportional safety — pick the ones whose degradation would actually block a launch.