datarekha
Statistics & Probability Medium Asked at MetaAsked at GoogleAsked at AmazonAsked at Booking

What are novelty and primacy effects in A/B testing, and how do you handle them?

The short answer

Novelty effect is the temporary engagement spike users show with any change simply because it is new; primacy effect is the temporary dip when users resist a change to a familiar interface. Both cause the short-term treatment effect to differ materially from the long-term steady-state effect.

How to think about it

Novelty effect

When you introduce a new feature — a redesigned checkout button, a new content format, a recommendation widget — engaged users disproportionately click on it because it is unfamiliar, not because it is genuinely useful. The measured lift in the first few days can be 2–3x the true long-run lift. This effect decays as the feature becomes part of the “normal” experience, usually within one to three weeks for a widely used product.

Practically: if your test shows a large positive effect in week 1 that shrinks in week 2, suspect novelty. Report the week-2 and week-3 estimates as more reliable.

Primacy effect

The primacy effect runs the opposite direction. Users accustomed to the old interface may have lower engagement in early treatment days while they adapt to the change. A new navigation structure, for example, may initially confuse users and suppress click rates even if it is objectively faster once learned. A short test would misleadingly conclude the new design is worse.

How to diagnose both

Plot the daily treatment effect (treatment rate minus control rate) over time. A novelty effect looks like a declining trend that flattens after week 1. A primacy effect looks like an initially negative or suppressed lift that recovers or turns positive over time.

Mitigation strategies

  1. Run longer: two to three weeks usually lets novelty decay and users adapt to primacy, reaching a more stable estimate.
  2. Segment by tenure in treatment: users who entered the treatment on day 1 have had more exposure by day 7 than users who entered on day 6. Analyzing only the “day 1 cohort” at the end of week 2 gives a longer-exposure estimate.
  3. Holdout for long-run effects: maintain a small holdout group (e.g., 5 %) that never sees the new feature for 30–90 days to measure truly long-run incremental impact.

For major product redesigns, many companies (including Netflix and LinkedIn) maintain long-running holdout groups specifically to measure persistent effects that short experiments cannot capture.

Learn it properly A/B testing

Keep practising

All Statistics & Probability questions

Explore further

Skip to content