datarekha
MLOps Hard

When would you use a multi-armed bandit or shadow deployment instead of a fixed A/B test?

The short answer

A fixed A/B test holds traffic splits constant to get a clean, statistically powered comparison, which is ideal when you need a trustworthy ship decision. A multi-armed bandit dynamically shifts traffic toward the better-performing model, reducing regret when you can't run long enough for significance or when the best arm may change. Shadow deployment sends real traffic to the new model without serving its outputs, so you validate behavior and latency risk-free before any user is exposed.

How to think about it

The short answer

Use a fixed A/B test when you need a clean, statistically powered yes/no on shipping. Use a multi-armed bandit when you’d rather minimize regret than get a textbook significance result — it shifts traffic toward the winner as evidence accrues. Use a shadow deployment when you want to validate a model on real traffic with zero user exposure before any experiment at all.

Why each exists

  • A/B test: constant splits give an unbiased, interpretable estimate of the effect. Best when the decision is high-stakes and you can afford the traffic/time to reach significance.
  • Multi-armed bandit: balances exploration vs exploitation, allocating more traffic to better-performing arms. Great when you can’t run long enough for significance, when opportunity cost of serving a worse model is high, or when the best arm may drift over time.
  • Shadow (dark) deployment: the challenger receives all production requests and logs predictions, but only the champion’s outputs are served. You measure latency, error rates, and prediction agreement with no user risk.

Concrete example

Deploying a new fraud model: first shadow it for a day to confirm it doesn’t time out and its score distribution is sane. Then run an A/B test to measure caught-fraud and false-positive rates with confidence. For a homepage banner ranker where freshness matters and there’s lots of traffic, a bandit harvests more value during the experiment than a static split.

Trade-offs

Bandits complicate clean statistical inference (the assignment isn’t fixed) and can over-exploit a temporarily-lucky arm. Shadow mode can’t measure user-facing outcomes like clicks — only operational behavior and offline-style agreement.

Common follow-up / trap

A frequent probe: “You have very little traffic — A/B or bandit?” Bandit (or CUPED-augmented A/B) reduces wasted exposure. The trap is treating these as competitors rather than a pipeline: shadow to de-risk, then A/B or bandit to decide.

Learn it properly A/B testing & experimentation

Keep practising

All MLOps questions

Explore further

Skip to content