MLOps Medium Asked at AirbnbAsked at DoorDashAsked at NetflixAsked at Grab

When and how should you trigger model retraining — scheduled vs. event-driven?

The short answer

Scheduled retraining is simple and predictable but wastes compute when nothing has shifted and reacts slowly when drift is sudden. Event-driven retraining ties compute to evidence — a drift alarm, a performance threshold breach, or a data volume trigger — and is more efficient at scale. Most mature systems combine both.

How to think about it

The retraining question is really two questions: when to retrain, and how to validate the new model before it goes live. Both matter equally.

Scheduled retraining

Train nightly, weekly, or monthly on a fixed cadence regardless of drift signals. Simple to implement and easy to audit. Works well for models where the world changes slowly and training is cheap.

Downsides: wastes compute during stable periods; reacts with lag when drift is abrupt (breaking news, a competitor launch, a market shock). A weekly cadence can lose days of accuracy.

Event-driven (triggered) retraining

Retrain when a monitored condition is met:

Drift trigger: PSI above 0.2 on a key feature, or Jensen-Shannon divergence on output distribution exceeding a threshold.
Performance trigger: rolling accuracy or AUC (when labels arrive) drops below an acceptable floor.
Data volume trigger: enough new labelled samples have accumulated to meaningfully shift the training distribution.
Business trigger: an external event (product launch, seasonal spike, regulation change) is flagged by a human operator.

Event-driven requires robust monitoring infrastructure — if your drift detectors are noisy, you’ll thrash with unnecessary retrains.

Hybrid approach (production best practice)

Use a minimum scheduled cadence (e.g., monthly) to prevent stale models, plus event-driven triggers that can fire sooner. This gives a safety net against runaway drift while keeping compute proportional to need.

Validating the retrained model before deployment

Run on a held-out recent window (not the same window used to detect drift).
Shadow deploy: route live traffic to both models, compare predictions without serving the new model’s output.
Canary or A/B test: serve a small traffic slice to the new model, gate on business KPI improvement before full rollout.
Automated champion/challenger: promote only if the challenger exceeds the champion on the evaluation metric by a statistically significant margin.