datarekha

Retraining & continual learning

Drift detection is useless if nothing acts on it. Scheduled vs trigger-based retraining, the champion-challenger pattern, and how to retrain safely without auto-shipping a worse model.

7 min read Intermediate MLOps Lesson 19 of 28

What you'll learn

  • Scheduled vs trigger-based retraining and when each fits
  • The champion-challenger pattern for safe model updates
  • Why automated retraining needs guardrails, not blind trust

Before you start

Drift detection tells you the world has changed. But an alert nobody acts on is just noise. Retraining is the action that closes the loop — and the question isn’t whether to retrain, but when and how safely. Get it wrong and you either serve a stale model for months or auto-ship a broken one.

Scheduled vs trigger-based

Two ways to decide when to retrain:

  • Scheduled — retrain on a fixed cadence (nightly, weekly, monthly). Simple and predictable, and a fine default. The downside: you either retrain too often (burning compute when nothing changed) or too rarely (the model goes stale between runs).
  • Trigger-based — retrain when a signal fires: drift crosses a threshold, performance drops below an SLA, or a volume of new labeled data arrives. More efficient (you retrain exactly when needed) but more machinery — you need the monitoring and the automated pipeline to act on it.

Most mature teams use both: a scheduled floor (retrain at least weekly) plus drift/performance triggers for when reality moves faster.

The retraining loop

monitordrift / perftrigger firesthreshold crossedretrainon fresh datavalidatechallenger vs championpromoteif it winscontinuous loop
Monitor → trigger → retrain → validate → promote only if it wins, then back to monitoring.

Champion-challenger — never auto-ship blind

The dangerous failure mode is automatically deploying every retrained model. Retraining can produce a worse model — bad new data, a label pipeline bug, a distribution shift the model handled poorly. The safe pattern is champion-challenger:

  • The champion is the model currently serving production.
  • A freshly retrained challenger is evaluated against it — first offline (must beat the champion on the holdout), then often in a shadow or canary deployment or a proper A/B test.
  • The challenger is promoted only if it wins. Otherwise the champion stays, and you’ve lost nothing.

This makes retraining safe to automate: the worst case is “we kept the old model.”

Quick check

Quick check

0/3
Q1What's the tradeoff between scheduled and trigger-based retraining?
Q2Why use a champion-challenger pattern instead of auto-deploying every retrained model?
Q3What is the feedback-loop risk in continual learning?

Next

Safe retraining depends on the model registry gate and orchestration to run the loop. Next: the governance side — responsible-AI ops.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Practice this in an interview

All questions
How do you decide when to retrain a model, and how do you do it safely?

Choose between scheduled retraining on a fixed cadence and trigger-based retraining fired by monitored drift or a performance drop, picking based on how fast the data distribution changes and how good your monitoring is. Retrain safely by treating it as an automated pipeline that validates data, trains, and gates the new model against the current champion on held-out and business metrics before promotion. Then roll out progressively with shadow or canary so a bad model never fully replaces the champion.

When and how should you trigger model retraining — scheduled vs. event-driven?

Scheduled retraining is simple and predictable but wastes compute when nothing has shifted and reacts slowly when drift is sudden. Event-driven retraining ties compute to evidence — a drift alarm, a performance threshold breach, or a data volume trigger — and is more efficient at scale. Most mature systems combine both.

What's the difference between full retraining, incremental (warm-start) training, and continual online learning?

Full retraining trains a fresh model from scratch on the latest data window, giving the cleanest result but at the highest cost and slowest cadence. Incremental or warm-start training continues from existing weights on new data, which is cheaper and faster but can accumulate drift and forgetting. Continual online learning updates the model continuously from a live stream for maximum freshness, at the cost of stability, harder evaluation, and vulnerability to bad or poisoned data.

What is the difference between data drift, concept drift, and label drift — and how do you detect each?

Data drift is a change in the statistical distribution of model inputs; concept drift is a change in the relationship between inputs and the target; label drift is a shift in the marginal distribution of the target itself. They require different detectors and carry different business urgency.

Related lessons

Explore further

Skip to content