How do you safely promote a model to production using a model registry?
Register every candidate as an immutable, versioned artifact, then move it through environments (dev to staging to prod) gated by automated checks rather than promoting straight to prod. In modern MLflow you use aliases like champion and challenger instead of the deprecated stage labels, and promotion is a governed, auditable action with sign-off and an easy rollback by repointing the alias. Always validate in staging and roll out progressively (canary or shadow) before full traffic.
How to think about it
The short answer
Never push a freshly trained model straight to prod. Register it as an immutable, versioned artifact, then promote it through environments — dev to staging to prod — where each transition is gated by automated checks and recorded for audit. Rollback should be a one-line operation.
Why
A model registry is the system of record that decouples training from deployment. It gives you versioning, lineage (which data/code produced this), governance (who approved it), and a stable reference the serving layer reads. As of MLflow 2.9+ and into MLflow 3, the old Staging/Production stages are deprecated in favor of aliases — mutable named pointers like @champion and @challenger. Serving reads models:/fraud@champion; promotion is just repointing the alias, and rollback is repointing it back.
A safe promotion flow
- Register the candidate version with metadata: git SHA, data hash, training metrics.
- Gate in staging: run automated checks — minimum eval metrics, no regression vs current champion, schema/contract tests, latency budget.
- Require sign-off for the prod promotion (manual approval or policy-as-code) so it’s auditable.
- Roll out progressively: shadow or canary the challenger on a slice of traffic, compare live metrics, then repoint
@champion. - Keep the previous version pinned so rollback is instant.
Concrete example
You tag v23 as @challenger, shadow it for a day, confirm its live precision matches offline and p99 latency is fine, then set_registered_model_alias("fraud", "champion", version=23). If alerts fire, repoint champion to v22 — seconds, not a redeploy.
Common follow-up / trap
Interviewers probe: “What goes wrong if you skip staging?” — silent training-serving skew, schema drift, or latency regressions hit users directly. The trap is describing the deprecated stage transitions as current best practice; mention aliases. Another trap: promoting on offline metrics alone — strong answers tie promotion to an online check (canary/A-B) before full rollout.