How do you safely promote a model to production using a model registry?

Register every candidate as an immutable, versioned artifact, then move it through environments (dev to staging to prod) gated by automated checks rather than promoting straight to prod. In modern MLflow you use aliases like champion and challenger instead of the deprecated stage labels, and promotion is a governed, auditable action with sign-off and an easy rollback by repointing the alias. Always validate in staging and roll out progressively (canary or shadow) before full traffic.

What's the difference between experiment tracking and a model registry, and why do you need both?

Experiment tracking logs every run, its parameters, metrics, and artifacts, so you can compare and reproduce experiments during development. A model registry is the curated, governed catalog of the few models you actually intend to deploy, with versioning, stage or alias management, approvals, and lineage. You need both because tracking gives breadth for exploration while the registry gives the controlled, auditable path to production.

What is a model registry, and how does model versioning work in production ML systems?

A model registry is a centralised store that tracks every trained model artifact alongside its metadata — hyperparameters, training data version, evaluation metrics, and lineage. Versioning assigns unique identifiers to each artifact and manages lifecycle stages so teams can promote, roll back, and audit models without manual file management.

Walk me through the full ML lifecycle from problem definition to model retirement.

The ML lifecycle spans eight phases: problem framing, data collection and validation, feature engineering, training and experimentation, offline evaluation, deployment, production monitoring, and retirement or retraining. Each phase has distinct owners, artefacts, and failure modes that an MLOps practice must systematise.

Model registry & promotion — MLOps

The last lesson left us with forty perfectly-reproducible runs and a governance problem: provenance is not the same as control. Knowing each run’s exact lineage tells you nothing about which one is live, which is being tested, and how a new version safely takes the throne from the old one. We asked what system decides all that. This lesson is that system.

MLflow tracking answers “what did I try?” The model registry answers a different, scarier question: “which model is in production right now, how was it trained, and who approved it?” If your answer today is “the model.pkl on that server, probably from the notebook Raj ran in March,” you need a registry.

What a registry tracks

A registry sits on top of your tracked runs and adds a governed layer:

A registered model with a name (churn-classifier).
Versions (1, 2, 3…), each pointing to a specific run, its artifacts, metrics, and the exact data/code that produced it.
Stages — None → Staging → Production → Archived — or modern aliases (champion, challenger, shadow) that let you point a label at a version without renaming anything downstream.

So “the production model” becomes a queryable pointer, not tribal knowledge. Your serving layer loads models:/churn-classifier@champion and you can roll back by moving the alias.

The promotion gate

The registry’s real value isn’t storage — it’s the gate. Promoting a version to Production should require evidence: eval results, a model card, a bias audit, a human sign-off. Try to promote a model before it’s ready:

TryModel registry · the promotion gate

You can't ship what you can't vouch for

Version churn-classifier:v7 is in Staging. To promote it to Production, the registry requires the evidence below. Toggle each gate, then promote — the button stays locked until every requirement is met.

Staging⊘Production

0/4 gates — 4 blocking

A model registry is the single source of truth for which version exists, how it was trained, and what's in prod — with versioned stages (None → Staging → Production → Archived, or aliases likechampion/challenger). Its real power is the promotion gate: it makes shipping a model a reviewed, logged event, not a silent model.pkl copy. That audit trail is also your evidence for regulations like the EU AI Act.

That gate is what turns “someone copied a pickle to prod” into a reviewed, logged, reversible event — and it’s where governance and responsible-AI checks get enforced.

In one breath

A model registry sits on top of your tracked runs and turns “the production model” from tribal knowledge into a governed, queryable pointer — a named model with numbered versions, each carrying full lineage, promoted through stages or aliases (champion/challenger) by a gate that demands evidence (eval results, model card, bias audit, sign-off) before anything reaches production; the gate is the whole point, converting a silent pickle-copy into a reviewed, logged, reversible event — though promotion only blesses a version, it does not deploy it.

Practice

Before the quiz, separate two ideas the registry deliberately keeps apart. Moving a version to the Production stage makes it the blessed model — yet no traffic shifts until a separate pipeline acts on that label. In your own words, why is it valuable that “what is approved” and “how it goes live” are two different systems rather than one? And the governance angle: the promotion gate is where a bias audit or an EU-AI-Act sign-off gets enforced — why is the gate, not the training script, the right place to attach that evidence?

Quick check

0/3

Q1What does a model registry add on top of experiment tracking?

Q2Why is the promotion gate the most important part of a registry?

Q3You move a model to the 'Production' stage in the registry. Is it now serving traffic?

A question to carry forward

We just made the promotion gate the hero of the registry — the choke point that demands evidence before a model reaches production. But say that word back slowly: evidence. The gate refuses to promote a model that hasn’t proven itself — and yet we never said what “proven” actually means, or who checks it.

That is the hole this chapter has been circling. A gate is only as trustworthy as the tests standing behind it, and “the model got 0.91 F1” is nowhere near enough evidence — the data could be corrupt, a slice could be failing, the serving features could mismatch training, and the aggregate number would never blink. So the question to carry forward is: what does it actually take to prove an ML system is production-ready — across its data, its model, its infrastructure, and its monitoring — so the gate has something real to check? That is ML testing and the ML Test Score, and it is the next lesson.

Model registry & promotion

What you'll learn

Before you start

What a registry tracks

The promotion gate

You can't ship what you can't vouch for

In one breath

Practice

Quick check

A question to carry forward

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further