datarekha
MLOps Medium Asked at DatabricksAsked at NetflixAsked at AirbnbAsked at StripeAsked at Lyft

What is a model registry, and how does model versioning work in production ML systems?

The short answer

A model registry is a centralised store that tracks every trained model artifact alongside its metadata — hyperparameters, training data version, evaluation metrics, and lineage. Versioning assigns unique identifiers to each artifact and manages lifecycle stages so teams can promote, roll back, and audit models without manual file management.

How to think about it

A model registry is the single source of truth for trained model artifacts. Without one, models are files dropped in S3 buckets with ad-hoc names, training metadata lives only in someone’s notebook, and rollback means hunting for the right .pkl file.

Core capabilities:

  • Artifact storage — immutable versioned blobs (weights, ONNX files, tokenizers) stored with content-addressable hashing.
  • Metadata — git commit, dataset version, hyperparameters, evaluation metrics, training duration, hardware used.
  • Lifecycle stagesStaging → Production → Archived. Promotion requires a review step; demotion is one API call.
  • Lineage — links a deployed model version back to the exact training run, data snapshot, and code commit.
  • Aliases / aliases tagging — a champion alias always points to the current production version, so serving code references champion rather than a hard-coded version number.
import mlflow

# Log run and register model
with mlflow.start_run() as run:
    mlflow.log_params({"lr": 0.01, "depth": 6})
    mlflow.log_metric("auc", 0.923)
    mlflow.sklearn.log_model(
        model,
        artifact_path="model",
        registered_model_name="fraud-detector",
    )

# Promote to Production via API
from mlflow import MlflowClient
client = MlflowClient()
client.transition_model_version_stage(
    name="fraud-detector",
    version=7,
    stage="Production",
    archive_existing_versions=True,
)
# Serving config references the alias, not a hard version
model:
  name: fraud-detector
  stage: Production  # resolved at load time — survives rollouts

Rollback is instant: call transition_model_version_stage to restore the previous version to Production. The serving fleet re-loads without a code deploy.

Keep practising

All MLOps questions

Explore further

Skip to content