MLOps Medium Asked at MicrosoftAsked at AmazonAsked at MetaAsked at Databricks

How do Docker and ONNX complement each other for packaging and deploying ML models portably?

For MLOps Engineer ML Engineer AI / LLM Engineer

The short answer

Docker encapsulates the full runtime environment — OS libraries, Python version, system packages — so the model runs identically everywhere. ONNX provides a hardware- and framework-agnostic model format so a model trained in PyTorch can be executed by a high-performance runtime like ONNX Runtime without the training framework as a dependency.

How to think about it

Docker solves environment reproducibility. A model that depends on a specific CUDA version, numpy ABI, or shared library will silently produce wrong results or crash on a different host without containerisation. The image layers are immutable and tagged, so every deployment uses exactly the same runtime.

ONNX (Open Neural Network Exchange) solves framework lock-in. After training in PyTorch or TensorFlow you export to .onnx, then serve with ONNX Runtime — a lean, optimised inference engine with kernels for CPU, CUDA, TensorRT, CoreML, and OpenVINO. The training framework is not installed in the serving container, cutting image size and attack surface.

Together they form a clean two-layer packaging strategy:

Export layer — convert the trained model to ONNX once, version and push to the model registry.
Serving layer — a minimal Docker image containing only ONNX Runtime and the serving code.

# Export PyTorch model to ONNX
import torch, torch.onnx

model.eval()
dummy = torch.randn(1, 128)
torch.onnx.export(
    model, dummy, "model.onnx",
    input_names=["features"],
    output_names=["logit"],
    dynamic_axes={"features": {0: "batch_size"}},
    opset_version=17,
)

# Minimal serving image — no PyTorch needed
FROM python:3.12-slim
RUN pip install --no-cache-dir onnxruntime==1.18.0 fastapi uvicorn
COPY model.onnx serve.py ./
CMD ["uvicorn", "serve:app", "--host", "0.0.0.0", "--port", "8080"]

# serve.py
import onnxruntime as ort, numpy as np
from fastapi import FastAPI

app = FastAPI()
sess = ort.InferenceSession("model.onnx", providers=["CUDAExecutionProvider", "CPUExecutionProvider"])

@app.post("/predict")
def predict(features: list[float]):
    arr = np.array([features], dtype="float32")
    logit = sess.run(["logit"], {"features": arr})[0]
    return {"logit": float(logit[0, 0])}

How do Docker and ONNX complement each other for packaging and deploying ML models portably?

Keep practising

Explore further