MLOps Medium Asked at NetflixAsked at UberAsked at LinkedInAsked at SpotifyAsked at DoorDash

What are the differences between batch, online, and streaming inference, and when should you use each?

For MLOps Engineer ML Engineer AI / LLM Engineer

The short answer

Batch inference runs predictions on large datasets on a schedule, optimizing for throughput. Online inference serves individual requests in real time, optimizing for low latency. Streaming inference processes continuous event streams with bounded latency requirements between the two extremes.

How to think about it

Batch inference runs a model over a large dataset on a schedule (hourly, nightly). Predictions are stored and looked up later. Ideal when results are not needed at request time — recommendation pre-computation, fraud scoring on daily transactions, churn propensity scores.

Online inference serves a single request synchronously with strict SLA requirements (p99 under 100 ms is common). Used when the model needs context only available at request time — real-time fraud detection, search ranking, autocomplete.

Streaming inference consumes from a message queue (Kafka, Kinesis) and scores records as they arrive. Sits between batch and online: sub-second latency, higher throughput than pure request/response. Common for click-stream feature aggregation or IoT anomaly detection.

The three inference modes trade throughput for latency across a spectrum.

# Batch: score a DataFrame offline
import pandas as pd, joblib

model = joblib.load("model.pkl")
df = pd.read_parquet("s3://data/features/2026-06-06.parquet")
df["score"] = model.predict_proba(df[FEATURE_COLS])[:, 1]
df.to_parquet("s3://data/scores/2026-06-06.parquet")

# Online: FastAPI endpoint
from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class Request(BaseModel):
    features: list[float]

@app.post("/predict")
def predict(req: Request):
    score = model.predict_proba([req.features])[0, 1]
    return {"score": float(score)}

What are the differences between batch, online, and streaming inference, and when should you use each?

Keep practising

Explore further