MLOps Medium

What is training-serving skew, and how does a feature store help prevent it?

For ML Engineer MLOps Engineer Data Scientist

The short answer

Training-serving skew is any mismatch between how features are computed during training and how they are computed at serving time, which silently degrades a model that looked fine offline. It arises when offline and online feature logic are implemented separately, for example a rolling average computed over a different window in each path. A feature store prevents it by keeping a single feature definition used for both batch training and online serving, so the same values and logic apply in both, and it supports point-in-time-correct retrieval to avoid leakage.

How to think about it

The short answer

Training-serving skew is any discrepancy between how features are computed during training and during serving. It silently degrades a model that looked great offline, and it’s notoriously hard to debug. A feature store prevents it by holding a single feature definition used for both batch training and online serving — same logic, same values.

Why it happens

The classic cause is two implementations of the same feature: a Python/SQL job computes user_avg_purchase_30d for training, and a separate online service recomputes it at request time — over a slightly different window, time zone, or null-handling rule. The model trained on one definition is served the other, so its inputs no longer match what it learned. As the Nubank engineering team notes, these mismatches cause “catastrophic and hard-to-debug” performance problems.

How a feature store helps

One definition, two paths: the store computes/serves the same feature logic offline and online, so definitions can’t drift apart.
Point-in-time-correct retrieval: when building training data it joins feature values as they were at the event time, preventing label leakage from future data.
Reuse & governance: features are documented and shared, so teams stop re-implementing (and re-diverging) the same logic.

Concrete example

A model scores 0.92 AUC offline but 0.84 in production. Investigation finds the online service computed the 30-day average including the current day while training excluded it. Routing both paths through a feature store with one definition closes the gap.

Common follow-up / trap

A sharp follow-up: “Does a feature store fully eliminate skew?” No — skew can still creep in from raw upstream data differences, preprocessing outside the store, or stale online values. A feature store removes the definitional mismatch, but you still need monitoring that compares training vs serving feature distributions. The trap is treating the feature store as a silver bullet rather than one (important) layer alongside data contracts and skew monitoring.

Learn it properly Batch vs real-time inference

What is training-serving skew, and how does a feature store help prevent it?

The short answer

Why it happens

How a feature store helps

Concrete example

Common follow-up / trap

Keep practising

Explore further