datarekha

Feature stores — when you need one, when you don't

The online/offline skew problem in one sentence, then a runnable Feast-shaped feature pipeline. Plus the honest answer to 'do we need a feature store?' (usually: no — until you do).

10 min read Advanced MLOps Lesson 17 of 17

What you'll learn

  • The training/serving skew problem — why a SQL query that worked offline silently breaks online
  • Offline store vs online store — the two halves and why both exist
  • Defining a feature view with a Feast-shaped API
  • `get_historical_features` (training) vs `get_online_features` (serving) — same definition, two reads
  • When a feature store is the right answer, and when it's premature infrastructure

Before you start

A senior ML engineer once described the problem like this. The data scientist wrote a SQL query that joined three tables, computed a 30-day rolling average of user activity, and trained a churn model that hit 0.87 AUC. Six months later, the same model in production is getting 0.71 AUC, and nobody can figure out why.

It turns out the production feature service computes the same “30-day rolling average” using a slightly different time window (now() - INTERVAL 30 DAY instead of the training query’s event_date BETWEEN snapshot_date - 30 AND snapshot_date). The features look the same. They aren’t.

That’s training/serving skew. It’s the single most common failure mode in production ML, and it’s what feature stores were invented to fix.

The mental model

A feature store has two halves, and both halves answer the same question — “what was feature X for entity Y at time T?” — using very different infrastructure.

HalfWhat it storesRead latencyUsed forTypical backing
Offline storeHistorical feature values, point-in-timeMinutesTraining, backfillsParquet on S3, BigQuery, Snowflake, Delta
Online storeLatest feature values per entity< 10 msReal-time inferenceRedis, DynamoDB, Bigtable, RocksDB

The whole point: one feature definition, two reads. You define user_thirty_day_activity once. Training reads it from the offline store with full history. Serving reads it from the online store at inference time. The store guarantees they’re computed the same way.

Skew goes away because it can’t appear — by construction, both reads use the same upstream computation.

A feature view, Feast-shaped

Feast is the open-source default. The shape of its API has become the industry’s lingua franca even when teams use other stores.

# feature_repo/features.py — what your feature definitions actually look like
from datetime import timedelta
from feast import (
    Entity, FeatureView, Field, FileSource, ValueType, FeatureService,
)
from feast.types import Float32, Int64, String

# 1) Entities — the "thing" features are attached to.
user = Entity(name="user_id", value_type=ValueType.INT64, description="User primary key")

# 2) Source — where the offline data lives (here a Parquet file; in prod, BQ/Snowflake/S3).
user_activity_source = FileSource(
    path="s3://yourorg-features/user_activity.parquet",
    timestamp_field="event_timestamp",
    created_timestamp_column="created_ts",
)

# 3) Feature view — the unit of definition. Names, types, freshness, TTL.
user_activity_fv = FeatureView(
    name="user_activity_features",
    entities=[user],
    ttl=timedelta(days=2),     # online value is stale after 2 days of no update
    schema=[
        Field(name="thirty_day_session_count", dtype=Int64),
        Field(name="seven_day_avg_session_minutes", dtype=Float32),
        Field(name="days_since_last_purchase", dtype=Int64),
        Field(name="lifetime_purchases", dtype=Int64),
    ],
    source=user_activity_source,
    online=True,
)

# 4) Feature service — what your model actually consumes (a "view of views")
churn_risk_v1 = FeatureService(
    name="churn_risk_v1",
    features=[user_activity_fv],
)

That’s the entire contract. The same user_activity_fv definition backs both training reads and online lookups.

Training — get_historical_features

For training, you have a list of (entity, label, timestamp) rows. You ask the store: “for each of these rows, give me the feature values as they were at that exact timestamp.”

This is point-in-time correctness — the killer feature. Without it, you accidentally use future information to predict the past, and your offline AUC ends up wildly higher than production performance.

# Training-time read — features as of each row's event_timestamp
from feast import FeatureStore
import pandas as pd

fs = FeatureStore(repo_path="feature_repo/")

# Your training spine: (user_id, label, when-the-prediction-was-made)
entity_df = pd.read_parquet("training_labels.parquet")
# columns: user_id, churned_within_30_days, event_timestamp

training_df = fs.get_historical_features(
    entity_df=entity_df,
    features=fs.get_feature_service("churn_risk_v1"),
).to_df()
# training_df now has the original columns + the feature columns,
# each feature value taken at-or-before its row's event_timestamp.

Two non-obvious wins:

  • You can’t accidentally leak future state, because Feast enforces “no feature value with event_timestamp > the row’s timestamp.”
  • You can re-train on any historical date by changing entity_df’s timestamps. Time-travel is built in.

Serving — get_online_features

For real-time inference, you have a single entity (a user_id) and need the latest values, fast.

# Serving-time read — latest values, low latency
features = fs.get_online_features(
    features=fs.get_feature_service("churn_risk_v1"),
    entity_rows=[{"user_id": 4242}],
).to_dict()

# features = {
#   "user_id": [4242],
#   "thirty_day_session_count": [27],
#   "seven_day_avg_session_minutes": [13.4],
#   "days_since_last_purchase": [9],
#   "lifetime_purchases": [4],
# }

model_input = [
    features["thirty_day_session_count"][0],
    features["seven_day_avg_session_minutes"][0],
    features["days_since_last_purchase"][0],
    features["lifetime_purchases"][0],
]
score = model.predict_proba([model_input])[0][1]

A separate process — feast materialize — periodically computes and pushes the latest feature values from the offline store into the online store (hence “materialize”: turning a logical definition into stored, ready-to-read values). That’s the only piece of infrastructure the feature store is asking you to operate.

A runnable worked example

You can’t run the full Feast stack in the browser (it needs an online store and a materialize job), but you can run a faithful shape-of-it that demonstrates the duality. Same feature definition, two reads, both giving you the right answer.

The thing to internalise from that simulation: the same materialize() function powers both reads. Real Feast doesn’t compute on the fly; it materializes into the online store on a schedule. But the contract is identical.

When a feature store is the right answer

A feature store earns its weight when several of these are true:

  • You have many models reusing the same features. The “definition once, reads everywhere” win compounds with reuse. With one model and four features, the wins are theoretical.
  • You serve in real time (single-digit ms latency budgets). You need an online store.
  • You’ve already shipped a model that drifted because training features and serving features diverged. You know the pain.
  • Multiple teams contribute features and you need governance: ownership, lineage, freshness SLAs.
  • You have point-in-time correctness needs — temporal features where using future data leaks.

When it’s premature infrastructure

Equally true:

  • You have under five production models. The shared-definition payoff is small.
  • You don’t serve in real time. If your inference is batch (nightly scoring), the offline store is your feature store — Parquet on S3 with snapshot dates does the job, and adding Feast on top is ceremony.
  • Your features are simple aggregates from a single source. A view in Snowflake or BigQuery, materialized nightly, is cheaper to operate than a feature store.
  • Nobody on the team will own the feature store. Like Kubeflow, Feast/Tecton needs an owner. An unmaintained feature store decays fast.

The managed vs. open-source landscape

OptionWhat it isSweet spot
FeastOpen-source, BYO infra (Redis + S3 + your own materialize)Teams with existing cloud infra; flexibility over hand-holding
TectonManaged feature platform with streaming, transformations, SLA monitoringReal-time, multi-team, complex transformations
Databricks Feature EngineeringFeature store inside Databricks workspaceAlready on Databricks; want one less integration
SageMaker Feature StoreAWS-native feature store with offline + online storesHeavy SageMaker shops
Vertex AI Feature StoreGCP-native equivalentHeavy Vertex shops
Just a warehouse viewMaterialized SQL, no special frameworkBatch scoring, fewer than 5 models, no real-time needs

The pattern: the more models, teams, and real-time needs you have, the more justification for a managed (Tecton, Databricks, SageMaker) option. The more you’re a one-team shop with batch needs, the more “a table in Snowflake” is the right answer.

What never changes — the feature contract

Whatever you choose, the idea is the part you internalise:

  1. A feature is a named, typed, owned quantity attached to an entity at a timestamp.
  2. The definition of that feature is code — versioned, reviewed, tested.
  3. Training and serving read the same definition, never reimplement it.

You can do all three without Feast. You can fail at all three with Feast. The infrastructure is downstream of the discipline.

Quick check

Quick check

0/3
Q1What's the core problem a feature store exists to solve?
Q2What does point-in-time correctness mean in the context of `get_historical_features`?
Q3You have one ML model, batch-scored nightly, with features that come from a single Snowflake table. Should you adopt Feast?

Next

You’ve now got the data side (features) and the serving side (FastAPI, K8s) wired up. The next thing that bites a real production system is drift — when the world quietly changes underneath your trained model.

Practice this in an interview

All questions
What is a feature store and why is it critical for production ML systems?

A feature store is a shared data platform that computes, stores, and serves ML features consistently for both training and serving. It eliminates training-serving skew by ensuring the same transformation code runs in both contexts, and it reduces duplicated work by letting teams share and discover features across models.

Your model performs well offline but degrades in production. How do you diagnose and fix it?

The most common cause is training-serving skew: the distribution of features at serving time differs from the training data. The fix requires instrumenting the pipeline to log serving inputs, compare their distribution to training data, and identify whether the gap is due to data drift, feature engineering bugs, label leakage, or infrastructure inconsistencies.

What is train/serve skew and how do you prevent it?

Train/serve skew occurs when the feature values a model sees at training time differ from those it sees at inference time, even for the same raw input — caused by divergent preprocessing code paths, different data sources, or temporal leakage. It silently degrades performance without raising obvious errors.

What is feature leakage and how do you prevent it during feature engineering and preprocessing?

Feature leakage occurs when information from the test set or from the future leaks into training features, making a model appear more accurate than it will be in production. It arises from fitting preprocessing steps on the full dataset, using post-event information as a predictor, or computing aggregates across train-test boundaries. Prevention requires strict pipeline discipline: all stateful transformations must be fit only on training data.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content