datarekha
MLOps Medium

How does DVC differ from a feature store, and when would you reach for each?

The short answer

DVC (and lakeFS) version raw datasets and model artifacts as immutable snapshots tied to Git commits, giving reproducibility and rollback. A feature store manages computed features for training and serving, its main job being to keep offline and online feature definitions in sync to prevent training-serving skew. They are complementary: DVC answers what data made this model, while a feature store answers how do I serve the same features consistently.

How to think about it

The short answer

They solve different problems. DVC/lakeFS version datasets and artifacts — immutable snapshots tied to a Git commit for reproducibility and rollback. A feature store manages computed features and its headline job is keeping training (offline) and serving (online) feature logic identical to prevent training-serving skew.

Why the distinction matters

Data versioning is about lineage and reproducibility: “which exact bytes produced this model, and can I roll back?” A feature store is about consistency and reuse at serving time: the same user_avg_purchase_30d must be computed the same way in your batch training job and your low-latency online endpoint. As a Databricks overview notes, feature stores also add discovery, documentation, and governance so features get reused instead of rebuilt.

Concrete example

  • You retrain a churn model and accuracy drops. DVC lets you diff the dataset snapshot against last week’s to find a corrupted partition, and roll back.
  • Your model scores 0.92 offline but 0.84 in production. That smells like feature store territory: the online service computed a rolling average over a different window than the training pipeline did.

When to reach for each

  • Reach for DVC/lakeFS whenever you need reproducible experiments, dataset rollback, or audit trails — every serious project should have this.
  • Reach for a feature store (Feast, Tecton, Databricks FS) when you have real-time inference and shared features across teams. For a pure batch project with one team, it’s often overkill.

Common follow-up / trap

The trap is claiming a feature store “fixes” reproducibility or replaces data versioning — it doesn’t. A feature store still needs point-in-time-correct snapshots to avoid label leakage, and you still need artifact versioning for the model itself. The mature answer is “both, layered”: version raw data with DVC, serve features through a store, and stamp the model with both versions.

Learn it properly Data & model versioning

Keep practising

All MLOps questions

Explore further

Skip to content