datarekha
Infrastructure May 3, 2026

Feature stores in 2026: Tecton, Feast, Hopsworks — death and rebirth

The feature store hype cycle went peak (2021), trough (2023, 'just use dbt'), and into a quieter rebirth as the data-for-AI layer for both classical ML and LLM agents. Tecton sold to Databricks. Feast survives as the open-source baseline. Hopsworks redefined itself as an AI lakehouse. Here's what the modern feature store actually is in 2026.

13 min read · by datarekha · feature-storestectonfeasthopsworks

The feature store category, as a marketing category, is mostly dead. The last time a developer told me they were “evaluating feature stores” was late 2023, and they were comparing build-vs-buy with a tone of mild embarrassment. The conventional wisdom had moved decisively: feature stores were a 2021 hype-cycle artifact, the actual problem was solved by dbt for offline features and a Redis cache for online lookup, and the remaining vendor pitches were enterprise software in search of a problem.

Then the agent era arrived, and the same problem — get the right piece of contextual data into the model at the right time, with the right freshness — turned out to be the central infrastructure problem of RAG and agent systems. The vocabulary changed (it’s now “context retrieval” or “AI-data” or “the data layer for agents”) but the work under the hood is what feature stores had been doing all along, plus embeddings.

In August 2025, Databricks acquired Tecton explicitly to power AI-agent context. That deal closed the loop. The feature store didn’t die. It just stopped being marketed as one.

This post is about what actually happened, what the modern feature store does in 2026, and the three players still worth knowing — Tecton (now inside Databricks), Feast, and Hopsworks.

The hype cycle, briefly

The feature store category got its name from the Uber Michelangelo post in 2017, which described Uber’s internal feature store. By 2021 every ML company had one in development; every MLOps startup had one in the pitch deck; Tecton raised at over $1B; Feast had been donated to LF AI; Hopsworks, Continual, Iguazio, Sagemaker Feature Store, Vertex Feature Store — the category was crowded.

By 2023, the trough. The honest pushback from senior engineers was that the offline feature store was just a dbt model with a join, the online feature store was Redis with a sensible schema, and paying a vendor for the orchestration in between was a luxury most teams couldn’t justify. A handful of high-profile teardowns appeared — Chip Huyen wrote one, Tecton’s competitors wrote a couple, and the open-source flagship (Feast) lost commercial momentum.

By 2025-2026, the rebirth. Three trends compounded:

THREE WORKLOADS, ONE PRODUCT CATEGORYWORKLOAD 1Classical MLfeaturespoint-in-time joins,offline training set,online low-latencylookup at scoringfraud / churn / pricingWORKLOAD 2Embeddings +vector retrievaldocument vectors,hybrid search,freshness +metadata filterRAG / searchWORKLOAD 3Agent contextat inferencelive signals,personalisation,sub-100ms freshness,prompt enrichmentsupport / sales agentsAll three workloads need the same underlying primitives. That’s why the category came back.
In 2022 these were three different vendor categories. In 2026 they are three workloads served by one product. The category came back because the underlying problem turned out to be the same.

The agent and RAG markets needed exactly the infrastructure that feature stores had been quietly building for half a decade: a way to ingest data from many sources, define transformations once and serve them across offline and online use, manage freshness and lineage, and expose the data with sub-second latency to a model at inference time. The vector database boom of 2023 solved part of the problem (the embedding part); the rest — fresh structured signals, joins, point-in-time correctness, governance — was still the feature store.

Tecton — the AI-agent pivot

Tecton’s product trajectory tells the story most clearly. Tecton was founded by ex-Uber engineers who had built the Michelangelo feature store. The early product was a classical-ML feature platform: define feature views in Python, run pipelines on Spark or streaming, materialise features into an offline store and an online store, serve them at low latency. It was good. It was also expensive, and the differentiation narrowed as cloud-native feature stores from AWS, GCP, and Databricks matured.

The pivot started in 2024. Tecton 1.0 framed the product as enriching LLM prompts with real-time context, with feature pipelines feeding the data layer of agents. The pitch reframed the same primitives: feature views became “AI context retrieval functions,” materialisation became “data freshness for agents,” the online store became “the low-latency serving layer between the agent and the data warehouse.”

The pitch worked. Databricks acquired Tecton in August 2025, explicitly to provide “fast, reliable, real-time data for deploying AI agents.” The acquisition price wasn’t disclosed, but the strategic framing was unambiguous: feature serving had become AI-context serving, and Databricks wanted the technology and the team in one move. Tecton’s public-facing numbers — sub-10ms latency, sub-100ms freshness, 99.99% uptime — became the latency story for Mosaic AI’s agent platform.

In 2026, “Tecton” effectively refers to the feature-and-context layer inside Databricks. The standalone product is being absorbed into the Mosaic AI platform.

Feast — open source survival

Feast is the open-source counterweight. Originally built at Gojek and donated to the Linux Foundation, Feast has been the default open-source feature store for years. The commercial story has been bumpy — Tecton was once the primary sponsor, and the funding model has shifted multiple times — but the project itself has kept shipping. Feast 0.10, released earlier in 2026, added improved streaming sources, better governance, and tighter integrations with Azure and AWS.

The Feast value proposition has clarified over time. It is not a managed platform; it is a spec and a reference implementation for what a feature store should do. Teams that want full control of their data layer — large fintechs, regulated banks, hyperscale tech companies — adopt Feast, run it on their own infrastructure, and treat it as part of their platform. Teams that want a managed offering use it as inspiration but pay for Hopsworks, Tecton, or a cloud-native feature store.

The honest read in 2026: Feast is healthy as an open-source project, modestly used in production at significant scale by teams that wanted the open-source baseline, and unlikely to become a commercial juggernaut. That’s fine; not every open-source project needs to be Snowflake.

Hopsworks — the AI lakehouse repositioning

Hopsworks is the most interesting strategic story of the three. Founded as a feature store and platform out of Stockholm, Hopsworks spent 2024-25 repositioning as an “AI Lakehouse” — combining the feature store with a lake-table layer (Iceberg, Delta, Hudi support), a vector index, and a serving layer with sub-millisecond latency from RonDB, their open-source key-value store.

The pitch is sharp: most ML platforms are bolted onto a lakehouse that was designed for analytics. Hopsworks is a lakehouse designed for AI from the start, with the feature serving layer as a first-class citizen rather than an afterthought. Hopsworks 5.0 released in early 2026 added agent-driven pipeline building, with the promise of going “from idea to production pipelines in minutes.”

The numbers Hopsworks publishes are aggressive. SIGMOD 2024 benchmarks showed 10x lower online latency than SageMaker Feature Store and Vertex Feature Store. Whether you believe their benchmark or the cloud providers’ benchmarks, the engineering investment in latency is real, and it’s the right thing to be optimising for the agent-context use case.

Hopsworks sits in an interesting market position: more capable than Feast, cheaper than Tecton (now Databricks), and available both as self-hosted open-source-friendly software and as a managed cloud service. That puts it in the path of teams who want a real feature platform but don’t want to be Databricks-deep. It’s also the last credible independent feature-store-shaped vendor in the market.

What a 2026 feature store actually does

Cutting through the marketing, the working definition of a modern feature store in 2026 is:

  • Offline feature pipelines that compute aggregates, joins, and transformations from your warehouse or lakehouse, materialise them into a versioned offline store (typically Iceberg, Delta, or Hudi tables), and produce point-in-time-correct training datasets. This is the classical feature store role.

  • Online feature serving with sub-100ms (often sub-10ms) latency for low-latency scoring. Fraud detection, real-time personalisation, pricing decisions. The same feature definition is used for offline training and online serving, eliminating training-serving skew.

  • Embedding storage and vector retrieval as a first-class data type. Document and entity embeddings live in the feature store, are refreshed on a schedule or via streaming, and can be queried alongside structured features. Hybrid search (vector + metadata filter + structured features) is the typical access pattern.

  • Agent context retrieval that exposes the same feature views to LLM applications as inputs to prompts or as inputs to tool calls. The user’s recent activity, the current account state, the outstanding invoice — all served with the same freshness guarantees and the same governance as classical-ML features.

  • Lineage and governance that track which features came from which sources, who has access, how fresh the data is, and which model and agent versions consumed which features. This is the governance layer that compliance and audit teams care about, and it’s the layer that distinguishes a feature store from “we put some data in Redis.”

The teams that adopt a feature store in 2026 are doing it for the combination of these capabilities. Adopting a feature store for any one of them in isolation is usually overengineering.

A real example with numbers

A useful illustrative case from a US-based fintech (anonymised because the details are sensitive): the team runs a fraud detection model scoring around 5,000 transactions per second at peak. Features include historical aggregates (90-day spending, average ticket size by merchant category), real-time signals (current session velocity, device-fingerprint patterns), and embedding similarity (merchant embedding distance to the user’s typical merchants).

Before the feature store: separate pipelines for offline training (a Spark job that joined warehouse tables), online serving (a Java microservice that hit DynamoDB and Redis), and embedding lookup (a standalone vector database call). Three different definitions of “recent merchant” maintained in three places. A measurable training-serving skew that caused fraud detection precision to drop ~3 percentage points in production compared to offline evaluation.

After moving to a feature store (in their case, Hopsworks; the same shape works on Tecton-inside-Databricks): one definition per feature, materialised offline for training and served online from a key-value store at the same definition. The vector lookup integrated as another feature view. Training-serving skew effectively eliminated. The team’s own published-internal numbers showed precision recovery of around 2.5 points, fraud loss reduction in the seven figures per year, and a 30% reduction in engineering time spent maintaining feature pipelines.

These numbers are not unusual. Every team that does the migration tells some version of the same story: the headline gain is in modeling quality (skew elimination); the operational gain is in engineering time (one definition instead of three).

The cloud-native question

The honest question worth flagging: do you need a third-party feature store at all, or is the platform-native one enough?

SageMaker Feature Store, Vertex Feature Store, and Databricks’ built-in feature engineering capabilities are real, mature products. For teams that don’t need extreme low-latency serving, don’t have complex streaming requirements, and are already deep on their cloud provider, the platform-native feature store is increasingly the right answer.

The places the platform-native feature stores fall short, where Hopsworks (or Tecton-inside-Databricks) earn their keep:

  • Sub-10ms latency for real-time fraud or agent use cases.
  • Multi-cloud or hybrid deployments that need feature consistency across environments.
  • Complex streaming features with low-latency materialisation from Kafka or Kinesis sources.
  • Embedding plus structured-feature hybrid serving in a single primitive.

For everything else — the median enterprise tabular ML use case — the platform-native feature store is competitive enough that paying extra is hard to justify.

Anti-patterns from the trough

A few patterns the 2023 trough taught the industry, worth not repeating:

Building your own feature store from scratch. A small minority of companies (very large tech firms, mostly) genuinely need this. For everyone else, the build cost dwarfs the savings. The 2021-2022 era of “we’ll build our own internal feature store” left a trail of half-finished projects and pipeline teams that spent two years reinventing primitives that already exist in Feast.

Adopting a feature store before you have features that need it. The 2021 hype-cycle version of this purchase pattern was “we’ll be a data-driven company so we need a feature store.” The right purchase pattern is “we have three models in production with measurable training-serving skew and we need to eliminate it.” Adopt when there’s a concrete problem.

Treating the vector database as a separate product category from the feature store. The 2023 split between “feature stores for ML” and “vector databases for RAG” was a vocabulary accident. The 2026 products serve both shapes. Picking your vector store and your feature store as separate procurements doubles your data-plane lock-in for no gain.

What to take away

  • The feature store category came back as the data-for-AI layer. Same primitives, broader use case, agent and embedding workloads layered on top of classical ML.
  • Tecton went to Databricks. Feast is the OSS baseline. Hopsworks is the independent. Those are the three names worth knowing.
  • In 2026, “feature store” means offline + online + embeddings + agent context, with lineage and governance underneath. Any product that only does one of those is a partial solution.
  • The cloud-native feature stores are good enough for the median case. Reach for Hopsworks or Tecton-inside-Databricks when latency, multi-cloud, or hybrid retrieval shapes are the actual requirement.
  • The hype-cycle vocabulary lied about the category’s death. What died was the standalone-feature-store pitch. The underlying problem — get the right contextual data to the model at the right time — turned out to be more important in 2026 than it was in 2021.

Five years after the peak, the feature store is more central to ML and AI infrastructure than ever. The category just doesn’t call itself that anymore. If you’re starting an agent or RAG product in 2026, the question to ask is not “do I need a vector database?” The question is “what’s my feature serving layer, and is it going to handle structured features, embeddings, and agent context with the freshness and governance I’ll need next year?” The answer to that question is, in shape if not in name, a feature store.


Further reading: Databricks’ Tecton acquisition post is the clearest statement of the “feature store as AI-context layer” thesis. Hopsworks’ AI Lakehouse manifesto is the most coherent independent vision. The original Uber Michelangelo post from 2017 (revised version) is still the best read on why the category exists at all.

Skip to content