The vector database shakeout: Pinecone, Weaviate, Qdrant, Chroma, pgvector

In early 2023, if you wanted to ship a RAG system you picked from five vector databases that all looked roughly equivalent on the surface: Pinecone, Weaviate, Qdrant, Milvus, and Chroma. Each one wrote a Substack about why the others were wrong. Each one raised a Series A. Each one had a chart that showed they were 3x faster than the others on some benchmark you’d never heard of.

Three years later the picture is unrecognisable. Pinecone re-architected itself around serverless and abandoned the pod model that put it on the map. Weaviate quietly pivoted to multi-tenant SaaS workloads. Qdrant ate the open-source-and-cheap segment. Chroma became the default for laptop development and stayed there. And the loudest fact of all is that the default vector store for a new RAG project in 2026 isn’t on any of those lists — it’s pgvector, the Postgres extension that started as a hobby project and now sits inside Supabase, RDS, Cloud SQL, Tiger Data (née Timescale), and every other managed Postgres.

This post is the field report. What each system is actually good at, what the numbers say, and the production rule that’s emerged for picking between them.

The new default: pgvector + HNSW

The single most consequential thing that happened to the vector database market is that in late 2023, pgvector shipped HNSW indexing. Before that, pgvector only had IVFFlat, which was too slow at any meaningful scale. After HNSW landed in version 0.5.0, the math changed.

Supabase’s pgvector vs Pinecone benchmark showed pgvector HNSW handling 1185% more queries per second than the Pinecone s1 pod at equivalent recall, while costing $70/month less. Their follow-up comparison against Qdrant at 1M vectors showed pgvector HNSW matching or beating Qdrant on equivalent compute at 99% recall. These are not the numbers from a hostile party — Supabase is a Postgres company, but the methodology is open and the results have replicated independently.

The practical implication: if you already run Postgres (and you do, almost everyone does), adding a CREATE EXTENSION vector and an HNSW index on a 1536-dimension column is roughly a one-evening change. There’s no new service to operate, no new connection string, no separate billing line, no consistency-between-systems problem when a row’s metadata changes.

The 2026 sweet spots. The bars overlap because the boundaries aren’t sharp — but the leftmost option that meets your constraints is usually the right one.

What pgvector still loses on: filtering performance with high-cardinality metadata, true multi-tenant isolation (you fake it with row-level security), and the absolute top end of scale. At 50M vectors the picture flips — Tiger Data’s pgvectorscale extension gets back into the game, but plain pgvector starts degrading. At 100M+ vectors with strict latency SLOs, you want a system whose entire raison d’être is vector search.

The other thing pgvector still doesn’t do well: rapid bulk-rewrites of the index. If your corpus changes frequently — daily re-embeddings, constant document updates — the HNSW graph maintenance becomes a bottleneck. Pinecone, Qdrant, and Weaviate all have purpose-built index update paths that handle this better. The shape of corpus churn matters more than people expect when picking a vector DB.

Pinecone’s serverless reset

Pinecone is the company that defined “vector database” as a product category, and in January 2024 it set fire to its own architecture. The pod model — you pre-provision capacity, you pay 24/7 even when idle — got replaced by a serverless architecture built on object storage with vector clustering. The pitch was a 10-100x cost reduction for bursty workloads.

That pitch has held up. For RAG workloads that go quiet overnight and spike during business hours — which is most of them — Pinecone Serverless prices out at 40-60% less than the equivalent pod deployment. The trade is that cold-start latency is real (the first query against a recently-idle index can spike to 500ms+), and the minimum bill is $50/month on Standard or $500/month on Enterprise.

The reason teams still pick Pinecone in 2026 is operational. It is the only vector database where you can credibly write “Pinecone” in the “infrastructure dependencies” section of your runbook and have it mean zero on-call burden. No nodes to upgrade. No quorum to manage. No HNSW rebuild after a bulk insert. The trade-off is that you’ve outsourced the most performance-sensitive piece of your stack to a vendor whose pricing you don’t control. Whether that’s worth it depends on how much your team’s time is worth versus how predictable your workload is.

At true billion-vector scale, Pinecone Serverless is also the easiest of the four to operate. The independent benchmarks suggest p95 query latency of around 50ms at 10K QPS for 1B 1536-dimensional vectors — that’s a workload most teams don’t reach, but the ones that do overwhelmingly run on Pinecone, not because it’s cheapest but because it’s the one they don’t have to think about during a 3am incident.

Qdrant: the open-source cost winner

Qdrant made two correct bets early on. First, they wrote it in Rust, which means it uses 2-3x less memory than Go-based competitors at the same dataset size and never has GC pauses. Second, they made the cloud offering identical to the open-source version, so you can self-host on a $20/month VPS for a hobby project and migrate to their managed cloud at production scale without changing a line of code.

The cost story is the headline. Qdrant Cloud is typically 40-60% of an equivalent Pinecone bill, and the free tier (1GB RAM, 4GB disk) is enough to run a real demo or a small internal tool indefinitely. Self-hosting in your own Kubernetes is well-trodden — most of the production deployments I’ve seen at engineering-heavy startups are self-hosted Qdrant rather than managed anything.

The technical edge: Qdrant’s filtering performance is best-in-class. Their payload-indexed HNSW does true pre-filtering, which adds only 1-3ms overhead regardless of how selective the filter is. By contrast, naive post-filtering implementations (which some competitors still use under load) can blow up to 10x slower when filters become selective. If your queries look like vector_search(query, where: {tenant_id: ..., status: 'active'}), Qdrant is the system that handles that gracefully without surprises.

What Qdrant gives up: there’s no hybrid search story as polished as Weaviate’s, no managed multi-region replication as smooth as Pinecone’s, and the “you operate it yourself” reality means you need someone on your team who has read the Qdrant operations docs.

The cost story breaks down at extreme scale too. Independent benchmarks at 50M vectors show Qdrant hitting roughly 41 QPS at 99% recall, while the pgvectorscale extension on equivalent hardware hits 471 QPS — an order-of-magnitude difference that surprises Qdrant users when they first hit the scale boundary. The lesson is that “Qdrant wins on cost” is a 1M-10M vector statement, not a billion-vector one.

Weaviate’s hybrid-search moat

Weaviate’s pitch in 2023 was “we do GraphQL + vectors.” That didn’t end up being the differentiator. The differentiator that survived is the combination of two features that nobody else does as well together: hybrid search and first-class multi-tenancy.

Hybrid search — combining BM25 keyword scores with vector similarity into a single ranked list — is non-trivial to get right. The naive implementation runs both, then merges with a reciprocal rank fusion or a weighted sum. Weaviate’s version pre-filters both sides with the same allow-list (so your metadata constraints apply to both BM25 and vector results before fusion), then merges them with a tuneable alpha parameter. The output is a system where exact-keyword matches and semantic-similarity matches coexist without one drowning out the other.

The multi-tenancy story is the bigger deal at enterprise scale. Weaviate treats each tenant as a separate physical index, which means a noisy tenant’s heavy queries can’t slow down a quiet one, and you can delete a tenant’s data without rebuilding anything. For SaaS companies serving thousands of customers — each of whom needs isolated vector search over their own documents — this is the architecture that makes auditors and account managers happy at the same time.

What Weaviate gives up: it’s the most operationally complex of the four to self-host. The schema-first model that lets it do hybrid search well also makes it the most opinionated about how your data is shaped. And the managed cloud, while solid, is priced closer to Pinecone than to Qdrant.

The interesting development in 2025 was Weaviate’s growing focus on agentic RAG workloads — query-time decisions about which tenant’s data to search, which fields to filter, which retrieval method to prefer. The schema-first model that felt heavy in 2023 turned out to be a structural advantage for these use cases: when an agent needs to reason about what to retrieve before retrieving it, having that metadata structured is the difference between a workable agent and a hallucination machine.

Chroma stays a laptop database

Chroma made a choice that’s looking smart in retrospect: it didn’t try to compete with Pinecone on production scale. Instead, it made the best local-development experience in the category. You pip install chromadb, you import chromadb, you add documents, and Chroma handles embedding generation, persistence, and querying without ever asking you to think about it. The default embedding model (all-MiniLM-L6-v2) downloads on first use. Data persists to disk via DuckDB and Parquet automatically.

This is the right shape for the 90% of vector-DB use cases that are actually “I’m prototyping a RAG demo on my laptop.” Chroma is also the default vector store in many LangChain and LlamaIndex examples, which means a lot of “RAG tutorial” code on GitHub uses Chroma whether it needs to or not.

The trap is that Chroma’s production story is much weaker than its local story. The client-server mode exists, but if you’re going to operate a vector database in production, the answer is rarely Chroma — it’s pgvector, Qdrant, or one of the managed clouds. Chroma’s gracefully accepting this niche; it’s positioning itself as the “prototype here, deploy elsewhere” runtime, which is honest and useful.

The bigger pattern: there’s a real, underserved category of “things that are extremely good at being used locally and don’t need to scale beyond a single machine.” DuckDB owns the analytics version of this niche. SQLite owns the OLTP version. Chroma is staking out the vector version. It’s a smaller market than “production vector database,” but it’s a real one, and Chroma is well-positioned to keep it.

The decision the numbers actually support

After watching three years of vector-DB decisions in production, the shape of the choice is no longer “which is best” — it’s a five-question filter:

A working decision tree. The right answer is the topmost one that fits. Don’t pick a more specialised system until you have a measurable reason — vector search at 1M scale is not the hard part of your RAG system.

What changed, and what didn’t

The 2023 narrative was “vector databases are the next major database category” — like document stores in 2009 or graph databases in 2015. That framing is now mostly wrong. Vector search is a feature that landed in essentially every database category: Postgres (pgvector, pgvectorscale), Elasticsearch, MongoDB Atlas, Redis, ClickHouse, DuckDB, SingleStore, Snowflake, BigQuery. The “do you need a vector database” question has been replaced with “does your existing database do vectors well enough.”

For most teams, the answer is yes. That doesn’t mean Pinecone, Weaviate, and Qdrant aren’t viable businesses — they are, and the technical work they’re doing on scale, multi-tenancy, and hybrid search is genuinely ahead of what pgvector offers. But they’re now competing for the narrower top-of-funnel that needs those features, not the broad “every RAG system needs a vector DB” market the original pitch assumed.

The shape of the take-away has barely changed in three years:

Start with pgvector. It is good enough for an enormous fraction of real workloads, and the cost of being wrong is a one-month migration, not a years-long architectural mistake.
Escalate to a specialised vector DB when measurement says you must — not because a benchmark blog post said so, but because your p99 latency, your filtering load, or your multi-tenant requirement broke the simple thing.
Pinecone, Qdrant, and Weaviate are not the same product. They’ve sorted themselves into three distinct shapes — managed serverless, cost-optimised open-source, and multi-tenant hybrid-search — and the right one for you depends on which constraint you’ve hit first.

The shakeout isn’t that one vector DB won. It’s that the category got smaller — and the survivors are good at very different things.

Further reading: Supabase’s pgvector vs Pinecone benchmark, Pinecone’s serverless architecture announcement, and the Qdrant cloud documentation. For the hybrid-search context, Weaviate’s hybrid search explained is the best primer.