GraphRAG: when knowledge graphs beat vector search

When Microsoft Research published From Local to Global: A Graph RAG Approach to Query-Focused Summarization in July 2024, the framing was unusually sharp for a RAG paper. The authors named two query classes: local questions, the kind that live in a small region of one document (“who founded Stripe?”), and global questions, the kind that require reasoning across the entire corpus (“what are the main themes in these podcast transcripts?”). They argued that vector RAG handled the first class fine and the second class catastrophically — and that a knowledge graph plus pre-computed community summaries closed the gap.

Two years on, the verdict is sharper than the hype. GraphRAG does exactly what the paper claims, on the queries the paper targeted. It also costs ten to a hundred times more to index than vector RAG, which is the part the hype mostly skipped. The teams that picked it up have split into three camps: ones who bit the bill because their corpus justified it, ones who switched to lighter variants like Microsoft’s own LazyGraphRAG or the open-source LightRAG, and ones who quietly reverted to plain hybrid search after the indexing invoice arrived.

This post is the field report. What GraphRAG actually does, what the numbers actually say, and where in your stack the cost-benefit math finally pencils out.

The problem vector RAG can’t solve

The canonical failure case is the one the Microsoft paper opened with: ask vector RAG “what are the main themes in this corpus?” and watch it fail. The retriever pulls the top-K chunks closest to the query embedding, the model dutifully summarizes those chunks, and the summary is wildly biased toward whichever themes happen to be phrased most similarly to the question. Themes that recur across the corpus but in different words — exactly what you’d want a synthesis to surface — never make it into the top-K.

The deeper issue is that vector search is, by construction, a local operation. It returns the chunks nearest to a point in embedding space. There is no representation of “the document as a whole” or “the corpus as a whole” that the retriever can consult. For single-fact questions this is fine — the answer lives in a region — but for sensemaking it’s the wrong shape of computation.

Vector RAG ranks chunks by distance to a query point. GraphRAG instead pre-builds an entity graph and clusters it into communities, each with its own summary the system can consult for global questions.

What GraphRAG actually does

The pipeline has four stages, and the cost lives in stages 1 and 2:

Stage 1 — entity extraction. Run an LLM (Microsoft used GPT-4 in the paper; the open-source pipeline defaults to GPT-4-class models) over every chunk in the corpus to extract entities and the relations between them. This is the expensive step. For a corpus of 1M tokens with reasonable chunk sizes, you make a few thousand extraction calls.

Stage 2 — graph construction and community detection. Merge duplicate entities across chunks. Run a hierarchical clustering algorithm (the paper uses Leiden) on the entity graph to identify communities — clusters of densely connected entities that correspond to topical regions of the corpus. Then, again with an LLM, generate a community summary for each cluster at each level of the hierarchy.

Stage 3 — local search. For specific questions, find the entities mentioned in the query, walk their immediate neighborhood in the graph, pull the relevant chunks, and answer normally. This is competitive with vector RAG but no faster.

Stage 4 — global search. For corpus-wide questions, broadcast the query to every community summary at a chosen hierarchy level, collect partial answers, then have an LLM merge them into a coherent response. This is what vector RAG cannot do.

The Microsoft paper From Local to Global reports that on global sensemaking questions over corpora in the 1M-token range, GraphRAG meaningfully outperforms a vector RAG baseline on both comprehensiveness (does the answer address all relevant aspects) and diversity (does it present varied perspectives). The numbers aren’t a single accuracy score because the questions aren’t single-fact; they’re judged by LLM-as-judge head-to-head, and GraphRAG wins something like 70-80% of the matchups on the corpora the paper used.

The bill nobody talks about

Now the cost. Independent benchmarks have made the indexing economics clear, and it’s the part of GraphRAG that most teams underestimate.

The numbers, from the various community benchmarks summarized at the GraphRAG vs LightRAG comparison and the LightRAG paper itself:

Full GraphRAG indexing: roughly 610,000 tokens to index a medium corpus in the benchmarks LightRAG ran. At GPT-4o pricing (~$2.50 per million input + ~$10 per million output, with output dominating for entity extraction) you’re looking at $5-20 for a small-to-medium document set, and tens of thousands of LLM calls.
Per-query global search: also expensive, because the LLM has to fan out to community summaries and merge partial answers. A single global query can burn tens of thousands of tokens.
Vector RAG indexing: the embedding cost. For the same corpus, pennies on dense embeddings, free on BM25.

For a typical mid-sized enterprise corpus of, say, 50M tokens, you should plan for hundreds of dollars in indexing alone — and if the corpus changes weekly, you re-pay that bill on every re-index. This is the single most common reason teams that pilot GraphRAG don’t ship it.

Approximate indexing cost on a 1M-token corpus, normalized against vector RAG. Numbers vary with model choice and chunk size; the shape of the spread is what matters. LazyGraphRAG defers all summarization to query time, which is why its indexing cost matches plain vector RAG.

LazyGraphRAG, LightRAG, and the great unbundling

The expensive parts of GraphRAG are the per-chunk extraction and the per-community summarization. Both happen at indexing time. The alternatives ask: what if we did less of that?

LazyGraphRAG, published by Microsoft Research in late 2024, defers all LLM summarization to query time. Indexing is a lightweight NLP pass — the cost is identical to vector RAG (Microsoft cites “0.1% of full GraphRAG”), and Microsoft’s own benchmarks show comparable quality on global queries to the original GraphRAG, at 700× lower query cost. The honest framing: LazyGraphRAG is the answer to “what if GraphRAG’s economics weren’t insane?”

LightRAG from researchers at the University of Hong Kong takes a different angle. It builds a graph but skips the hierarchical community-summary tree, instead running a dual-level retrieval that combines local entity context with global topic context at query time. Reported costs are roughly 1/100th of full GraphRAG on the same corpora, with quality competitive on standard benchmarks. The pitch is “graph reasoning at vector RAG prices.”

nano-graphrag is a ~1,100-line reimplementation that preserves the core ideas but is explicitly designed to be readable and hackable. Useful as a starting point for teams that want to understand GraphRAG before deciding whether to adopt it.

The pattern across all three is the same: the cost lived in indexing, and indexing was over-eager. Defer work to query time, skip the deepest summarization layers, and you keep most of the quality at a fraction of the bill.

When the cost is actually worth it

The 2026 production verdict, based on which teams have actually shipped GraphRAG to users:

Small, fixed corpora where global questions are the primary workload. Regulatory filings (a 1,000-page 10-K corpus where analysts ask “summarize the litigation risk across all filings”), clinical trial archives, internal policy libraries. These are expensive to index, but they don’t change often, and the questions are exactly the kind GraphRAG was built for.

Audit and discovery workflows. Legal e-discovery, compliance audits, due diligence. The “show me everything in this corpus related to X” question is hard for vector RAG and natural for GraphRAG. Neo4j has aggressively positioned its GraphRAG offering into this segment, and it’s the segment where the per-query cost is already absorbed by professional services pricing.

Research and sensemaking tools. Microsoft Discovery, NotebookLM-style research products, intelligence analysis. Here the user is paying for a session, the corpus is bounded, and “what are the themes” is the literal product.

When it’s not worth it. Most chatbots. Most customer support RAG. Most internal knowledge bases over millions of pages. Most documentation Q&A. These workloads are dominated by local questions, and vector RAG with a reranker is faster, cheaper, and equally accurate. The LlamaIndex KG extraction docs do a reasonable job of being explicit about this: build a property graph when your domain genuinely has structure to exploit, not as a default.

A working decision rule

A flowchart, distilled from watching GraphRAG projects ship or shelve over the past 18 months:

A decision tree distilled from production deployments. The teams that go straight to full GraphRAG without checking the upper branches are also the teams whose CFOs surface their Azure invoices in steering meetings.

What to take away

GraphRAG is the rare RAG technique where the marketing landed almost perfectly on the use case and almost not at all on the cost. The three lines worth internalizing:

It does solve a real problem that vector RAG cannot. Global, cross-corpus, sensemaking queries are genuinely outside what similarity search can do. If those are your queries, GraphRAG (or one of its lighter cousins) is the right tool.
The indexing economics are the deal-breaker for most teams. Pricing your pilot on a 10K-page sample is fine; pricing your production on a 10M-page corpus that changes weekly is what causes the rollout to stall. Use LazyGraphRAG, LightRAG, or nano-graphrag unless you have a specific reason not to.
Vector RAG with a reranker still beats GraphRAG on the queries most products are actually asked. Don’t reach for graphs because they’re impressive. Reach for them when the query class demands it and the corpus shape makes the bill defensible.

The original Microsoft paper rewards a careful re-read; so does the LazyGraphRAG follow-up. The 2026 take is that GraphRAG, like a lot of post-2023 RAG techniques, found its real footprint not as a universal upgrade but as a precise tool for a narrow query class — the one vector search was never going to handle.

Further reading: Microsoft’s original GraphRAG paper, the GraphRAG GitHub project, Microsoft’s LazyGraphRAG announcement, the LightRAG paper, and the nano-graphrag implementation.