How Cursor's Composer actually works
Multi-file edits feel atomic in Cursor not because the model got smarter, but because the team built a stack of careful workarounds — speculative diffs, a separate Apply Model, and an indexer that stays one step ahead of you. Here's the engineering.
The first time you watch Cursor Composer rename a function used across nine files, it doesn’t look like AI. It looks like a refactor command in a 1998 IDE — except the IDE knew nothing about your refactor until you typed it in plain English. The edits land in every file at once, the diff is coherent, the tests still compile. You scroll through and the only weird thing is how un-weird it feels.
This is not the obvious outcome. Most agentic coding tools treat multi-file edits as a sequence of independent operations: read file, write file, read next file, write next file. Each step is a fresh LLM call; each LLM call sees a partial view of the change; consistency is supposed to fall out of the model “remembering” what it just did. In practice it doesn’t. Imports get out of sync, type signatures drift, one file calls the new function name while another still uses the old one.
Composer feels different because the team underneath it stopped pretending one model could do the whole job. What ships in Cursor is a small pipeline of specialised parts — a planner, an Apply Model, a speculative decoding trick borrowed from inference research, and an indexer that quietly runs while you’re typing. None of those parts on its own is a breakthrough. The composition is.
This post walks through what each piece does, why it exists, and what agentic IDEs that haven’t built this pipeline give up.
The shape of the problem
A multi-file edit is not “do N single-file edits in a row.” It is one intent fanned out across files that have to agree with each other when the dust settles. The minimum failure modes you have to design against:
- Stale view. By the time the model writes file 4, files 1–3 have changed and the model is now reasoning against an out-of-date snapshot.
- Conflicting edits. Two files describe the same interface and the model rewrites it differently in each.
- Latency cost of fidelity. The obvious fix — give the model the whole repo on every call — is too expensive to run on every keystroke and too slow to feel interactive.
Composer’s design is a direct answer to those three failure modes. The trick is splitting “what change to make” from “how to apply it character-by-character” and then making the application step nearly free.
The indexer that runs while you type
Open a fresh repo in Cursor and there’s a quiet hum of activity in the status bar. The indexer is walking your files, chunking them, and shipping embeddings to a vector store keyed to your workspace. None of this blocks you — the moment you hit Composer with a prompt, the relevant chunks are already retrievable in milliseconds.
Two design choices matter:
- The chunking is structure-aware. Functions, classes, and JSX
components are chunked as whole units, not arbitrary 512-token blocks.
This is the difference between retrieving “the
validateUserfunction” and retrieving “lines 412 through 467 ofauth.ts, which happens to contain part ofvalidateUserand the start ofvalidateAdmin.” Anyone who has shipped RAG on code knows the second version is unusable. - Re-indexing is incremental and pre-emptive. Every file save schedules a re-embed of the affected chunks. By the time Composer reaches for context, the index reflects state from the keystroke before this one — not the state from five minutes ago.
When Composer fans out across files, the planner gets a focused window:
the file being edited, plus retrieved chunks for any symbols it references,
plus a small budget for project-level conventions (the README, the
tsconfig, the package manifest). It is not “show the model the whole
repo.” It is “show the model the slice of the repo this edit depends on.”
The planner: terse hints, not full diffs
The most counter-intuitive piece of Cursor’s stack is what the big
reasoning model actually outputs. It does not output the final code.
It outputs something closer to instructions for a junior who has the
file open: “after the imports, add import { z } from 'zod'. In
the UserSchema declaration, change email: string() to
email: string().email(). Add a phone field after name.”
This is the Composer team’s hard-won insight, written up in their Instant Apply post: big reasoning models are excellent at deciding what should change and mediocre at outputting long verbatim file content without drift. If you ask Claude or GPT to emit a 600-line file with three changes, you will get a 600-line file with three intended changes plus seventeen unintended ones — a reformatted comment here, a “fixed” type annotation there, a silent reordering of imports.
So Composer asks the planner only for the intent of the change. The intent is short, easy to verify, and — crucially — small enough that the planner can output intents for several files in parallel in a single call. That parallelism is half of why Composer feels atomic.
The Apply Model: a small model that knows one trick
The other half is the Apply Model: a smaller, fine-tuned model whose entire job is to take (original file, planner’s edit hints) and produce the new file. It does not reason about design. It does not choose names. It transforms.
This is a much narrower task than “be a coding agent,” and a small model
fine-tuned on it can match the quality of a frontier model on this slice
of work while running maybe an order of magnitude faster. Anysphere has
written that the Apply Model is fine-tuned from open-weights bases and
trained on a custom dataset of (file, hint, result) triples mined from
their own usage.
But raw speed isn’t quite enough. A 600-line file is still 600 lines of generation, and across nine files that’s a lot of tokens. Which brings us to the speculation trick.
Speculative decoding, but against the original file
Standard speculative decoding speeds up inference by having a tiny “draft” model propose several tokens at once, then verifying them in a single batched forward pass of the big model. Cursor’s twist is that for the Apply Model, the draft is the original file itself.
The intuition is brutal and obvious once you see it: in almost any real-world edit, 90%+ of the output file is identical to the input. The Apply Model doesn’t need to “decide” each of those tokens — it just needs to confirm them. So Composer feeds the original file in as the speculative draft, and the Apply Model’s forward pass either accepts a run of original tokens (cheap) or rejects them and emits its own (the actual edit).
The effect on latency is dramatic. Instead of one token per forward pass, unchanged regions cruise at dozens of tokens per pass. A multi-file edit that would have taken thirty seconds of streaming text now lands in a couple of seconds, all files at once. That is the technical reason Composer feels “atomic” — the application step is fast enough that the user perceives a single edit, not a sequence.
original.ts (draft, 600 tokens)
▼
┌────────────────────────────────┐
│ Apply Model forward pass │
│ • accepts tokens 1..348 │ ← unchanged spans, ~free
│ • rejects tokens 349..361 │ ← actual edit
│ • emits 14 replacement tokens │
│ • accepts tokens 362..600 │ ← unchanged spans, ~free
└────────────────────────────────┘
▼
new file in ~1 pass of meaningful work
Conflict resolution: the orchestrator stays in charge
The pipeline above describes one file. For a multi-file edit, Composer
runs many (planner hint → Apply Model) pairs in parallel and then
re-enters the orchestrator with all the resulting diffs in hand. This
is where conflicts get resolved.
Conflicts at this stage are rare because the planner’s hints were
generated in a single call with all files in view — the model already
knew it was renaming User to Account everywhere when it wrote the
hint for auth.ts, and it wrote the hint for db.ts accordingly. The
orchestrator’s job is more about consistency checks than reconciliation:
did every file that imports User get an updated import? Did the
function signature change in types.ts get reflected at every call site?
When a check fails, the orchestrator runs a follow-up loop — a new planner call with the partial diff in context, asking only the narrow question “fix the inconsistency between files A and B.” This is pattern 5 (orchestrator + workers) applied with extreme discipline: the loop only fires when a deterministic check says it must.
Why agentic IDEs that re-prompt per file lose
Compare this with a more vanilla agentic loop: read file → prompt model to edit → write file → repeat. Each step is a full prompt, each step sees a partial view of the change, and there is no shared plan tying the edits together. The model has to rediscover the intent on every file.
What that looks like in practice:
- Latency multiplies. Nine files × full prompt × full response = thirty-plus seconds per multi-file change, all serial.
- Drift compounds. Small inconsistencies in the model’s output — a slightly different variable name, a slightly reformatted argument list — accumulate across files until something stops compiling.
- Recovery is hard. When file 7 fails to compile, the loop doesn’t know which earlier file’s edit caused it. The “fix” is usually a cascade of follow-up prompts that often makes things worse.
Composer dodges all three by making the planner write the plan once and the Apply Model execute it fast. The expensive thing happens once; the cheap thing happens in parallel.
What this design buys, and what it costs
The Composer architecture trades simplicity for control. You ship more moving parts, but each part has one job and one performance budget. The planner is allowed to be slow because it runs once. The Apply Model is allowed to be smaller because its task is narrow. The indexer is allowed to be eager because it never blocks the UI. The orchestrator can be deterministic code because the LLMs have already done the reasoning.
The cost is the team you need to build it. This is not a thin wrapper around an API — it’s an inference stack with a fine-tuned model, a custom decoding loop, a custom indexer, and an orchestrator that has been debugged against millions of real edits. Anysphere has been doing this for two years with dozens of engineers on the inference side alone.
What to take away
- Multi-file coherence is an architecture problem, not a model problem. The same frontier models, called the same way, would not produce Composer’s behaviour. The pipeline does.
- Split “decide” from “execute.” Big models decide what to change; small fine-tuned models execute the change. The economics of inference become friendlier; the latency budget becomes tractable.
- Speculation works on anything that’s mostly unchanged. If your task output is a near-copy of your task input (file edits, document revisions, JSON patches), speculative decoding against the original buys you an order of magnitude.
- The indexer is half the magic. A coding agent without an always-on structure-aware index is reasoning against a stale or incomplete world. No amount of prompt engineering fixes that.
The pattern generalises. Anywhere you’re tempted to make one big model do “everything in one prompt,” look for a place to split the work into deciding and executing, then make the executing step boring and fast. That’s the lesson of Composer, and it’s the lesson the next generation of agent tooling is being built on.
Further reading: Anysphere’s Instant Apply post is the canonical writeup of the Apply Model and the speculation trick. The codebase indexing docs describe the indexer at a useful level of detail. For the broader pattern, the Anthropic agents paper covers the orchestrator-workers loop that sits at the top.