Sub-agents, handoffs, supervisors — pick exactly one

If you’ve shipped a multi-agent system in the last 18 months, you’ve hit the moment where it stops working and you can’t figure out why. The traces look right. Each individual agent looks competent. But the system as a whole produces unpredictable results, and the eval scores are sliding sideways.

The usual diagnosis is “the model isn’t good enough.” It almost never is. The usual actual problem is that your team picked a multi-agent topology halfway through the design and then mixed in techniques from the other topologies because they looked easier in the docs.

This post is the framework for picking — and sticking with — exactly one of the three topologies that actually ship in 2026.

The three topologies

The vocabulary is a mess. “Multi-agent,” “swarm,” “crew,” “team,” “orchestrator,” “supervisor,” “sub-agent,” “handoff” — all of these get used interchangeably in blog posts. The honest taxonomy reduces to three structurally distinct patterns:

The three topologies. They look similar but the control-flow is structurally different — and you can only debug one at a time cleanly.

Sub-agents

A parent agent spawns one or more child agents with scoped tasks. Each child runs in its own context window, executes the work, and returns one structured result to the parent. The parent integrates the results. The child’s intermediate tool calls and reasoning stay inside the child’s context — they don’t bleed back.

The canonical implementation is Claude Code’s Agent SDK. The parent invokes the Agent tool with a prompt, the sub-agent runs fresh, only the final message returns to the parent. Crucially, sub-agents can run in parallel — when the parent spawns N independent ones, they execute concurrently, and the wall-clock time is the slowest, not the sum.

Handoffs

One agent decides another agent should take over. Control transfers linearly. The receiving agent picks up the conversation state and continues. There is no return path — the original agent doesn’t get control back unless a future handoff explicitly transfers to it.

The pattern was crystallized in OpenAI’s Swarm framework, which has since been formalized in the OpenAI Agents SDK. A handoff is a tool call that returns a Command(goto=...) object telling the framework to switch agents. The conversation history travels with the handoff.

Supervisor

A central routing agent stays in control. Worker agents (or worker tools) get invoked turn by turn based on the supervisor’s decisions. Each worker returns, the supervisor decides the next action, the loop continues. The supervisor is always the one with the global view.

This is LangGraph’s supervisor pattern, which has effectively become the default for serious multi-agent production deployments in 2026.

Why they are not interchangeable

The mistake teams make is treating these as equivalent abstractions with different ergonomics. They aren’t. They differ on three load-bearing dimensions:

Dimension	Sub-agents	Handoffs	Supervisor
Control flow	Parent dispatches, children return	Linear, no return	Central router every turn
Parallelism	Native	None	Possible via fan-out
Debuggability	High (parent sees only results)	Low (state travels with handoff)	High (every decision in one place)
Context isolation	Strong (fresh per sub)	Weak (history follows)	Moderate (workers can be scoped)
Eval surface	Per sub-agent + parent integration	Per agent + handoff correctness	Supervisor decision + worker quality

The two that get conflated most often are sub-agents and supervisor. They look similar — a top-level agent dispatches work. The structural difference is who keeps state:

Sub-agents: the sub does the work in its own context, returns one structured result, vanishes. The parent never sees the sub’s intermediate steps.
Supervisor: the supervisor keeps a running thread of every worker’s output, decides what to invoke next, accumulates context across multiple worker calls.

If you want parallelism with isolation, you want sub-agents. If you want continuous central control with a global view, you want supervisor. Pick one. Don’t switch.

The failure modes when teams mix them

The single most common multi-agent failure I see in production code review is teams that pick handoffs-style topology but treat the agents like sub-agents — expecting state to flow back. The result is that information gets lost between the linear handoff steps, but the team’s mental model assumes it didn’t, so debugging takes weeks.

A worked example: a customer-support workflow.

As sub-agents (Claude Code style):

parent: "handle this support ticket"
  → spawn sub: "classify intent" → returns "billing-refund"
  → spawn sub: "lookup order #12345" → returns order JSON
  → spawn sub: "draft refund response" → returns email text
  → spawn sub: "issue refund via Stripe" → returns refund_id
parent: assembles final response

Each sub runs fresh. The order-lookup sub doesn’t see the intent classification reasoning. The refund-draft sub gets only the order JSON it needs. Parallelism is free where steps are independent. The parent has the only continuous view of the workflow.

As handoffs (OpenAI Swarm style):

intake-agent: "I'll classify this"
  → hands off to → billing-agent with full chat history
billing-agent: "I'll look up the order"
  → hands off to → refund-agent with full chat history + order
refund-agent: "I'll write the response and issue the refund"
  → returns final result to user

Each agent inherits the full conversation. There is no parent. The last agent in the chain is the one talking to the user. If something went wrong mid-chain, the user is now talking to an agent that may not have the context to recover.

As supervisor (LangGraph style):

supervisor: routes to classifier-worker → gets intent
supervisor: routes to order-lookup-worker → gets order
supervisor: routes to drafter-worker → gets draft
supervisor: routes to stripe-worker → gets refund_id
supervisor: assembles + sends to user

The supervisor sees every worker’s output and decides each next step. Workers can be very small (one-tool tools) because the supervisor has the global view.

All three of these solve the customer-support workflow. The differences show up under stress:

Latency: sub-agents parallelize naturally (lookup and draft can overlap). Supervisors can fan out but require the supervisor to explicitly orchestrate the parallelism. Handoffs are inherently serial.

Debuggability: when the workflow fails, sub-agents have the cleanest trace — the parent’s log shows N sub-agent calls with inputs and outputs. Supervisor traces are slightly noisier but still readable. Handoffs are the worst — the failure can be in any agent in the chain and the full context has to be reconstructed by walking the handoffs.

Evals: sub-agents and supervisor both have clean eval boundaries — evaluate each sub-agent independently, plus an integration eval. Handoff systems require evaluating the handoff decision itself, which is hard because the “right” handoff is path-dependent on everything before it.

The 2026 production default: supervisor

Most serious multi-agent systems in production today are supervisor topologies. The drivers:

LangGraph’s defaults moved here. As the LangGraph supervisor docs note, supervisor is the recommended starting topology. The framework makes it cheap. Most teams don’t reach for anything else.

OpenAI deprecated Swarm in favor of the Agents SDK. The Agents SDK supports handoffs but is increasingly used in a supervisor pattern — one “triage” agent that dispatches to specialists is structurally a supervisor. The pure peer-handoff vision didn’t survive contact with production debugging needs.

Anthropic’s Claude Code uses sub-agents at one level only. The canonical pattern is parent + sub-agents, with sub-agents not spawning their own sub-agents. This is functionally close to supervisor with parallel workers. The Agent SDK documentation is explicit about this.

Across the three major frameworks, the convergence is striking: there’s one agent that holds the workflow. Whether it’s called a parent, a supervisor, a triage agent, or a planner-executor split — the topology is structurally the same. One agent has the global view. Other agents do scoped work and return.

When sub-agents and handoffs still make sense

Two narrower use cases where the non-supervisor topologies win:

Sub-agents for parallel research

When the task genuinely is “do N independent things and merge them” — research a topic from 5 angles, review code with 3 different lenses, draft 3 alternative responses — sub-agents are correct. The parent fans out, the children parallelize, the parent integrates. Anthropic’s Building Effective Agents calls this the “parallelization” pattern; it works because the work is genuinely independent.

Handoffs for clearly hierarchical workflows

The case for handoffs is narrower than the Swarm framing suggests. Where they win: when a workflow has clear hierarchical phases where the right agent for the next phase is structurally different from the right agent for the current phase, and there’s no need to come back. A common example is “intake → diagnosis → execution,” where each phase needs a different specialist and the work is genuinely linear. Production teams I’ve seen successfully ship handoff systems typically have 2–4 specialist agents, and the handoffs are baked into business logic rather than left to the model to decide.

The Augment Code analysis gets the framing right: swarms are faster (fewer LLM calls, no intermediary) but harder to debug. The trade is real, and for most production workflows the debuggability wins.

The rule: one topology per workflow

The cleanest production systems pick one topology per coherent workflow and stick with it. Mixing topologies inside a workflow is where the failure modes compound:

Sub-agents that hand off to peer agents lose their return path.
Supervisors that allow workers to hand off to other workers lose their global view.
Handoff chains that try to “return to the original agent” require state mechanisms the framework wasn’t designed for.

If you have a workflow that genuinely needs more than one topology, the production move is layering — one topology per layer, with clean boundaries:

An outer supervisor that dispatches to inner sub-agents (each of which is itself a single-topology workflow).
A sub-agent that, internally, runs a small handoff chain — but its parent sees one structured result.

Layering is fine. Mixing — switching topologies mid-workflow at the same level of abstraction — is the failure mode.

What to take away

Three topologies, structurally distinct. Sub-agents (parent-dispatch, child-return), handoffs (linear, no return), supervisor (central router).
Sub-agents parallelize. Supervisor centralizes. Handoffs are serial. Pick the one that matches your actual control-flow needs.
The 2026 production default is supervisor. LangGraph, OpenAI Agents SDK in triage mode, Claude Code’s parent+sub-agents — all converging on one agent with the global view.
Don’t mix topologies inside a workflow. Layer them — one topology per level of abstraction, clean boundaries between layers.
Handoffs are still the right choice for hierarchical linear workflows where each phase needs a different specialist and there’s no return path. Just don’t pretend they generalize.

The reason this matters is that multi-agent systems are graphs, and the topology of the graph is the load-bearing design decision. The LLM choice barely affects whether your system is debuggable. The topology entirely determines it. Pick one. Defend it. The teams that treat topology as a real decision — not a framework default they inherit accidentally — are the ones whose multi-agent systems actually ship.

Further reading: LangGraph’s supervisor tutorial is the canonical reference. Claude Code’s sub-agents docs cover the parallel-scoped-work pattern. OpenAI’s Agents SDK is the successor to Swarm. Cognition’s “Don’t Build Multi-Agents” makes the architectural argument for supervisor as the default.