Multi-agent: supervisor & swarm
Supervisor, swarm, and hierarchical orchestration — and the harder question of when NOT to go multi-agent. Most tasks are better served by a single agent with good tools.
What you'll learn
- The orchestration topologies — single, supervisor, swarm, hierarchical
- When multi-agent genuinely helps (isolated context windows)
- When NOT to use multi-agent — and why most swarms get rewritten
Before you start
“Let’s have a planner agent talk to a research agent talk to a writer agent.” It sounds sophisticated, and it’s usually a mistake. Multi-agent systems are real and sometimes necessary — but they are not a default, and reaching for them too early is the single most common architecture error in agent engineering. This lesson covers the topologies and the harder discipline: knowing when one agent is the right answer.
The topologies
- Single agent — one agent, a good prompt, a few tools. Start here. It’s the cheapest, fastest, and most debuggable option, and it handles the large majority of tasks.
- Supervisor (orchestrator–workers) — a central agent decomposes the task and delegates sub-tasks to worker agents, then synthesizes their results. This is roughly 70% of production multi-agent systems, because it maps cleanly onto “split this into independent pieces.”
- Swarm — peer agents hand off control to one another (any agent can pass to any other). Flexible and powerful, but routing is emergent and traces are hard to follow. The frontier, not the default.
- Hierarchical — supervisors of supervisors, for very large task trees. Most power, most overhead and complexity.
When multi-agent actually helps
The honest criterion is context isolation. Multi-agent earns its cost when sub-tasks genuinely need separate context windows:
- Parallel research threads — three workers each explore a different source with their own context, then a supervisor merges short summaries.
- A long codebase — split across workers so no single context window has to hold the whole thing.
- Genuinely independent sub-tasks that can run concurrently for speed.
In each case there’s a concrete reason the work can’t share one context window.
The cost you’re signing up for
Every extra agent is more LLM calls (latency × cost), more places to fail, and a harder debugging and evaluation story — non-deterministic hand-offs are genuinely hard to trace and test. Supervisor designs contain this best (the orchestrator is a single point to log and inspect); swarms are the hardest because control flow is emergent.
Quick check
Quick check
Next
Whatever topology you choose, you have to keep each agent’s context lean and trustworthy — that’s context engineering — and you have to be able to see what happened, which is observability.
Practice this in an interview
All questionsUse multiple agents when a task decomposes into distinct specialties or parallel subtasks that exceed one agent's context or reliability; avoid it when a single agent suffices, since multi-agent systems add coordination overhead, latency, cost, and error propagation. A supervisor architecture has an orchestrator routing work to specialized sub-agents, while a swarm lets peer agents hand off control to one another without a central coordinator.
An agent is an LLM placed in a loop where it reasons, chooses and calls tools or actions, observes the results, and repeats until a goal is met, rather than producing one response and stopping. The key differences are autonomy, tool use, memory and state, and multi-step control flow driven by the model's own decisions.
Key risks include prompt injection, especially indirect injection via tool or retrieval outputs, hijacking the agent, excessive tool permissions enabling damaging actions, data exfiltration, confused-deputy privilege escalation, and unbounded loops driving cost or harm. Mitigations include least-privilege tools, sandboxing, input and output guardrails, human-in-the-loop approval for sensitive actions, and audit logging.
ML workflows are multi-step DAGs with dependencies, and an orchestrator gives you dependency management, retries, backfills, caching, observability, and lineage that chained cron jobs cannot. Airflow is a general-purpose task orchestrator defining DAGs in Python, while Kubeflow Pipelines is ML-native, passing typed artifacts between containerized steps on Kubernetes with conditional logic like deploy only if accuracy exceeds a threshold. Choosing depends on whether you need generic scheduling or ML-specific, container-based pipelines.