Context engineering
The headline 2026 agent skill: curating what's in the context window. Compaction, isolation, and retrieval keep long-running agents lean, coherent, and affordable — where prompt wording can't.
What you'll learn
- Why a long agent run fills the context window and degrades (context rot)
- The core moves — compaction, isolation, and just-in-time retrieval
- Why curating context beats prompt wording for production agents
Before you start
Prompt engineering is about the words you send. Context engineering is about what’s in the window at all — and for agents, it’s the more important skill. A long-running agent accumulates messages, tool results, and observations every turn. Left unmanaged, that pile fills the context window, quality rots, cost climbs, and eventually the whole run breaks. Curating that window is what separates a demo agent from one that survives a hundred turns.
The problem: the window fills, and rots
Every turn adds tokens — the model’s message plus (often large) tool outputs. Two things go wrong as it grows:
- Context rot — even well below the limit, models use a big context worse than a focused one. Important facts get lost in the middle; accuracy degrades.
- Overflow — eventually you hit the limit and the run errors or silently drops history.
Watch it happen — and watch compaction rescue it:
The three moves
Context engineering is mostly three techniques:
- Compaction — summarize old turns into a compact note and drop the raw history. The agent keeps the gist of what happened without carrying every token. Anthropic reports compaction cutting token use by roughly 84% over a long (~100-turn) run while keeping the thread coherent.
- Isolation — give sub-tasks their own context window. A supervisor spins up a worker that does heavy work in isolation and returns only a short summary — so the main agent’s window never sees the noise. (This is the real justification for multi-agent.)
- Just-in-time retrieval — don’t pre-load everything “just in case.” Keep the window lean and retrieve the specific document, memory, or tool result only when it’s needed — RAG applied to the agent’s own working memory.
Quick check
Quick check
Next
Keeping context lean pairs with seeing what the agent actually did — observability & tracing — and bounding spend — cost & latency control.
Practice this in an interview
All questionsAgents use short-term memory (the working context window) and long-term memory stored in vector databases or files, often split into episodic, semantic, and procedural memory. Context engineering is the discipline of curating what goes into the limited context window, and compaction summarizes or prunes older history so the agent retains key information without overflowing the window or degrading from too much noise.
Key risks include prompt injection, especially indirect injection via tool or retrieval outputs, hijacking the agent, excessive tool permissions enabling damaging actions, data exfiltration, confused-deputy privilege escalation, and unbounded loops driving cost or harm. Mitigations include least-privilege tools, sandboxing, input and output guardrails, human-in-the-loop approval for sensitive actions, and audit logging.
The core toolkit is: system prompts (role and constraints), few-shot examples (format and tone anchoring), chain-of-thought (step-by-step reasoning), and output constraints (JSON schema, stop sequences). Combining these predictably closes the gap between a capable base model and a production-ready feature.
RAG is the default for dynamic, proprietary, or frequently updated knowledge. Fine-tuning is correct when you need to change the model's behavior, format, or domain-specific reasoning style — not just its knowledge. Long-context models are appropriate when your entire knowledge base fits in a single context window and latency is acceptable.