datarekha

Context engineering

The headline 2026 agent skill: curating what's in the context window. Compaction, isolation, and retrieval keep long-running agents lean, coherent, and affordable — where prompt wording can't.

8 min read Intermediate Agentic AI Lesson 34 of 42

What you'll learn

  • Why a long agent run fills the context window and degrades (context rot)
  • The core moves — compaction, isolation, and just-in-time retrieval
  • Why curating context beats prompt wording for production agents

Before you start

Prompt engineering is about the words you send. Context engineering is about what’s in the window at all — and for agents, it’s the more important skill. A long-running agent accumulates messages, tool results, and observations every turn. Left unmanaged, that pile fills the context window, quality rots, cost climbs, and eventually the whole run breaks. Curating that window is what separates a demo agent from one that survives a hundred turns.

The problem: the window fills, and rots

Every turn adds tokens — the model’s message plus (often large) tool outputs. Two things go wrong as it grows:

  1. Context rot — even well below the limit, models use a big context worse than a focused one. Important facts get lost in the middle; accuracy degrades.
  2. Overflow — eventually you hit the limit and the run errors or silently drops history.

Watch it happen — and watch compaction rescue it:

The three moves

Context engineering is mostly three techniques:

  • Compaction — summarize old turns into a compact note and drop the raw history. The agent keeps the gist of what happened without carrying every token. Anthropic reports compaction cutting token use by roughly 84% over a long (~100-turn) run while keeping the thread coherent.
  • Isolation — give sub-tasks their own context window. A supervisor spins up a worker that does heavy work in isolation and returns only a short summary — so the main agent’s window never sees the noise. (This is the real justification for multi-agent.)
  • Just-in-time retrieval — don’t pre-load everything “just in case.” Keep the window lean and retrieve the specific document, memory, or tool result only when it’s needed — RAG applied to the agent’s own working memory.

Quick check

Quick check

0/3
Q1What is 'context rot'?
Q2What does compaction do for a long-running agent?
Q3How does 'isolation' relate to multi-agent systems?

Next

Keeping context lean pairs with seeing what the agent actually did — observability & tracing — and bounding spend — cost & latency control.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Practice this in an interview

All questions
What types of memory do agents use, and what is context engineering and compaction?

Agents use short-term memory (the working context window) and long-term memory stored in vector databases or files, often split into episodic, semantic, and procedural memory. Context engineering is the discipline of curating what goes into the limited context window, and compaction summarizes or prunes older history so the agent retains key information without overflowing the window or degrading from too much noise.

What are the major security risks of deploying autonomous agents?

Key risks include prompt injection, especially indirect injection via tool or retrieval outputs, hijacking the agent, excessive tool permissions enabling damaging actions, data exfiltration, confused-deputy privilege escalation, and unbounded loops driving cost or harm. Mitigations include least-privilege tools, sandboxing, input and output guardrails, human-in-the-loop approval for sensitive actions, and audit logging.

What prompt engineering techniques should every LLM practitioner know?

The core toolkit is: system prompts (role and constraints), few-shot examples (format and tone anchoring), chain-of-thought (step-by-step reasoning), and output constraints (JSON schema, stop sequences). Combining these predictably closes the gap between a capable base model and a production-ready feature.

When should you use RAG vs fine-tuning vs a long-context model?

RAG is the default for dynamic, proprietary, or frequently updated knowledge. Fine-tuning is correct when you need to change the model's behavior, format, or domain-specific reasoning style — not just its knowledge. Long-context models are appropriate when your entire knowledge base fits in a single context window and latency is acceptable.

Related lessons

Explore further

Skip to content