What types of memory do agents use, and what is context engineering and compaction?

Agents use short-term memory (the working context window) and long-term memory stored in vector databases or files, often split into episodic, semantic, and procedural memory. Context engineering is the discipline of curating what goes into the limited context window, and compaction summarizes or prunes older history so the agent retains key information without overflowing the window or degrading from too much noise.

What are the major security risks of deploying autonomous agents?

Key risks include prompt injection, especially indirect injection via tool or retrieval outputs, hijacking the agent, excessive tool permissions enabling damaging actions, data exfiltration, confused-deputy privilege escalation, and unbounded loops driving cost or harm. Mitigations include least-privilege tools, sandboxing, input and output guardrails, human-in-the-loop approval for sensitive actions, and audit logging.

What prompt engineering techniques should every LLM practitioner know?

The core toolkit is: system prompts (role and constraints), few-shot examples (format and tone anchoring), chain-of-thought (step-by-step reasoning), and output constraints (JSON schema, stop sequences). Combining these predictably closes the gap between a capable base model and a production-ready feature.

How would you prevent an AI agent from leaking or misusing API credentials?

Keep raw credentials outside model context and traces. Let the model propose typed intent, authorize the final action and arguments deterministically, then have a trusted executor inject a short-lived, narrowly scoped, audience-restricted credential for one call. Re-authorize downstream and gate high-impact writes with explicit approval.

Context engineering — Agentic AI

Prompt engineering is about the words you send. Context engineering is about what’s in the window at all — and for agents, it’s the more important skill. A long-running agent accumulates messages, tool results, and observations every turn. Left unmanaged, that pile fills the context window, quality rots, cost climbs, and eventually the whole run breaks. Curating that window is what separates a demo agent from one that survives a hundred turns.

The problem: the window fills, and rots

Every turn adds tokens — the model’s message plus (often large) tool outputs. Two things go wrong as it grows:

Context rot — even well below the limit, models use a big context worse than a focused one. Important facts get lost in the middle; accuracy degrades.
Overflow — eventually you hit the limit and the run errors or silently drops history.

Plotted over a long run, the difference is stark — unmanaged growth marches into the rot zone and overflows; compaction keeps the window in a lean band:

The three moves

Context engineering is mostly three techniques:

Compaction — summarize old turns into a compact note and drop the raw history. The agent keeps the gist of what happened without carrying every token. Anthropic reports compaction cutting token use by roughly 84% over a long (~100-turn) run while keeping the thread coherent.
Isolation — give sub-tasks their own context window. A supervisor spins up a worker that does heavy work in isolation and returns only a short summary — so the main agent’s window never sees the noise. (This is the real justification for multi-agent.)
Just-in-time retrieval — don’t pre-load everything “just in case.” Keep the window lean and retrieve the specific document, memory, or tool result only when it’s needed — RAG applied to the agent’s own working memory.

In one breath

Prompt engineering is the words you send; context engineering is what’s in the window at all — the more important skill for agents.
A long run fills the window, and even below the limit a crowded context rots (lost-in-the-middle) before it eventually overflows.
Three moves keep it lean: compaction (summarize old turns, drop raw history — ~84% fewer tokens over ~100 turns), isolation (give sub-tasks their own window, return a short summary — the real case for multi-agent), and just-in-time retrieval (fetch only what’s needed now).
A bonus move: let the agent write code that calls tools out-of-context and returns only the answer (~98% token cut on tool-heavy work).
Treat the window as scarce real estate: goal + facts-needed-now + a compact summary in; raw dumps, full docs, and finished transcripts out.

Quick check

0/3

Q1What is 'context rot'?

Q2What does compaction do for a long-running agent?

Q3How does 'isolation' relate to multi-agent systems?

Keeping context lean pairs with seeing what the agent actually did — observability & tracing — and bounding spend — cost & latency control.

Context engineering

What you'll learn

Before you start

The problem: the window fills, and rots

The three moves

In one breath

Quick check

Quick check

Next

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further