Code execution with MCP
Instead of loading every tool and routing every result through the context window, let the agent write code that calls tools out-of-context. The pattern that cut tokens by up to 98.7%.
What you'll learn
- Why loading all tool definitions and intermediate results bloats the context
- How code execution calls tools and processes results out-of-context
- The token, cost, and latency win — and the sandboxing requirement
Before you start
As agents get more MCP tools, a quiet scaling problem appears. The traditional approach loads every tool definition into the context up front, and routes every intermediate result back through the model. Connect a dozen MCP servers with a few hundred tools and fetch a large dataset, and you’re spending a fortune in tokens before the agent has done anything useful. Anthropic’s code execution with MCP flips this — and the savings are dramatic.
The problem: everything flows through the model
Two costs balloon as tool use grows:
- Tool definitions — each tool’s schema and description sits in the context on every call, whether or not it’s used. Hundreds of tools = tens of thousands of tokens of pure overhead.
- Intermediate data — fetch a 50,000-row result to filter it down to 3 rows, and all 50,000 rows pass through the context window on the way. The model pays to read data it’s only going to discard.
The fix: write code, run it out-of-context
In code mode, the MCP tools are presented as a code API the agent can program against. Instead of “call tool, read result, call tool, read result,” the agent writes a short script: call these tools, join and filter the results, return just the final answer. The fetching and crunching happen in a sandbox, outside the context window — only the small final result comes back to the model.
# Conceptually, the agent writes & runs code like this (in a sandbox):
orders = mcp.db.query("SELECT * FROM orders WHERE status='open'") # 50k rows
overdue = [o for o in orders if days_since(o.due) > 30] # filtered out-of-context
summary = f"{len(overdue)} overdue orders, total ${sum(o.amount for o in overdue):,}"
return summary # ONLY this short string re-enters the model's context
The model never sees the 50,000 rows — just the one-line summary. Tool definitions are discovered and called as code, not pre-loaded as schemas. The result, on tool-heavy tasks, is up to a ~98.7% token reduction (Anthropic’s reported 150K → 2K), with matching cuts to cost and latency — and less context rot because the window stays small.
Quick check
Quick check
Next
Round out production readiness with observability & tracing and cost & latency control.
Practice this in an interview
All questionsMCP is an open protocol from Anthropic that standardizes how LLM applications discover and connect to external tools, data sources, and prompts through a common client-server interface. It replaces bespoke per-integration glue with a single protocol, so any MCP-compatible host can use any MCP server, and has been adopted across the broader ecosystem.
Tool calling extends the LLM's output space to include structured function invocations. The model emits a JSON object naming a tool and its arguments; the runtime executes the tool and feeds the result back as a new message. An agent is a loop that repeats this cycle — observe, think, act — until the task is complete or a stopping condition is met.
Function calling lets an LLM output a structured request to invoke an external function with arguments, which the runtime executes and feeds back, enabling agents to act in the world. Good tool design uses clear names and descriptions, minimal well-typed parameters, narrow single-purpose scope, least privilege, and informative error messages so the model can choose and call them reliably.
Agents use short-term memory (the working context window) and long-term memory stored in vector databases or files, often split into episodic, semantic, and procedural memory. Context engineering is the discipline of curating what goes into the limited context window, and compaction summarizes or prunes older history so the agent retains key information without overflowing the window or degrading from too much noise.