datarekha

Code execution with MCP

Instead of loading every tool and routing every result through the context window, let the agent write code that calls tools out-of-context. The pattern that cut tokens by up to 98.7%.

7 min read Intermediate Agentic AI Lesson 41 of 42

What you'll learn

  • Why loading all tool definitions and intermediate results bloats the context
  • How code execution calls tools and processes results out-of-context
  • The token, cost, and latency win — and the sandboxing requirement

Before you start

As agents get more MCP tools, a quiet scaling problem appears. The traditional approach loads every tool definition into the context up front, and routes every intermediate result back through the model. Connect a dozen MCP servers with a few hundred tools and fetch a large dataset, and you’re spending a fortune in tokens before the agent has done anything useful. Anthropic’s code execution with MCP flips this — and the savings are dramatic.

The problem: everything flows through the model

Two costs balloon as tool use grows:

  1. Tool definitions — each tool’s schema and description sits in the context on every call, whether or not it’s used. Hundreds of tools = tens of thousands of tokens of pure overhead.
  2. Intermediate data — fetch a 50,000-row result to filter it down to 3 rows, and all 50,000 rows pass through the context window on the way. The model pays to read data it’s only going to discard.

The fix: write code, run it out-of-context

In code mode, the MCP tools are presented as a code API the agent can program against. Instead of “call tool, read result, call tool, read result,” the agent writes a short script: call these tools, join and filter the results, return just the final answer. The fetching and crunching happen in a sandbox, outside the context window — only the small final result comes back to the model.

# Conceptually, the agent writes & runs code like this (in a sandbox):
orders = mcp.db.query("SELECT * FROM orders WHERE status='open'")  # 50k rows
overdue = [o for o in orders if days_since(o.due) > 30]            # filtered out-of-context
summary = f"{len(overdue)} overdue orders, total ${sum(o.amount for o in overdue):,}"
return summary    # ONLY this short string re-enters the model's context

The model never sees the 50,000 rows — just the one-line summary. Tool definitions are discovered and called as code, not pre-loaded as schemas. The result, on tool-heavy tasks, is up to a ~98.7% token reduction (Anthropic’s reported 150K → 2K), with matching cuts to cost and latency — and less context rot because the window stays small.

Quick check

Quick check

0/3
Q1What two things bloat the context when an agent has many MCP tools?
Q2How does code execution with MCP reduce tokens so dramatically?
Q3What's the essential safety requirement for this pattern?

Next

Round out production readiness with observability & tracing and cost & latency control.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Practice this in an interview

All questions
What is the Model Context Protocol (MCP) and what problem does it solve?

MCP is an open protocol from Anthropic that standardizes how LLM applications discover and connect to external tools, data sources, and prompts through a common client-server interface. It replaces bespoke per-integration glue with a single protocol, so any MCP-compatible host can use any MCP server, and has been adopted across the broader ecosystem.

How do function/tool calling and LLM agents work at a high level?

Tool calling extends the LLM's output space to include structured function invocations. The model emits a JSON object naming a tool and its arguments; the runtime executes the tool and feeds the result back as a new message. An agent is a loop that repeats this cycle — observe, think, act — until the task is complete or a stopping condition is met.

What is tool use or function calling in LLMs, and how do you design good tools for an agent?

Function calling lets an LLM output a structured request to invoke an external function with arguments, which the runtime executes and feeds back, enabling agents to act in the world. Good tool design uses clear names and descriptions, minimal well-typed parameters, narrow single-purpose scope, least privilege, and informative error messages so the model can choose and call them reliably.

What types of memory do agents use, and what is context engineering and compaction?

Agents use short-term memory (the working context window) and long-term memory stored in vector databases or files, often split into episodic, semantic, and procedural memory. Context engineering is the discipline of curating what goes into the limited context window, and compaction summarizes or prunes older history so the agent retains key information without overflowing the window or degrading from too much noise.

Related lessons

Explore further

Skip to content