How do function/tool calling and LLM agents work at a high level?
Tool calling extends the LLM's output space to include structured function invocations. The model emits a JSON object naming a tool and its arguments; the runtime executes the tool and feeds the result back as a new message. An agent is a loop that repeats this cycle — observe, think, act — until the task is complete or a stopping condition is met.
How to think about it
Tool calling
Modern APIs let you describe available functions in the request. The model decides when to call one and returns a structured call rather than prose. The application executes the function and sends the result back as a tool role message.
from openai import OpenAI
client = OpenAI()
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["city"],
},
},
}
]
messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]
response = client.chat.completions.create(
model="gpt-4o", messages=messages, tools=tools
)
tool_call = response.choices[0].message.tool_calls[0]
# Execute: result = get_weather(**json.loads(tool_call.function.arguments))
# Append tool result and call again for final answer
The agent loop
An agent wraps tool calling in an observe-think-act loop:
- Observe — receive user goal or previous tool output.
- Think — the LLM decides the next action (or “finish”).
- Act — call the chosen tool; get result.
- Repeat until the model emits a final answer.
The two dominant patterns are ReAct (Reason + Act interleaved in text) and plan-and-execute (separate planning and execution steps for long-horizon tasks).
Key design decisions
- Max iterations — cap the loop to prevent infinite cycles.
- Tool output truncation — tool results can be large; summarize before re-injecting.
- Error handling — pass tool errors back to the model so it can retry or gracefully fail.
- Memory — for long sessions, summarize prior turns to avoid context overflow.
Multi-agent systems
Orchestrator agents delegate subtasks to specialist agents (e.g., a web search agent, a code execution agent). Communication happens through message passing; the orchestrator aggregates results.