The Agent Harness
The LLM is stateless — the harness is everything else. Learn what the runtime scaffolding around a model is, what each component does, and how to build a minimal one from scratch.
What you'll learn
- Why the model is stateless and what the harness must supply to make it an agent
- The six components every production harness has — loop, tool dispatch, context management, permissions, stop conditions, and verification
- How to build a minimal agent harness in Python with a fake model and a real tool dispatch table
- The failure modes that appear when any component is missing or underbuilt
Before you start
Send a message to a language model and it responds. Send the same message again and it responds identically — it has no memory of the first call, no knowledge of what it did before, and no ability to act on anything. The model is a pure function: text in, text out. Nothing about it is agentic.
The thing that makes it an agent is the code around it. That code is the harness (from the engineering term for the scaffolding that drives and monitors a system) — the loop, the tool dispatch, the context manager, the stop conditions, the permissions layer. A well-built harness is invisible when things go right. It is the entire problem when things go wrong.
This lesson builds one from scratch.
The model is stateless. The harness is the loop.
A language model has no memory across calls. Every invocation starts from whatever you put in the context window — nothing more. If you want the model to remember what happened two steps ago, the harness has to put that history into the prompt. If you want the model to call a tool and then reason about the result, the harness has to run the tool and append the result to the context. This is why the harness exists: without it, every API call is independent, and you get a chatbot, not an agent.
The core agent loop is:
while not done:
response = call_model(context)
if response is a tool call:
result = dispatch_tool(response.tool, response.args)
context.append(response, result)
else:
return response.text # final answer
That’s not pseudocode for the hard part — that’s the entire structural idea. Every framework (LangGraph, MAF, ADK) is just this loop with opinions about state management, persistence, and error handling.
The six components
A minimal harness needs six things. Most of the engineering budget in a production agent goes into these.
1. The agent loop (think → act → observe)
The loop drives the model forward until it reaches a terminal state. Each iteration: call the model, inspect the output, if it’s a tool call run the tool and feed back the result, otherwise surface the answer.
The three steps you’ll see named everywhere — think, act, observe — map directly to a single loop iteration: the model thinks (generates a response), the harness acts (runs the tool), the model observes (the result is appended to context for the next call).
2. Tool dispatch and schema validation
Tools are functions registered with the harness. The model requests a tool by name and provides arguments as JSON. The harness is responsible for:
- Schema validation — check that the arguments match the tool’s declared schema before calling anything. If they don’t, return a well-formed error back to the model so it can retry, not a Python traceback.
- Dispatch — route the validated call to the right function.
- Result serialisation — return results as strings (or structured text) the model can reason about.
Without schema validation you get cascading failures: a malformed argument propagates into the tool, the tool raises an exception, the harness panics, the run dies. Models do produce malformed tool calls, especially in edge cases.
3. Context management (windowing and compaction)
Context windows are finite. A naive harness that appends every observation forever eventually overflows. The harness must manage what goes into the context window, using one or more of:
- Sliding window — keep only the last N messages.
- Compaction / summarization — periodically ask the model to summarise earlier history and replace it with the summary.
- Retrieval — store older observations externally and retrieve relevant chunks by embedding similarity when needed.
The tradeoff is always memory vs. fidelity: a tighter context window costs less and runs faster, but the model may forget an earlier observation it needs.
4. Permissions and sandboxing
Tools can do real things — write files, call APIs, send emails. The harness is the last line of defense before those side effects happen. A minimal permissions layer:
- Maintains an allowlist of which tool/argument combinations are permitted for this run or this user.
- Optionally requires human confirmation before irreversible actions (file deletion, sending a message).
- Prevents tool calls that reach outside the declared scope of the task.
Skipping this is fine for demos. It is not fine for anything that touches real infrastructure.
5. Stop conditions and runaway guards
An agent loop without a stop condition can run forever — burning tokens, making tool calls, and producing nothing useful. Stop conditions include:
- Max steps — a hard cap on loop iterations.
- Max tokens — a budget cap across the whole run.
- Repetition detection — if the model calls the same tool with the same arguments twice in a row, it is probably stuck.
- Timeout — wall-clock time limit for the whole run.
A well-designed harness trips the cheapest guard first. Token counting is free; repetition detection costs a dictionary lookup; timeout is a timer. These should fire before an expensive final model call.
6. Verification
Not every loop should terminate at the model’s first “done”. For tasks where correctness matters — code execution, structured extraction, multi-step calculation — the harness can run a verification step after the model declares it is finished: execute the code and check it doesn’t throw, parse the output against a schema, run a secondary evaluator call.
This is the evaluator-optimizer pattern applied inside the harness. It costs one extra model call per run, and it catches a meaningful fraction of hallucinated or malformed final answers.
A minimal harness you can run
The playground below implements all six components in plain Python — no frameworks, no network. The model is faked with a deterministic function so it runs in Pyodide. Walk through each section and see how the pieces connect.
Run it. Trace the output — each [TOOL] line is one dispatch cycle, the
[FINAL] line is the verification-cleared answer. Try changing max_steps=2
to see the runaway guard fire, or comment out the repetition detection to
see what a stuck agent looks like.
What breaks without each component
| Missing component | Failure mode |
|---|---|
| Context management | Context overflow; model forgets earlier observations |
| Schema validation | Malformed args propagate into tools; tracebacks in model context |
| Repetition detection | Agent loops on the same tool call indefinitely |
| Max-steps cap | Runaway spend — a stuck agent at 1000 steps costs real money |
| Permissions check | Model (or adversarial input) calls destructive tools unchecked |
| Verification | Hallucinated final answers accepted as correct |
None of these are hypothetical. They are the six failure categories you will see in the first production incident for any agent system that skips them.
Quick check
Quick check
Next
The next lessons put this harness to work inside LangGraph — where the loop becomes an explicit state machine and the context becomes a typed state dict you can inspect, pause, and resume.
Practice this in an interview
All questionsTool calling extends the LLM's output space to include structured function invocations. The model emits a JSON object naming a tool and its arguments; the runtime executes the tool and feeds the result back as a new message. An agent is a loop that repeats this cycle — observe, think, act — until the task is complete or a stopping condition is met.
An agent is an LLM placed in a loop where it reasons, chooses and calls tools or actions, observes the results, and repeats until a goal is met, rather than producing one response and stopping. The key differences are autonomy, tool use, memory and state, and multi-step control flow driven by the model's own decisions.
Function calling lets an LLM output a structured request to invoke an external function with arguments, which the runtime executes and feeds back, enabling agents to act in the world. Good tool design uses clear names and descriptions, minimal well-typed parameters, narrow single-purpose scope, least privilege, and informative error messages so the model can choose and call them reliably.
The core toolkit is: system prompts (role and constraints), few-shot examples (format and tone anchoring), chain-of-thought (step-by-step reasoning), and output constraints (JSON schema, stop sequences). Combining these predictably closes the gap between a capable base model and a production-ready feature.