datarekha

The Agent Harness

The LLM is stateless — the harness is everything else. Learn what the runtime scaffolding around a model is, what each component does, and how to build a minimal one from scratch.

9 min read Intermediate Agentic AI Lesson 7 of 42

What you'll learn

  • Why the model is stateless and what the harness must supply to make it an agent
  • The six components every production harness has — loop, tool dispatch, context management, permissions, stop conditions, and verification
  • How to build a minimal agent harness in Python with a fake model and a real tool dispatch table
  • The failure modes that appear when any component is missing or underbuilt

Before you start

Send a message to a language model and it responds. Send the same message again and it responds identically — it has no memory of the first call, no knowledge of what it did before, and no ability to act on anything. The model is a pure function: text in, text out. Nothing about it is agentic.

The thing that makes it an agent is the code around it. That code is the harness (from the engineering term for the scaffolding that drives and monitors a system) — the loop, the tool dispatch, the context manager, the stop conditions, the permissions layer. A well-built harness is invisible when things go right. It is the entire problem when things go wrong.

This lesson builds one from scratch.

The model is stateless. The harness is the loop.

A language model has no memory across calls. Every invocation starts from whatever you put in the context window — nothing more. If you want the model to remember what happened two steps ago, the harness has to put that history into the prompt. If you want the model to call a tool and then reason about the result, the harness has to run the tool and append the result to the context. This is why the harness exists: without it, every API call is independent, and you get a chatbot, not an agent.

The core agent loop is:

while not done:
    response = call_model(context)
    if response is a tool call:
        result = dispatch_tool(response.tool, response.args)
        context.append(response, result)
    else:
        return response.text   # final answer

That’s not pseudocode for the hard part — that’s the entire structural idea. Every framework (LangGraph, MAF, ADK) is just this loop with opinions about state management, persistence, and error handling.

The six components

A minimal harness needs six things. Most of the engineering budget in a production agent goes into these.

1. The agent loop (think → act → observe)

The loop drives the model forward until it reaches a terminal state. Each iteration: call the model, inspect the output, if it’s a tool call run the tool and feed back the result, otherwise surface the answer.

The three steps you’ll see named everywhere — think, act, observe — map directly to a single loop iteration: the model thinks (generates a response), the harness acts (runs the tool), the model observes (the result is appended to context for the next call).

2. Tool dispatch and schema validation

Tools are functions registered with the harness. The model requests a tool by name and provides arguments as JSON. The harness is responsible for:

  • Schema validation — check that the arguments match the tool’s declared schema before calling anything. If they don’t, return a well-formed error back to the model so it can retry, not a Python traceback.
  • Dispatch — route the validated call to the right function.
  • Result serialisation — return results as strings (or structured text) the model can reason about.

Without schema validation you get cascading failures: a malformed argument propagates into the tool, the tool raises an exception, the harness panics, the run dies. Models do produce malformed tool calls, especially in edge cases.

3. Context management (windowing and compaction)

Context windows are finite. A naive harness that appends every observation forever eventually overflows. The harness must manage what goes into the context window, using one or more of:

  • Sliding window — keep only the last N messages.
  • Compaction / summarization — periodically ask the model to summarise earlier history and replace it with the summary.
  • Retrieval — store older observations externally and retrieve relevant chunks by embedding similarity when needed.

The tradeoff is always memory vs. fidelity: a tighter context window costs less and runs faster, but the model may forget an earlier observation it needs.

4. Permissions and sandboxing

Tools can do real things — write files, call APIs, send emails. The harness is the last line of defense before those side effects happen. A minimal permissions layer:

  • Maintains an allowlist of which tool/argument combinations are permitted for this run or this user.
  • Optionally requires human confirmation before irreversible actions (file deletion, sending a message).
  • Prevents tool calls that reach outside the declared scope of the task.

Skipping this is fine for demos. It is not fine for anything that touches real infrastructure.

5. Stop conditions and runaway guards

An agent loop without a stop condition can run forever — burning tokens, making tool calls, and producing nothing useful. Stop conditions include:

  • Max steps — a hard cap on loop iterations.
  • Max tokens — a budget cap across the whole run.
  • Repetition detection — if the model calls the same tool with the same arguments twice in a row, it is probably stuck.
  • Timeout — wall-clock time limit for the whole run.

A well-designed harness trips the cheapest guard first. Token counting is free; repetition detection costs a dictionary lookup; timeout is a timer. These should fire before an expensive final model call.

6. Verification

Not every loop should terminate at the model’s first “done”. For tasks where correctness matters — code execution, structured extraction, multi-step calculation — the harness can run a verification step after the model declares it is finished: execute the code and check it doesn’t throw, parse the output against a schema, run a secondary evaluator call.

This is the evaluator-optimizer pattern applied inside the harness. It costs one extra model call per run, and it catches a meaningful fraction of hallucinated or malformed final answers.

A minimal harness you can run

The playground below implements all six components in plain Python — no frameworks, no network. The model is faked with a deterministic function so it runs in Pyodide. Walk through each section and see how the pieces connect.

Run it. Trace the output — each [TOOL] line is one dispatch cycle, the [FINAL] line is the verification-cleared answer. Try changing max_steps=2 to see the runaway guard fire, or comment out the repetition detection to see what a stuck agent looks like.

What breaks without each component

Missing componentFailure mode
Context managementContext overflow; model forgets earlier observations
Schema validationMalformed args propagate into tools; tracebacks in model context
Repetition detectionAgent loops on the same tool call indefinitely
Max-steps capRunaway spend — a stuck agent at 1000 steps costs real money
Permissions checkModel (or adversarial input) calls destructive tools unchecked
VerificationHallucinated final answers accepted as correct

None of these are hypothetical. They are the six failure categories you will see in the first production incident for any agent system that skips them.

Quick check

Quick check

0/3
Q1A language model is described as 'stateless'. What does this mean for the harness?
Q2Which harness component is responsible for catching the case where a model requests the same tool call twice in a row with identical arguments?
Q3Why does schema validation belong in the harness rather than inside each tool function?

Next

The next lessons put this harness to work inside LangGraph — where the loop becomes an explicit state machine and the context becomes a typed state dict you can inspect, pause, and resume.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Practice this in an interview

All questions
How do function/tool calling and LLM agents work at a high level?

Tool calling extends the LLM's output space to include structured function invocations. The model emits a JSON object naming a tool and its arguments; the runtime executes the tool and feeds the result back as a new message. An agent is a loop that repeats this cycle — observe, think, act — until the task is complete or a stopping condition is met.

What is an AI agent, and how does it differ from a single LLM call?

An agent is an LLM placed in a loop where it reasons, chooses and calls tools or actions, observes the results, and repeats until a goal is met, rather than producing one response and stopping. The key differences are autonomy, tool use, memory and state, and multi-step control flow driven by the model's own decisions.

What is tool use or function calling in LLMs, and how do you design good tools for an agent?

Function calling lets an LLM output a structured request to invoke an external function with arguments, which the runtime executes and feeds back, enabling agents to act in the world. Good tool design uses clear names and descriptions, minimal well-typed parameters, narrow single-purpose scope, least privilege, and informative error messages so the model can choose and call them reliably.

What prompt engineering techniques should every LLM practitioner know?

The core toolkit is: system prompts (role and constraints), few-shot examples (format and tone anchoring), chain-of-thought (step-by-step reasoning), and output constraints (JSON schema, stop sequences). Combining these predictably closes the gap between a capable base model and a production-ready feature.

Related lessons

Explore further

Skip to content