How do function/tool calling and LLM agents work at a high level?

Tool calling extends the LLM's output space to include structured function invocations. The model emits a JSON object naming a tool and its arguments; the runtime executes the tool and feeds the result back as a new message. An agent is a loop that repeats this cycle — observe, think, act — until the task is complete or a stopping condition is met.

What is tool use or function calling in LLMs, and how do you design good tools for an agent?

Function calling lets an LLM output a structured request to invoke an external function with arguments, which the runtime executes and feeds back, enabling agents to act in the world. Good tool design uses clear names and descriptions, minimal well-typed parameters, narrow single-purpose scope, least privilege, and informative error messages so the model can choose and call them reliably.

What is the difference between monitoring and distributed tracing?

Monitoring aggregates health signals such as request rate, error rate and latency percentiles across systems and time, detecting that a population is unhealthy. Distributed tracing follows one request through nested spans across services, localizing where it waited or failed. Metrics trigger investigation; traces explain individual paths.

What is an AI agent, and how does it differ from a single LLM call?

An agent is an LLM placed in a loop where it reasons, chooses and calls tools or actions, observes the results, and repeats until a goal is met, rather than producing one response and stopping. The key differences are autonomy, tool use, memory and state, and multi-step control flow driven by the model's own decisions.

Observability & tracing — Agentic AI

An agent run is a tree of nested calls — the agent calls the model, which picks a tool, which returns a result, which triggers another model call, and so on, sometimes dozens deep. When something goes wrong (a wrong answer, a $5 query, a 30-second hang), “read the final output” tells you nothing. You need to see the whole run. That’s observability, and for agents it’s not optional — it’s how you debug, evaluate, and control cost.

Monitoring tells you that; tracing tells you where

Monitoring aggregates health across runs and time: request rate, error rate, p95 latency, token cost and tool-failure counts. It detects a fleet-level symptom and drives an alert.

Tracing follows one run through nested operations. It localizes which model, retrieval or tool span waited, retried or failed. In practice they work as a pair: a latency monitor pages, a representative trace reveals three retries in one tool call, and the monitoring dashboard confirms the fix across traffic.

What a trace captures

A trace records the full execution as nested spans — one span per operation, each with timing, inputs, outputs, token counts, and cost:

A trace tree: each span shows duration, tokens, and cost — making the retry loop that ate the latency obvious.

With that view, the bugs that are invisible in the final output jump out:

A retry loop burning latency and tokens (the highlighted span above).
The wrong tool called, or a tool called with bad arguments.
A cost spike — which span spent the tokens, and on what.
An agent looping instead of converging, hitting its step budget.

What to instrument

Log, for the whole run and every span: inputs/outputs, the model and prompt version, token counts (prompt + completion + any reasoning), latency, cost, tool name + arguments + result, and errors/retries. Tie it all to a trace id so a single run is one searchable object.

In one breath

An agent run is a tree of nested calls; “read the final output” can’t explain a wrong answer, a slow run, or a cost spike — you need to see the whole run.
Monitoring aggregates health across runs (rate, error %, p95, cost) and alerts; tracing follows one run through nested spans and localizes the bad step — the two work as a pair.
A trace records each span with timing, inputs/outputs, token counts, and cost, tied to a trace id — so retry loops, wrong-tool calls, and cost spikes jump out.
Instrument the whole run: I/O, model + prompt version, tokens, latency, cost, tool name/args/result, and errors/retries.
Don’t build it from scratch — LangSmith, Langfuse, Arize/AgentOps (increasingly OpenTelemetry); wire tracing before you ship, and reuse captured traces as eval cases.

Quick check

0/3

Q1What does a trace capture for an agent run?

Q2Why is the final output insufficient for debugging an agent?

Q3How do observability traces connect to agent evaluation?

Tracing also surfaces where the money goes — the bridge to cost & latency control.

Observability & tracing

What you'll learn

Before you start

Monitoring tells you that; tracing tells you where

What a trace captures

What to instrument

In one breath

Quick check

Quick check

Next

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further