How do you evaluate an agentic system, and what is the difference between trajectory and outcome evaluation?
Outcome evaluation checks whether the agent's final result is correct, while trajectory evaluation inspects the intermediate steps, tool calls, and decisions along the way. You need both because an agent can reach the right answer through a flawed path or fail despite sound reasoning; trajectory metrics catch wrong tool use, redundant steps, and loops that outcome-only metrics miss.
How to think about it
Outcome evaluation checks whether the agent’s final result is correct, while trajectory evaluation inspects the intermediate steps, tool calls, and decisions along the way. You need both because an agent can reach the right answer through a flawed path or fail despite sound reasoning; trajectory metrics catch wrong tool use, redundant steps, and loops that outcome-only metrics miss.