ReAct, Plan-Execute, Reflexion
The three agent reasoning loops every engineer should know in 2026 — interleaved ReAct, upfront Plan-and-Execute, and self-correcting Reflexion. How each works and when to reach for it.
What you'll learn
- The three core agent reasoning loops and how each is structured
- The tradeoffs — LLM calls, adaptivity, and predictability
- Which loop fits a dynamic, a known-multi-step, or a verifiable-hard task
Before you start
The design-patterns lesson covered the workflow shapes (prompt chaining, routing, parallelization). This lesson is about the reasoning loops an agent runs inside those shapes — how it actually decides what to do next. Three dominate the field, and knowing which to reach for is the difference between an agent that’s reliable and one that’s slow, expensive, or brittle.
ReAct — reason and act, interleaved
ReAct (Reason + Act) is the workhorse. The agent loops: Thought → Action → Observation → Thought → … — it reasons about what to do, takes one action (a tool call), observes the result, and reasons again with that new information. Because it reacts to each observation, it handles dynamic situations gracefully — it doesn’t need to know the steps in advance.
The cost: an LLM call per step, so a long task is many calls (latency and money), and it can wander or loop if the stop condition is loose. ReAct is the default shape behind most tool-using agents.
Plan-and-Execute — decide everything first
Plan-and-Execute splits reasoning from doing. A planner LLM lays out all the steps up front; an executor then runs them, often without calling the planner again. This is cheaper and more predictable — one expensive planning call, then mechanical execution — and easier to debug because the plan is explicit.
The weakness: it’s blind to surprises. If step 2 returns something the plan didn’t anticipate, a pure plan-execute agent plows ahead. In practice you add a re-plan step when execution deviates — a hybrid that recovers some of ReAct’s adaptivity.
Reflexion — try, critique, retry
Reflexion adds a self-correction loop: the agent makes an attempt, then a reflection step critiques its own output (“I cited the wrong policy section”), and it retries with that feedback. It trades extra passes for quality, and it shines when first attempts often fail and the result can be checked — code that must pass tests, structured extraction you can validate, math you can verify.
Quick check
Quick check
Next
These loops run inside a single agent. When one agent isn’t enough, see multi-agent orchestration — and the equally important question of when not to go multi-agent.
Practice this in an interview
All questionsReAct interleaves reasoning traces with actions step by step, deciding the next tool call based on the latest observation. Plan-and-Execute first drafts a full multi-step plan and then executes it, which is more efficient and predictable for complex tasks but less adaptive, while Reflexion adds a self-reflection step where the agent critiques past failures and retries with that feedback.
An agent is an LLM placed in a loop where it reasons, chooses and calls tools or actions, observes the results, and repeats until a goal is met, rather than producing one response and stopping. The key differences are autonomy, tool use, memory and state, and multi-step control flow driven by the model's own decisions.
The core toolkit is: system prompts (role and constraints), few-shot examples (format and tone anchoring), chain-of-thought (step-by-step reasoning), and output constraints (JSON schema, stop sequences). Combining these predictably closes the gap between a capable base model and a production-ready feature.
Tool calling extends the LLM's output space to include structured function invocations. The model emits a JSON object naming a tool and its arguments; the runtime executes the tool and feeds the result back as a new message. An agent is a loop that repeats this cycle — observe, think, act — until the task is complete or a stopping condition is met.