Pydantic AI: typed agents for the Python ecosystem

There is a specific category of Python engineer that the existing agent frameworks have systematically failed: the one whose entire production service is a typed FastAPI app, who uses Pydantic v2 for every external contract, runs pytest with mypy --strict, and reaches for async def by default. They tried LangChain in 2023 and bounced off the dynamic typing. They tried CrewAI and found the role/task model too loose for their codebase. They tried LangGraph and built a working agent — but the state object was a TypedDict and the LLM calls were strings, and the whole thing felt orthogonal to how they write the rest of their code.

For three years, the workaround was “use a framework but wrap it in your own typed layer.” Most production Python LLM code I’ve read does exactly that. The work is to make a non-typed framework behave like your typed codebase.

Then in November 2024 the Pydantic team — yes, the same team whose library OpenAI, Google, and Anthropic all use internally for input validation in their SDKs — shipped Pydantic AI. The pitch was the most precise one any agent framework has put in writing: “FastAPI for agents.”

Eighteen months on, it has 16.5K GitHub stars, a stable 1.x API since late 2025, and a specific kind of production team that has standardised on it. This post is about what those teams have in common, what the framework actually does well, and where its growing pains still show.

The shape of the framework

Pydantic AI’s mental model is one sentence: an Agent is a typed function that calls an LLM, takes Pydantic-modelled dependencies, and returns a Pydantic-validated result. Everything else is implementation detail.

The shape that earned the FastAPI comparison: typed inputs (including a deps container), an agent that owns the LLM call, a Pydantic-validated typed output. Same ergonomics, different runtime.

The look-and-feel is unmistakable to anyone who has written FastAPI:

from pydantic import BaseModel
from pydantic_ai import Agent, RunContext

class SupportContext(BaseModel):
    user_id: str
    customer_tier: str

class TicketTriage(BaseModel):
    category: Literal["billing", "technical", "feature_request"]
    severity: Literal["p0", "p1", "p2", "p3"]
    suggested_response: str

triage_agent = Agent(
    "anthropic:claude-4.5-sonnet",
    deps_type=SupportContext,
    output_type=TicketTriage,
    system_prompt="You triage support tickets...",
)

@triage_agent.tool
async def fetch_recent_tickets(ctx: RunContext[SupportContext]) -> list[dict]:
    return await db.tickets.recent_for(ctx.deps.user_id)

result = await triage_agent.run(
    user_message="my last invoice is wrong",
    deps=SupportContext(user_id="u_123", customer_tier="enterprise"),
)
# result.output is a TicketTriage. mypy knows. your IDE knows.

The choices that make this feel native to a FastAPI codebase:

Dependency injection through deps_type. Just like FastAPI’s Depends, the agent’s tools and validators get a typed context object. No globals, no hidden state — the same pattern your HTTP routes already use.
Typed output via output_type. Under the hood Pydantic AI uses whichever structured-output mechanism the provider supports (OpenAI Structured Outputs, Anthropic strict tool use, Gemini function declarations). You write a Pydantic model; you get a Pydantic model.
Async-first. agent.run() is a coroutine. There is no threading-vs-async confusion, no sync wrappers that quietly block the event loop.
Model-agnostic. The same Agent definition runs against OpenAI, Anthropic, Gemini, Mistral, Cohere, Bedrock, Vertex, Ollama, Groq, Together, OpenRouter — at last count, 38 providers. You change one string.

If you write FastAPI all day, almost none of this requires a new mental model. The cost of adopting Pydantic AI from a typed Python codebase is unusually low.

What it competes against, and where it wins

The framing trap to avoid: Pydantic AI is not a LangGraph competitor in the strict sense, because LangGraph is a state machine and Pydantic AI is a typed agent. The fair comparison set is:

Vanilla function calling — what most production Python teams do by default. Direct API calls, hand-rolled retry logic, hand-rolled schema validation. Pydantic AI is the obvious upgrade: it owns the retry-on-validation-error loop and the schema translation.
LangChain agents — the loose, dynamically-typed legacy choice. Pydantic AI is everything LangChain agents were not.
LangGraph for simple agents — overkill if your agent doesn’t have durable state. Pydantic AI is the right tool for the “agent fits in a request” case; LangGraph is right when it doesn’t.
DSPy — different abstraction layer entirely. DSPy optimises prompts; Pydantic AI runs them. Some teams use both.

Where Pydantic AI specifically wins:

The team’s existing stack is typed Python. FastAPI, SQLModel, Pydantic v2 for every external contract. The cost of adopting an un-typed framework is real, and Pydantic AI eliminates it.
The agent is short-lived. One request, one or two tool calls, a validated output. No human gates, no multi-hour runs.
You want to swap models. Pydantic AI’s model-agnosticism is better than the alternatives because the model interface is a single string and the rest of the code doesn’t change.
Observability matters. Pydantic Logfire, the team’s OpenTelemetry-native observability product, integrates one-line. Span hierarchy, LLM trace, dependency calls, all in one trace. No setup fee — 10M free spans per month.

The Logfire integration is the underappreciated part. Pydantic AI plus Logfire is the only stack I’ve used in 2026 where I could open a single trace and see HTTP request → endpoint → agent run → tool call → LLM turn → database query → response, with no plumbing. OpenTelemetry as the underlying wire format, so you can hook it into Datadog or Honeycomb if you’d rather. That story is genuinely best-in-class.

What it doesn’t do

This is where the post has to stop sounding like an ad. Pydantic AI’s weaknesses are the predictable ones for a young, focused framework:

No durable state. There is no checkpointer. There is no interrupt(). If your agent has to pause for a human approval hours later, you build that yourself or you reach for LangGraph. Pydantic AI Graph (released mid-2025) added a graph layer on top of agents, but the state durability story is still thinner than LangGraph’s.
Smaller ecosystem. LangGraph has an integration for everything; Pydantic AI’s integrations grow week by week but the long tail is shorter. Vector store wrappers, document loaders, retrieval pipelines — you’ll often write your own or call into LangChain’s.
The “FastAPI for agents” framing has a ceiling. FastAPI’s job is to turn HTTP requests into typed Python and back. Agent frameworks also have to handle multi-step planning, retrieval, evaluation, and the human-in-the-loop story. Pydantic AI does the typed-call part beautifully; the rest of the agent lifecycle is partially your job.
Younger, faster-moving API. The 1.x line stabilised in late 2025 but the framework still ships meaningful API additions every few weeks. If you pin to a version and don’t follow the changelog, you miss things — though breaking changes have been rare.

The team is upfront about these. Samuel Colvin’s interviews — the Latent Space episode and the Software Engineering Daily podcast — have consistently framed Pydantic AI as the “agent runtime” layer, not the “long-running orchestration” layer. They built Pydantic Graph when teams asked for orchestration, but the centre of gravity stays on the typed-agent piece.

Who’s running it

Production adoption is real and growing, but it’s quieter than CrewAI or LangGraph because the teams using Pydantic AI tend to be smaller typed-Python shops rather than enterprise platforms with PR teams.

What the community signals look like in mid-2026:

Pydantic Logfire itself is built on Pydantic AI for its internal agent features (the dogfooding tells you something).
Multiple Y Combinator startups in the agent-builder and AI-ops space have standardised on it; Samuel Colvin’s interviews cite a list without naming companies.
Amazon Bedrock AgentCore lists Pydantic AI as a supported framework — one of the few non-Amazon-built frameworks AWS endorses by name on the agent side.
The Pydantic team report 1.x is in production at companies they can’t name — typical of B2B SaaS adoption, but the maintained pipeline of bug reports and feature requests on the GitHub repo reads like a framework with users, not a framework with stargazers.

The honest assessment: Pydantic AI is in the early-majority phase that FastAPI was in around 2019. It’s not the default yet; it’s the increasingly default for the kind of team that overlaps with the existing Pydantic install base.

Side-by-side: the moments that matter

The clearest way to feel the difference between Pydantic AI and the alternatives is to look at the small moments where the ergonomic story either lands or doesn’t. A few examples that come up over and over in real code reviews.

Four micro-ergonomics that decide whether a framework feels native. Pydantic AI’s bet is to nail these — the rest of the framework follows.

The retry-on-validation case is the most underrated one. When the LLM returns an output that doesn’t validate against your Pydantic model, Pydantic AI re-prompts the model with the validation error attached. The model sees something like “the field severity must be one of p0, p1, p2, p3 — you returned urgent. Please retry with a valid value.” No application code; the framework does it. This is the kind of detail most teams build by hand, and Pydantic AI has it as a default.

The dependency injection case is the other one. Real production agents need access to a database, a feature-flag store, the current user’s permission set, an HTTP client with the right timeouts. Without DI, that becomes module-level globals or factory closures. With DI, your tool signatures stay clean (async def fetch(ctx: RunContext[Deps])) and your tests get to inject fakes. Anyone who’s written FastAPI tests will recognise the pattern instantly.

When to pick it

The clearest decision rule:

Pick Pydantic AI when your existing Python codebase is typed, your agent is request-scoped (no human-in-the-loop, no hours-long runs), you want model-agnosticism without rewriting on provider swap, and you value Logfire-grade observability out of the box.
Don’t pick it when the agent must outlive a request and resume across worker restarts (use LangGraph), or when the team’s abstractions are role-based and prefer English-y declarations (use CrewAI), or when you’re building a research benchmark on multi-stage pipelines (use DSPy).
Combine it when you’ve got a LangGraph state machine but the individual nodes are best written as typed agent calls — Pydantic AI inside the node, LangGraph outside. This hybrid is one of the cleaner architectures we’ve seen for serious agent systems.

What to take away

Three years into the agent framework era, Pydantic AI is the framework that matches the shape of the rest of the Python ecosystem. It is not the most ambitious framework. It does not solve the hardest problems. But it solves the typed-call problem cleanly, and that’s the problem most teams actually have.

The FastAPI comparison is honest. Same dependency injection pattern, same typed-IO pattern, same async-native runtime, same light-touch ergonomics. If you like writing FastAPI, you’ll like writing Pydantic AI.
The Logfire integration is the killer feature most people miss. One-line tracing, OpenTelemetry-native, distributed traces that span HTTP → agent → tool → database. The observability story alone justifies the framework for a lot of teams.
It’s the right tool when the agent is a typed function. It’s the wrong tool when the agent is a durable workflow. Don’t pick one framework for everything — pick the one that matches what your agent is.

The 2026 production stack for a typed-Python team building agents is boring in a good way: FastAPI for the HTTP layer, Pydantic AI for the agent runtime, LangGraph for the durable workflows, Logfire for the traces, Postgres for the state. It is the closest the Python ecosystem has come to a default, and Pydantic AI is the missing piece that made it converge.

Further reading: the Pydantic AI docs, the Latent Space interview with Samuel Colvin on agent engineering and graphs, and the Pydantic Logfire docs for the observability story. The Pydantic AI GitHub repo is also unusually well-documented for a framework of its age.