Agents for legal, finance, healthcare — the high-stakes pattern
Three industries where hallucination is malpractice. The agent companies winning in legal, finance, and healthcare have converged on the same shape — scoped tasks, retrieval over generation, mandatory human checkpoints, audit logging by default. Here's what that pattern looks like in production at Harvey, Hebbia, Abridge, and Suki.
There is a particular kind of agent demo that doesn’t make it past compliance review. A model answers a question confidently, the demo audience nods, and somewhere a general counsel asks “but what happens when it’s wrong?” In consumer products, the answer is “the user shrugs and tries again.” In healthcare, finance, or legal work, the answer is “we get sued, we get fined, or someone dies.”
That asymmetry is the central design constraint for every agent company that has shipped into a regulated industry. And by mid-2026, enough of them have shipped at scale — Harvey at $11B and over 700 customers across 58 countries; Abridge at $5.3B and over 150 health systems; Suki across 400+ healthcare systems; Hebbia serving roughly a third of asset managers — that we can talk about the pattern, not the pitch.
This post is about what the high-stakes agent pattern actually looks like. And about why it is conspicuously not what the autonomous-multi-agent crowd was selling in 2024.
Why “smarter agents” was the wrong answer
The instinct, when you first meet the regulated-industry problem, is to make the agent better. Bigger model. More tools. Smarter reasoning. Surely if the agent is good enough, hallucinations become rare enough that compliance accepts them.
This is wrong, and it took two years of failed pilots for the industry to internalise it. The problem in regulated industries isn’t the rate of errors; it’s the consequence of any single error. A 1% hallucination rate in a consumer chatbot is a “yeah, sometimes it makes things up” caveat. A 1% hallucination rate in a clinical note is a HIPAA breach and a malpractice suit a year later when the bad note ends up in front of a jury. A 1% hallucination rate in a legal memo is a sanction from the judge — and in 2023, an actual sanction did happen when a lawyer filed a brief with ChatGPT-fabricated citations.
You can’t make agents smart enough to be safe in regulated industries. You have to build a system that doesn’t let a single bad output reach the patient, the court, or the regulator unobserved.
The companies that figured this out converged on four design rules — none of which involve a smarter model.
Legal — Harvey, Hebbia, and the citation problem
The first regulated industry where agents went mainstream is law, and the canonical company is Harvey. Founded in 2022, Harvey crossed $100M ARR in August 2025, and as of March 2026 sits at an $11B valuation after a $200M round co-led by GIC and Sequoia. The product is now in 45 AmLaw 100 firms, plus PwC and a long tail of corporate legal teams. The genuinely impressive thing about Harvey is not the valuation — it’s that the major firms, who are professionally paranoid, have decided to trust the workflow.
The trust came from designing for the four rules above. Specifically:
Harvey’s tasks are narrowly scoped. Diligence review, contract analysis, litigation research, drafting from precedents. Not “be a lawyer.” The product is organised around discrete workflows that a senior associate would recognise as “tasks I would delegate to a junior, but never sign without reviewing.” Harvey’s Legal Agent Bench, released in May 2026, contains over 1,200 agent tasks across 24 practice areas — and the framing is deliberate. They are tasks, not roles.
Citations are retrieval-grounded. After the Mata v. Avianca fiasco, every serious legal AI vendor learned the same lesson: never let the model generate a citation. Every cited case, statute, or contract clause comes from a retrieval step into a vetted legal database, and the agent’s prompt explicitly forbids citing anything not retrieved. If the retrieval finds nothing, the agent says so. Harvey, Allen & Overy’s Markets Innovation Group, and the rest of the serious legal stack are all retrieval-first by design.
Human signoff is mandatory and tracked. The lawyer reviews, approves, and signs the work product. Harvey logs who approved what, when, and the diff between the agent’s output and the final filed version. When a client (or the bar association) asks “did a human review this?”, the answer is in the audit log.
Hebbia sits adjacent — it’s pitched at financial research but used heavily in legal due diligence. Hebbia raised $130M at $700M in 2024, and as of mid-2026 has processed over a billion pages of documents. The product’s central insight is that the user wants the citation more than the answer. Hebbia’s UX puts the source document literally next to the generated summary, with span-level highlights showing which paragraph of which document supports which claim. The agent doesn’t get to make claims; it surfaces evidence and lets the analyst connect the dots.
The lesson from both companies: in legal, retrieval is the product. The LLM is the tool that summarises retrieved content. Reversing that polarity — making the LLM the answer-generator and retrieval an optional augmentation — is the path back to fabricated citations.
Finance — Hebbia, and the deep-research workflow
Finance shares legal’s “audit-trail-or-die” constraint, with an additional twist: the work products are often the basis of multi-million-dollar deals, and the cost of an error is not just liability but missed alpha. Hebbia’s customer list — American Industrial Partners, Oak Hill Advisors, Charlesbank, Centerview Partners, plus the US Air Force on the government side — skews toward firms whose entire business model is reading documents that competitors can’t read fast enough.
The pattern Hebbia has popularised is “deep research” — a long-horizon agent that, given a research question, plans a sequence of retrievals across hundreds or thousands of documents, drafts a structured analysis, and surfaces every claim with its source citation. OpenAI later called out Hebbia’s product as the inspiration for its own Deep Research feature, with Hebbia automating roughly 90% of the work an analyst would have done manually.
What makes Hebbia work where similar 2023-vintage “AI research” products flamed out is the obsessive grounding. Every sentence in a Hebbia output links to the document and page that supports it. The analyst’s job has shifted from “do the research” to “verify the chain of evidence.” That shift is the four-rule pattern in disguise: the human is no longer the generator, but they are still the validator, and the audit log of the validation is the deliverable.
The companies that tried to ship “analyst-replacement agents” without that grounding — there were a number of them in 2024 — uniformly failed to land in serious firms. The compliance and risk teams at hedge funds and private equity firms don’t sign off on a black box that recommends investments, even a smart one.
Healthcare — Abridge, Suki, and the HIPAA constraint
Healthcare is the strictest of the three. HIPAA imposes audit-log and data-handling requirements that exceed even financial-services regulations, and the FDA’s medical-device framework hovers over anything that veers too close to diagnostic decision-making. The companies winning healthcare — Abridge, Suki, and a handful of others — have done so by aggressively scoping themselves out of the diagnostic loop and into ambient documentation.
The dominant product category is the AI scribe. The patient and clinician have a conversation; the agent listens, transcribes, structures the encounter into the SOAP-note format (subjective, objective, assessment, plan), and presents it back to the clinician for editing and signoff. The clinician’s signature is what makes the note a legal medical record.
Abridge has executed this pattern at a scale that’s hard to overstate. It raised a $300M Series E in June 2025 at $5.3B (more than doubling from $2.75B four months earlier), reached $100M ARR by May 2025, and is deployed across 150+ health systems. Suki took longer to scale but is now in 400+ health systems and claims 70% clinician adoption inside its installed base, with notes completed 72% faster than manual documentation.
The design rules in action:
- Scope is narrow. AI scribes don’t diagnose. They don’t prescribe. They don’t suggest billing codes (though some products do this in a clearly separated workflow). They turn an audio recording into a structured note that a human edits. The narrower the scope, the easier the FDA story.
- Retrieval is the patient context. Abridge and Suki pull in the patient’s chart from the EHR — Epic, Cerner, Athena, MEDITECH — so the generated note is grounded in the patient’s documented history, not the model’s training data. Hallucinated allergies are the failure mode no one wants.
- The clinician’s signoff is the legal artifact. The note isn’t real until the clinician signs it. This is non-negotiable, and it’s what keeps the products outside the FDA’s medical-device classification.
- The audit log is HIPAA-grade. Every patient encounter, every prompt, every edit, every signoff, retained per the HIPAA retention schedule, with access controls and Business Associate Agreements with the model providers.
The asymmetric lesson: the AI scribe market is “boring” exactly because it’s narrow. Founders who wanted to build “AI doctors” pivoted to AI scribes or shut down. The companies that stayed narrow grew into multi-billion-dollar businesses.
Enterprise crossover — Glean Agents
A useful adjacent example is Glean Agents, which launched in February 2025 and rolled out general availability in May 2025. Glean isn’t a pure-play regulated-industry vendor — it’s enterprise search — but its customer base includes plenty of regulated industries, and the agent platform’s design choices mirror the four-rule pattern.
Glean Agents are scoped tasks (“update the deal record from this email,” “draft the policy summary from this document”). Every agent action retrieves from the customer’s existing enterprise knowledge graph, not the open web. Permissions are inherited from the underlying systems (an agent acting on a Salesforce record sees only what the user can see). Every action is logged. Glean reportedly hit $200M ARR by December 2025, doubling in nine months.
The pattern travels.
Anti-patterns I keep seeing
A few patterns I’d skip if I were starting an agent company aimed at any of these industries:
Pitching “AI replacement” instead of “AI assist.” The buyer is the lawyer, the analyst, the clinician. They will reject a product framed as replacing them. They will adopt a product framed as making them faster. Harvey, Hebbia, and Abridge all sell as augmentation. The 2023-era pitch decks that promised “autonomous lawyers” or “AI analysts” without human oversight are a tour of dead companies.
Treating compliance as an afterthought. SOC 2, HIPAA Business Associate Agreements, SOX-relevant audit logs — these are not features to add later. They are gating requirements for a sales conversation. The vendors who shipped them at v1 (Harvey had them in 2023, Abridge has had them since launch) are the ones with customers.
Storing prompts and completions casually. A user pastes a contract or a patient record into the prompt. That prompt is now a regulated record. Most generic LLM observability tools default to indefinite retention, which is exactly wrong for these industries. Roll your own retention policy, or pick an observability vendor that lets you set field-level redaction and short retention by default.
Letting the model write the citation. This is the only rule that matters more than the others. Models can paraphrase, summarise, rank, extract. They should never generate a citation that wasn’t retrieved from a vetted source. Every legal-AI postmortem I’ve read traces back to this mistake.
What to take away
- The high-stakes pattern is four rules, not one model. Narrow scope, retrieval grounding, mandatory human checkpoint, audit log by default. Skip any one and you don’t have a product, you have a liability.
- Retrieval is the product. The LLM is the tool. This polarity is the single biggest difference between agents that ship into regulated industries and agents that don’t.
- Scoped tasks scale; “AI replacements” don’t. Harvey, Hebbia, Abridge, and Suki all run on a library of narrow workflows. The breadth comes from the count of workflows, not from any one of them being autonomous.
- The audit log is the product no one demos but everyone buys. Compliance, security, and legal teams are the gatekeepers of these industries. The audit log is what gets them to “yes.”
There’s a reason the regulated-industry agent companies are among the fastest-growing in the entire AI ecosystem in 2026. It isn’t that their models are better. It’s that their systems around the models are built to a higher standard than the rest of the field has bothered to meet. Anyone trying to ship agents into legal, finance, or healthcare should copy the pattern. The model is the easiest part.
Further reading: Harvey’s customer page is the cleanest published list of which AmLaw firms have moved into production. Abridge’s Series E recap covers the deployment numbers across health systems. The Mata v. Avianca sanction remains the single best teaching example of why legal-AI citation discipline is non-negotiable.