← Blog
Security

Your AI agent isn't HIPAA-compliant just because the model is good

Thien Nguyen · Jun 30, 2026

In 2026 everyone has shipped an AI agent. Far fewer have shipped one they could defend in an audit. Surveys keep finding the same gap: most security leaders are worried about AI-agent risk, and only a handful have actually put mature controls around it. Teams are deploying agents faster than they can govern them, and in healthcare, finance, or anywhere regulated, that's how a great demo becomes a reportable breach.

Here's the category error underneath it: a capable model is not a compliant system. You can point the best model in the world at protected health information (PHI) and still be wildly non-compliant. Compliance isn't a property of the model; it's a property of the architecture around it. (We've argued before that a good model doesn't make a tool HIPAA-compliant; with agents, the gap gets wider.)

An agent's compliance surface is bigger than a chatbot's

A chatbot reads and replies. An agent does things: it calls tools, queries databases, writes records, sends messages, and remembers across turns. Every one of those is a new place regulated data can leak or an unlogged action can happen:

  • Tool calls reach into systems that hold PHI, and each tool is a new data path that needs a BAA and least-privilege scoping.
  • Autonomous actions can change real state (book, cancel, message a patient). Anything affecting care can't be a black-box decision.
  • Memory and logs quietly persist PHI, often in places nobody put under a Business Associate Agreement.
  • Data egress to a hosted model provider is a transfer of PHI to a third party. No BAA with that provider, no compliance. Full stop.

The model is maybe 10% of the risk. The other 90% is everything the agent is wired to touch.

The governance checklist for agents on regulated data

If an agent goes near PHI, these aren't nice-to-haves; they're the difference between "audit-ready" and "liability":

  1. BAA chain, including the model provider. Every service that processes PHI on your behalf (cloud, database, and the LLM API) needs a signed Business Associate Agreement before a single token flows. A consumer LLM endpoint with no BAA is an instant fail.
  2. Minimize and mask PHI before the model sees it. Strip or tokenize identifiers at the boundary. The less PHI reaches the model, the smaller your breach blast radius.
  3. Human-in-the-loop on anything affecting care. Measured accuracy plus a human sign-off, not autonomous decisions on treatment, eligibility, or anything clinical.
  4. Tamper-evident audit logging of every action and tool call. Who, what, when, why, retained per HIPAA's six-year expectation. "What did the agent do at 2am?" must have an answer.
  5. Least-privilege tools. Scope each tool to the minimum data and actions it needs. An agent that can read every record will eventually read the wrong one.
  6. No training on PHI. Confirm contractually that your data isn't used to train the provider's models.
  7. Measured, reported accuracy. Evaluation is part of the build, not a launch-day afterthought, and in regulated settings you have to be able to show it.

"But it's just RAG / it's read-only"

Doesn't matter. Read-only still means PHI egresses to wherever you embed and store it. RAG still puts patient data in a vector store and a prompt. The questions an auditor asks (where did the data go, who could see it, what's logged, who signed a BAA) don't care whether your agent writes anything. They care where the data went.

The takeaway

The winners in regulated AI aren't the teams with the flashiest agent. They're the teams whose agent can pass the audit, because the governance was designed in, not bolted on after the demo got applause. If your agent touches PHI (or card data, under PCI), build the framework first and let the model be the easy part.

That's how we build production AI for regulated industries: compliance by design, with the agent governance auditors actually ask for. If that's the bar your product has to clear, here's how we work.

If you're running agents on regulated data, what's your hardest governance problem right now? Logging, BAAs, or keeping humans in the loop without killing the UX?