Home
/
AI Accountability in Insurance

Who's Responsible When AI Goes Wrong | An Accountability Framework

Insights from Notch Team
May 26, 2026

"The AI did it" is not a defense in any insurance regulatory regime that exists today. State Departments of Insurance, the NAIC model laws, GDPR consumer rights, and FCA conduct rules all anchor accountability on identifiable parties - the carrier, the licensed agent, the regulated entity. None of them recognize the model as a responsible party. None will. That is the legal baseline every accountability framework starts from.

What changes with AI agents is not whether accountability exists. It is how the responsibility chain is structured, what each link is expected to deliver, and what the audit trail must prove when a failure surfaces. This piece walks the four-party chain operating behind every Notch agent deployment, the scenarios that test each party's responsibility, and what carriers should require in writing before signing a vendor.

Why accountability matters more, not less, with AI

The intuition that AI somehow diffuses responsibility runs in the opposite direction of how regulated environments actually work. Volume increases scrutiny. A workflow that processes 100 claims a day generates limited exposure on any single mistake. The same workflow at 10,000 claims a day with an agent in the loop generates correspondingly more exposure - and the same regulatory expectations.

Regulators do not adjust their standards downward because the carrier deployed AI. They adjust upward. The premise behind every conversation with state DOIs on AI in claims handling is consistent: if you automated it, you also automated your obligation to control it. Accountability tightens.

This is why the responsibility chain matters more, not less, the moment agents enter a workflow. Each link has to be named, the contractual obligations have to be explicit, and the audit trail has to be dense enough to settle a dispute after the fact.

The four-party accountability chain

Every Notch agent deployment carries accountability across four parties. The chain is sequential. Each link is responsible for a different layer of the failure surface, and each is held to a different standard.

The vendor

The vendor is responsible for the architecture - the guardrail layers, the model selection and swap mechanics, the audit logging system, the recovery paths, and the boundary between what the agent can and cannot do regardless of how it is configured. Architectural failures are vendor-side. A prompt injection that bypassed Layer B is a vendor issue. A logging gap that obscures what an agent did is a vendor issue. A model that produces consistent hallucinations on a documented pattern is a vendor issue.

What the vendor is not responsible for: tenant-specific configuration that turns off a guardrail, business rule definitions the carrier sets, and the workflows the carrier scopes to the agent. Those move down the chain.

The carrier

The carrier is responsible for configuration and scope. Which workflows the agent runs, what business rules govern those workflows, what the guardrail thresholds are set to, what falls inside the autonomous boundary and what escalates. The carrier owns the policy interpretation that the agent executes; the agent does not interpret coverage independently.

Operationally this means the carrier has named owners for the deployed agent's behavior. Not "the AI is responsible for X." A specific ops leader is responsible for X, the agent runs it under their authority, and the deviation from the rules they wrote is their problem to address first.

The human in the loop

For any workflow with a defined escalation path, the human receiving the escalation is responsible for the decision made on that file. The agent prepares the file. The human decides. That responsibility does not transfer because the agent did the prep work. If anything, it sharpens, because the file the adjuster receives is structured, the questions are pre-asked, and the relevant policy form is attached. There are fewer excuses for missing context.

Where the agent did not escalate but should have, responsibility moves up the chain to configuration (carrier) or architecture (vendor) depending on whether the escalation rule existed and failed, or did not exist.

The audit trail

The fourth party is not a party in the legal sense. It is the artifact that determines how the other three are held accountable. Every action, every blocked action, every escalation, every decision path that informed the answer - all of it lands in a log dense enough to reconstruct what happened.

The audit trail is also the protective infrastructure. A carrier that can produce a log showing the agent operated inside its configured boundaries, the human in the loop reviewed the escalation as required, and the vendor's architecture functioned as specified has a defensible position. A carrier that cannot does not.

What the audit trail must prove

For accountability to function, the audit log has to answer five questions on demand:

  • What action did the agent take? With timestamp, parameters, and downstream system response.
  • What action did the agent attempt and not take? With the layer that blocked it and the reason.
  • What information did the agent retrieve, and did the validation layer approve it? The provenance of the answer matters as much as the answer.
  • What escalations were generated, to whom, with what context? The handoff record is as load-bearing as the agent action record.
  • What configuration was in effect at the time? Business rules, guardrail thresholds, model version, tenant policy. Configuration changes after the fact cannot rewrite what happened.

A log that answers all five settles disputes. A log that answers only the first creates them.

Contractual structure that holds the chain together

Accountability without contractual structure is a story carriers tell their regulator. The contract is what makes the story enforceable. The structures that matter:

  • Indemnification scoped to the vendor's architectural surface. The vendor indemnifies for failures of the architecture, not for tenant-specific configuration choices. The line has to be explicit.
  • SLAs on the audit log. Availability, completeness, retention period, regulatory subpoena response time. These are not nice-to-haves. They are the operational guarantee behind the accountability story.
  • Model swap rights. The carrier retains the right to require a different underlying model if the deployed model demonstrates a pattern the carrier finds unacceptable. Vendor lock-in to a single model is a liability transfer the carrier should not accept.
  • Configuration change logs. Both vendor-side and carrier-side. Anyone who touched a setting is in the log. Disputes resolve faster when configuration history is available.
  • Right to audit. The carrier reserves the right to inspect the production audit log on demand, not on the vendor's schedule. The right is not theoretical; it is exercised.

Regulatory framing

The accountability conversation does not happen in a regulatory vacuum. Across the regimes that govern US and EU insurance operations, several frameworks are converging on the same expectation: the regulated entity remains accountable for the conduct of any AI deployed in its name.

  • NAIC Model Bulletin on AI: regulated insurers are expected to maintain governance frameworks specifically addressing AI use, including documentation of model risk management, third-party vendor oversight, and consumer outcome testing.
  • State DOI guidance: several states have issued explicit guidance treating AI-driven claims decisions as the carrier's decision for regulatory purposes. The vendor is not a shield.
  • EU AI Act: insurance underwriting and claims handling are classified as high-risk uses, with specific obligations on data quality, transparency, human oversight, and post-market monitoring.
  • GDPR Article 22: data subjects retain the right not to be subject to a decision based solely on automated processing, including profiling. Claims handling workflows have to preserve a meaningful human review path.

None of these frameworks blames the model. All of them require the regulated entity to demonstrate control over the model. That is the accountability chain made statutory.

Three scenarios and their named owners

Scenario: Hallucinated coverage statement

An agent tells a policyholder that a peril is covered when the policy form excludes it. The statement is recorded in the audit log; the customer relies on it; the claim subsequently denies.

Owner: vendor first, then carrier. Vendor side: was Layer A knowledge validation in place, and did it fire? If validation should have caught it, the architectural failure is the vendor's. Carrier side: was the policy form ingested correctly, and was the agent scoped to make coverage statements at all? If the carrier scoped the agent for coverage interpretation despite the operational guidance against it, that is configuration responsibility.

Scenario: Wrong-action payment release

The agent approves a payment that exceeds the per-transaction cap. The payment lands in the wrong account.

Owner: vendor primarily, with a configuration question. Layer D should have blocked the action. If it did not, the deterministic layer failed and the vendor is on the hook. If Layer D was configured with a cap higher than the carrier's policy because someone changed the setting, the configuration change log answers who and why.

Scenario: Missed escalation on a complex claim

A complex claim with coverage ambiguity processes through the agent and closes without adjuster review. The customer disputes; the adjuster sees the file for the first time after the complaint.

Owner: carrier primarily. Was the escalation rule in place? Did it specify the patterns that should have flagged this file? If the rule was missing or under-specified, the configuration responsibility is the carrier's. If the rule existed and fired and the human in the loop did not pick it up, responsibility is on the human in the loop. If the rule existed and the agent did not fire it despite the trigger being met, the vendor is on the hook.

What buyers should require in writing

Before signing a vendor contract, the carrier's contracting and compliance teams should require:

  • A named architectural failure surface, with the failures the vendor takes responsibility for explicitly enumerated.
  • Audit log SLAs covering availability, completeness, retention, and subpoena response.
  • Configuration change logging on both sides of the contract.
  • Right-to-audit clauses exercisable without advance notice.
  • Model-swap rights retained by the carrier.
  • Indemnification scope aligned with the architectural surface, not with the carrier's configuration choices.
  • Regulatory cooperation clauses committing the vendor to support carrier responses to DOI inquiries and audits.

None of these clauses are unusual. Carriers that do not require them treat AI vendor risk informally; carriers that do require them treat the vendor as a regulated third party. The latter is the right posture.

FAQ

What happens when accountability is unclear?

It is never unclear if the audit trail is dense and the contract is explicit. Disputes that drag are nearly always traceable to a missing log entry or an under-specified clause. Build the architecture so that the answer to "what happened" is one query away.

Doesn't the vendor try to push everything onto the carrier?

Carriers should expect a vendor that pushes all responsibility downstream to be a vendor that loses the diligence cycle. The architectural surface is real and the vendor owns it. A vendor unwilling to indemnify failures inside that surface is selling a product they do not stand behind.

What about new failure modes that emerge after deployment?

Both parties have an obligation to update. The vendor patches the architecture; the carrier updates the configuration. The contract should anticipate this with a change-management clause and a shared incident process.

How does this differ from a traditional SaaS vendor?

Traditional SaaS vendors deliver tools that humans operate. AI vendors deliver workflow execution. The latter requires accountability structure the former does not. Treating an AI vendor like an ordinary SaaS vendor is a category error that surfaces under regulatory scrutiny.

What is the most common gap in carrier contracts today?

Missing model-swap rights and under-specified audit log SLAs. Both are easy to fix at contracting time, expensive to fix afterward.

If you are negotiating an AI vendor contract for regulated workflows, book a demo. We will share our indemnification and audit-log SLA language as a starting point.

The AI Engine Behind
Regulated Operations

Book a Demo