Who's Responsible When AI Goes Wrong in Insurance

"The AI did it" is not a defense in any insurance regulatory regime that exists today. State Departments of Insurance, the NAIC model laws, GDPR consumer rights, EIOPA insurance-specific guidance, and FCA conduct rules all anchor accountability on identifiable parties - the carrier, the licensed agent, the regulated entity. None of them recognize the model as a responsible party. None will. That is the legal baseline every accountability framework starts from.

What changes with AI agents is not whether accountability exists. It is how the responsibility chain is structured, what each link is expected to deliver, and what the audit trail must prove when a failure surfaces. This piece walks the four-party chain operating behind every Notch agent deployment, the scenarios that test each party's responsibility, and what carriers should require in writing before signing a vendor - with the specific regulatory provisions and the August 2, 2026 EU AI Act enforcement date that make this conversation immediate, not theoretical.

‍

Why accountability matters more, not less, with AI

The intuition that AI somehow diffuses responsibility runs in the opposite direction of how regulated environments actually work. Volume increases scrutiny. A workflow that processes 100 claims a day generates limited exposure on any single mistake. The same workflow at 10,000 claims a day with an agent in the loop generates correspondingly more exposure - and the same regulatory expectations.

Regulators do not adjust their standards downward because the carrier deployed AI. They adjust upward. The premise behind every conversation with state DOIs on AI in claims handling is consistent: if you automated it, you also automated your obligation to control it. Accountability tightens. EIOPA's August 2025 Opinion makes this explicit on the EU side: outsourcing AI does not outsource accountability, and AI governance has to be embedded in the insurer's Own Risk and Solvency Assessment (ORSA) with named senior leadership owning the risk.

This is why the responsibility chain matters more, not less, the moment agents enter a workflow. Each link has to be named, the contractual obligations have to be explicit, and the audit trail has to be dense enough to settle a dispute after the fact - on a 15-day clock if it rises to a serious incident under EU AI Act Article 73.

‍

The four-party accountability chain

Every Notch agent deployment carries accountability across four parties. The chain is sequential. Each link is responsible for a different layer of the failure surface, and each is held to a different standard.

The vendor

The vendor is responsible for the architecture - the guardrail layers, the model selection and swap mechanics, the audit logging system, the recovery paths, and the boundary between what the agent can and cannot do regardless of how it is configured. Architectural failures are vendor-side. A prompt injection that bypassed Layer B is a vendor issue. A logging gap that obscures what an agent did is a vendor issue. A model that produces consistent hallucinations on a documented pattern is a vendor issue.

What the vendor is not responsible for: tenant-specific configuration that turns off a guardrail, business rule definitions the carrier sets, and the workflows the carrier scopes to the agent. Those move down the chain.

The carrier

The carrier is responsible for configuration and scope. Which workflows the agent runs, what business rules govern those workflows, what the guardrail thresholds are set to, what falls inside the autonomous boundary and what escalates. The carrier owns the policy interpretation that the agent executes; the agent does not interpret coverage independently.

Operationally this means the carrier has named owners for the deployed agent's behavior. Not "the AI is responsible for X." A specific ops leader is responsible for X, the agent runs it under their authority, and the deviation from the rules they wrote is their problem to address first. EU AI Act Article 17 requires this assignment of responsibility and authority as part of the deployer's quality management system; NAIC Exhibit B treats the same question as the test of governance program effectiveness.

The human in the loop

For any workflow with a defined escalation path, the human receiving the escalation is responsible for the decision made on that file. The agent prepares the file. The human decides. That responsibility does not transfer because the agent did the prep work. If anything, it sharpens, because the file the adjuster receives is structured, the questions are pre-asked, and the relevant policy form is attached. There are fewer excuses for missing context.

Where the agent did not escalate but should have, responsibility moves up the chain to configuration (carrier) or architecture (vendor) depending on whether the escalation rule existed and failed, or did not exist. GDPR Article 22 makes this point explicit: a solely automated decision with legal or similarly significant effects requires a meaningful human review path - the existence of that path is the carrier's responsibility, the use of that path is the human's, and the architecture that supports the path is the vendor's.

The audit trail

The fourth party is not a party in the legal sense. It is the artifact that determines how the other three are held accountable. Every action, every blocked action, every escalation, every decision path that informed the answer - all of it lands in a log dense enough to reconstruct what happened.

The audit trail is also the protective infrastructure. A carrier that can produce a log showing the agent operated inside its configured boundaries, the human in the loop reviewed the escalation as required, and the vendor's architecture functioned as specified has a defensible position. A carrier that cannot does not. EU AI Act Article 12 requires automatic record-keeping for high-risk AI systems specifically so this artifact exists by default.

‍

What the audit trail must prove

For accountability to function, the audit log has to answer five questions on demand:

What action did the agent take? With timestamp, parameters, and downstream system response.
What action did the agent attempt and not take? With the layer that blocked it and the reason.
What information did the agent retrieve, and did the validation layer approve it? The provenance of the answer matters as much as the answer.
What escalations were generated, to whom, with what context? The handoff record is as load-bearing as the agent action record.
What configuration was in effect at the time? Business rules, guardrail thresholds, model version, tenant policy. Configuration changes after the fact cannot rewrite what happened.

A log that answers all five settles disputes. A log that answers only the first creates them. The traditional approach assembles this dossier in roughly six weeks of manual work across siloed systems; an architecture built around EU AI Act Article 12 produces it in under a minute on demand. The difference is not a productivity improvement - it is whether the audit trail exists as a product of the system or as a forensic reconstruction after the fact.

‍

Contractual structure that holds the chain together

Accountability without contractual structure is a story carriers tell their regulator. The contract is what makes the story enforceable. The structures that matter:

Indemnification scoped to the vendor's architectural surface. The vendor indemnifies for failures of the architecture, not for tenant-specific configuration choices. The line has to be explicit.
SLAs on the audit log. Availability, completeness, retention period, regulatory subpoena response time. These are not nice-to-haves. They are the operational guarantee behind the accountability story and the practical answer to EU AI Act Article 12 record-keeping obligations.
Model swap rights. The carrier retains the right to require a different underlying model if the deployed model demonstrates a pattern the carrier finds unacceptable. Vendor lock-in to a single model is a liability transfer the carrier should not accept - and EIOPA's 2025 Opinion expects model swap capability as part of operational resilience.
Configuration change logs. Both vendor-side and carrier-side. Anyone who touched a setting is in the log. Disputes resolve faster when configuration history is available.
Right to audit. The carrier reserves the right to inspect the production audit log on demand, not on the vendor's schedule. The right is not theoretical; it is exercised.

‍

The regulatory map: penalties, deadlines, and named provisions

The accountability conversation does not happen in a regulatory vacuum, and the timelines are no longer theoretical. The frameworks below converge on the same expectation - the regulated entity remains accountable for the conduct of any AI deployed in its name - but each comes with its own teeth.

NAIC Model Bulletin on AI: adopted by 25 US states as of April 2026, with more coming. Insurers are expected to maintain governance frameworks specifically addressing AI use, including documentation of model risk management, third-party vendor oversight, and consumer outcome testing. The NAIC's Third-Party Data and Models Task Force (formed 2024) explicitly extends accountability to vendor AI systems and is finalizing the AI Evaluation Tool through 2026.
NAIC AI Systems Evaluation Tool: 11 states are currently piloting the tool during regulatory examinations. The tool is a four-exhibit framework - Exhibit A (AI use inventory), Exhibit B (governance controls), Exhibit C (high-risk model details, including explainability and fairness metrics), and Exhibit D (data assessment and lineage). Carriers being examined under the pilot have to produce documentation against each exhibit on a structured timeline.
State DOI guidance: Colorado's Regulation 10-1-1 and SB21-169, California's Department of Insurance Bulletin 2022-5, the New York Proposed Insurance Circular Letter, and the Connecticut Data Privacy Act all treat AI-driven claims and underwriting decisions as the carrier's decision for regulatory purposes. The vendor is not a shield in any of these regimes.
EU AI Act (Regulation 2024/1689): insurance underwriting and life/health claims handling are classified as high-risk under Annex III. Full high-risk obligations enforce on August 2, 2026; legacy systems have to be brought into compliance by August 2027. Article 9 (risk management), Article 10 (data governance and bias mitigation), Article 11 (technical documentation), Article 12 (automatic record-keeping), Article 13 (transparency), Article 14 (human oversight), Article 15 (accuracy and robustness), Article 17 (quality management system with assigned responsibilities), Article 26(11) (information to affected persons), and Article 27 (Fundamental Rights Impact Assessment for deployers) collectively define the architectural surface a deployed agent must satisfy. Article 73 mandates serious incident reporting to market surveillance authorities within 15 days. Penalties for non-compliance reach €35 million or 7% of global annual turnover, whichever is higher.
EIOPA Opinion on AI Governance (August 2025): supplements the EU AI Act with insurance-specific governance guidance. Board-level accountability is required, AI governance must be embedded in ORSA (Own Risk and Solvency Assessment), and outsourcing AI does not outsource accountability. Black-box models face heightened scrutiny and potential prohibition.
GDPR Article 22 (and Articles 13-15, 17, 35): data subjects retain the right not to be subject to a decision based solely on automated processing, plus rights of access, explanation, erasure, and a Data Protection Impact Assessment requirement for high-risk processing.
DORA (Digital Operational Resilience Act): requires strict third-party ICT risk management for financial institutions, including major ICT-related incident reporting and concentration-risk analysis on foundation model providers.
Solvency II Directive (Article 49): outsourcing requirements that explicitly preserve insurer accountability over outsourced AI activities, including the ongoing oversight of any third-party model.

None of these frameworks blames the model. All of them require the regulated entity to demonstrate control over the model. That is the accountability chain made statutory, and the August 2, 2026 enforcement date for EU high-risk obligations is the next hard deadline on the calendar.

‍

Three scenarios and their named owners

Scenario: Hallucinated coverage statement

An agent tells a policyholder that a peril is covered when the policy form excludes it. The statement is recorded in the audit log; the customer relies on it; the claim subsequently denies.

Owner: vendor first, then carrier. Vendor side: was Layer A knowledge validation in place, and did it fire? If validation should have caught it, the architectural failure is the vendor's, and it is reportable under EU AI Act Article 73 if the consumer was materially harmed. Carrier side: was the policy form ingested correctly (Article 10 data governance), and was the agent scoped to make coverage statements at all? If the carrier scoped the agent for coverage interpretation despite operational guidance against it, that is configuration responsibility.

Scenario: Wrong-action payment release

The agent approves a payment that exceeds the per-transaction cap. The payment lands in the wrong account.

Owner: vendor primarily, with a configuration question. Layer D should have blocked the action. If it did not, the deterministic layer failed and the vendor is on the hook. If Layer D was configured with a cap higher than the carrier's policy because someone changed the setting, the configuration change log answers who and why. Under Solvency II Article 49 the outsourcing oversight chain runs through the carrier regardless of where the failure originated.

Scenario: Missed escalation on a complex claim

A complex claim with coverage ambiguity processes through the agent and closes without adjuster review. The customer disputes; the adjuster sees the file for the first time after the complaint.

Owner: carrier primarily. Was the escalation rule in place? Did it specify the patterns that should have flagged this file? If the rule was missing or under-specified, the configuration responsibility is the carrier's. If the rule existed and fired and the human in the loop did not pick it up, responsibility is on the human in the loop - and may trigger a GDPR Article 22 inquiry on whether the meaningful human review path was actually meaningful. If the rule existed and the agent did not fire it despite the trigger being met, the vendor is on the hook.

‍

The August 2026 cliff and what changes

The accountability framework above is in force today on the US side and enforces in full on the EU side on August 2, 2026. After that date, the EU AI Act's Article 9 risk management, Article 11 technical documentation, Article 12 record-keeping, Article 13 transparency, Article 14 human oversight, Article 15 robustness, Article 17 quality management with assigned responsibilities, Article 26(11) information to affected persons, Article 27 Fundamental Rights Impact Assessment, and Article 73 fifteen-day incident reporting all apply to any insurance AI system serving the EU market. Penalties reach €35 million or 7% of global annual turnover. Legacy systems get an additional year - August 2, 2027 - before they too must meet the same bar.

Carriers operating only in the US face a different calendar but the same shape. The NAIC's AI Systems Evaluation Tool pilots expand through 2026, the Third-Party Data and Models Task Force formed in 2024 is finalizing guidance, and state DOIs are increasingly treating AI-driven claims decisions as the carrier's decision for regulatory purposes. The accountability chain - vendor, carrier, human in the loop, audit trail - is the operational structure that supports compliance under either framework.

The contract clauses, the audit log SLAs, and the right-to-audit provisions named above are not preparation for a future regime. They are the operational hooks the active and incoming frameworks already expect, and the diligence work has to happen before, not during, the next examination.

‍

What buyers should require in writing

Before signing a vendor contract, the carrier's contracting and compliance teams should require:

A named architectural failure surface, with the failures the vendor takes responsibility for explicitly enumerated.
Audit log SLAs covering availability, completeness, retention, and subpoena response - aligned to EU AI Act Article 12 record-keeping expectations and NAIC examination response windows.
Configuration change logging on both sides of the contract.
Right-to-audit clauses exercisable without advance notice.
Model-swap rights retained by the carrier, with a documented swap procedure that does not require vendor cooperation in a crisis.
Indemnification scope aligned with the architectural surface, not with the carrier's configuration choices.
Regulatory cooperation clauses committing the vendor to support carrier responses to DOI inquiries and EU market surveillance authority requests.
Article 73 incident-reporting workflow, committing the vendor to detection, notification template generation, and authority-facing support within the EU AI Act's 15-day window.
Article 27 Fundamental Rights Impact Assessment cooperation, with the vendor providing the technical documentation the deployer needs to complete the FRIA before placing the system in service.
NAIC Exhibit-ready exports, covering Exhibit A inventory, Exhibit B governance evidence, Exhibit C high-risk model documentation, and Exhibit D data lineage - exportable from the platform without separate IT and legal assembly.
DORA-aligned third-party reporting, for any carrier with EU operations or EU-facing foundation model dependencies.

None of these clauses are unusual. Carriers that do not require them treat AI vendor risk informally; carriers that do require them treat the vendor as a regulated third party. The latter is the right posture, and the one regulators expect to see when they ask how the carrier oversees its AI.

‍

FAQ

What happens when accountability is unclear?

It is never unclear if the audit trail is dense and the contract is explicit. Disputes that drag are nearly always traceable to a missing log entry or an under-specified clause. Build the architecture so that the answer to "what happened" is one query away.

Doesn't the vendor try to push everything onto the carrier?

Carriers should expect a vendor that pushes all responsibility downstream to be a vendor that loses the diligence cycle. The architectural surface is real and the vendor owns it. A vendor unwilling to indemnify failures inside that surface is selling a product they do not stand behind.

What about new failure modes that emerge after deployment?

Both parties have an obligation to update. The vendor patches the architecture; the carrier updates the configuration. The contract should anticipate this with a change-management clause and a shared incident process - one that lines up with EU AI Act Article 73 and DORA reporting timelines.

How does this differ from a traditional SaaS vendor?

Traditional SaaS vendors deliver tools that humans operate. AI vendors deliver workflow execution. The latter requires accountability structure the former does not. Treating an AI vendor like an ordinary SaaS vendor is a category error that surfaces under regulatory scrutiny - and the EU AI Act's deployer obligations make it a costly category error after August 2026.

What is the most common gap in carrier contracts today?

Missing model-swap rights, under-specified audit log SLAs, and the absence of an Article 73 incident-reporting workflow committed to in writing. All three are easy to fix at contracting time, expensive to fix afterward.

‍

If you are negotiating an AI vendor contract for regulated workflows, book a demo. We will share our indemnification scope, audit-log SLA language, and Article 73 cooperation provisions as a starting point.

‍

Who's Responsible When AI Goes Wrong | An Accountability Framework

Why accountability matters more, not less, with AI