Platform

Guardrails and Escalations for AI Agents in Regulated Industries

Elool Jacoby

Co-founder and CPO at Notch

Elool Jacoby is the co-founder and CPO at Notch.cx, an autonomous AI customer support platform that uses agentic architecture to resolve customer inquiries end-to-end.

Stay ahead in support AI

Get our newest articles and field notes on autonomous support.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

In regulated industries, “good answers” are not enough. AI agents must stay compliant, protect sensitive data, and know when to stop, hand off, or escalate. This post lays out practical guardrail patterns for insurance and finance, then goes deeper on the Notch approach: LLM-as-judge guardrails, technical defenses against misuse (like prompt injection), and deterministic business limitations that block risky actions even if the model tries to proceed.

Why guardrails and escalations for AI agents matter in regulated industries

Insurance and financial services operate inside dense rulebooks: privacy requirements, auditability expectations, strict claims and complaint handling standards, and risk controls that exist to protect consumers and institutions.

AI agents add huge leverage, but they also introduce a new kind of operational risk: a single “helpful” response can become an unauthorized disclosure, an unapproved commitment, or a policy violation. Guardrails define the boundaries. Escalations define what happens when the boundaries are reached.

What guardrails are (and what they are not)

Guardrails are enforceable constraints that shape what the agent is allowed to say and do. They are not just “nice prompting.” In regulated environments, guardrails need to be:

Specific: mapped to real policies and regulations, geo and jurisdiction limits (per state, country, regulator)
Repeatable: consistent across time and across conversations
Auditable: you can explain why the agent refused, handed off, or escalated
Action-aware: the riskiest moments are often tool actions (refunds, payouts, account changes), not just text replies
‍

Guardrails at Notch: a layered system

At Notch, we treat guardrails as a multi-layer stack, because no single technique is enough in real production environments.

Notch's 5 layers of guardrails in AI agents

Layer A: LLM-as-judge guardrails (pre-made and org-specific)

This layer uses LLMs in a “judge” role to continuously evaluate the conversation against boundaries.

Two types of judge guardrails:

Pre-made guardrail agents that detect common, cross-industry issues, for example:
- The conversation is stuck in a loop, or the customer is becoming frustrated.
- repetitive or unproductive back-and-forth
- sensitive categories like political content or other disallowed topics (based on the organization’s rules)
Organization-specific guardrail agents tailored to a tenant’s exact policy and compliance needs. Example: “Do not compare pricing against competitors in chat,” or “Never make coverage commitments, you only allowed to explain the process.”
Knowledge Validation is an agent function where the agent assesses the retrieved information to determine if it contains the answer or the needed content. If the answer is not present, the agent takes alternative actions, such as initiating a web search or escalating the query to a human agent.

Why this matters: regulated companies rarely share the same exact boundaries. Your policy language, risk tolerance, and compliance interpretations vary by organization. This layer makes those rules explicit and enforceable.

Layer B: Technical guardrails (prompt-injection and misuse defenses)

Some threats are not “policy gray areas.” They are adversarial attempts to manipulate the system: prompt injection, instruction smuggling, tool abuse, or attempts to extract internal logic.

This layer is built into the system so it does not rely on per-tenant configuration. The goal is to reduce the probability that a malicious or tricky user can jailbreak the agent into unsafe behavior.

For example, we recently blocked a real case where a user attempted to inject system prompts or run background HTML/scripts through their message, aiming to receive a refund greater than their original purchase amount

Layer C: Deterministic Access Limits (classification and eligibility)

This layer answers: “What is this user allowed to do or see, given what we know about them right now?”
It’s deterministic because it’s driven by clear system states and data points (authentication status, verification level, user data, policy role, account ownership, channel, region) - not by model judgment.

Think of it as capabilities by identity class.

Examples

Unauthenticated user can:
- Ask general questions about coverage, timelines, required documents, claim status process
- Receive links or instructions for spesific topics
- Start a “draft intake” that collects non-sensitive info (depending on policy)
Unauthenticated user cannot:
- Receive account-specific data (policy number, claim status, payout details)
- Make changes to a policy
- Get any PHI/PII beyond what’s necessary to authenticate
- Open or modify or open a claim
Authenticated but not fully verified can:
- View limited claim status (high-level milestones)
- Submit missing documents
- Ask clarifying questions about next steps
Authenticated and verified can:
- Access full account-specific information
- Proceed with claim filing flows (within policy)
- Receive detailed explanations tied to their own policy and claim

This layer is what prevents the classic regulated failure mode: the AI being “helpful” to someone who is not entitled to the information.

Layer D: Deterministic Business Limits (actions, tools, counters)

This layer answers: “Even if the user is eligible, what actions are allowed right now?”
These are deterministic rules that govern tool execution and high-risk outcomes, often scoped by tenant, policy, user segment, and rolling counters.

This is the hard stop layer: even if the model tries to take an action, the system blocks it.

Examples

Refund/payout caps:
- per transaction (max amount)
- per day/week/month (rolling counters)
- per customer segment (VIP vs standard)
Claim handling constraints:
- prohibit actions entirely without a specific data point (e.g, “approve payout”)
- require adjuster review above a threshold or for specific categories
Account actions:
- prevent changing address/bank details unless specific verification steps passed
- block cancellation unless retention workflow completed
Risk throttles:
- limit how many “goodwill credits” can be issued per agent per day
- block repeated attempts at the same action after N failures

Why both layers matter

Access limits stop unauthorized disclosure and workflow initiation.
Business limits stop unauthorized or risky execution, even for fully verified users.

LLMs are optimized to be helpful and to keep users satisfied, which means they can sometimes overreach. In regulated environments, that “helpfulness” can turn into overpromising or jump betwen steps - implying a refund will happen, suggesting a claim will be approved, confirming eligibility, or sounding more certain than policy or law allows.

Even when the model is trying to do the right thing, the risk is that it commits to outcomes it cannot legally or operationally guarantee. That’s exactly why deterministic rules are so important: they act as hard, auditable constraints on both access and actions, ensuring the system stays compliant even when the model’s natural tendency is to say “yes” to keep the customer happy.

Layer E: Deterministic Geo and Jurisdiction Limits (state, country, regulator)

This layer answers: “What is allowed in this user’s jurisdiction?”
It’s deterministic because it’s driven by location + legal entity + product + regulator rules (not model discretion). The same request can be allowed in one place and prohibited or require different disclosures, timing, or escalation in another.

What it keys off

Customer location (state, country, province) and sometimes travel context
The insured risk location (important in insurance)
The regulated entity and license (which carrier/bank entity is servicing the user)
Applicable regimes (US state DOI rules, EU GDPR, UK FCA, etc.)

Examples

US insurance, state-by-state differences
- Claims handling timelines, required notices, and consumer protections vary by state.
- A complaint or claim delay may trigger different escalation rules depending on the state DOI expectations.
- Renewal and cancellation times and terms vary by state.
EU/EEA privacy (GDPR) vs US privacy
- The agent may need to route data access/deletion requests into a GDPR-specific workflow (identity verification, logging, deadlines) for EU users, while US handling might follow different privacy frameworks and internal policy.
Cross-border data handling
- If the user is in Europe, you may prohibit certain data transfers or require specific processing paths, so the agent must route to the correct regional system or team.

How it interacts with the other deterministic layers

Access limits (C): “Is this user authenticated/verified enough to access this data?”
Geo limits (E): “Even if they are verified, does their jurisdiction allow this flow, disclosure, or data handling path?”
Business limits (D): “Even if allowed, are we permitted to execute this action right now given thresholds and counters?”

This third deterministic layer is what prevents a common failure mode in regulated AI: behaving correctly for one region while accidentally violating rules in another.

Best practices for implementing guardrails and escalations

Map rules to outcomes, not just categories

Avoid guardrails that only label content (“this is sensitive”). Guardrails should also decide the operational outcome: refuse, offer handoff, or forced escalation.

Put deterministic limits around the highest-risk actions

If an action can create financial loss, regulatory exposure, or irreversible customer impact, enforce it with business limitation guardrails rather than hoping the model behaves.

Build a legal-led test suite using LLM-vs-LLM simulations

One of the most effective practices is a test suite authored by legal and compliance teams.

How it works:

LLM A (your AI agent) following your guardrails.
LLM B plays a simulated customer whose with a spesific goal (e.g. get his claim status).
Each scenario has a measurable expected outcome.

A concrete insurance-style simulation:

The simulated customer tries to push the agent into opening or progressing a claim that triggers your “high-risk” handling rule (for example, “above X dollars,” or “requires adjuster review”).
The test checks whether the conversation correctly ends in the intended outcome: refusal, optional handoff, or forced escalation to an adjuster queue.

This turns compliance from a static document into an executable standard you can run continuously.

What happens when a guardrail triggers: three end states

When a conversation hits a boundary, there are three clean outcomes an AI “elevator” can choose from:

Refuse, then continue
The agent says it cannot help with that specific request, but keeps the conversation open for safe topics.
Offer a handoff (customer choice)
The agent explains it cannot continue on that topic and asks whether the customer wants to move to a human agent.
Forced escalation (no customer choice)
The agent informs the customer it is transferring the conversation, then automatically routes them into the relevant human queue. This is appropriate for high-risk cases: potential PHI exposure, security signals, or regulated workflows where delay or missteps are unacceptable.

This structure keeps the experience predictable for customers and operationally safe for the business.

Below is a rewritten, example-driven version of the escalation section that uses the exact guardrail layers from the article and ends in one of the three outcomes: reply, offer handover, or forced escalation.

Escalation pathways in conversational AI

Here’s an example of how we apply the escalation framework. We’ll use an insurance policy “Coverage” question as the example, where grounding is missing, with the goal of preventing made-up answers.

Scenario
Customer: “Does my policy cover flood damage in a finished basement?”

Why escalation exists here
This is where LLMs tend to “sound sure” and fill gaps when the needed clause is missing, incomplete, or conflicting. The highest risk is confabulation: confidently stated but false content that misleads the customer. NIST Publications highlights this class of risk for generative AI systems.

Guardrail layers involved
LLM-as-judge (knowledge validator): the judge evaluates whether retrieved content actually contains the answer. If grounding is missing, it blocks the agent from “guessing” and pushes the flow to a safer outcome.

Expected behavior

Retrieval returns: “No relevant disclosure found” so the judge labels: “High hallucination risk” / “No relevant knowledge”

What to do once triggered

First, Offer handover (customer choice): The AI agent should say it cannot confirm coverage without reviewing the policy clause and offer a transfer to a licensed agent/claims specialist.
If the user keeps pressing for a definitive yes/no, switch to forced escalation to prevent the model drifting into invented certainty.

For more examples, please read the following article, where we highlighted additional scenarios from other industries.

‍

Conclusion: compliant AI that still feels helpful

Regulated industries do not need “less AI.” They need safer AI.

The winning pattern is layered:

LLM-as-judge guardrails for real-time boundary detection
technical defenses against misuse and prompt injection
deterministic business limitations that block unsafe actions
clear escalation pathways tied to real compliance workflows

When these pieces work together, AI agents can deliver fast, high-quality support while staying inside the rules that insurance and finance organizations must live by.

‍