Product

Versioning AI Agents in Production

Oren Partush

Full-Stack Developer

Oren Partush is a full-stack developer at Notch AI, focused on building and scaling product experiences end-to-end across the Notch help desk, AI, and infrastructure to support reliable, high-impact AI customer support workflows.

Stay ahead in support AI

Get our newest articles and field notes on autonomous support.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

The Notch No-Code Approach

Most teams learn a hard lesson the moment an AI agent goes live: iteration is easy (just change the prompt and check), but safe iteration is not. A single prompt tweak can shift tone, break a policy boundary, or degrade outcomes in ways that only appear at scale. In other words, shipping AI to production is not a one-time launch - it is continuous engineering.

At Notch, we built versioning for AI production with one guiding principle: treat agents like code. That means disciplined change management, repeatable testing, and release workflows that are collaborative without being chaotic. The difference is that we deliver those engineering-grade controls through a no-code platform, so product, CX, ops, and QA teams can participate directly in improving the agent.

Below is how we do it.

Versioning is non-negotiable for production AI

In traditional software, version control exists because change is constant and regressions are expensive. Production AI shares that same reality, plus a few extra complications:

Behavior can shift from small edits that look harmless in isolation.
Quality is best evaluated in context: real conversations, real edge cases, real policies.
You need fast iteration, but you also need traceability, rollback, and accountability.

So we started from a familiar playbook: branch, test, review, merge, deploy.

Build with confidence. Deliver through teamwork. — Build with confidence -Deliver through teamwork.

Branching like developers

At Notch, every agent evolves through versions, and versions behave like branches in software development. Instead of editing “the live agent,” teams work in a draft or branch that isolates changes from production behavior.

This creates a safe workflow:

Create a new branch (a draft version) of the agent.
Make changes: prompts, policies, tools, configuration, escalation logic, and more.
Run validation and conversational tests (more on this below).
Review outcomes and get approval.
Merge into the main version, then deploy.

This approach enables experimentation without risking live performance, while still keeping progress fast and structured.

Smart testing

AI behavior should be tested the way it will be used: in conversation. Our pipeline includes a testing suite (like CI/CD) that evaluates changes using real conversational dynamics rather than only static checks.

Two core components are central here:

Conversation simulations: we run targeted AI test suites that simulate user interactions, edge cases, and policy boundaries.
Historical conversation replays: we run the updated version against past conversations to see how outcomes change, where it improves, and where it regresses.

The goal is straightforward: before a version reaches production, it has already faced the kinds of situations production will throw at it.

Merge discipline

4 steps - conflicts, peer review, gradual rollout, and deploy. Testing is necessary, but not sufficient. In fast-moving AI iteration, the biggest operational risk is not that a change “fails” - it is that multiple changes collide, ship without review, or roll out too broadly before you see the impact. That is why our release workflow mirrors mature software delivery practices:

Merge conflicts (and explicit resolution)
When two branches touch the same behaviors, rules, or configurations, we surface conflicts during merge rather than letting one change silently overwrite the other. The goal is to make collisions visible, force an explicit decision, and preserve intent.
Pair review before merge
Every meaningful change is reviewed by at least one additional stakeholder (for example, CX lead, ops owner, product, or QA). This is not bureaucracy - it is a practical control to catch unintended side effects, validate edge cases, and confirm the change matches the original objective (in several industries, adherence to this requirement is not just encouraged, it is a regulatory imperative).
Gradual rollout
Instead of flipping the entire system from one version to the next, we support controlled rollout strategies so teams can observe performance and behavior under real traffic before expanding exposure. This reduces blast radius and makes regressions easy to contain.
Deploy with confidence (and a clear rollback path)
Once a version has cleared review and rollout gates, it is promoted to deployment. Because the full history is versioned, deploying is a deliberate promotion step, not an irreversible edit, and rollback is a known, auditable action rather than an emergency scramble.

This approach keeps iteration fast while ensuring releases remain controlled, reviewable, and safe to run in production.

Everything is versioned

At Notch, versioning extends beyond the agent definition itself. Every operational element in the system is treated as a versioned asset.

That includes configurations like:

Specific rules and guardrails
Allow and deny lists
Addresses, business details, and operational constraints
Routing logic, escalation settings, and workflow conditions

Any update to configuration is logged, versioned, and traceable. This gives teams full transparency into what changed, who changed it, why it changed, and what version it shipped in. It also enables safe rollback when needed, without guesswork.

Multiplayer collaboration

Production AI is not owned by a single person. The best outcomes happen when multiple roles contribute: product defines goals, CX provides ground truth, ops maintains business rules, and engineering ensures reliability.

Notch is designed as a multiplayer collaborative workspace where many people can work in parallel using branches and merges, similar to how developers use Git. That includes:

Multiple drafts in flight at the same time for different initiatives
Clear separation between experimental branches and production-ready versions
Structured review and approval paths before changes reach “main” (Git-style workflows)

To keep this safe at scale, we pair collaboration with strict RBAC (Role-Based Access Control). In practice, this means:

Teams can control who can create branches, edit sensitive configurations, run tests, approve merges, and deploy.
Permissions can be scoped by role and responsibility so that collaboration does not become risk.
Auditability is preserved: every change is attributable and reviewable.

The result is a system that supports rapid iteration across teams, without sacrificing governance.

The next step - AI implementation agent

We are now extending the same approach to AI-driven implementation, works inside the same versioning system.

Instead of treating AI as only the thing you deploy, we are introducing an AI implementation agent that helps you build and improve the system itself. For example, it can:

Draft SOPs and structured playbooks
Incorporate feedback into policies or agent behaviors
Propose fixes for recurring issues found in reviews or QA
Produce iterative changes that can be tested and approved

Crucially, this implementation agent operates inside the same workspace, under the same rules and versions. It does not “change production.” It creates a new branch or draft, runs the relevant tests, and then presents a proposed version for human approval. From there, you can merge it into your main version or keep it as an alternative branch for further refinement.

This keeps AI-powered implementation accountable, reviewable, and consistent with production-grade release discipline.

Controlled Release Workflow

The promise of AI agents is speed and leverage. The risk is uncontrolled change. Versioning is how you get the upside without the operational debt.

By treating agents like code, versioning every meaningful configuration, enforcing branch-based workflows, validating changes with conversational testing, and enabling collaborative work under strict RBAC, Notch brings engineering-grade production rigor to a no-code environment.

That is the standard we believe production AI deserves: fast iteration, controlled releases, and full accountability from draft to deployment.

‍