Company

Engineering Acceleration Playbook

Yuval Raz

R&D Group Lead at Notch

Yuval Raz is R&D Group Lead at Notch, where AI agents handle customer support autonomously. He’s currently building the engineering org, the quality culture, and - apparently - personal operating systems in his spare time.

Stay ahead in support AI

Get our newest articles and field notes on autonomous support.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Our CEO recently said something that crystallized what I'd been living through:

"Full-stack engineers are insanely productive today with AI tools. What used to take 3 days can now take 2 hours. But it's not just faster - it's different. There are more decisions while working with AI tools. Not less."

He was talking about the 1:1:1 model - one engineer, one PM, one designer per team. When engineering speed increases, decision density increases. The smallest effective team shrinks. The demands on each person in that team grow.

This post is about what happens when you apply that same acceleration to leadership. I'll walk through how I built an AI-augmented operating system in 20 days, the mistake that almost made it counterproductive, and five principles you can use to build your own, whether you're leading three people or thirty.

I showed up to Notch three weeks ago as R&D Group Lead. ~12 engineers, expanding internationally, AI-powered product, and an intense delivery culture that had outpaced the management structure around it. Not broken - fast. The kind of org where output is high but the systems to sustain it at scale haven't caught up yet. Standard leadership firehose: names, systems, priorities, promises. The kind of first month where you either build a system to process the velocity or the velocity processes you.

I chose to build a system. That was the right call. And the wrong one. Let me explain.

‍

The Pen

I carry a digital pen tablet everywhere. Every meeting, every thought, every half-formed idea gets captured in handwriting. The tablet syncs to the cloud as images - messy scrawl, arrows, mixed languages, doodles in the margins.

The habit predates AI. It predates this job. The pen is my anchor because week one at any leadership role is a firehose, and the only thing you can control is capture. Processing comes later.

My first note, timestamped 09:11 on Day 1: "Sprint extended for 1 week. Board updates are not constant. Bottleneck on the frontend."

That last observation, bottleneck identification on hour one, would turn out to be the pattern. The notes weren't just capture. They were diagnosis. But diagnosis that lived in my handwriting, invisible to everyone else, including me the next morning.

‍

The Spark

Day 10. One line on my todo list:

connect handwritten notes with Claude Code

Simple idea: pipe handwritten notes through an AI vision API, get structured markdown out, then query that data alongside the engineering board. By lunchtime I had a working ingestion pipeline. By evening, I'd built a CLI that let me ask natural-language questions about our sprint board. By nightfall, the first digest ran - a morning briefing combining board state, stuck issues, team load, and yesterday's handwritten notes, cross-referenced by AI.

Three separate streams - handwriting, project management data, AI synthesis, connected in a single day.

Over the next 48 hours, it compounded. Board accuracy measurement. End-of-week digest generation. Automated 1:1 prep sheets pulling context from issues and notes. Meeting transcript integration. Customer health snapshots combining engineering state with implementation team commitments.

Here's how the system grew:

‍

By Day 20, the system had a name (JavOS - from a childhood nickname), a monorepo structure, 312 tests running in under 0.2 seconds, and eight AI skills that taught my Claude Code co-pilot how to query the board, prep for meetings, and run structured end-of-day reviews.

Every script was tested from day one. Not after the fact - as part of the build. The same AI session that wrote the digest engine wrote the tests for it. This wasn't perfectionism. It was survival: if the tool that drives your morning decisions has a silent failure, you're leading with wrong data. A digest that says "nothing stuck" when twelve things are stuck is worse than no digest at all. Quality isn't a phase you add later. It's baked into the process, or it's missing entirely.

Each morning before standup: yesterday's board changes, stuck issues, team load distribution, my own handwritten notes, cross-referenced and synthesized. Built entirely with AI assistance in the gaps between actual leadership work.

This is what the 1:1:1 model looks like applied to leadership itself. One leader, one AI co-pilot, one shared context layer. The AI doesn't decide, it surfaces patterns, runs queries, generates digests. The leader decides what matters.

‍

The Telescope

And then my CTO said something that reframed everything.

I'd been showing him the system, the automated digests, the customer health snapshots, the board accuracy trends. He was genuinely impressed. Then he asked a simple question:

"For each customer in the pulse hub, have you sat down with the account lead and the assigned engineer? Do you know what the customer actually needs, from the people doing the work?"

I paused. For some of them, no. I had beautiful snapshots. I had data. But I hadn't had the conversation.

The uncomfortable truth was that I already knew this about myself. Week 1, I'd written "1:1s with all engineers" on my notepad and then built an ingestion pipeline instead of finishing the 1:1s. The tool I built to surface critical tasks was itself a product of avoiding a critical task.

His feedback crystallized into a principle I now carry everywhere: tools are the telescope, not the ground truth.

The board CLI can tell me who's stuck. The meeting transcript can tell me what was said. The accuracy tracker can tell me whether sprint commitments are reliable. But none of that replaces sitting across from a person, reading the room, and understanding what the numbers actually mean for the humans behind them.

A customer health snapshot without the human conversation behind it isn't readiness, it's a false sense of readiness. The most dangerous kind.

‍

The Pattern

This connects directly to how we think about AI at Notch. Our product is built on a foundational design principle: we assume the model can be wrong, overly confident, or socially engineered. That's why we pair AI reasoning with deterministic guardrails - the AI handles complexity, but hard rules prevent it from crossing boundaries that matter.

The same principle applies to leadership tooling. AI gives you speed and structure. It surfaces patterns you'd miss manually. But it can also give you the world's most sophisticated excuse for staying in your comfort zone.

The system is fast. Humans are unpredictable. The unpredictable part is where the real pattern recognition happens, the kind no algorithm replaces. A developer who says "it's fine" but avoids eye contact. A customer-facing lead whose tone shifts when you ask about a specific account. An engineer who volunteers for extra work because they're afraid to say they're overloaded.

The tooling doesn't catch that. The pen might. But only if you're in the room.

‍

The Playbook: Five Principles for AI-Augmented Leadership

Here's what I'd tell any engineering leader starting this journey:

1. Start with capture, not processing

The pen habit was running for months before I built anything digital. Don't start with the AI tool, start with the raw input. What's your firehose? Email? Slack? Meeting notes? Find a capture method that works for you without technology, then build the processing layer on top.

2. Bake quality in from the first line

312 tests for personal scripts sounds excessive until you realize the alternative: leading with wrong data and not knowing it. Every script I wrote shipped with its tests in the same session, built by the same AI pair. This isn't about testing discipline, it's about building a system you can trust at 9:45 AM when you haven't had coffee yet and the digest says three people are blocked. If you wouldn't ship untested code to production, don't ship untested tools to your own decision-making.

3. Automate the synthesis, not the judgment

AI is extraordinary at combining four data sources into a coherent briefing. It's terrible at deciding what the briefing means for your team. Let it surface - "these three engineers have had stuck issues for five days" -but never let it conclude "therefore, reassign the work." That's your job. The moment you automate judgment, you've delegated leadership.

4. Earn the trust behind the data

When the system generates accurate data for 14 consecutive mornings, you stop double-checking. That's infrastructure. But infrastructure measures whether the numbers are right. It doesn't measure whether you understand what's behind them. For every automated snapshot, ask: have I had the human conversation that gives this data meaning? If not, the snapshot is incomplete, no matter how accurate it is. Show up. Ask. Listen.

5. The compound effect is your superpower

Day 1: handwritten notes on paper. Day 10: notes flowing into a queryable system. Day 15: automated daily operations. Day 20: a named, tested, documented operating system. None of it was planned from the start. Each layer made the next one possible. Don't design the end state, build the next useful layer.

‍

What's Next

I'm three weeks in. The personal operating system works. The question now is: what happens when it's not just me?

Scaling to the team. The same 1:1:1 model that let me build JavOS in 20 days applies to every squad we're building. Each team lead needs their own context layer, not a copy of mine, but the same pattern: capture, synthesize, decide. I'm designing an org structure where each squad has the autonomy and the tooling to operate with full context, not just full backlogs.

From co-pilot to orchestrator. At Notch, we build AI agents that handle customer service autonomously. Our design philosophy - AI handles complexity, deterministic guardrails prevent it from crossing boundaries, applies directly to how we're thinking about AI inside engineering itself. Not just code completion, but workflow orchestration: automated sprint health checks, predictive delivery risk, customer health monitoring that combines what engineering is tracking with what the customer is actually saying. The agent doesn't replace the leader. It makes the leader's pattern recognition faster and better-informed.

Quality as culture, not phase. The 312 tests aren't a testing strategy. They're a statement: quality is embedded in the process, not bolted on after. Every tool, every script, every AI skill ships tested - not because someone filed a QA ticket, but because the builder treats quality as a first-class constraint. This is the same principle we're building into the engineering org: an agent-ready codebase where quality isn't a gate you pass through, it's the material you build with. When AI can generate code at 10x speed, the only thing that prevents 10x bugs is a culture where testing is as natural as writing the code itself.

Here's what I believe will happen in the next two years: the engineering teams that win won't be the biggest. They'll be the most context-rich. Four-person squads will outperform twelve-person teams, not because they're better engineers, but because each person has full context, AI-augmented synthesis, and the autonomy to act on what they see. The bottleneck won't be headcount. It will be how fast a team can go from "something feels wrong" to "here's what we're doing about it."

The leaders who build these teams will need three things: tooling that makes context available (not just data - context), a quality culture that keeps the tooling trustworthy, and the discipline to stay in the room when the dashboard says everything is fine.

The pen captures. The system processes. The leader understands. All three, or it doesn't work.

‍