Treat Your AI Workflow Like You Treat Your Code

I kept having the same argument with Claude.

Not a dramatic one. More like the low-grade friction of a coworker who keeps formatting the PR description wrong no matter how many times you fix it. The AI would reach for the wrong abstraction. I’d correct it. Next session, same mistake. I’d correct it again. And again. And, guess what? Again.

At some point I realized the problem was me. While I was learning from each session, the AI’s brain got wiped. Nothing carried over unless I made it carry over. So I made a simple tweak to my workflow: at the end of each session, I’d ask “what could I change in the project instructions to make this go smoother next time?” Sometimes the answer was nothing. But often enough, it surfaced an improvement (a clarification, a constraint, a pattern to codify) to make the next session measurably better.

That small habit changed how I think about AI-assisted development. The model matters, sure, and the tooling matters, but the thing that’s made the biggest difference for me is treating my AI workflow as something I actively maintain, not a setup task I do once and forget about.

Process Debt

Most developers I talk to set up their AI tools, maybe write a CLAUDE.md or drop in a rules file, and then… leave it. The configuration becomes write-once infrastructure. Some don’t revisit it even when they switch to a new model.

But an AI workflow isn’t just infrastructure. It’s a living codebase. And anyone who’s been writing software for a while already knows what happens to code that doesn’t get actively maintained.

Think about the red/green/refactor cycle. Red/green is the work of the session: get the thing working. Refactor is the step most people skip: step back and think about what to change about how we got there. Just like skipping refactor in code leads to technical debt, skipping it in an AI workflow means solving the same problems over and over without compounding (I like to call it “process debt”).

The good news is this doesn’t have to be heavy, it can be as simple as one question at the end of a session. But it does have to be deliberate! After all, we wouldn’t expect a new teammate to just figure things out without investing in the working relationship; it’s the same idea here.

Lots of Roads, Same Direction

I asked around SEP and was impressed to find that people had arrived at the same underlying idea from very different directions.

If the AI gets something wrong, don’t get frustrated. Instead, ask yourself, what additional context could I have provided so it got it right?

– Steven Mills, Software Engineer 2

I love the practicality of this advice: treat AI mistakes as feedback on what’s missing from the instructions, or what’s gone stale in them. Steven doesn’t update after every session, though. He keeps a mental log of mistakes, and when they start clustering into a category, that’s when he invests in a fix. Dave Mott takes a similar approach, focusing specifically on capturing framework quirks and configuration choices that differ from what the AI would find on the internet. Those are the things it’ll get wrong every time unless told otherwise.

Nathan Sickler took this a step further and built the question directly into his workflow. He added an instruction that asks the AI to append a self-assessment to every substantive response, including a confidence score, any assumptions it made, and one concrete improvement to the instructions that would make the next session better. In practice, he found the assumptions section is where he gets the most value. More than once, he’s skipped verifying the rest of a response entirely because the self-assessment surfaced a fundamentally incorrect assumption he could correct immediately.

Brian Ball connected this to something we already know how to do. In Agile, the one required practice is the retrospective: make time to reflect on what works, consider how to change, and change. His point was that if AI is helping ship 5x as much code, reviewing the agent and skill files daily is the equivalent of a weekly retro. The cadence of reflection should scale with the pace of output.

Brian’s team took this further than anyone else I talked to. They built a pipeline where the developer captures observations while the agent implements, and specialized review agents document their findings separately. A reconciler then compares the two and splits the results into process improvements and product changes. They even built checks that remind the developer to capture their observations if they forget. The part that stuck with me: they ask “how should we work differently?” alongside “what should we build differently?” every single cycle (I wrote previously about how shared AI infrastructure enables more human collaboration, not less and this is that idea in practice). Joe Coy, also on that team, emphasized that any rhythm requiring people to articulate what they’re doing (retros, mob reviews, onboarding) sharpens thinking and surfaces new ideas. The key is not letting those rhythms become purely ceremonial.

One thing I have found increasingly helpful is stopping for a second when something is going off the expected path. I then fork my session and work with it in this forked session to improve something. Once those configurations are updated, I can go back to the original session with the new configurations and restore to a checkpoint to see if it does better at doing the same task with the updated configurations.

–Brian Hanford, Senior Software Engineer

Other colleagues described their own variations:

Craig Belcher tells the AI to persist what it learned, letting it update agent files or skills directly.
Zachary Mayhew pauses at the end of each session to think about if it’s something he’ll repeat in the future. If so, he creates a skill for it.
Wes Hoffman had his agent generate lessons-learned documents after each effort, then fed those back into new sessions to produce better results.

This isn’t just happening inside SEP. Kieran Klaassen coined “compounding engineering” to describe exactly this loop: every interaction teaches the system, every bug becomes a permanent lesson. Birgitta Böckeler’s article on harness engineering on Martin Fowler’s site frames it through feedforward controls (guides that steer the agent before it acts) and feedback controls (sensors that help it self-correct after). Both agree: maintaining an AI workflow is an ongoing practice, not a one-time configuration.

What to Watch Out For

It would be easy to end the post here and say “just do the thing”, but the practitioners I talked to were just as clear about the failure modes as they were about the benefits.

David Sweet, whose team built the reconciliation pipeline I described earlier, laid out four risks he’s seen in practice:

Instruction bloat: continuously adding to agent files can overwhelm the model and produce unexpected behavior. More isn’t always better.
Coordination overhead: when multiple people are updating agents, the next developer who picks up those changes is essentially the tester.
Diminishing returns: as instructions solidify, incremental tweaks yield less value, and near deadlines they can feel like unnecessary risk.
My favorite: spending more time tweaking the pipeline than actually using it. At some point, optimizing the workflow becomes a way to avoid doing the real work, even though it feels productive.

Paul Steele shared the most surprising (to me) observation: he used to refine his conventions religiously, and it helped a lot, then newer models came out and he got better results by deleting everything and starting over. The conventions he’d painstakingly built for one model had become baggage for the next. Andrew Dunlap noticed something similar: every new model tends to make some existing rules redundant, but it’s hard to know which ones without rebuilding from scratch.

The people building these models are seeing the same thing. Anthropic’s own engineering team wrote recently about how harnesses encode assumptions about what the model can’t do on its own, and those assumptions go stale as models improve. Their example: a workaround they’d built for Sonnet 4.5 (context resets to prevent premature task completion) turned out to be dead weight on Opus 4.5, where the behavior had simply disappeared. Every rule in a workflow is implicitly a bet about the model’s current limitations, and those limitations keep shifting.

John Iles offered the most direct pushback: he doesn’t adjust any of this stuff, and Claude has been working well for him since Opus 4.5 with default settings. “Why change what clearly works?” That’s a legitimate position. Not everyone’s workflow needs systematic refinement, especially if the defaults are already producing good results for the type of work being done.

And then there’s Rui Liu’s, whose take was very pragmatic: updating after every session feels like overkill. She sees the value when domain rules change, after a big refactor, or when the same annoying problem keeps hitting the team, otherwise, the overhead isn’t worth it.

There’s a real tension here. The model-upgrade problem is accumulated investment getting wiped out by forces outside your control. That never feels good! But I think the practice is still worth doing, because the real value is in getting in the habit of noticing what went well, what didn’t, and why.

Now What?

The specific mechanism matters less than the mindset behind it: treating an AI workflow as something worth actively maintaining. I think this is the transition point between using AI as a tool and working with AI as a teammate: a tool gets configured, whereas a teammate gets onboarded, and then you keep investing in how you work together. I’ve been thinking about what those stages look like in more detail, and I’ll write about that soon, but for now: if the last AI session ended and nothing about the process changed, it’s worth asking whether anything should have.

✨ AI Post Recap

AI tools don’t learn between sessions — you do. Treating your AI workflow as living infrastructure means regularly asking what should change about how you work, not just what you built. Getting the most out of agentic development isn’t just about using AI; it’s about actively maintaining the process around it.

What is “process debt” in AI development? Process debt is what accumulates when you skip the refactor step on your AI workflow — solving the same problems repeatedly instead of codifying the fix. It’s the AI equivalent of technical debt: avoidable, but only if you make reflection a habit.

How often should you update your AI agent instructions? There’s no single right cadence. Some practitioners update after every session; others wait until the same mistake clusters into a pattern. Several practitioners found that model upgrades can make existing instructions stale or even counterproductive, so it’s worth revisiting your configuration when you switch models.

What’s the simplest way to start maintaining an AI workflow? End each session by asking the AI one question: “What could I change in my instructions to make this go smoother next time?” It won’t always surface something, but when it does, the next session is measurably better — and the habit compounds over time.

Treat Your AI Workflow Like You Treat Your Code

Robert Herbig

In this article

Process Debt

Lots of Roads, Same Direction

What to Watch Out For

Now What?

Unlock the power of AI engineering.

✨ AI Post Recap