Iterative AI: Learning to Fail Fast with Intelligence

You can’t predict how your AI will behave in production. So stop trying. Start learning instead.

Jan 05, 2026

Iterative AI: Learning to Fail Fast with Intelligence

You can’t predict how your AI will behave in production. So stop trying. Start learning instead.

In the first article of this series, I argued that engineering fundamentals matter more in the AI era. We’ve explored how to specify uncertain systems, test non-deterministic behaviour, and build continuous evaluation pipelines. Now let’s talk about the development process itself.

If you’ve worked in agile teams, you know the rhythm: short iterations, frequent releases, continuous feedback. Build something small, ship it, learn from it, adjust. This approach emerged because we realised we couldn’t predict upfront exactly what users needed or how systems would behave at scale.

AI amplifies this uncertainty exponentially.

You can’t predict how your recommendation engine will perform across diverse user segments. You can’t anticipate every way users will try to break your chatbot. You can’t know which edge cases matter until real people encounter them. Planning everything up front isn’t just inefficient: it’s impossible.

Which makes agile principles not just useful for AI development; they’re absolutely essential. But we need to adapt them to a new kind of uncertainty, one in which not just the users but the technology itself is unpredictable.

The Planning Fallacy Gets Worse

Traditional software has taught us that detailed upfront planning often fails. Requirements change. Users want something different than they asked for. Technical assumptions prove wrong. The waterfall model died because reality kept invalidating our plans.

AI development makes this worse. Not only do requirements and users surprise you, but the technology surprises you, too.

I learned this on Tracto, a platform to support neurodivergent families. We planned a sophisticated content recommendation system. Six weeks of design. Detailed architecture. Clear metrics. We knew exactly what we’d build.

In week three of implementation, we discovered that our planned approach couldn’t generate recommendations fast enough. Week five, we found that our confidence scoring didn’t work for cold-start users. In week seven, early beta users engaged with content that differed from what our data predicted.

None of this was predictable. The AI models behaved differently under real load. User behaviour diverged from historical patterns. Edge cases emerged that no planning session anticipated.

We’d spent six weeks planning and eight weeks discovering our plan was wrong. If we’d spent two weeks building a rough version instead, we’d have learned those lessons in week three and adjusted.

The Minimum Viable AI

The agile concept of MVP (Minimum Viable Product) takes on new meaning with AI. What’s the minimum viable AI?

It’s not the AI with the best accuracy. It’s not the AI with the most features. It’s the AI that teaches you the most about what actually matters.

For our recommendation system, the MVP wasn’t a sophisticated machine learning model. It was a simple rule-based system that logged what it would have recommended and measured whether users would have clicked. We shipped it in two weeks. It gave terrible recommendations. Users saw a basic “trending content” feed instead.

But we learned:

Which content categories users actually engaged with (different from what they said they wanted)
How quickly recommendations needed to load (much faster than we’d spec’d)
What “personalised” meant to users (different from our definition)
Where the system needed to ask for clarification versus making assumptions

These learnings came from production usage, not planning sessions. They informed everything we built next.

The principle: ship something that can measure the problem before you build something to solve it. Let reality teach you what matters.

Treating Features as Hypotheses

In traditional agile, you build features that users requested. With AI, you should build experiments to test hypotheses.

Every AI feature embeds assumptions:

“Users want personalised video recommendations” (Do they? Or do they want less effort finding relevant content?)
“The AI should categorise expenses automatically” (Should it? Or should it suggest categories for user approval?)
“High accuracy matters most” (Does it? Or is speed and confidence more important?)

These are hypotheses, not requirements. You don’t know if they’re true until users interact with your system.

Frame your AI work as experiments:

Instead of: “Build a recommendation engine that achieves 85% accuracy”

Try: “Hypothesis: Users will engage more with AI-recommended content than with trending content. Success metric: 20% increase in video completion rate. Experiment: Ship recommendations to 30% of users for two weeks, measure engagement.”

This framing changes everything. You’re not committed to shipping the feature; you’re committed to learning whether the feature solves a real problem. If engagement doesn’t increase, you learned something valuable: users don’t need recommendations; they need something else.

On Tracto, we hypothesised that parents wanted AI-generated insights about their child’s progress. We built it. Parents ignored it. But we noticed they heavily used the journal feature to write notes to themselves.

Hypothesis revised: Parents don’t want AI analysis — they want help reflecting on their own observations. We pivoted to AI-assisted journaling prompts. Engagement tripled.

We learned by failing fast with intelligence. The failed hypothesis taught us what users actually needed.

Short Iterations and Fast Feedback

The Agile Manifesto emphasises short iterations; weeks, not months. With AI, think even shorter for initial learning.

Step 1: Define hypothesis and success metrics. What are you trying to learn?

Step 2: Build the simplest possible version. Mock the AI if needed. Focus on instrumentation; you need to measure outcomes.

Step 3: Ship to a small user group. Begin data collection.

Step 4: Analyse results. Did your hypothesis hold? What surprised you? What new questions emerged?

Step 5: Iterate based on learnings OR kill the feature if the hypothesis failed.

This cadence keeps you in learning mode. You’re not building to completion; you’re building to discover what completion means.

The key enabler: feature flags. Deploy code while controlling who sees it. Start with beta users who expect rough edges. Expand as confidence grows. Kill experiments without removing code. With feature flags, your iteration speed is limited only by how fast you can learn, not how fast you can deploy.

The Feedback Loop Architecture

Agile emphasises feedback from users, from stakeholders, from the team. AI systems need structured feedback loops built into the product itself.

Your AI can’t improve without data about how it’s performing. Design for feedback collection from day one:

Explicit feedback:

Thumbs up/down on AI outputs
“Was this helpful?” prompts
User corrections and edits
Report problem buttons

Implicit feedback:

Click-through rates
Time spent engaging with AI outputs
Completion rates
Return usage patterns
Abandonment signals

Every interaction generates a data point. These data points flow back into your evaluation pipeline. Your continuous evals measure quality. Your model retraining incorporates corrections. The system learns from usage.

This creates a virtuous cycle: ship AI, collect feedback, measure quality, improve model, ship updated AI. The faster this loop, the faster you learn and improve.

Cross-Functional Collaboration

The Agile Manifesto values “individuals and interactions over processes and tools.” With AI, this means breaking down walls between different specialities.

Traditional software teams had developers, designers, and product managers. AI teams need data scientists, ML engineers, domain experts, and users to collaborate continuously.

Why? Because no single person understands the whole picture:

Data scientists understand what models can do, but not always what users need
Engineers know how to ship reliably, but not always what “good AI” means
Product managers understand user problems, but not always AI capabilities and limitations
Domain experts know the subject matter, but not always the technical constraints
Users know what they actually need, which often differs from what they said they wanted

These groups must work together throughout the iteration, not just at handoff points.

Embracing Technical Debt Differently

Agile teams balance speed and quality by occasionally taking on technical debt with the commitment to pay it down. AI systems have unique forms of debt that need different management:

Model debt: Using a simple rule-based system or a generic LLM as a stopgap before building a proper ML model, or a fine-tuned LLM. This is often the right choice: ship something, learn, then sophisticate. Just track it explicitly.

Data debt: Training on limited or biased datasets with the intention to improve later. This is dangerous debt. It can encode problems that get worse over time. Pay this down quickly.

Infrastructure debt: Running AI workloads on suboptimal infrastructure to ship faster. This is acceptable early on, but costs compound quickly. Plan the paydown.

Evaluation debt: Shipping AI without comprehensive evals because you’re moving fast. This is the most dangerous debt. You’re flying blind. Never accumulate this debt; it’s not actually faster.

The key difference from traditional technical debt is that AI debt can cause your system to degrade silently over time. Traditional debt makes future work slower. AI debt makes your current production system worse.

Be intentional about which debt you take on and pay it down aggressively.

Responding to Change Over Following a Plan

The Agile Manifesto says “responding to change over following a plan.” With AI, this isn’t just a preference; it’s a requirement.

Your AI will surprise you. User behaviour will shift. Models will drift. Third-party APIs will change. New edge cases will emerge. You must be ready to pivot.

What does this look like practically?

Have rollback plans. Every AI deployment should be reversible. Feature flags, canary deployments, blue-green infrastructure. When your change makes things worse, revert quickly.

Reserve capacity for unplanned work. We keep 30% of each sprint for “production learnings”; issues discovered from continuous evaluation, user feedback, or unexpected model behaviour. This isn’t slack; it’s planned responsiveness.

Measure outcomes, not outputs. Don’t judge the sprint by features shipped. Judge it by the learnings gained and the improved user outcomes. In some sprints, we ship nothing new but significantly improve the quality of existing AI.

Kill features that aren’t working. We built an AI-powered “smart notifications” system. It took three sprints. Usage was terrible. We killed it. That’s not failure; that’s learning fast.

The principle is that your plan should be your current best hypothesis, not a commitment. Update the hypothesis as you learn.

Working Software Over Comprehensive Documentation

Agile values working software over comprehensive documentation. But with AI, you need more documentation than traditional software: your experiment log.

For each AI experiment, document briefly:

Hypothesis: What are you testing?
Success metrics: How will you know if it worked?
Implementation: What did you build? (high level, not code)
Results: What happened? Quantitative and qualitative.
Learnings: What surprised you? What would you do differently?
Decision: Ship, iterate, or kill?

This isn’t bureaucracy. It’s institutional memory. Three months later, someone will ask, “Why did we build it this way?” The experiment log explains the reasoning and learnings that led here.

Sustainable Pace for AI Teams

Agile also emphasises sustainable pace. AI work makes this harder because the temptation is to constantly chase the next model improvement or experiment result.

The reality: AI development is cognitively demanding. Understanding model behaviour, debugging non-deterministic systems, and analysing experiment results is draining.

Build in recovery time:

Don’t run back-to-back experiments. Give the team time to analyse results and plan next steps.
Rotate who’s on AI quality on-call. Responding to production eval alerts is taxing.
Schedule explicit learning time. Read papers, explore new techniques, and understand the field.
Celebrate learnings, not just launches. The experiment that failed but taught something valuable deserves recognition.

Start Here

If you’re building AI features without iterative practices, start with one change: frame your next AI feature as an experiment.

Pick something you were going to build anyway. Before starting, write down:

What’s your hypothesis? (What problem will this solve?)
What’s your success metric? (How will you measure if it worked?)
What’s your MVP? (What’s the simplest version that can test the hypothesis?)
What’s your learning timeline? (When will you evaluate results?)

Build the MVP. Ship it to a small group. Measure results. Make a decision: iterate, ship broadly, or kill it.

You’ll learn more in two weeks than six weeks of planning would teach you.

The goal isn’t to ship faster (though you often will). The goal is to learn faster. To let reality teach you what to build instead of making your best guess upfront.

The Intelligence in Failing Fast

“Fail fast” has become a Silicon Valley cliché. But with AI, it’s not about celebrating failure; it’s about designing for rapid learning.

Your AI will behave unexpectedly. You can spend months trying to anticipate every scenario, or you can spend weeks building something that lets you discover what actually matters.

Your users will interact with your AI in surprising ways. You can spend months speculating about edge cases, or you can instrument your system to learn from real usage.

The companies succeeding with AI aren’t those that predicted everything upfront. They’re those who learn fastest from reality.

Iterative development isn’t just a process choice. It’s a strategy for dealing with uncertainty. And in AI development, uncertainty is the only certainty.

The fundamentals still matter. We’re just applying them to a technology that teaches us through surprises.

Additional Resources:

Agile Manifesto - The original principles that matter more than ever
Extreme Programming Explained by Kent Beck - Foundational practices for iterative development
Lean Startup by Eric Ries - Build-Measure-Learn cycle applied to products
As an Amazon Associate, I earn from qualifying purchases to help support the blogging time.

Human Coder

Discussion about this post

Ready for more?