From Vibe Coding to Agentic Engineering: The Evolution of AI-Assisted Development
One year after Andrej Karpathy coined "vibe coding," he declared the era over. What does "agentic engineering" mean for software teams?
From Vibe Coding to Agentic Engineering: The Evolution of AI-Assisted Development
One year after Andrej Karpathy coined "vibe coding," he declared the era over. What does "agentic engineering" mean for software teams?
Exactly one year ago, Andrej Karpathy gave a name to something many developers were already doing. "Vibe coding," he called it; a new way of building software where you give in to the vibes, embrace exponentials, and forget that the code even exists. It was liberating, exciting, and a little bit reckless. We were all high on the possibilities.
Last week, Karpathy posted again, acknowledging the anniversary: "Many people have tried to come up with a better name for this to differentiate it from vibe coding, personally my current favourite 'agentic engineering.'" The shift in terminology isn't just semantic. It reflects something fundamental about how AI-assisted development has matured, and where it still struggles.
Here's the tension we need to confront: 41% of worldwide code is now AI-generated, according to recent industry surveys. That number sounds transformational. Yet rigorous research shows that experienced developers sometimes work slower with AI tools than without them. The promise and the reality have been on a collision course for months. Now they've finally met.
This isn't a story about AI failing to live up to expectations. It's about the industry growing up.
The Vibe Coding Era: What We Were Doing
When Karpathy coined "vibe coding" in February 2025, he captured something real. He described an experimental, intuitive approach to building software where the developer's role shifted from writing code to describing intent. You'd tell the AI what you wanted, and it would generate the implementation. Speed and creativity mattered more than rigour. You didn't need to understand every line; you just needed to trust that the output worked.
The philosophy was seductive: Why spend hours writing boilerplate when an AI could do it in seconds? Why memorise syntax when you could describe what you needed in plain English? For many developers, vibe coding felt like cheating, in the best possible way.
And for certain use cases, it genuinely worked. Prototyping became dramatically faster. A developer with an idea could have a working proof of concept within hours rather than days. Personal projects and MVPs flourished. People built things they never would have attempted before because the activation energy dropped so dramatically.
Learning new languages and frameworks became more accessible, too. Instead of grinding through documentation, you could explore by example: ask the AI to show you how things worked, then iterate from there. Quick scripts and automation tasks that used to require careful research can be completed in a single prompt session.
The success stories were real. There's the indie hacker who built a flight simulator MMO in JavaScript, learning as he went, and ended up monetising it through Twitter. There's the entrepreneur who created a successful SaaS product without any prior programming experience. These weren't flukes; they represented a genuine expansion of who could build software and how quickly they could do it.
But vibe coding had boundaries, even if we didn't want to see them.
Production systems with reliability requirements exposed the first cracks. When your code needs to handle edge cases gracefully, process financial transactions correctly, or stay up under load, "trusting the vibes" stops being charming. Security-critical applications revealed an uncomfortable truth: studies showed that roughly 40-50% of AI-generated code contains vulnerabilities. That's not a bug rate you can hand-wave away.
Complex codebases presented another problem. AI tools could generate code, but they struggled to understand the intricate web of dependencies, architectural decisions, and implicit conventions that define a mature codebase. Context mattered, and context was exactly what these tools often lacked.
Perhaps most tellingly, teams discovered that code written by vibes was hard to maintain. When nobody truly understood what the AI had generated (because understanding it wasn't the point), extending and debugging that code became a nightmare. The speed gained upfront was often paid back with interest later.
The turning point came when organisations tried to scale vibe coding from experiments to production. What worked for a solo developer building a side project failed when applied to critical business systems. The limitations weren't theoretical anymore; they were showing up in incident reports and security audits.
The Productivity Reality Check
Let's talk about the numbers, because the numbers tell a story that challenges the hype.
The industry narrative around AI coding tools has been dominated by impressive statistics. Early GitHub Copilot studies reported 55-88% faster task completion. Marketing materials promised transformational productivity gains. The message was clear: adopt AI tools or get left behind.
But rigorous, independent research has painted a more nuanced picture. The METR study, published in late 2025, found that experienced developers working on their own codebases were actually 19% slower with AI tools than without them. Not faster. Slower. Stanford research led by Yegor Denisov-Blanch found that the median productivity lift was 10-15%; meaningful, but nowhere near the 60% figures being tossed around.
Perhaps most striking: developers felt like they were working faster even when measurements showed otherwise. They perceived themselves as 20% more productive even as the clock told a different story. This perception gap matters because it affects how we evaluate and adopt these tools.
Why does such a gap exist between early studies and real-world results? Several factors compound:
First, early studies often used controlled tasks that didn't reflect real-world complexity. Completing a well-specified coding exercise is fundamentally different from navigating a messy codebase with incomplete requirements and legacy constraints.
Second, there's a substantial learning curve. Research suggests it takes 30-100 hours of deliberate practice before developers see consistent productivity gains with AI tools. Most early adopters were still on that curve when the celebratory statistics were being published.
Third, context switching between AI-assisted and manual coding has real overhead. When you're constantly evaluating whether to ask the AI or just write it yourself, you lose time to the decision itself.
Fourth, the verification burden is real. Time saved generating code is often spent reviewing it, testing it, and fixing the subtle bugs the AI introduced. If you're not careful, you just shift the work rather than eliminate it.
Finally, there's what we might call the familiar codebase penalty. Experts who know their systems intimately lose less time doing things the old way than they gain from AI assistance. The tools help most when you're out of your depth, which is valuable, but different from the universal productivity boost the marketing promised.
Here's the key insight from the Stanford research: "The biggest impact isn't in writing code. It's in the stages before and after coding: understanding the codebase, interpreting requirements, debugging, QA, and navigating complex systems." In other words, AI tools are most valuable for the work surrounding code, not necessarily the coding itself.
For teams, this means several things. Don't expect instant productivity gains: budget for the learning curve instead. Measure actual outcomes (features shipped, bugs fixed, systems improved) rather than perceived effort or lines of code. And recognise that AI augments different tasks differently; a blanket "AI makes us X% faster" claim is almost certainly wrong.
Enter Agentic Engineering
So what does Karpathy mean by "agentic engineering"? In his own words: "'agentic' because the new default is that you are not writing the code directly 99% of the time, you are orchestrating agents who do."
This is a meaningful paradigm shift. In the vibe coding era, the model was simple: the developer described the intent, the AI generated code, and the developer accepted or rejected it. The human was always in the driver's seat, using AI as a sophisticated autocomplete.
Agentic engineering inverts this relationship. The developer becomes an orchestrator, directing multiple AI agents that plan, code, test, and debug with increasing autonomy. You're not writing code with AI assistance; you're managing systems that write code while you focus on higher-level concerns.
The technical infrastructure for this has matured rapidly through 2025 and into 2026. Multi-agent systems now feature specialised agents for different tasks: one agent might handle architectural planning, another writes implementation code, a third generates tests, and a fourth handles debugging. These agents don't just run in your IDE; they operate across environments, run in the cloud, accessing browsers, terminals, file systems, and external services.
The Model Context Protocol (MCP) has emerged as a key enabling technology, creating standardised connections between agents and tools. This means agents can interact with databases, APIs, cloud services, and development infrastructure in consistent ways.
Consider NVIDIA's VibeTensor project as an example of where this leads. An entire deep learning runtime was built end-to-end by coding agents under high-level human guidance. The humans provided direction and validation; the agents did the implementation work. This isn't vibe coding; it's a fundamentally different way of creating software.
Anthropic's 2026 report on agentic coding trends captures the shift well: "Engineers are shifting from writing code to coordinating agents that write code, focusing their own expertise on architecture, system design, and strategic decisions."
This changes what it means to be an effective developer. The key skills for agentic engineering include:
Defining goals clearly. Agents need well-specified objectives to work effectively. Vague instructions that a human collaborator might interpret charitably will lead an agent astray. The ability to decompose problems and articulate clear success criteria becomes essential.
Reviewing and validating. The human remains accountable for what the agents produce. This means developing a keen eye for AI-generated code, understanding common failure modes, and knowing what to check.
Orchestrating workflows. Managing multi-agent coordination requires thinking about dependencies, sequencing, and information flow. It's more like being a project manager for your AI team than being a solo coder.
Strategic intervention. Knowing when to step in, when to let the agents continue versus when to redirect or take over, is a judgment call that requires understanding both the problem domain and the agents' capabilities.
If you've been following this series, you'll recognise that these skills connect directly to the testing and evaluation frameworks we've discussed previously. You can't work effectively with agents if you can't evaluate their output. The foundations matter even more now.
The Security Imperative
There's a shadow side to agentic engineering that we need to address directly: security.
The State of AI Agent Security 2026 Report contains some sobering statistics. Eighty-eight percent of organisations reported AI agent security incidents in the past year. Only 14% have full security approval for their agent fleet. Only 47% of agents deployed in enterprise environments are actively monitored.
As Snyk put it: "Shadow agents are the new shadow IT. If you don't know what tools and what MCP servers are being used by the devs, then how are you going to secure them?"
This isn't hypothetical risk. Agents act with delegated authority: your credentials, your permissions, your access. When an agent interacts with a database or an API, it does so as you. If that agent can be manipulated or misused, the blast radius is significant.
Traditional security models struggle with agents for several reasons. Agents behave non-deterministically; the same input won't always produce the same output, making audit trails harder to interpret. The attack surface expands in new directions: prompt injection can trick agents into taking unintended actions, agents can be hijacked mid-task, and resource consumption can spiral beyond expected bounds.
Perhaps most fundamentally, identity management systems weren't designed for autonomous entities that act on behalf of humans but aren't humans themselves. The question "who did this?" becomes genuinely complicated when the answer involves an agent acting with delegated permissions based on high-level instructions that were interpreted in unexpected ways.
What should teams actually do? Here's a practical governance framework:
Build an agent inventory. You cannot secure what you don't know exists. Map out which agents are running in your environment, what they can access, and who authorised them.
Create AI bills of materials (AI-BOMs). Document the components that make up your AI systems: which models, tools, data sources, and permissions.
Apply least privilege aggressively. Agents should get the minimum permissions necessary for their tasks. The convenience of broad access isn't worth the security exposure.
Monitor agents as security principals. Treat agents as first-class entities in your logging, alerting, and audit systems. What did the agent do? When? Why?
Implement human-in-the-loop for critical operations. Some actions should require explicit human approval, regardless of how capable the agent is. Define those checkpoints and enforce them.
The organisations that get this right treat agent security as a top-tier priority, not an afterthought. The ones getting it wrong are accumulating technical and security debt that will be expensive to repay.
Practical Transition Guide
So how do you move from vibe coding to agentic engineering? Here's practical guidance organised by role.
For Individual Developers
Invest in the learning curve. Budget 30-100 hours to become genuinely effective with AI tools. That means deliberate practice, not just casual use. Experiment with different tools, learn their strengths and weaknesses, and develop intuition for when AI helps versus when it hinders.
Start with tasks before and after coding. Remember that research shows AI's biggest impact is on understanding codebases, interpreting requirements, debugging, and QA; not necessarily writing code. Use AI to explore unfamiliar systems, generate test cases, and investigate bugs. These are lower-risk applications that build familiarity.
Learn orchestration patterns. Get comfortable with MCP, experiment with agent frameworks, and practice building multi-step workflows. The skill of coordinating agents is distinct from the skill of prompting a single model.
Build verification skills. Code review for AI-generated code is different from reviewing human-written code. Learn common patterns of AI mistakes, develop checklists to cover key areas, and practice catching subtle bugs that pass a superficial inspection.
For Engineering Teams
Set realistic expectations. Don't oversell productivity gains to leadership or team members. The 60% improvement isn't real for most situations. A 10-15% lift is meaningful but won't transform your roadmap overnight. Honest expectations prevent disappointment and backlash.
Establish governance early. Before shadow agents proliferate, put structures in place. Define which tools are approved, how agents should be configured, and what monitoring is required. It's much easier to establish norms early than to rein in chaos later.
Integrate with existing CI/CD/CE. Continuous evaluation is essential when agents write code. Build pipelines that catch AI-introduced regressions, security vulnerabilities, and quality issues. Your evaluation infrastructure is now a first-class engineering investment.
Train for orchestration, not just prompting. The skill bar is rising. Prompting a chatbot is table stakes; orchestrating multi-agent workflows is the new capability gap. Invest in training accordingly.
For Engineering Leaders
Measure outcomes, not activity. AI-generated lines of code is a vanity metric. What matters is features shipped, customer problems solved, and system reliability maintained. Build measurement systems that capture real value.
Invest in security infrastructure. Treat agent security as a priority before you have an incident that forces you to. The cost of prevention is far lower than the cost of remediation.
Rethink hiring and training. The skills your team needs are changing. Orchestration, evaluation, and verification matter more than raw coding speed. Update your hiring criteria and training programs to match.
Plan for the skills gap. Junior developers face a particular challenge: they need to learn fundamentals and develop AI fluency, but the AI tools can mask whether they're actually building foundational skills. Be intentional about how you develop junior talent in an agentic world.
Where This Is Heading
Predictions are humbling, but let me offer a few observations about where things seem to be going.
The optimistic view: IBM's recent predictions suggest "we'll all become AI composers, whether you're a marketer, programmer, or PM." The barrier between "people who code" and "people who don't" is becoming more permeable. More people will be able to create software to solve their own problems, and professional developers will be able to accomplish more with less. Software will continue to eat the world, faster.
The cautious view: There are sustainability concerns we shouldn't ignore. Research suggests that the open-source ecosystem, the foundation on which AI tools are built and trained, may be eroding as contributor engagement declines. AI models learn from human-written code. If fewer humans are writing code and sharing it publicly, what does that mean for the next generation of models? The question is real, and nobody has a confident answer.
My take: Engineering fundamentals matter more in the AI era, not less. Agentic engineering doesn't eliminate the need for architecture, specification, testing, and evaluation; it amplifies their importance. When you're orchestrating agents rather than writing code directly, your leverage comes from clear thinking, well-defined requirements, and robust verification. The sloppy parts of our practice become more costly, not less.
The developers who will thrive aren't those who vibe code the fastest. They're the ones who can orchestrate AI agents while maintaining the engineering discipline to ship reliable software. Speed matters, but speed without control is just chaos arriving faster.
The vibe coding era was fun. It expanded who could build software and showed us what was possible when we lowered barriers. But fun doesn't scale to production. Fun doesn't pass security audits. Fun doesn't maintain systems over years.
Agentic engineering is serious work. And that's okay. The most valuable things usually are.
What You Should Do Now
Audit your current AI tool usage. Do you have shadow agents running in your environment? Do you know what permissions they have and what they're doing? If you can't answer confidently, start there.
Invest in the learning curve. Give it 100 hours before you judge whether AI tools work for you. The research is clear that gains appear after significant practice, not immediately.
Build evaluation capabilities. Continuous evaluation isn't optional anymore. If you can't systematically assess what your agents produce, you can't safely rely on them.
Focus on the work AI can't do. Architecture, strategy, validation, understanding user needs: these are where your value lies. Double down on uniquely human contributions while orchestrating AI for everything else.
The transition from vibe coding to agentic engineering isn't optional; it's already happening. The question is whether you'll shape how it happens in your context, or simply react to it.
Choose to shape it.
This post is part of the AI-Era Engineering Practices series. Previous posts have covered Testing the Untestable, CI/CD/CE: The Third Pillar and Writing User Stories for Uncertain Systems.
Sources referenced:



This evolution is exactly what I experienced. Started with vibe coding—"just make it work." Fun but chaotic. Then moved to structured agent workflows with clear escalation rules.
The jump from vibe to agentic isn't just technical. It's a mindset shift. You go from "let AI do stuff" to "design systems where AI operates within boundaries." That's engineering.
Compared Claude Code and Codex from this angle: https://thoughts.jock.pl/p/claude-code-vs-codex-real-comparison-2026
Turns out which tool wins depends entirely on whether you're vibing or engineering.