Back in 2020, GPT-3 could code functional HTML from a text description, however it's only around now that AI can one-shot functional websites. Likewise, AI can one-shot a functional demo of a saas product, but they are far from being able to one-shot the entire engineering effort of a company like slack.
However, I don't see why the rate of improvement will not continue as it has. The current generation of LLM's haven't been event trained yet on NVidia's latest Blackwell chips.
I do agree that vibe-coding is like gambling, however that is besides the point that AI coding models are getting smarter at a rate that is not slowing down. Many people believe they will hit a sigmoid somewhere before they reach human intelligence, but there is no reason to believe that besides wishful thinking.
My project has a C++ matching engine, Node.js orchestration, Python for ML inference, and a JS frontend. No LLM suggested that architecture - it came from hitting real bottlenecks. The LLMs helped write a lot of the implementation once I knew what shape it needed to be.
Where I've found AI most dangerous is the "dark flow" the article describes. I caught myself approving a generated function that looked correct but had a subtle fallback to rate-matching instead of explicit code mapping. Two different tax codes both had an effective rate of 0, so the rate-match picked the wrong one every time. That kind of domain bug won't get caught by an LLM because it doesn't understand your data model.
Architecture decisions and domain knowledge are still entirely on you. The typing is faster though.
I don't think these are exclusive. Almost a year ago, I wrote a blog post about this [0]. I spent the time since then both learning better software design and learning to vibe code. I've worked through Domain-Driven Design Distilled, Domain-Driven Design, Implementing Domain-Driven Design, Design Patterns, The Art of Agile Software Development, 2nd Edition, Clean Architecture, Smalltalk Best Practice Patterns, and Tidy First?. I'm a far better software engineer than I was in 2024. I've also vibe coded [1] a whole lot of software [2], some good and some bad [3].
You can choose to grow in both areas.
[0]: https://kerrick.blog/articles/2025/kerricks-wager/
[1]: As defined in Vibe Coding: Building Production-Grade Software With GenAI, Chat, Agents, and Beyond by Gene Kim and Steve Yegge, wherein you still take responsibility for the code you deliver.
https://fortune.com/2026/01/29/100-percent-of-code-at-anthro...
Of course you can choose to believe that this is a lie and that Anthropic is hyping their own models, but it's impossible to deny the enormous revenue that the company is generating via the products they are now giving almost entirely to coding agents.
I would have thought sanity checking the output to be the most elementary next step.
Fortunately, I've retired so I'm going focus on flooding the zone with my crazy ideas made manifest in books.
Note: the study used sonnet-3.5 and sonnet-3.7; there weren’t any agents, deep research or similar tools available. I’d like to see this study done again with:
1. juniors ans mid-level engineers
2. opus-4.6 high and codex-5.2 xhigh
3. Tasks that require upfront research
4. Tasks that require stakeholder communication, which can be facilitated by AI
The differences are subtle but those of us who are fully bought in (like myself) are working and thinking in a new way to develop effectively with LLMs. Is it perfect? Of course not - but is it dramatically more efficient than the previous era? 1000%. Some of the things I’ve done in the past month I really didn’t think were possible. I was skeptical but I think a new era is upon us and everyone should be hustling to adapt.
My favorite analogy at the moment is that for awhile now we’ve been bowling and been responsible for knocking down the pins ourselves. In this new world we are no longer the bowlers, rather we are the builders of bumper rails that keep the new bowlers from landing in the gutter.
Right now I see the former as being hugely risky. Hallucinated bugs, coaxed into dead-end architectures, security concerns, not being familiar with the code when a bug shows up in production, less sense of ownership, less hands-on learning, etc. This is true both at the personal level and at the business level. (And astounding that CEOs haven't made that connection yet).
The latter, you may be less productive than optimal, but might the hands-on training and fundamental understanding of the codebase make up for it in the long run?
Additionally, I personally find my best ideas often happen when knee deep in some codebase, hitting some weird edge case that doesn't fit, that would probably never come up if I was just reviewing an already-completed PR.
idk what ya'll are doing with AI, and i dont really care. i can finally - fiiinally - stay focused on the problem im trying to solve for more than 5 minutes.
If you keep some for yourself, there’s a possibility that you might not churn out as much code as quickly as someone delegating all programming to AI. But maybe shipping 45,000 lines a day instead of 50,000 isn’t that bad.
But yes, I usually constrain my plans to one function, or one feature. Too much and it goes haywire.
I think a side benefit is that I think more about the problem itself, rather than the mechanisms of coding.
It’s not. It’s either 33% slower than perceived or perception overestimates speed by 50%. I don’t know how to trust the author if stuff like this is wrong.
Which frankly describes pretty much all real world commercial software projects I've been on, too.
Software engineering hasn't happened yet. Agents produce big balls of mud because we do, too.
Have you tried explicitly asking them about the latter? If you just tell them to code, they aren't going to work on figuring out the software engineering part: it's not part of the goal that was directly reinforced by the prompt. They aren't really all that smart.
And it seemed pretty clear to me that they would have to do with the sort of evergreen, software engineering and architecture concepts that you still need a human to design and think through carefully today, because LLMs don't have the judgment and a high-level view for that, not the specific API surface area or syntax, etc., of particular frameworks, libraries, or languages, which LLMs, IDE completion, and online documentation mostly handle.
Especially since well-designed software systems, with deep and narrow module interface, maintainable and scalable architectures, well chosen underlying technologies, clear data flow, and so on, are all things that can vastly increase the effectiveness of an AI coding agent, because they mean that it needs less context to understand things, can reason more locally, etc.
To be clear, this is not about not understanding the paradigms, capabilities, or affordances of the tech stack you choose, either! The next books I plan to get are things like Modern Operating Systems, Data-Oriented Design, Communicating Sequential Processes, and The Go Programming Language, because low level concepts, too, are things you can direct an LLM to optimize, if you give it the algorithm, but which they won't do themselves very well, and are generally also evergreen and not subsumed in the "platform minutea" described above.
Likewise, stretching your brain with new paradigms — actor oriented, Smalltalk OOP, Haskell FP, Clojure FP, Lisp, etc — gives you new ways to conceptualize and express your algorithms and architectures, and to judge and refine the code your LLM produces, and ideas like BDD, PBT, lightweight formal methods (like model checking), etc, all provide direct tools for modeling your domain, specifying behavior, and testing it far better, which then allow you to use agentic coding tools with more safety or confidence (and a better feedback loop for them) — at the limit almost creating a way to program declaratively in executible specifications, and then convert those to code via LLM, and then test the latter against the former!
You'll probably be forming some counter-arguments in your head.
Skip them, throw the DDD books in the bin, and do your co-workers a favour.
If you had midas touch would you rent it out?
The people on the start of the curve are the ones who swear against LLMs for engineering, and are the loudest in the comments.
The people on the end of the curve are the ones who spam about only vibing, not looking at code and are attempting to build this new expectation for the new interaction layer for software to be LLM exclusively. These ones are the loudest on posts/blogs.
The ones in the middle are people who accept using LLMs as a tool, and like with all tools they exercise restraint and caution. Because waiting 5 to 10 seconds each time for an LLM to change the color of your font, and getting it wrong is slower than just changing it yourself. You might as well just go in and do these tiny adjustments yourself.
It's the engineers at both ends that have made me lose my will to live.
It's both. It's using the AI too much to code, and too little to write detailed plans of what you're going to code. The planning stage is by far the easiest to fix if the AI goes off track (it's just writing some notes in plain English) so there is a slot-machine-like intermittent reinforcement to it ("will it get everything right with one shot?") but it's quite benign by comparison with trying to audit and fix slop code.
Like I don’t remember syntax or linting or typos being a problem since I was in high school doing Turbo Pascal or Visual Basic.
This framing is exactly how lots of people in the industry are thinking about AI right now, but I think it's wrong.
The way to adopt new science, new technology, new anything really, has always been that you validate it for small use cases, then expand usage from there. Test on mice, test in clinical trials, then go to market. There's no need to speculate about "too much" or "too little" usage. The right amount of usage is knowable - it's the amount which you've validated will actually work for your use case, in your industry, for your product and business.
The fact that AI discourse has devolved into a Pascal's Wager is saddening to see. And when people frame it this way in earnest, 100% of the time they're trying to sell me something.
When people talk about this stuff they usually mean very different techniques. And last months way of doing it goes away in favor of a new technique.
I think the best you can do now is try lots of different new ways of working keep an open mind
No, it's different from other skills in several ways.
For one, the difficulty of this skill is largely overstated. All it requires is basic natural language reading and writing, the ability to organize work and issue clear instructions, and some relatively simple technical knowledge about managing context effectively, knowing which tool to use for which task, and other minor details. This pales in comparison with the difficulty of learning a programming language and classical programming. After all, the entire point of these tools is to lower the required skill ceiling of tasks that were previously inaccessible to many people. The fact that millions of people are now using them, with varying degrees of success for various reasons, is a testament of this.
I would argue that the results depend far more on the user's familiarity with the domain than their skill level. Domain experts know how to ask the right questions, provide useful guidance, and can tell when the output is of poor quality or inaccurate. No amount of technical expertise will help you make these judgments if you're not familiar with the domain to begin with, which can only lead to poor results.
> might be useful now or in the future
How will this skill be useful in the future? Isn't the goal of the companies producing these tools to make them accessible to as many people as possible? If the technology continues to improve, won't it become easier to use, and be able to produce better output with less guidance?
It's amusing to me that people think this technology is another layer of abstraction, and that they can focus on "important" things while the machine works on the tedious details. Don't you see that this is simply a transition period, and that whatever work you're doing now, could eventually be done better/faster/cheaper by the same technology? The goal is to replace all cognitive work. Just because this is not entirely possible today, doesn't mean that it won't be tomorrow.
I'm of the opinion that this goal is unachievable with the current tech generation, and that the bubble will burst soon unless another breakthrough is reached. In the meantime, your own skills will continue to atrophy the more you rely on this tech, instead of on your own intellect.
It feels like the kind of thing a human would notice, but which the agents are considering out of their scope.
I have the exact same experience... if you don't use it, you'll lose it
She's not wrong.
A good way to do this calculation is with the log-ratio, a centered measure of proportional difference. It's symmetric, and widely used in economics and statistics for exactly this reason. I.e:
ln(1.2/0.81) = ln(1.2)-ln(0.81) ≈ 0.393
That's nearly 40%, as the post says.
Another good alike wager I remember is: “What if climate change is a hoax, and we invested in all this clean energy infrastructure for nothing”.
I’ve found also AI assisted stuff is remarkable for algorithmically complex things to implement.
However one thing I definitely identify with is the trouble sleeping. I am finally able to do a plethora of things I couldn’t do before due to the limits of one man typing. But I don’t build tools I don’t need, I have too little time and too many needs.
But it should be a philosophy, not a directive. There are always tradeoffs to be made, and DDD may be the one to be sacrificed in order to get things done.
https://www.amazon.com/Learning-Domain-Driven-Design-Alignin...
It presents the main concepts like a good lecture and a more modern take than the blue book. Then you can read the blue book.
But DDD should be taken as a philosophy rather than a pattern. Trying to follow it religiously tends to results in good software, but it’s very hard to nail the domain well. If refactoring is no longer an option, you will be stuck with a non optimal system. It’s more something you want to converge to in the long term rather than getting it right early. Always start with a simpler design.
https://sequoiacap.com/podcast/training-data-openai-imo/
The thing however is the labs are all in competition with each other. Even if OpenAI had some special model that could give them the ability to make their own Saas and products, it is more worth it for them to sell access to the API and use the profit to scale, because otherwise their competitors will pocket that money and scale faster.
This holds as long as the money from API access to the models is worth more than the comparative advantage a lab retains from not sharing it. Because there are multiple competing labs, the comparative advantage is small (if OpenAI kept GPT-5.X to themselves, people would just use Claude and Anthropic would become bigger, same with Google).
This however may not hold forever, it is just a phenomena of labs focusing more on heavily on their models with marginal product efforts.
My theory is that executives must be so focused on the future that they develop a (hopefully) rational FOMO. After all, missing some industry shaking phenomenon could mean death. If that FOMO is justified then they've saved the company. If it's not, then maybe the budget suffers but the company survives. Unless of course they bet too hard on a fad, and the company may go down in flames or be eclipsed by competitors.
Ideally there is a healthy tension between future looking bets and on-the-ground performance of new tools, techniques, etc.
Note, if staying on the bleeding edge is what excites you, by all means do. I'm just saying for people who don't feel that urge, there's probably no harm just waiting for stuff to standardize and slow down. Either approach is fine so long as you're pragmatic about it.
At the end of the day, it doesn’t matter if a cat is black or white so long as it catches mice.
——
Ive also found that picking something and learning about it helps me with mental models for picking up other paradigms later, similar to how learning Java doesn’t actually prevent you from say picking up Python or Javascript
There's a good reason that most successful examples of those tools like openspec are to-do apps etc. As soon as the project grows to 'relevant' size of complexity, maintaining specs is just as hard as whatever other methodology offers. Also from my brief attempts - similar to human based coding, we actually do quite well with incomplete specs. So do agents, but they'll shrug at all the implicit things much more than humans do. So you'll see more flip-flopped things you did not specify, and if you nail everything down hard, the specs get unwieldy - large and overly detailed.
But also, you don't have to upgrade every iteration. I think it's absolutely worthwhile to step off the hamster wheel every now and then, just work with you head down for a while and come back after a few weeks. One notices that even though the world didn't stop spinning, you didn't get the whiplash of every rotation.
AI is really good to rubber duck through a problem.
The LLM has heard of everything… but learned nothing. It also doesn't really care about your problem.
So, you can definitely learn from it. But the moment it creates something you don't understand, you've lost control.
You had one job.
It’s more obvious if you take more extreme numbers, say: they estimated to take 99% less time with AI, but it took 99% more time - the difference is not 198%, but 19900%. Suddenly you’re off by two orders of magnitude.
Maybe they need to start handing out copies of the mythical man month again because people seem to be oblivious to insights we already had a few decades ago
Vibe coding is the creation of large quantities of highly complex AI-generated code, often with the intention that the code will not be read by humans. It has cast quite a spell on the tech industry. Executives push lay-offs claiming AI can handle the work. Managers pressure employees to meet quotas of how much of their code must be AI-generated or risk poor performance reviews. Software developers worry that everyone around them is a “10x developer” and that they’ve fallen behind. College students wonder if it is worth studying computer science now that AI has automated coding. People of all career stages hesitate to invest in their own career development. Won’t AI be able to do their jobs for them anyway a year from now? What is the point?
I work at an AI company, and we use AI every day. AI is useful! However, we approach vibe coding with caution and have seen that much can go wrong.
The results of vibe coding have been far from what early enthusiasts promised. Well-known software developer Armin Ronacher powerfully described some of the issues with AI coding agents. “When [I first got] hooked on Claude, I did not sleep. I spent two months excessively prompting the thing and wasting tokens. I ended up building and building and creating a ton of tools I did not end up using much… Quite a few of the tools I built I felt really great about, just to realize that I did not actually use them or they did not end up working as I thought they would.”
Armin titled his post “agent psychosis”. The term “psychosis” is a strong label. What is it about this technology which could be trapping such productive and experienced developers? The reason may be similar to the addictive qualities of gambling, a sinister under-current of the normally positive state of flow.
When coding or doing other creative work, many of us experience a state of flow: full absorption and energized focus. This concept was first formalized by psychologist Mihaly Csikszentmihalyi in the 1970s. In his 1990 best-selling book, he described flow as “a sense that one’s skills are adequate to cope with the challenges at hand, in a goal-directed, rule-bound action system that provides clear clues as to how well one is performing.”
There are activities that can produce feelings of absorption and engaged focus that don’t meet this positive definition of flow. Consider gambling. A key aspect of flow is that the challenge faced be reasonably matched to the person’s skills. “Roulette players develop elaborate systems to predict the turn of the wheel,”Csikszentmihalyi writes of how gamblers often believe their skills are playing a significant role, even in games governed entirely by chance.

Csikszentmihalyi emphasized the importance of skill and challenge being appropriately matched. He later highlighted that optimal flow occurs with high skill and high challenge. Figure adopted from: https://pmc.ncbi.nlm.nih.gov/articles/PMC8943660/
Another key aspect of this kind of flow is that the activity should provide “clear clues as to how well one is performing.” The makers of modern slot machines have gone to great lengths to do the opposite, creating the outcome of a Loss Disguised as a Win (LDW).
On a traditional slot machine, you either win or lose. In contrast, multiline slot machines have 20 rows going at once and reward partial “credits” that create a false sense of winning even as you lose. For example, you can gamble 20 cents and receive a 15 cent “credit”. This is actually a 5 cent loss, yet the slot machine plays celebratory noises that trigger a positive dopamine reaction. Research shows these games induce a similar physiological reaction to an actual win and players are more likely to enter a highly absorbed, flow-like state.

This slot machine allows 4 lines to be played at once; some allow up to 20 lines. Source: Wikimedia Commons
Researchers on gambling addiction have coined the term “dark flow” to describe this insidious variation on true flow. In a 2014 interview, Csikszentmihalyi defined the idea of “junk flow”: “Junk flow is when you are actually becoming addicted to a superficial experience that may be flow at the beginning, but after a while becomes something that you become addicted to instead of something that makes you grow. The problem is that it’s much easier to find pleasure or enjoyment in things that are not growth-producing but are attractive and seductive.”
The concepts of “junk flow” or “dark flow” align with many people’s experience of vibe coding. The results can be disastrous.
Look back at Armin’s experience again: “Quite a few of the tools I built I felt really great about, just to realize that I did not actually use them or they did not end up working as I thought they would.” This sounds like the Loss Disguised as a Win concept from gambling addiction. Consider the hundreds of lines of code, all the apps being created: some of these are genuinely useful, but much of this code is too complex to maintain or modify in the future, and it often contains hidden bugs.
One thing many of us love about computer programming is our experiences of flow. On the surface, vibe coding can seem to induce a similar flow. However, it often violates the same characteristics of flow that fail with gambling:
With vibe coding, people often report not realizing until hours, weeks, or even months later whether the code produced is any good. They find new bugs or they can’t make simple modifications; the program crashes in unexpected ways. Moreover, the signs of how hard the AI coding agent is working and the quantities of code produced often seem like short-term indicators of productivity. These can trigger the same feelings as the celebratory noises from the multiline slot machine.
Vibe coding provides a misleading feeling of agency. The coder specifies what they want to build and is often presented with choices from the LLM on how to proceed. However, those options are quite different than the architectural choices that a programmer would make on their own, directing them down paths they wouldn’t otherwise take.
Both slot machines and LLMs are explicitly engineered to maximize your psychological reaction. For slot machines, the makers want to maximize how long you play and how much you gamble. LLMs are fine-tuned to give answers that humans like, encouraging sycophancy and that they will keep coming back. As I wrote in a previous blog post and academic paper, AI can be too good at optimizing metrics, often leading to harmful outcomes in the process.
With “junk” (or “dark”) flow we lose our ability to accurately assess our productivity levels and the quality of our work. A study from METR found that when developers used AI tools, they estimated that they were working 20% faster, yet in reality they worked 19% slower. That is nearly a 40% difference between perceived and actual times!

Developers thought that AI was helping them speed up, but it was actually slowing them down. Source: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
It is difficult to evaluate claims from those who enthuse about their productivity with vibe coding. While prior expertise in software engineering and knowledge on providing effective context are useful, their impact on vibe coding results is non-linear and opaque.
I found myself unable to read the latest 2 posts of a blog by a leading AI researcher that I have subscribed to (and previously enjoyed) for 10 years. I happened to skip ahead to a subsection of one of the posts, where the author revealed that he had used AI to generate these latest 2 posts. He wrote that he was producing writing of the same quality, only much faster than before. The writer is an intelligent and highly accomplished person whom I respect, yet he seemed unaware that these posts read quite differently than his earlier work. For me at least, they were less readable than his previous articles.
Social media is full of accounts saying how much more they are accomplishing with AI. People may genuinely believe what they are saying, yet individuals are terrible judges of their own productivity.
It is worth experimenting with AI coding agents to see what they can do, but don’t abandon the development of your current skillset. Part of the appeal of vibe coding is claimed extrapolation about how effective it will be 6 or 12 months from now. These predictions are pure guesswork, often based more on hope than reality.
Renowned AI researcher Geoffrey Hinton predicted that AI would replace radiologists by 2021. Google CEO Sundar Pichai and head of AI Jeff Dean predicted that all data scientists would be using neural net architecture search to generate customized architectures for their individual problems by 2023. Anthropic CEO Dario Amodei predicted that by late 2025, AI would be writing 90% of all code. There is an entire wikipedia page documenting the failed predictions of Elon Musk on when we would have autonomous vehicles.

Bring a skeptical eye to tech CEO predictions
We all make mistakes and I am not trying to pick on the people listed above. However, it is important to ask if you want to stop investing in your own skills because of a speculative prediction made by an AI researcher or tech CEO. Consider the case where you don’t grow your software engineering or problem-solving skills, yet the forecasts of AI coding agents being able to handle ever expanding complexity don’t come to pass. Where does this leave you?
While AI tools are genuinely impressive and continue to make improvements, the forecasts from major foundation labs has consistently overstated the pace they will develop. This is nothing new. Tech companies have been overhyping their products for decades.
AI coding agents can produce syntactically correct code. However, they don’t produce useful layers of abstraction nor meaningful modularization. They don’t value conciseness or improving organization in a large code base. We have automated coding, but not software engineering.
Similarly, AI can produce grammatically correct, plausible sounding text. However, it does not directly sharpen your ideas. It does not generate the most precise formulations or identify the heart of the matter.
“People who go all in on AI agents now are guaranteeing their obsolescence. If you outsource all your thinking to computers, you stop upskilling, learning, and becoming more competent,” Jeremy Howard shared in his Nvidia Developer interview. AI is a useful tool, but it doesn’t replace core human abilities.
Thank you to Jeremy for feedback on earlier drafts of this essay.
The real profits are the companies selling them chips, fiber, and power.
Put another way, the ability to use AI became an important factor in overall software engineering ability this year, and as the year goes on the gap between the best and worst users or AI will widen faster because the models will outpace the harnesses
That's most code when you're still working on it, no?
> Also, multiple agents can run at once, which is a workflow for many developers. The work essentially doesn't come to a pausing point.
Yeah the agent swarm approach sounds unsurvivably stressful to me lol
If you just mean, "hey you should learn to use the latest version of Claude Code", sure.