I've always said, the easiest part of building software is "making something work." The hardest part is building software that can sustain many iterations of development. This requires abstracting things out appropriately which LLMs are only moderately decent at and most vibe coders are horrible at. Great software engineers can architect a system and then prompt an LLM to build out various components of the system and create a sustainable codebase. This takes time an attention in a world of vibe coders that are less and less inclined to give their vibe coded products the attention they deserve.
Used Codex for the whole project. At first I used claude for the architect of the backend since thats where I usually work and got experience in. The code runner and API endpoints were easy to create for the first prototype. But then it got to the UI and here's where sh1t got real. The first UI was in react though I had specifically told it to use Vue. The code editor and output window were a mess in terms of height, there was too much space between the editor and the output window and no matter how much time I spent prompting it and explaining to it, it just never got it right. Got tired and opened figma, used it to refine it to what I wanted. Shared the code it generated to github, cloned the code locally then told codex to copy the design and finally it got it right.
Then came the hosting where I wanted the code runner endpoint to be in a docker container for security purpose since someone could execute malicious code that took over the server if I just hosted it without some protection and here it kept selecting out of date docker images. Had to manually guide it again on what I needed. Finally deployed and got it working especially with a domain name. Shared it with a few friends and they suggested some UI fixes which took some time.
For the runner security hardening I used Deepseek and claude to generate a list of code that I could run to show potential issues and despite codex showing all was fine, was able to uncover a number of issues then here is where it got weird, it started arguing with me despite showing all the issues present. So I compiled all the issues in one document, shared the dockerfile and linux secomp config tile with claude and the also issues document. It gave me a list of fixes for the docker file to help with security hardening which I shared back with codex and that's when it fixed them.
Currently most of the issues were resolved but the whole process took me a whole week and I am still not yet done, was working most evenings. So I agree that you cannot create a usable product used by lots of users in 30 minutes not unless it's some static website. It's too much work of constant testing and iteration.
As we move from tailors to big box stores I think we have to get used to getting what we get, rather than feeling we can nitpick every single detail.
I'd also be more interested in how his 3rd, 4th or 5th vibe coded app goes.
The old rules still apply mainly.
To have a polished software project, you must spend time somewhat menially iterating and refining (as each type of user).
To have a polished software project, you need to have started with tests and test coverage from the start for the UI, too.
Writing tests later is not as good.
I have taken a number of projects from a sloppy vibe coded prototype to 100% test coverage. Modern coding llm agents are good at writing just enough tests for 100% coverage.
But 100% test coverage doesn't mean that it's quality software, that it's fuzzed, or that it's formally verified.
Quality software requires extensive manual testing, iteration, and revision.
I haven't even reviewed this specific project; it's possible that the author developed a quality (CLI?) UI without e2e tests in so much time?
Was the process for this more like "vibe coding" or "pair programming with an LLM"?
There are some good points here to improve harnesses around development and deployment though, like a deployment agent should ask if there is an existing S3 bucket instead of assuming it has to set everything up. Deployment these days is unnecessarily complicated in general, IMO.
My thoughts on vibe coding vs production code:
- vibe coding can 100% get you to a PoC/MVP probably 10x faster than pre LLMs
- This is partly b/c it is good at things I'm not good at (e.g. front end design)
- But then I need to go in and double check performance, correctness, information flow, security etc
- The LLM makes this easier but the improvement drops to about 2-3x b/c there is a lot of back and forth + me reading the code to confirm etc (yes, another LLM could do some of this but then that needs to get setup correctly etc)
- The back and forth part can be faster if e.g. you have scripts/programs that deterministically check outputs
- Testing workloads that take hours to run still take hours to run with either a human or LLM testing them out (aka that is still the bottleneck)
So overall, this is why I think we're getting wildly different reports on how effective vibe coding is. If you've never built a data pipeline and a LLM can spin one up in a few minutes, you think it's magic. But if you've spent years debugging complicated trading or compliance data pipelines you realize that the LLM is saving you some time but not 10x time.
I know it's not the point of this article, but really?
I needed it, I quickly build it myself for myself, and for myself only.
Also this article uses 'pfp' like it's a word, I can't figure out what it means.
I'm able to vibe code simple apps in 30 minutes, polish it in four hours and now I've been enjoying it for 2 months.
those are not copies, they aren't even features. usually part of a tiny feature that barely works only in demo.
with all vibe coding in the world today you still need at least 6 months full time to build a nice note taking app.
If we are talking something more difficult - it will be years - or you will need a team and it will still take a long time.
Everything less will result in an unusable product that works only for demo and has 80% churn.
Examples: AI really wants to use Project Panama (FFM) and while that can be significantly faster than traditional OO approaches it is almost never the best. And I'm not taking about using deprecated Unsafe calls, I'm talking about using primative arrays being better for Vector/SIMD options on large sets of data. NIO being better than FFM + mmap for file reading.
You can use AI to build something that is sometimes better than what someone without domain specific knowledge would develop but the gap between that and the industry expected solution is much more than 100 hours.
The test cases themselves becomes the foci - the LLM usually can't get them right.
Not knocking the premise of the post. It probably works well for one single user if it’s an iPhone or Android app. But his 100 power hours are probably just right for what he ended up launching as he iterated through the requirements and learned how to set this up through reinforced learning and user feedback.
I would find it a bit tricky to write a full test suite for a product without any code though. You'd need to understand the architecture a bit and likely end up assuming, or mocking, what helpers, classes, config, etc will be built.
Bad example, note apps loaded with features are anti-productive and are for people who treat note taking as a hobby itself.
You have Obsidian anyway if you want something open source to work with.
When everyone is able to make their own one off prototype in 30 minutes, no one will pay for the thing that took someone 6 months.
The things that are going away are tools that provide convenience on top of a workflow that's commoditized. Anything where the commercial offering provides convenience rather than capabilities over the open source offerings is gonna get toasted.
I tried it works wells. I can do the same thing in my Linux machine, but even my 12 year old now can get perplexity to build him a tool to compare ram prices at different chinease vendors.
Could you do the same in eg. Photoshop? Maybe, but even if, you would need to learn how.
It depends entirely on what you want. You can literally code a JavaScript 1-liner that will make a <textarea> then put the content back in the URL and it will work serverless on pretty much any platform with a Web browser.
You can also write a note taking app that will be federated yet private, that will have its own scripting language, etc. I mean you can yak-shave your way to write your own OS or even designing your own CPU for that.
So... I'm not sure that metric, time, means much without a proper context, including who does it. It's quite different if to do that, regardless of the tooling used, if you are a professional developer, designer, fullstack dev, prototypist, PM, marketer, writer, etc.
sure. does your note taking app supports formatting? you don't need it today. you will need it at some point. images? same.
does it handle file corruption etc? no? then its pretty much useless.
does it work across devices? in modern world, again, it is pretty much useless without it
it works across devices? then it needs hosting. if it is hosted it needs auth, it needs backups
you can go on for ever.
the bar for very minimal note taking app that you actually will use is very high, with other software it is even higher.
and this is not even state of art, this is must haves
there is very very rare use case when diy makes sense. in 99% of cases its just a toy that feels nice as you kinda did it. but if you factor in the time etc it is always costs 100x more than $5/month you could usually buy
(emphasis added)
Not sure if it was actually written by hand or AI was glossed over, but as soon as giving away money was on the table, the author seems to have ditched AI.
Some people seem to be better at it than others. I see a huge gulf in what people can do. Oddly there is a correlation between was a good engineer pre AI and can vibe code well.
But I see one odd thing. A subset of those who people would consider good or even amazing pre AI struggle. The best I can tell at this stage is because they lacked get int good results with unskilled workers in the past and just relied on their own skills to carry the project.
AI coders can do some amazing things. But at this stage you have to be careful about how you guide it down a path in the same way you did with junior engineers. I am not making a comparison to AI being junior, they by far can code better than most senior engineers, and have access to knowledge at lighting speed.
When we start selling the software, and asking people to pay for/depend upon our product, the rules change -substantially.
Whenever we take a class or see a demo, they always use carefully curated examples, to make whatever they are teaching, seem absurdly simple. That's what you are seeing, when folks demonstrate how "easy" some new tech is.
A couple of days ago, I visited a friend's office. He runs an Internet Tech company, that builds sites, does SEO, does hosting, provides miscellaneous tech services, etc.
He was going absolutely nuts with OpenClaw. He was demonstrating basically rewiring his entire company, with it. He was really excited.
On my way out, I quietly dropped by the desk of his #2; a competent, sober young lady that I respect a lot, and whispered "Make sure you back things up."
When an agent takes a shortcut early on, the next step doesn't know it was a shortcut. It just builds on whatever it was handed. And then the step after that does the same thing. So by hour 80 you're sitting there trying to fix what looks like a UI bug and you realize the actual problem is three layers back. You're not doing the "hard 20%." You're paying interest on shortcuts you didn't even know were taken. (As I type this I'm having flashbacks to helping my kid build lego sets.)
The author figured this out by accident. He stopped prompting and opened Figma to design what he actually wanted. That's the move. He broke the chain before the next stage could build on it. The 100 hours is what it costs when you don't do that.
There's some 80-20:ness to all programming, but with current state of the art coding models, the distribution is the most extreme it's ever been.
Something much closer to production SDLC patterns than a Figma mockup.
There are plenty of ways to code and use code, which-ever works for you is good just improve on it and make it more effective. I have multiple screens on my computer, i don't like jumping back and fourth opening tabs and browsers so i have my set up the best way that works for me. As for the AI models, they are not going to be that helpful to you if you don't understand why its doing what its doing in a particular function or crate (in case of rust) or library. I imagine the the over the top coder that has years of experience and multiple knowledge in various languages and depth knowledge in libraries, using the same technique he can replace a whole Department by himself technically.
Even pretty massive companies like databricks don't think about those things and basically have a UI template library that they then compose all their interfaces from. Nothing fancy. Its all about features, and LLM create copious amounts of features.
And then there is one guy, a friend of mine, who is planning to release a "submit a bug report, we will fix it immediately" feature (so, collect error report from a user, possibly interview them, then assess if its a bug or not with a "product owner LLM", and then autonomously do it, and if it passes the tests - merge and push to prod - all under one hour. Thats for a mid cap company, for their client-facing product. F*** hell! I have a full bag of bug reports ready for when this hits prod :->
EXCEPT... you've just vibe coded the first 90 percent of the product, so completing the remaining 10 percent will take WAY longer than normal because the developers have to work with spaghetti mess.
And right there this guy has shown exactly how little people who are not software developers with experience understand about building software.
Before LLMs the slow part was writing code. Now the slow part is validating whether the generated code is actually correct.
The interesting shift seems to be that building the first version is no longer the bottleneck — distribution, UX polish and reliability are.
Which part of "commodity" is confusing???
Honestly, seeing all the dumb code that it produces, calling this thing "intelligent" is rather generous...
The author accidentally proved it: the moment they stopped prompting and opened Figma to actually design what they wanted, Claude nailed the implementation. The bottleneck was NEVER the code generation, it was the thinking that had to happen BEFORE ever generating that code. It sounds like most of you offload the thinking to AFTER the complexity has arisen when the real pattern is frontloading the architectural thinking BEFORE a single line of code is generated.
Most of the 100-hour gap is architecture and design work that was always going to take time. AI is never going to eliminate that work if you want production grade software. But when harnessed correctly it can make you dramatically faster at the thinking itself, you just have to actually use it as a thinking partner and not just a code monkey.
I shipped a React Native app recently and probably 30% of the total dev time was wrapping every async call in try/catch with timeouts, handling permission denials gracefully, making sure corrupted AsyncStorage doesn't brick the app, and testing edge cases on old devices. None of that is the fun part. None of it shows up in a demo. But it's the difference between "works on my machine" and "works in production."
Vibecoding gets you to the demo. The gap is everything after that.
So many people are just shouting ‘I wanna go fast’ and completely forgetting the lessons learned over the past few decades. Something is going to crash and burn, eventually.
I say this as a daily LLM user, albeit a user with a very skeptical view of anything the LLM puts in front of me.
The result worked but that's just a hacked together prototype. I showed it to a few people back then and they said I should turn it into a real app.
To turn it into a full multi user scaleable product... I'm still at it a year later. Turns out it's really hard!
I look at the comments about weekend apps. And I have some of those too, but to create a real actual valuable bug free MVP. It takes work no matter what you do.
Sure, I can build apps way faster now. I spent months learning how to use ai. I did a refactor back in may that was a disaster. The models back then were markedly worse and it rewrote my app effectively destroying it. I sat at my desk for 12 hours a day for 2 weeks trying to unpick that mess.
Since December things have definitely gotten better. I can run an agent up to 8 hours unattended, testing every little thing and produce working code quite often.
But there is still a long way to go to produce quality.
Most of the reason it's taking this long is that the agent can't solve the design and infra problems on its own. I end up going down one path, realising there is another way and backtracking. If I accepted everything the ai wanted, then finishing would be impossible.
The only thing he needed to code was an NFT wrapper, which presumably is just forking an existing NFT wholesale.
The interesting, user-facing part of the project isn't code at all! It's just an HTML front end on someone else's image generator and a "pay me" button.
Very disappointing.
I have to say its a little sad that so many devs think of security and cryptography in the same way as library frameworks. In that they see it as just some black box API to use for their projects rather than respecting that its a fully developed, complex field that demands expertise to avoid mistakes.
I would say the remaining 10% are about how robust your solution is - anything associated with 'vibe' feels inherently unsecure. If you can objectively proof it is not, that's 10 % time well spend.
The difference I've noticed is that the act of actually typing out code made me backtrack a few times refining the possible solutions before even starting the integration tests, sometimes before even doing a compile.
When generating, the LLM never backtracked, even in the face of broken tests. It would proceed to continue band-aiding until everything passed. It would add special exceptions to general code instead of determining that the general rule should be refined or changed.
The reason that some devs are reporting 10x productivity is because a bunch of duct-taped, band-aided, instant-legacy code is acceptable. Others who dont see that level of productivity increase are spending time fixing the code to be something they can read.
Not sure yet if accepting the spaghetti is the right course. If future LLMs can understand this spaghetti then theres no point in good code. If we still need human coders, then the productivity increase is very small.
Additionally, the author seems to build an app just for the sake of building an app / learning, not to solve any real serious business problem. Another "big" claim on LLM capabilities based on a solo toy project.
This is the exact kind of task that LLMs excel at
I'm doing a simple single line text editor, and designing some frame options. Which has a start end markers.
This was really hard to get the LLM to do right.. until just took a pen and paper, drew what I wanted, took a photo and gave it to the llm
I suppose if you are doing something that truly can be decided based on a test but, I just don't see it, at least for anything I do.
You have to figure out what you want before the AI codes. The thinking BEFORE is the entire game.
Though I will also say that I use Claude for working out designs a lot. Literally hours sometimes with long periods of me thinking it through.
And I still get a ton more done and often use tech that I would never have approached before these glory days.
The 100 hour gap between a vibecoded prototype and a working product
Hasn't happened in a long time. Opus 4.6 is a miracle improvement.
Edit: It's interesting how I am getting downvoted here when pangram confirms my suspicions that this is 100% AI generated.
Only "feels"?
Curious. Can you elaborate on this a bit?
That is pretty bad..
- version 2 -- we realise we're solving a completely different problem to what is needed
- version 3 -- we build what is actually needed
The final solution ended up being something like: 1. Page includes new React report widget. 2. Widget imports generic overlay component and all canned reports, and lets user pick a report. 3. User picks report, widget sets that specific report component as a child of the overlay component, launches overlay. 4. Report component makes call to database with filters and business logic, passes generic set of inputs (report title, other specifics, report data) to a shared report display template.
My original plan was for the report display template to also be unique to each report file. But when the dust settled, they were so similar that it made sense to use a shared component. If a future report diverges significantly, we can just skip the shared component and create a one-off in the file.
I could have designed all this ahead of time, as I would need to do with an LLM. But it was 10x easier to just start coding it while keeping my ultimate scalability goals in mind.
If I'm reviewing all the code, so far I'm still the bottleneck even with a single agent and I don't see an easy way to change that.