I see so many comments that seem to me like either they don't use standard known processes, or they assume AI doesn't need you to follow the standards.
Can I ship more code and features? Absolutely I can, if I have a good set of requirements, and thorough testing. All AI written code needs to be reviewed and tested, and should be in discrete commits and pull requests, anyone pushing a PR with thousands of lines of code is a red flag, you wouldn't do it without AI, why would you do it with AI? Major rewrites / refactors are the only known exception, and even then I would argue that these should still have discrete commits you can switch to so you can see how things changed, and make a more informed decision.
If you show me a massive one shot commit or PR I will deny it. Break it down into bits a normal developer can audit.
BUT: The article is 100% right that I spend a lot of time doing other tasks: Reviewing other teammates' work, interacting with colleagues, planning, ect. AI isn't quite as helpful there. For example, I find that co-pilot code reviews don't add a lot of value; and the AI isn't good at judging a UI.
Maybe we'll get there soon? It's starting to look like the biggest challenge with AI is learning how to use it correctly.
Programming is a logical circuit breaker. There is a wide range of incompleteness that halts development or puts the solutions in an unpublishable state.
A product person has no compiler, no RAM, no database, no state machine. There is nothing that can fail. There are probably strategies to weed out some issues, but none will be perfect.
We need to combine reality with computers. Computers set the constraints and we can only check if we are in bounds of the constraints by solving the problems with computers.
Oddly enough AI has so far nothing to offer to improve the "product people" problems.
^ I say shouldnβt because I work in research engineering. Most of the needs of our users are pretty unique. Weβve had people come in and try and specify every piece of work, -and ended up building a crud app no one wanted or used.
Another aspect that is not captured here is that the lawyers and subject matter experts will also be using AI to speed up their parts.
> We are now talking about software development, but this is applicable to all processes that take longer than you would like.
Indeed, it's kind of a generalized version of Amdahl's law. Since we only speed up a portion of the work, there are upper bounds on time saved. Worse, work in progress tends to bunch up at a specific point: code review. A coworker of mine literally complained two months ago now that nobody was reviewing code (and that it was blocking his work). I'm not sure review delay has actually gotten better since.
To some extent, we tell as many lies as we can get away with. Some answers are more convenient then others.
"Why" this is taking so long, like "why did this fail?" are prone to broadly agreed lies. Sometimes this is for obvious blame liability reasons. Often, this is because the lie conflicts with some "meta."
One such fallacy is the idea that software=value. Code= money, because it cost money to write. Features=revenue. Etc.
Irl.. startups produce features very quickly because they actually need features. They start with zero features.
But... LinkedIn, visa or even Facebook.... What they are short on is opportunities to develop code with value. Ie... Something that will increase revenue.
FB aren't resource constrained. They're demand constrained. If there were a "write code, make revenue" opportunity available... they'd have taken it already.
This totally conflicts with the experience of working somewhere. That's because you have wishlists, road maps and deadlines.... and it always appears that demand for code is sky high.
The primary issue is simply that developers are the most immediately impacted by this technology. The combination of being able to adopt, willing to adopt, and the tech actually being incredibly good at developer related concerns is unique. The rest of the business will eventually catch up. I'm watching it happen in real time. It is agonizingly slow in most places, but it is happening.
The developers being able to drain a one year long work queue in an afternoon is meaningless if the rest of the business cannot absorb the effects of that work in the same timeframe. The business will not leave your idle work queue on the table for long though. Keep pulling a vacuum on them and they will fill the space eventually.
Once tooling (e.g. agent harnesses, external tools) becomes more mature and consistent, the other 2 will become less of a bottleneck.
If I were to take a gamble here, I would argue that development will at one point reach the more ideal scenario, whereas the project planning, the scoping, will become longer. Also, the documentation section will take almost the same as the development, slightly longer at the edges.
The new ai-assisted era will most likely push companies to adopt a Waterfall management, rather than an Agile one.
- shift towards throughput-oriented vs latency-oriented. Can juggle more tasks, but increasingly hard to speed up individual ones.
- strong scaling is tough. Might even see slowdowns for individual tasks, so reliable benefits come from being able to juggle more and eat the per-task inefficiency
- amdahl's law: we can't speed up tasks beyond their longest sequential (human) unit, so our work becomes identifying those bits and working on them. Related: you can buy bandwidth, but you can't buy latency
The human their cumulative experience over a career of the nuances behind every decision and their evolved context at their given company. This context allows them to take that one-line spec and extract tons of detail from it by knowing who wrote the ticket, what was the "trigger" for the ticket, what other work is being done in tandem that might need to be incorporated, etc.
LLMs can be given this context but it's a manual process of transcription into its prompt/memory/skills and that content must be continually updated and refined. It just pushes lots of work to spec writing from the more intuitive nature of feature development a lot of us have a level of mastery over. Then you must constantly have a back-and-forth to refine the output.
Any senior engineer knows that a lot of that communication is wasted energy. If I have a good idea of what I'm building I can develop the feature in a focused flow of output that I refine in an almost unconscious way because I don't need to translate intent into words, just code, and that process is incredibly automatic after years of developing software.
When all the effort is placed into writing specs, re-prompting and then reviewing (often over and over again), that intuitive and automatic ability to build software degrades. Think of a time when you were mostly focused on PR reviews and not contributing to a project. You may have been able to help developers build better code, but if you were to jump into that project to contribute, there would be a real and painful effort to re-familiarize yourself and reconstruct that intuitive familiarity of the project.
LLMs have many very useful qualities but so far I fear an over reliance on them can be more a hinderance than a benefit.
This is how I felt when I first started seeing people discuss things like AGENTS.md etc.
https://podcasts.apple.com/us/podcast/the-daily/id1200361736...
We have a person who wants, effectively, a formatted report generated on demand from four sources. The current interface is four different programs, all of which were written by different groups inside the corp, but they also all draw from the same or similar databases. There's a unified login, but each interface has its own permissions.
The company brings in an AI initiative and soon enough drops all security restrictions for the AI's access to the databases. The new formatted report gets generated through the use of a few tens of thousands of tokens each time, and about 5% of the time synthesizes non-existent data.
A competent DBA and application programmer could have spent a week doing the same thing, producing a program which would do the job faster, cheaper (at run-time), secure and in a way which could be extended and debugged.
But DBA and application programmer time is expensive up-front and the execs are gung-ho about the stock-price now that they are hip and trendy.
The proper implementation and design still take time, but still faster in systems with a lot of available resources online.
> ...but that doesnβt mean itβs generating the correct code.
Something I'm observing is that now a lot of the pressure moves to the product team to actually figure out the correct thing to build. Some product teams are simply not used to this and are YOLO-ing prototypes now, iterating, finding out they built and shipped the wrong thing, and then unwinding.Before, when there was the notion that "building is expensive", product teams would think things through, do user interviews up-front, actually do discovery around the customer + business context + underlying human process being facilitated with software.
This has shortened the cycle to first working prototype, but I'd guess that in the longer scale, it extends the time to final product because more time is wasted shifting the deliverable and experience on the user during this process of discovery versus nailing most of the product experience in big, stable chunks through design.
At the end of the day, there is a hidden cost to fast iterative shifts on the fundamental design of the software intended for humans to use and for which humans are responsible for operation. First is the cost on the end users who have to stop, provide feedback, and then retrain on each cycle. Second is that such compounding complexities in the underlying implementation as product learns requirements and vibe-codes the solution creates a system that becomes very challenging for humans to operationalize and maintain.
Ultimately, I think the bookends of the software development process are being neglected (as author points out) to the detriment of both the end users and the teams that end up supporting the software. I do wonder if we're entering an "Ikea era" of software where we should just treat everything as disposable artifacts instead.
However, while the engineering team successfully fast tracked development, UAT, and production testing largely thanks to AI other departments only began digging deeper into the project toward the end of April. To be fair, they do use AI in their workflows to some extent, but they haven't adapted their processes to keep pace with engineering's increased productivity.
In my opinion, this lag is mostly because many employees in those departments are older and hesitant to change their routines. While I understand that resistance to change is a natural human trait, what comes to my mind is this beautiful German adage, "Wer nicht mit der Zeit geht, geht mit der Zeit" which loosely translates to, "Who doesn't change with time is left behind by time"
I get most value from them when I'm asking it to either fill in the blanks of something already half implemented or when I need some feature in a given context/language that only exists in other languages
Another option is that lower software costs would significantly reduce the cost of whatever non-software product the software supports (manufactured good, electricity, services, telecom etc.) but I don't know in which industry the cost of software is a large portion of the overall product cost.
And there's another thing. A company that makes tractors can't produce food without land. A company that makes metal machining equipment can't make cars without the raw materials. But a software company that makes software that automatically makes software could just produce the result software itself rather than sell the software-making software. If AI ever reaches the point it makes software at a marginal cost that's not much higher than the cost of the AI itself, what would be the incentive of selling that AI?
> This is often the part that slows down software development. Trying to figure out what a vague, title only, feature request actually means.
But that is exactly what Software Engineering is!. It's 2026 and the notion that you can get detailed enough requirements and specifications that you can one-shot a perfect solution needs to die.
In my experience AI has made us able to iterate on features or ideas much faster. Now most of the friction comes from alignment and coordination with other teams. My take is that to accelerate processes we should reduce coordination overhead and empower individuals and teams to make decisions and execute on them.
When I was working we used to get requirements that literally said things like, "Get data and give it to the user". No definition of what data is, where its stored, or in what format to return it. We would then spend a significant amount of time with the product person trying to figure out what they really wanted.
In order to get good results with LLMs we need to do something similar. Vague requirements get vague results.
Ideation: Throw ideas back & forth, cross reference with knowledge bases, generate design documents. Documentation: Generate large parts of docs. Development: Clear. Deployment: Generate deployment manifests, tooling around testing, knowledge around cloud platforms.
Every single step can be done better & faster with AI. Not all of them, but a lot.
Even development. Yes some part of your job involves understanding the problem better than anyone & making solutions. But some parts are also purely chore. If you know you keed a button doing X, then designing that button, placing it, figuring out edge cases with hover & press states, connecting to the backend etc - this is chore that can be skipped. Same principle applies to almost all steps.
On the other hand, it feels like we've been over this tens of times recently, on HN specifically and IRL at work. Another blog post isn't going to convince leaders that this is how the world works when they are socially and financially incentivized to pretend like AI really will speed things up. So now I just wait for their AI projects to fail or go as slowly as previous projects and hope they learn something.
But for a small studio, or independent developer, LLMs are a big game changer. Being able to do a mediocre job at 5 people's jobs is a huge leap over trying to get by without those jobs - relying on third party assets or other sorts of content, or even worse - doing a really awful job of trying to improv those jobs. See the UI of basically any program ever that was clearly laid out by a programmer and not a designer. Or there's the whole trying to rip off stuff from dribbble, but lacking the skills to do so. Whereas with AI, you can suddenly competently rip off everything and everybody - it's basically their entire MO.
This naturally involves a lot of tradeoffs and politics - senior engineers know to avoid adding 'weight' to their airframes and fight hard to avoid adding scope to the systems they're responsible for or divergence from their intended direction of travel. So compromises have to be struck or escalations to management to choose between priorities have to play out.
Maybe AI solves that as well but that is a lot more difficult lift.
Eg: I had a product manager say to me that he envisions a future where any meeting with stakeholders that does not result in an interactive prototype by the end of the meeting would be considered a failure. This feels directionally correct to me.
The other thing I expect to see is Vibecoding being the "Excel 2.0" where it allows significant self-serve of building interactive apps that's engaged in a continual war with IT to turn them into something with better security guarantees, proper access control & logging, scalability, change management etc.
But the larger historical point here is that every revolutionary transition produces, in the early stages, "Steam Horses". The invention of the steam engine had people imagining that the future of transportation would involve horse shaped objects, powered by steam, pulling along conventional carts. It wasn't until later developments that we understood the function of transportation as divorced from the form.
I started talking about Steam Horses originally in the context of MOOCs, which was a classic Steam Horse idea.
No, the code is actually almost always correct. The way itβs added is probably not what youβre going to like, if you know your code base well enough. You know thereβs some ceremony about where things are added, how they are named, how much comments youβd like to add and where exactly. Stuff like that seems to irritate people like me when not being done right by the agent, and it seems to fail even if itβs in the AGENTS.md.
> If you were to give human developers the same amount of feature/scope documentation you would also see your productivity skyrocket.
Almost 2 decades in IT and I absolutely do not believe this can ever happen. And if it does, itβs so rare, itβs not even worth talking about it.
I tell them "Us engineers will probably be able to deliver some of our stuff faster but it won't have even a slight effect on the actual deliverable because we've never been the bottle neck", it's the fact that the process to get an S3 bucket allocated takes (not exaggerating) 4 weeks there.
If that sounds familiar, itβs because itβs what dang did over the course of several years.
Itβs taken a few weeks. I started right around May, and now itβs able to render large HN threads (900+ comments) within a factor of five of production HN performance. (Thank you to dang for giving actual performance numbers to compare against.)
A couple days ago, mostly out of curiosity, I ran Claude with β/goal make this as fast as HN.β Somewhat surprisingly, it got the job done within a couple hours. I kept the experiment on separate branches, because the code is a mess, just like all AI generated code starts as. But the remarkable part is that it worked, and I can technically claim to have recreated HN within a few weeks.
The real work is in the specifications. My port of HN is missing around a hundred features. Things from favorited comments, to hiding threads, to being able to unvote and re-vote.
But catching up to HN is clearly a matter of effort (time spent actually working on the problem with Claude), not complexity. Each feature in isolation is relatively easy. Getting them all done within a short time span without ruining the codebase is the hard part. And I think thatβs where a lot of people get tripped up: you can do a lot, but you have to manage it tightly, or else the codebase explodes into an unreadable mess.
Itβs true that if you donβt do that crucial step of βmanage the resultsβ, youβll end up making more work for yourself in the long run, by a large factor. But itβs also true that AI sped me up so much that I was able to do in weeks what wouldβve otherwise taken years (and did take dang years). Iβm not claiming parity, just that I got close enough to be an interesting comparison point.
AI can clearly accelerate us. But we need to be disciplined in how we use it, just like any other new tool. That doesnβt change the fact that it does work, and I think people might be underestimating how good the results can be.
> Software development is about translating a problem into a solution that a computer can understand and automatically resolve. Preferably in a secure and scalable way.
True, meanwhile software engineering puts optional bit into the requirements bucket. (ie. Secure & Scalable)
---
For the problem description and gathering requirements sentiment; I don't think we'll _ever_ have a 100% proper way of doing this. If we did, we'd basically solve any and all problems in the world.
Nevertheless, I think AI can help with investigating and exploring the problem space. Especially when the problem is an already solved thing that the prompter hasn't gained enough expertise yet.
Moreover, I think (and keep mentioning) we will see different kind of models in the near future. Those would be more specialized per industry, per language (both programming and human languages), even per field.
Those will open up newer areas for employment & job market. Something like an "AI-trainer" but more of a knowledge-worker style. Although this can also be automated with LLMs, the limits on context length/size plus amount of compute required to re-train the models to iterate faster both are quite heavy.
So well said.
AI is unveiling how the bureaucracy is the slow part.
The way AI makes your processes go faster will have little to do with cutting software development time in itself, but by letting an organization be made with fewer people, which in itself lowers your misalignment issues. A giant company of 200K people will still be about as messy as one today, but you might be able to do a lot more with the same number of people, just like a lone programmer today, without AI, already does quite a bit more than anyone could do by themselves the 80s.
Maybe some of the advantages are that you don't need quite as many developers, or maybe you can use a smaller marketing team, or you don't need to spend that much time answering questions, because an LLM is doing it for you, and it's tracking what it's been asked of it, turning the questions into product research. Either way, the gains come from being able to run leaner, and therefore minimizing organizational misalignment.
Also, I have the impression that LLMs bring some gains or benefits for individuals but not relevant enough at the organization level.
You know, typing fast and accurately is kind of important.
The new speed skill that developers now need is speed reading. LLMs just make copious amounts of output (from tests, documentation, diagnostics). They also produce code so quickly that a skill for focusing on weak points is so important.
> "faster typing won't make you faster".....
I understand a Deloitte consultant has specific incentives. But let's first try to answer a baseline question: why do some companies have thousands of software engineers? What do they all do?
And then, a follow-up: what is actually the bottleneck at most companies? What causes "requirements gathering" to take long?
^ this statement is false. typing infinitely fast would make software development much faster.
typing infinitely fast would not make shipping useful products and features instantaneous, because there is product, technical, and organizational uncertainty that requires iteration and "cross functional collaboration" to figure out.
but ai can make each iteration step a lot faster.
And yes, architecture and how to actually implement the designs are also part of the requirements.
The code is just the implementation, the actual problem that needs solving is one abstraction level higher.
Writing actual code was maybe 10-20% of what I did. Most of it was meetings, design review, authorization requests, etc
I think projects where correct is very clearly defined can benefit from LLM acceleration, as you're describing here.
But so much of modern software development is figuring out what the right thing to build is. And in those situations, I don't think LLMs provide nearly as much benefit.
Therein lies the paradox. And the problem is, interacting with llmβs is akin to a slot machine.
And on top of that, llm producers want you to view it that way - thatβs how they generate revenue and can play games
The trend I DO see at least based on JDs is a whole lots of βagentsβ which are glorified claude code but in the cloud with tools focus on a given industry or domain. If this is what you mean, then you are correct.
Computing has been doing that for decades. If your process is fucked, computers make it fucked faster.
Itβs just that now, we have entire generations alive that have never seem a world without digital computers. ~LLMs~ AI is a fun new lever in some uses so clearly it is finally the hammer that will drive the screws and bolts for us, with less effort on our part!
They just have to learn from experience. Itβs what you do when you canβt be bothered to learn the lessons of the past.
Because the "rate of improvement" is only astonishing in well understood areas and really only astonishing if you yourself are not that great at what you do. Speaking for myself here, my job is extremely safe given that my boss doesn't wanna sit there and prompt AI all day and i work in a fun little 4 person company. We already have plans for the 3 next years which involve me :-)
If you don't like the state of technology with AI tools, just wait a few weeks. Things are still changing at a quite rapid pace. The scope of what is possible seems to shift regularly. A lot of what I did in the last weeks was complete science fiction even a year ago.
This article makes a few good points though. AI won't magically make processes faster. You might actually have to change the process. A lot of processes in companies are about people and how they communicate. The more people you have, the more communication you get. It's an exponential. Using AI in that context just adds to the communication noise.
But if you restructure your processes you might get different results. Most companies have not really gone through that process yet. It's too early to call success or failure. And especially non technical people have mostly not yet experienced any agentic tooling at all. We've yet to see how that will change companies. My guess is that some companies will be better at this than others. And we'll see a bit of darwinism play out.
The broader issue is the sheer number of businesses that build massively overcomplicated stacks, bought heavily into bandage solutions like AWS lambda, got on dumb tech bandwagons like big data, nosql etc. This is just another one.
I think you can engineer yourself into being leaner, in some businesses AI will help but weβve had over a decade of βwe can just add more complexityβ and it just does not work.
Iβm a rails guy. People forget for every unicorn thereβs 10 9 figure businesses just ticking away on some niche with a VPS, rails and like 4-10 devs.
For a while this is not a problem: I can work with my current mental model. But every generated PR erodes my expertise a little bit. Eventually my mental model wonβt fit anymore.
So how much of that model maintenance should I count into my productivity metric? Does that even matter or will the next model be able to reason well enough that my mental model doesnβt matter?
Complexity.
In my experience (medium size businesses, i.e. 200 million to 2 billion annual revenue) we're trying to understand how a complex set of systems and business processes and different businesses (external partners) interact and then trying to morph all of that into a shape that now has capability X layered on top or in the middle.
Here's a concrete example, business X that makes their own products and has retail stores as well as an ecom site wanted to add the ability to put complementary items built by other companies on the website and have them drop shipped from the vendors to the consumers. The final solution involved 21 different interfaces between 4 different systems (ecom system, store system, omni channel system, external drop ship mgmt system) as well as a new internal system to manage this activity. It's takes a significant amount of time to understand and solve for all of the low level details.
It's 2026 and the idea that even with detailed-enough requirements you can one-shot even a workable (let alone perfect) solution also needs to die. Anthropic failed to build even something as simple as a workable C compiler, not only with a perfect spec (and reference implementations, both of which the model trained on) but even with thousands of tests painstakingly written over many person-years. Today's models are not yet capable enough to build non-trivial production software without close and careful human supervision, even with perfect specs and perfect tests. Without a perfect spec and a perfect human-written test suite the task is even harder. Maybe in 2027.
If I got detailed specs, Iβd just be a coding robot. I push that work off onto juniors.
In modern software development, there is no destination. On a 2-week basis, the business decides to change what the software is supposed to do. New features. New integrations. Changed features. Upgraded/replaced components. Larger scale. Different hosting.
Over years, the software is fundamentally altered. Quality and testing goes out the window. There's a constant slog, not only of trying to deal with modifications in an ad-hoc way, but also in fighting entropy. The software becomes a living being, which gets injured, changes its lifestyle, ages. The company is a custodian of a monster, like a zoo keeper, trying to keep the depressed animal alive.
Since humans are creatures of habit, all the same problems will happen with AI. But everything will be a little bit faster, and code reviews will make code a little bit better. But simultaneously, a lack of good tests and the desire for faster deployment will make everything a little bit worse. This push and pull will result in about the same level of software quality, but moving slightly faster. So in the end we will have a faster process. But nobody will really notice, because the rest remains a slog. We will all probably get burnt out faster.
It's complex for a reason, and you can't remove the complexity without removing the reasons. You can't solve business problems with tools.
> Yes, AI can generate code quickly (whether thatβs a good thing is open for debate), but that doesnβt mean itβs generating the correct code.
It really depends on what you asked it to do. Add a new feature? I wouldn't touch that code with a 10 foot pole. Create a service with an example of another service in your project that does something similar? It is going to nail that pretty much every time in 2026.
Someone else put it really well: use LLMs as a fast typer, not a fast thinker. Don't have it generate any code you can't verify at a glance. Call in small completions that don't span more than a couple files, everything else is vibe coding.
So sure, if you have none of these things set up to back-pressure agents and help them better understand the system, then they will just be dumb LLM code writers. But you can definitely go a lot further than that with the improvements that are rapidly happening to models and harnesses.
Just learn something like balsamiq. You don't need code to build out a prototype. Just like you don't need actors and a camera when a few sketches can capture a scene.
Problem for model producers is - the revenues they get from this mode of work is tiny relative to what they need.
Work in large orgs long enough and you will recognize these creatures. Ladder climbing is a skill orthogonal to adding any value to the customer/company.
This is a big HN LLM discussion divide. I am in the same no-specs work background camp, and so the idea that the humans who input that into dev teams are suddenly going to get anything out of an LLM if they directly input the same is laughable. In my career most orgs there has been no product person and we just talked directly to end users.
For that kind of org, it will accelerate some parts of the SWEs job at different multipliers, but all the non-dev work to get there with discussions, discovery, iteration, rework, etc remains.
If the input to your work is a 20 page specification document to accompany multi-paragraph Jira tickets with embedded acceptance criteria / test cases / etc, then yes there is a danger the person creating that input just feed it into an LLM.
" It lacks the 16-bit x86 compiler that is necessary to boot Linux out of real mode. For this, it calls out to GCC (the x86_32 and x86_64 compilers are its own).
It does not have its own assembler and linker; these are the very last bits that Claude started automating and are still somewhat buggy. The demo video was produced with a GCC assembler and linker.
The compiler successfully builds many projects, but not all. It's not yet a drop-in replacement for a real compiler. The generated code is not very efficient. Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled.
The Rust code quality is reasonable, but is nowhere near the quality of what an expert Rust programmer might produce. "
For faffing about with a multi agent system that seems like a pretty successful experiment to me.
Source: https://www.anthropic.com/engineering/building-c-compiler
Edit: Like I think people don't realize not even 7 months ago it wasn't writing this at all.
This has significantly helped devs and made sure that requirements are very clear.
Honestly, with the first step, it seems the PMs are already halfway there to implementation of the feature so I wonder if in the future they'll just do everything themselves and a few devs will be around as SDETs rather than full blown implementers.
This was substantially predicted by Fred Brooks in 1986 in the classic No Silver Bullets [1] essay under the sections "Expert Systems" and "Automatic Programming".
In it, he lays out the core features of vibe coding and exactly the experience we are having now with it: Initial success in a few carefully chosen domains and then a reasonable but not ground breaking increase in productivity as it expands outside of those domains.
[1] https://worrydream.com/refs/Brooks_1986_-_No_Silver_Bullet.p...
I need a python script that
1) reads /etc/hosts
2) find values of specific configured hosts (read from a .conf which) eg server1, localhost, etc
3) it'll assign a name to those configs eg if the .conf has
[Env1]
192.168.0.1 production-read
192.168.0.2 production-write
192.168.0.27 amqp
[Env2]
192.168.0.101 production-read
192.168.0.201 production-write
192.168.1.127 amqp
Basically format:
[CONFIG_NAME]
<ip> <hostname>
Like an usual hosts file
4) And each of those will be stored in memory
5) if in /etc/hosts it matches one of those, it sets the "current env" as the configname
5) It'll create an icon on the top-right of ubuntu 22 default gnome with
6) that icon could be the text of the current config name or if nothing matches, "custom" text would show
7) When the user clicks the "tray"/appindicator(or whatever gnome is calling them) it'll list the config names in a simple gtk/gnome
8) When the user clicks one config, we create a backup of /etc/hosts in ~/.config/backups/ named hosts-%UNIX_TIMESTAMP%
9) we then apply it to hosts file (find only the line with the hostnames to change and modify only those)
And that one-shotted a simple gnome app indicator env switcher. Had to fix a few lines here and there but it mostly just worked. If you give the proper spec to the LLM, it'll do it right. You can even fake a DSL to describe what you want and it'll figure it out.LLMs just take the same vague or poor requirements and make them look believable until you dig in to them.
An LLM will just say, "Sure! Here's the fully implemented code that gets the data and give it to the user. " and be done with it.
"Make a facebook clone" is the vague human promise to the end user. The reality is that it leads to so many assumptions which are insurmountable due to the vague interpretation so you have to change your requirements in the end to claim success.
Thus everything turns into a mediocre compromise. There is no exceptional outcome, which is what makes a marketable product. There are just corpses everywhere.
You need something better to both define requirements and implement them than this technology.
In several companies I have seen product managers joining teams and failing to even have minor requirement ready for months during βonboardingβ of the PM. And then code being ready but taking months to release because DevOps is busy or QA canβt find time.
The pace of release of software has been disconnected from the coding part for the longest time, and we have been quiet about it.
That's why we write programs in programming languages and not English. Because they are much more efficient at giving precise instructions than natural language.
"what does X means? how will it work?"
while a programmer will ask, about all cases.
A typical example of trying to add a new significant capability involves many meetings (days, weeks, months, etc. )with the business to understand how their work flows between systems X, Y and Z as well as all of the significant exceptions (e.g. we handle subset A this way and subset B that way, but for the final step we blend those groups together, except for subset C which requires special process 97).
Then with that understanding comes the system solutioning across multiple systems that can be a blend of internal system or vendor's system, each with different levels of ability to customize, which pushes the shape of the final solution in different directions.
There is certainly value in speeding up coding, but it's just one piece of the puzzle and today LLM's can't help with gathering the domain information and defining a solution.
In fact, these disagreements and disbeliefs create opportunities and salients in the market.
So I am spending my days gardening and obsessively working on personal coding projects with these agentic tools. Y'know, building a high performance OLTP database from scratch, and a whole new logic relational persistent programming environment, a synthesizer based on some funky math, an FPGA soft processor. Y'know, normal things normal people do.
So I know what these tools are capable of in a single person's hands. They're amazing.
But I hear the stories from my friends employed at companies setting minimum token quotas or having leaderboards of people who are "star AI coders" telling people "not to do code reviews" and "stop doing any coding by hand" and I shake my head.
I dipped my toes into some contract work in the winter and it was fine but it mostly degraded into dueling LLMs on code reviews while the founder vibe coded an entire new project every weekend.
These tools suck for team work or any real team software engineering work.
I'll just let this shake out and sit out until the industry figures it out. The only places that are going to be sane to work at are places with older wiser people on staff who know how to say "slow down!" and get away with it.
In the meantime, quantities of cut rhubarb $5 a bunch in Hamilton, Ontario area for sale. Also asparagus. Lots and lots of asparagus.
What are the chances that this is the Gell-Mann amnesia effect? Sounds like the textbook definition of it.
Personally, I find the exact opposite to be true. LLMs only help me when I already know exactly what I'm doing.
That's not my experience, especially when the inputs are bugs or performance issues. It frequently hallucinates and misdiagnosis without a guiding hand. However, it can still RCA and analyze well and improve efficiency if you keep an eye on what it's doing and push it the right direction.
> If you were to give human developers the same amount of feature/scope documentation you would also see your productivity skyrocket.
I think you run into a ceiling how fast a person can digest and analyze the info compared to a machine
This is a bold vague claim many on HN make, but never put back-of-napkin numbers on. e.g. do you think agentic Opus 4.7/GPT 5.5 are 95th percentile coders but you're 98th percentile? Or are you saying you're a middle-of-the-road 60th percentile coder and AI is 20th percentile so only 20% worst programmers should worry? Let's be specific about the claim being made.
Improved collaboration. Says every new CEO and manager. The notion that this is ever going to be solved especially with different experience, views, agendas etc needs to die too. AI is surely not going to help and with that roadblock iterating faster doesnβt help because then people want to try just for trying.
https://github.com/anthropics/claudes-c-compiler/issues/1
> Apparently compiling hello world exactly as the README says to is an unfair expectation of the software.
I am a very AI-forward person, but hallucinations are becoming more pernicious than ever even as they get less frequent, especially if the code actually works. A human absolutely has to guide these processes at a macro level for sustainability for SaaS as it evolves with business needs.
Maybe for one and done systems with no maintenance/no updates/no security patches you can reduce humans to SDETs, but systems like that are more the exception than the norm.
PMs turning their brain off and letting the LLMs extrapolate from quick and dirty bashing of text into a template (or, PMs throwing customer feedback at a slackbot to generate a jira ticket form it) can be better than PMs doing nothing but passing ill-defined reqs directly into the ticket, but that's a low bar. And it doesn't by itself solve the problems of the details that got generated for this ticket subtly conflicting with the details that got generated for (and implemented) in a different ticket 8 months ago.
I'm guessing they've tried (or been induced to try by upper management), but given up because they don't know how to debug any problems that arise due to the LLM working itself into a corner.
Coding-agent LLMs act a lot like junior devs. And junior devs are: eager to write code before gathering requirements; often reaching for dumb brute-force solutions that require more work from them and are more error-prone, rather than embracing laziness/automation; getting confused and then "spinning their wheels" trying things that clearly won't work instead of asking for help; not recognizing when they've created an X-Y problem, and have then solved for their Y but not actually solved for the original problem X; etc.
The way you compensate for those inexperience-driven flaws in junior devs' approach, is to have them paired with, or fast-iteration-code-reviewed by, senior devs.
Insofar as a PM has development experience, it's usually only to the level of being a "junior dev" themselves. But to compensate for LLMs-as-junior-devs, they really need senior-dev levels of experience.
The good PMs know all of this, and so they're generally wary to take responsibility for driving the actual coding-agent development process on all but the most trivial change requests. A large part of a PM's job is understanding task assignment / delegation based on comparative advantage; and from their perspective, it's obvious that wielding LLMs in solution-space (as opposed to problem-space, as they do) is something still best left to the engineers trained to navigate solution-space.
Just lol. Is this what you guys mean by productivity boost?
Comical. LLMβs arenβt all that great - itβs more that most orgs are horribly inefficient. Like itβs amazing how bad they are.
Thatβs why Elon succeeded with spacex - he saw how horrible inefficient the industry was. And used that thinking to take a gamble and itβs paid off.
The LLMs turn out fully formed clones of stuff for which there exists copious amounts of code openly searchable on the web doing the exact same thing.
LLMs require developer-like specification, task/subtask breakdown and detail where such example code already exists.
As a professional prior to LLMs, how many problems that you work on have many existing free solutions but you neglected to use that code and decided to spend days doing it yourself?
This is one of the reasons I like the OpenBSD and suckless projects. There are solutions that are technically correct, but are overengineered.
But yeah, this is not a "one shot" project, none of it is. One shot doesn't work even with humans - after all, this is exactly what killed waterfall as a methodology.
As an example, I did an exploratory attempt to add custom software over some genuinely awful windows software for a scientific imaging station with a proprietary industrial camera. Five days later Claude and I had figured out how to USB-pcap sample images and it's operationalized and smoothly running for months now. 100% of the code written by Claude, it's all clean (reviewed it myself) pretty much all I did was unstuck it at a few places, "hey based on the file sizes it looks like the images are being sent as a 16-bit format")
For day to day work, I'll often identify a bug, "hey, when I shift click on this graphical component, it's not doing the right thing". I go tell Claude to write a RED (failing) integration test, then make it pass.
Zero lines of code manually written. Only occasionally do I have to intervene and rearchitect. Usually thus involves me writing about ten lines of scaffold code, explaining the architectural concept, and telling it to just go
My first thought when reading Anthropic's description of the experiment was that it is unrealistically easy. It's hard to come up with realistic jobs in the 10-50KLOC range that would be this easy for an LLM. That it failed only shows how much further we still have to go.
I can make a c compiler in a couple weeks just by looking up open source libraries and copying them.
I can't make any software that people will pay me money to use without taking months/years of development, research, expiramentation and iteration.
Just because the original people who invented compilers had to be genius, doesn't mean anyone has to spend much time or thought in copying that work now.
The most difficult part of any non-trivial engineering is understanding the problem, and the first versions of a piece of software are how you reach that understanding.
That's why I do not think that AI-powered "software factories" will ever work. It's waterfall development all over again. An architect writing UML diagrams and handing them off to the team of programmers to do the essentially mundane task of implementing... the wrong thing.
AI is, however, very good at helping you go fast from the wrong first version to the less wrong second one. But you need to remember that your main task is to understand the problem that you are trying to solve.
I regularly get pieces of work someone product guy has thought up in an afternoon. They only care about the happy path, and sometimes only part of the happy path. I work for a global company that has to abide by rules and regulations in each country we operate in. The product guy thinks up some feature, we implement the feature, then we're told "actually, we legally aren't allowed to do this in 90% of the markets we operate in". Cool, so we add an ability to disable it in those markets. Then they come back "We can do this in some of those markets if it's implemented with [regulatory bureaucracy], so can you do that please".
Then we have to hack away at the solution because the deadline is right around the corner.
This is not software engineering! None of this is related to the software. The job of a software engineer is to take a list of requirements and figure out the way we accomplish those requirements. Requirements gathering is NOT a software engineering problem. Software is implementation, product is behaviour. That's the split. The behaviour of the thing we're building needs to be known before we even try to seriously build it.
If someone just held back for week and did their due diligence, we would been able to architect a solution that is scaleable, extensible, easy to maintain and can make the future easier.
Anthropic said the experiment failed to produce a workable C compiler:
- I tried (hard!) to fix several of the above limitations but wasnβt fully successful. New features and bugfixes frequently broke existing functionality.
- The compiler successfully builds many projects, but not all. It's not yet a drop-in replacement for a real compiler.
(source: https://www.anthropic.com/engineering/building-c-compiler)
Software that cannot be evolved is dead software. That in some PR communications they misrepresented their own engineer's report is beside the point.
> It compiled multiple projects successfully albeit less optimized.
150,000x slower (https://github.com/harshavmb/compare-claude-compiler) is not "less optimised". It's unworkable.
> Like I think people don't realize not even 7 months ago it wasn't writing this at all.
There's no doubt that producing a C compiler that isn't workable and is effectively bricked as it cannot be evolved but still compiles some programs is great progress, but it's still a long way off of auonomously building production software. Can today's LLM do amazing things and offer tremendous help in software development? Absolutely. Can they write production software without careful and close human supervision? Not yet. That's not disparagement, just an observation of where we are today.
And then someone copy pastes it into Claude and now those inaccuracies become part of the code and tests.
I have the feeling that every organization out there is, at least partially, focusing on process optimization, something that often happens when the market is down. These days there is also the AI angle to the entire thing, and the unrealistic expectations that follow it.
To come fully prepared for this, Iβve decided to re-read two absolute classics in this space: The Toyota way & The Goal 1. Iβve read both of these books in college, but re-reading them made me realize that a lot of these process optimization exercises are too simplistic in nature, and often misunderstand what to focus on.
Let me show what I mean.
gantt
title Project Timeline
dateFormat YYYY-MM-DD
section Scoping
Feature exploration :s1, 2024-01-01, 10d
Budget scoping :s2, after s1, 3d
Legal :s3, after s1, 10d
Documenting :s4, after s3, 5d
section Development
Exploration :d1, after s4, 25d
Software Development :d2, after d1, 70d
Documentation :d3, after d2, 5d
section Deployment
Deployment :dp1, after d2, 5d
Hyper-care :dp2, after dp1, 10d
This is a Gantt chart for demonstration purposes, normally you would look at BPMN. Showing a Gantt makes the point easier.
If you take a look at this Gantt chart you will immediately see what takes the most amount of time: software development. If your task was to improve project throughput, that would be your first stop. And that would be correct.
The problem, however, is how I typically see people go about it: throw people at the problem2 or just assume AI is going to make it so much faster.
What people typically donβt do is look at why this is taking so long, and even more importantly: long duration does not automatically mean the problem originates there.
We are now talking about software development, but this is applicable to all processes that take longer than you would like.
Every software developer knows that you canβt make projects go faster just by typing faster. If that were the case we would all be taking typing lessons.
Software development is about translating a problem into a solution that a computer can understand and automatically resolve. Preferably in a secure and scalable way.
To do something like that, you need a full overview of the problem. Either in feature or scope documents (if youβre going more waterfall), or with constant iteration with the domain experts (more agile).
This is often the part that slows down software development. Trying to figure out what a vague, title only, feature request actually means.
What does βsend mail to user once sale is completedβ mean? Ok, we can send a mail, but what should be in the mail? What if there was an issue in the sales process, do we still send an error mail? When is a sale completed?
An argument that I keep hearing about the automation of software development (AI generated code) is that you can just bypass the development part and the software developer becomes the project manager. AI discussions around software development actually illustrate this problem perfectly.
A lot of people expect the outcome of AI development to look like this:
gantt title Project Timeline dateFormat YYYY-MM-DD section Scoping Feature exploration :s1, 2024-01-01, 10d Budget scoping :s2, after s1, 3d Legal :s3, after s1, 10d Documenting :s4, after s3, 5d section Development AI development :d1, after s4, 3d section Deployment Deployment :dp1, after d1, 5d Hyper-care :dp2, after dp1, 10d
But thatβs not how this works. Here we face the exact same upstream issue as before.
Yes, AI can generate code quickly (whether thatβs a good thing is open for debate), but that doesnβt mean itβs generating the correct code.
In comparisons between human vs AI development they always ignore the handholding that is needed for AI to do its thing. It looks a lot more like this:
gantt title Project Timeline dateFormat YYYY-MM-DD section Scoping Feature exploration :s1, 2024-01-01, 10d Budget scoping :s2, after s1, 3d Legal :s3, after s1, 10d Documenting :s4, after s3, 40d section Development AI development :d1, after s3, 40d section Deployment Deployment :dp1, after d1, 5d Hyper-care :dp2, after dp1, 10d
Maybe this setup is faster compared to the old way of working. But I also think itβs an unfair comparison. Working like this requires a much deeper involvement of domain and product experts. This involvement would mean writing out every feature and bug fix down to the tiniest detail.
This exact thing is what software developers have been begging for since the beginning of the profession: Receiving a detailed outline of the problem and what the end result should look like.
If you were to give human developers the same amount of feature/scope documentation you would also see your productivity skyrocket.
If you want to speed up processes, you need to make sure that the people that need to do the work have all the means to actually do the work.
This means that if your legal approval process is going slow, you take a look at what is needed to start a legal approval process. If they need to chase five different people for incomplete documents, youβre not going to speed up said process by adding more lawyers to the department.
One of the big lessons of The Goal is: βbottlenecks should receive predictable, high-quality inputsβ.
I think that should be the first stop in process automation.
Here's a slightly more recent one focused more on comprehension/learning than productivity: https://www.anthropic.com/research/AI-assistance-coding-skil...
Metr attempted to redo that first one to get trends over time, but couldn't recruit enough developers to get reliable results for it.
Super glad to have gotten out when I did...
Even still, other professions interact with the real social world which is not necessarily the case with programming. A lawyer will always be needed because judgments are and must be made by humans only. Software on the other hand can be built and tested in its own loop, especially now with human readable specifications. For example, I wanted to build an app and told Claude and it planned out the features, which I reviewed and accepted, then it built, wrote tests, used MCPs including the browser for interacting with the UI and taking screenshots of it, finding any bugs and regressions, and so on until an hour later it came back with the full app. Such a loop is not possible in other professions.
Considering that thatβs been a running complaint for like 50 years, it doesnβt seem like project management is going to get better on its own at this point. So, yes, an LLM does represent a productivity boost in that area.
It's the equivalent of writer's block and is why a common advice given to writers is to put anything they can onto the page then edit it later.
Iβve often reimplemented things at work that exist elsewhere. If I could just copy & paste whole solutions from GitHub and change the branding/naming slightly, I could make curl in an afternoon.
I can only think of hobby projects, like writing yet another emulator, expression parser or media processor in a new language I'm trying to master.
In a professional setting, you would always diligently explore libraries and only implement your own if there is no suitable alternative.
Only when the existing free solutions are licensed with something like GPL. Now I can just say, write me a C webserver library similar to mongoose and I get the functionality without the license burden.
I read how thatβll read to VCs coming from Altman and Musk and, ow, the entire stock market just made sense for a second.
Do you have examples of (almost) entire software written by AI?
AI excels at make toy versions of software, prototypes and skeletons.
The closest things to fully functioning software created by AI that exists are all done by people that are experts in that particular field, ie software engineers.
I mean, no comment
It's the wrong thing for important things under the hood (like durability and security requirements) that are not tangible to them.
When we talk about "the" bottleneck being specs it just isnt the case that it's the only thing LLMs do poorly. Theyre really bad at a lot of stuff in the SDLC.
They're also good at providing results which are bad but look ok if you either dont look too closely or dont know what you're looking for.
Anyone who thought that gap could be shrunk substantially lives in delululand.
Hence why we havenβt seen this explosion of βreally greatβ products come out.
Many will continue to parrot βbro but the models changed I swearβ. Iβm sure they did. But youβre missing the damn point.
The dudes in Eastern-Wherever not asking what something means is the scary part. You only find out at the end how deeply confused everyone was when making the thing. You can fix it with attention and management, but then only some projects sometimes are profitably outsourced and you still need competency.
Are they reasonably documented/audited/put into any sort of version control like a lot of internal tooling? Or are they the kind of the thing that gets whacked together on the fly in a "move spreadsheet data from A to B", "I want a list of people's schedules with custom highlighting" kind of things.
Not doubting your productivity increase, I'm just curious how people quantify that when they say it.
looks like orgs have to have engineers on for optics. like having a legal staff with no lawyers, or a cybersecurity staff with no IT or certified people. Software has famously not needed state licenses or industry certification, but maybe thats a direction to consider to give utility to company optics.
Anecdotally, I see a lot of problems/solutions content about AI that doesn't reflect at all the challenges I face. But trying to tell people that there are other ways of doing things, especially when it conflicts with token-maxxing, is a lost cause
1: When was the last time you worked on a project where you thought the average IQ was 140? I donβt even think I have worked on a project where the maximum IQ was 140.
2: Who thinks the IQ of people on the project determines its success? Thereβs so much more to it than just βhigh capability team membersβ (to give IQ a generous interpretation).
3: (math joke) A sequence like (AI IQ - Human IQ) can be negative and monotonicly increasing and still never reach 0.
That's a theory but I've never seen this work in practice. A piece of software is unique. If it weren't, we'd just use the cp command.
What usually happens is you get a set of requirements that looks simple. Then you start thinking about a design and see 10 different possibilities, each corresponding to a slightly different interpretation of the requirements set. You iterate a few times reviewing the designs with who set the requirements and a few peers and see more possible variations to the requirements. You need to double check its parent requirements up to the master requirements. Then you need to take time/feature/quality tradeoffs, affecting the fulfillment of requirements.
Once starting to implement, you see dependencies to other software (framework, sdk, drivers, language features,...) and understand that other software is not what you thought, or has bugs. Or you see an issue with performance or see that one particular feature becomes unfeasible.
That's where all the complexity goes. AI doesn't change that, but can make prototyping iterations and bug hunting faster, as long as someone holds it on a leash and understands its decisions.
It has to be someone's job to push back on the Product Guy's stupid idea and answer all the awkward questions about the not-so-happy path with it. Unfortunately, because of the way we've ended up with this process, that person is often the engineer tasked with building it, without any effective political power to challenge the design process.
My senior year software engineering class had a whole section on requirements gathering.
Probably why I haven't ended up in any.
At least when the PM still wrote it you could outright tell it was bullshit and made no sense. Now that is just obfuscated.
Does anyone know how the 158000x slowdown happened? That's quite ridiculous.
I'm at a FAANG. My org is moving much more quickly, maybe between 3-10x more quickly than we were pre-AI. We aren't seeing a spike in reliability issues. Things just get done faster. An org as large as mine has no right to move as fast as it does.
When the org is misaligned, mismanaged, has poor customer feedback loops, bad product market fit, too much bureaucracy, etc etc no amount of AI slop is going to make a meaningful impact on its bottom line. In fact, it will likely do the opposite through combination of exponentially increasing complexity, combined with worker force deskilling, layoffs, and rising token prices. Real bottleneck is and always has been communication & alignment.
It might make the employees _happier_ in the interim though, which, I believe, is what we're predominantly seeing during this AI mania. People fed up with the bullshit jobs of rewriting the same service for the 5th time in 2 years or creating TPS reports weekly just for their manager to throw them directly in the trash are absolutely giddy that they no longer have to do this manually. I think we need to question the economic value of these jobs in the first place, though.
I've worked at big tech prior to LLMs becoming a thing, and consistently saw projects of 20-50 people carried by 2-3 individuals that actually understood what needed to be done. I don't think this ratio will be any better with genAI, and I also don't think that tokenmaxxing has any meaningful correlation with impact. Bullshit jobs (and questionable personal projects) just get done faster now. Yay, I guess.
If your technology relies on humans using it in ways that go against the ways they are inclined to use them, then that is an issue with the technology.
The PM has historically often not had a detailed enough mental model of the implementation to spot the hard parts in advance or a detailed enough mental model of the customer desires to know if it's gonna be the right thing or not.
Those are the things that killed waterfall.
You can use LLM tools to help you improve both those areas. Synthesizing large amounts of text and looking for inconsistencies.
But the 80th-percentile-or-lower person who was already not working hard to try to get ahead of those things still isn't going to work any harder than the next person and so won't gain much of a real edge.
We see it with code too right? Itβs harder to review code than to write it.
On top of that the LLM can work so fast that the amount of things that need validating grows!
This is where humans get lazy and the problems come in IMO. Whether its a PM not validating their ticket, or a dev doing a bad code review.
Add on to that that the incentives currently are to move fast and trust the AI.
It becomes clear to me that a lot of that review work either wonβt be done at all, or wonβt be nearly thorough enough.
Hahahahahaha. Sorry, I couldn't help myself; this reads like satire. The answer is "real life experience says otherwise".
And you now own full responsibility for maintenance.
You make it sound like writing good requirements is easy.
If it were easy we wouldn't need all these concepts around PMF, product pivots and the like. And even before that was Peter Naur's paper "Programming as Theory Building" [1].
If you truly understand the problem you're solving with software then requirements can be easy. But usually we don't, not right away, and so we have to build up our understanding of the problem first in order to solve it.
Even then, the problem we solve may not have been the problem paying users will have, so you can have "good requirements" and still have a bad business, or even the opposite where you somehow build a working business despite bad requirements, because you hit upon a customer's need quite by mistake.
Nothing about any of this precludes LLMs being helpful, though nothing guarantees LLMs will be helpful either.
[1]: https://cekrem.github.io/posts/programming-as-theory-buildin...
> What data should I retrieve, and where should I get it from? Please specify at least: ...
And it then goes on to ask just exactly what is necessary, being all constructive about it.
Can't good marketing teams, backed up by World Class Product people, sell anything we build, more or less?
</devil's advocate>
I'm not saying this is the correct thing, but companies are implementing it and it is "working". I don't think keeping our head in the sand is helping.
Humanity knows how to solve starvation. Clear routes were laid out long ago. The work is in adoption.
I got the opportunity to rewrite our aging login page just as a fun experiment. I sat down with one of our analysts and we just went to town in a zoom trying out stuff with claude until we made something pretty sweet. Ran it through all our systems for accessibility, performance, etc and it came out clean. Made a PR and fired up a test that day in production. I haven't written a lick of our front end framework ever in my entire life and we were able to build something that has had a marked improvement in our user engagement in a day.
To wit, the answer pre-AI was to hire an expert on that thing, and you would then critically assess their work product, despite being unable to build it yourself.
It's happening about 10x faster than any other I've seen or read about.
Conceive how long it took just to get barcode scanners rolled out in grocery stores. Or direct payment terminals. Or how many decades it's been getting robotics into the manufacturing of cars at scale. I worked through the .com boom and I can tell you that "webification" took 10 years or more for most businesses (and many of them now just gave up and just have a Facebook page instead etc)
This is a little insane what's happening now. It really does change everything. People who don't work in software I don't think have any idea what's coming.
I get that it's "novel" creation vs porting, but given that they reported that the C compiler cost them $20k in API costs, the Bun rewrite must be at least $200k, maybe even closer to a million. Pure madness.
At least with concurrent and distributed systems stuff (which is really all I know nowadays), it is great at getting a prototype, but the code is generally mediocre-at-best and pretty sub-optimal. I don't know if it's because it is trained on a lot of mediocre and/or buggy code but for concurrency-heavy stuff I've been having to rewrite a lot of it myself.
I think that AI is great for getting a rough POC, and admittedly often a rough POC is good enough for a project (and a lot of projects never get beyond a rough POC), but I think software engineers will be needed for stuff that needs to be more polished.
I never claimed they could! I just view this as a successful experiment. I don't think anthropic was making that claim with their experiment either.
It feels reflexive to the moment to argue against that claim, but I tend to operate with a bit more nuance than "all good" or "all bad".
I think this varies a lot. I find with a c++ project I'm working on that the LLM needs a lot of guardrails and guidance, and still gets a lot wrong. But with a vite/js project it often one shots complex and intricate changes in large codebases.
Assembler and linker are not part of a compiler. They are separate tools. They are also generally much simpler.
Developers are unlikely only doing development these days. There's ops and support to do as well, so more back and forth is less time doing those things and development.
We need to meet in the middle about requirements otherwise developers will end up doing someone else's job for them.
If you mean a human has to provide the initial impetus or spec, then no, there is no software in the world entirely written by AI.
If you mean a human provides the impetus or spec and an AI takes care of the rest, this is happening. But it is expensive so this is only really happening at FAANGs and whatnot.
Of course. The point is that a full, detailed spec isn't enough (even in the rare situations it does exist, like for a C compiler). At least for the moment, you need expert humans to supervise and direct the agents.
Vibe coders usually also let the agents write the tests, which mean that the only independent human validation of the software is some cursory manual inspection. That also obviously isn't enough to validate software.
> One shot doesn't work even with humans - after all, this is exactly what killed waterfall as a methodology.
You can one-shot a C compiler with humans. LLMs' software development ability is impressive and helpful, but it is not human-level yet, even if at some tasks the agents are better than most human programmers. And while many waterfall projects failed, many succeeded (although perhaps not as efficiently as they could have). So far I don't believe agents have been able to produce any non-trivial production software autonomously.
Honestly, I believe lower court judges will be the first job in the legal industry to become fully automated.
But the point still stands: in most contexts, the LLM will fill in the blanks with what it deems appropriate like an overconfident intern at best and a bull in a China shop at worst.
But the LLM is not aware of how the business works and why, so someone needs to work with the business to extract the information. Typically it's not well documented.
It's highly salient to management, and being forced top-down by them at 10x speed, for sure, because they see a future cost save to reduce headcount.
For certain technical roles its a force multiplier and already very saturated for sure.
On the other hand there's a lot of solution-looking-for-problem going on in large orgs where layers of management have been banging the table for 2-3 years on AI KPIs without any value being delivered.
In the weekly AI wins mail at a friends company, multiple non-technicals were bragging how AI has saved them 15 minutes a day by summarizing their morning inbox. This was the big game changer for them.
That's (as shown in my sample prompt) one great thing I've been using LLMs for: making GUIs for arcane Linux-based OS/userland settings that I have no interest in doing "sudo gedit yadda yadda" or learning man pages for. It's been 30+ years, we deserve a better desktop experience.
I've used suckless packages in the past, but it feels to me too close the GNOME/Apple way of giving zero settings and having opinionated defaults whose opinions do not ring well for me. I have zero desire to change my shortcuts/hotkeys to something random devs chose based on their past computer experience, mostly unix-based. Muscle memory > *.
If you can truly write a C compiler in weeks then kudos to you. How many compilers have you written so far for how many languages?
I work for big tech and I would say a large % of developers are incapable of producing a working C compiler on any reasonable time scale, certainly not weeks, even with looking at open source. I'm sure they can download one and run it. Most developers today don't even know C or assembler. They don't know how to approach the C language spec. The top 5-10% of developers/engineers can do it but even for them it's non-trivial.
It's when you have to iterate to handle changing business needs, scale issues, and integrate with other systems where the entropy becomes a scary concern over a long enough timeline.
And it's not just "checking" - it's wholesale rejections of code, reframing prompts to target specific classes or approaches, etc... I don't think you will take the human out planning any time soon.
Do you have any idea what has caused this engagement improvement and indeed do you actually have any metrics or is it hearsay?
It is much easier to knock something up in a day as you have done, but often the reason manual things take longer is they are based on actual testing and research which takes longer than a day however you do it. The manual way gives you much more data on the hows and whys, and will inform you much more in the future when you need to change again instead of just 'ai did it last time, lets use it again!'
I am certain I didnβt say that. To be a good product owner one needs skill, care and understanding of the business intent. If you know the business intent but lack the skill to express it as a useful requirement then itβs insufficient; if you have the skill but lack understanding or ability to understand the business intent then itβs insufficient; if you have the skill and understand the business intent but you are careless in your work then itβll be insufficient too. If the problem space is emergent then having all three might not be good enough either.
Itβs certainly true that good engineering teams can deeply understand the problem space enough to get to a business outcome without requirement documents.
I just wouldnβt bet that LLMs are going to make any of these realities any better, they might exacerbate those issues.
It's not that they're using the tool wrong, it's that the tool just isn't capable of what we see before our own eyes! I guess our eyes and ears are simply lying to us?
And then they ask for how we are managing to make things move faster. When you refuse to breach NDA and give up your competitive advantage on HN, this somehow confirms their belief that AI is useless.
Now the engineers I know that had the same skeptical tone as OP are the ones singing its praises and doing cool shit with it.
Anthropic can always fire the Opus/Mythos token machine gun on any problem (bugs, features, security) to ensure PR success, and there would be plenty of AI-sphere startups already drinking the kool-aid that would consider the whole vibe-coding thing to Bun's benefit.
This wasn't a half assed test but a legitimate effort to improve something that we never prioritized
We had a legitimate 25% reduction in users giving up logging in in a system that has millions of users.
We ran a 50-50 AB test for several weeks to confirm the data and then turned it on completely
edit: If you haven't already read my post, I'd also like to say that the benefit AI gives us is that I worked on something I never get to work on, the analyst got to try a hunch he always had, and we got to see it go live in a day. If it didn't' work out, we were out a day of work which beats the few weeks of an effort prior to AI that we would spend on something just to find out it didn't work.
I had this same discussion at work the other day. I had an 80k line generated project dropped on my plate. It doesnβt use anything built into the web framework or orm. Itβs a maintenance nightmare.
The overall impression given was inaccurate and the implicit claim of a fully working end-to-end generated compiler was inaccurate. The headlines were incomplete in a way that was intentionally misleading. It was an interesting experiment and somewhat impressive but the claims were overblown. It happens.
You can call that a success (as it did something impresssive even though it failed to produce a workable C compiler) but my point in bringing this up was to show that today's models are not yet able to produce production software without close supervision, even when uncharacteristically good specs and hand-written tests exist.
Some people are lazy, plain and simple. If they want to blindly accept what the LLM tells them without critical analysis and review then that's on them.
Normally waterfall works where the scope is extremely-well defined and articulated in design plans. Which shortens dev time because prior to AI code was mostly deterministic. Here we have to do waterfall level of documentation while iterating on a non-deterministic solution (code gen) to non-deterministic requirements (per usual).
It's bonkers.
I still think the technology is cool though.
And to answer the questioner.. Have you worked with a PM? Most of the ones I've worked with try to be simultaneously in charge yet not responsible for anything. Validating something implies skill and responsibility.
Also I was joking, I'd never do that; feels gross. But I suppose it is a legitimate "productive" use of AI.
Maybe if you include every application ever written, including every variation of "hello world", but if you are claiming that most serious production quality software could be written by a CS student who is simultaneously working on other classes, I'm gonna have to disagree with you.
That depends on how you count. By number of programs that may well be right, but that's not what matters in terms of impact on the industry, as software value roughly corresponds to the number of people working on a particular piece of software (or lines of code, if you wish). By number of people/LOC most software is not in the "simpler than a C compiler" category.
Are advanced calculators bad because a student could use the CAS to ace calculus homework, exams or the SAT without actually learning the material?
Is copy/paste bad because a person could use it to copy/paste code from one place to another without noticing some of the areas they need to update in the new location, adding bugs and missing a chance to learn some more subtleties of the system?
Is Git bad because a manager could use it to just measure performance by number of lines of code committed instead of doing more work to actually understand everyone's performance?
Many tools can be used lazily in ways that will directly work against a long term goal of improving knowledge and productivity.
This isnβt actually an argument for or against anything, I donβt know why people say this. It is entirely possible that people are using this brand new, historically unprecedented tool wrong.
Cars have been a huge success in spite of requiring people to learn a bunch of new things use them.
Reviewing code is harder than reviewing text because code does something and has interdependencies and therefore must be correct in its function, do not mix the two. This is like saying an editor reviewing an article or novel is harder than actually writing the novel which is blatantly incorrect.
I was pointing out that a simpler solution exists. I prefer simple solutions, because I want to test whatever idea I have in real world situation first before I go for a more complete one. Kinda like doodling before committing to do a sketch (or spend weeks doing a painting).
> It's been 30+ years, we deserve a better desktop experience
That desktop experience would need to be like smalltalk (where itβs trivial to modify the gui). The nice power of Unix is having the userland being actually a userland. Meaning you can design a system for your workflow and let the computer take care of that. Current desktop environment doesnβt allows for that kind of flexibility.
Also itβs the nature of unix that makes such basic utilities possible (and building them with raw xlib or tcl is easier than gtk). Imagine doing the same on macOS or Windows where everything is behind an opaque database where some other process fancies itself as its owner.
There are plenty of open source compilers that I can copy and paste whatever I need to. I don't get why you think this would have any level of difficulty?
Of course I couldn't make a brand new compiler that was better than what's out there...
Just like a game engine, I could clone one of the thousands of engines out there pretty easily - making something better or novel would be difficult. Just making a bare bones clone of what already exists by referencing documentation and pre-existing code is relatively easy now.
Yeah, when I made a mediocre 3d game engine 20 years ago, it was brain breaking difficult work. I can make one infinitely better in a micro fraction of the time now because most of the hard stuff is done and can just be looked up now.
Do you not agree?
- "CCC compiled every single C source file in the Linux 6.9 kernel without a single compiler error (0 errors, 96 warnings). This is genuinely impressive for a compiler built entirely by an AI. However, the build failed at the linker stage with ~40,784 undefined reference errors."(https://github.com/harshavmb/compare-claude-compiler)
- Overall itβs an interesting experiment, and shows the current bleeding edge of Claudeβs Opus 4.6 model. However the resulting product is also a clear example of the throwaway nature of projects generated almost entirely by AI code agents with little human oversight. The prototype is really impressive, but there is no real path forward for it to be further developed. It can build the Linux kernel [for RISC-V], which is impressive. It can also build other thingsβ¦ if you are lucky, but you really cannot rely on it to work. (https://voxelmanip.se/2026/02/06/trying-out-claudes-c-compil...)
Anthropic themselves said that the codebase was effectively bricked and that their agents could not salvage it.
Yes, that's certainly a fair assessment, especially the more it convinces software developers they can talk to the LLM rather than talking to users.
I'm pretty sure it still saves me time, and if nothing else it's an excuse to write TLA+, and that's fun.
Can they, though? They tried and failed to do it in their C compiler experiment. The experimenter wrote: "I tried (hard!) to fix several of the above limitations but wasnβt fully successful. New features and bugfixes frequently broke existing functionality."
Do Firefox not have tests? Then how was there over 200 CVEs found?
Are we going to be comfortable running a piece of software that has 1M lines, and who knows how many zero-days will be in it.
Yes, sure they are going to use LLM to find the CVE's, and so will the hackers. You need a day or two to fix the security issue, a hacker just need to put it in use.
And good luck debugging a million line code base.
1M LOC == already failed.
In the long run these highly inefficient firms are going to get destroyed by people who have a vision and can do what 100+ firms are doing and package it together as one solution that is far superior on dimensions that matter to firms.
ok, so for some of the jobs we're doing plausible sounding goo is just fine. and that's kinda sad. but the 'just playing around' case is fine for PSG, this isn't a serious effort but just seeing how things might work out without much effort.
taking the remainder, where understanding and intent are important, the role of the ai is produce PSG, but the intentional person now goes through everything and plucks out all the nonsense. this may take more or less time than simply writing it, but we should understand this is resulting in less real engagement by the ultimate author. where this is actually interesting is a parallel to Burrough's cutup method - where source text and audio were randomly scrambled and sometimes really clever and novel stuff pops out.
but to say the current model of vibe coding has much to offer in the second case is really quite unclear. to the extent to which coding is the production of boilerplate is really a problem with APIs and abstraction design. if we can get LLMs to mitigate some of that I the short term without causing too much distraction, that's fine, but we should really be using that to inform the solution to the fundamental problem.
so for me what's missing in your model is how LLMs are supposed to be used 'properly'. I don't think laziness is really the right cut here, make-work is make-work, and there's plenty of real work to be done. but in what sense does LLM usage for code actually improve our understanding of these systems and get us more agency?
The classic "you're holding it wrong" was about the iPhone 4: sure, people could learn to hold the iPhone in such a way that they didn't block the particular parts of the antenna that were (supposedly) the problem. But "holding an iPhone" is a fairly natural thing to do, and if the way that people are going to do it naturally doesn't allow its antenna to connect properly, then that's a technology problem, not a human problem.
If the selling point for AI is "you can just talk to it, and it will do stuff for you!" (which may or may not be yours, personally, but it is for a lot of people), then you have to be able to acknowledge that "describing a problem or desire using natural language" is something that humans already do naturally. Thus, if they have to learn to describe their problem in very specific ways in order to get the AI to do what they want, and most people are not doing that, then that's a failure of the technology.
For the specific case at hand, what's being described is similar to the problem of self-driving cars: you're selling the benefit as being the AI taking a lot of the work off your shoulders; all you have to do is constantly check its work just in case it makes a mistake. Which is something that we already know, empirically and with lots and lots of data, that humans are bad at.
Once again, it's a technology issue. Not a human issue.
This is why LLMs are really great 'knocking off the todo/wishlist' of things you always meant to do. The problem, as far as broader discussions of 'productivity multipliers' or 'total factor productivity' go is that there's a certain perverse diminishing returns to such wishlist items (if each item was all that important, why didn't it get done before?), they generally only apply to a small part of a large complicated whole (what % of your ecosystem/business/community as a whole is the login page, as pleasing and profitable as that fix is relative to the investment? Probably not a big %), and they are also finite (what happens when you have worked through your backlog of lowhanging fruit?).
Example: I got Claude to generate a language server for TLA+ so I could have nice integration with Neovim. It took like 45 minutes of arguing with Claude and then it worked fine. This is incredibly low-stakes stuff: realistically the worst case scenario is that the text in the file gets screwed up, and I'm somewhat protected by Git if that happens.
That said, I am a little concerned how cavalier people have been deploying AI code everywhere. I don't want pacemaker firmware to be written by some intern in an afternoon with Claude.
Edit: Maybe uncharitably is too strong, but we're talking past each other.
I don't think they tried to do that though.
> today's models are not yet able to produce production software without close supervision, even when uncharacteristically good specs and hand-written tests exist.
That's a good point anyway
Sure. You can clone gcc and build it. You can close a game engine and use it.
> People who use AI because they are trying to avoid doing work fall into a completely different category than people using AI as a force multiplier and for skills/capabilities enhancements / quality improvement.
This statement is absolutely true. There are ways to use LLM tools to significantly improve the quality of your work instead of to avoid doing hard work. (And the result can easily become something that requires more hard thought, not less.)
Some that I frequently enjoy that are usable even if you don't want the machine to generate your actual code at all: * consistency-check passes asking it to look for issues or edge cases * evaluation of test coverage to suggest any missed tests or proposed new ones * evaluation of feasibility of different refactoring approaches (chasing down dependencies and call trees much more faster than I would be able to do by hand, etc)
> to the extent to which coding is the production of boilerplate is really a problem with APIs and abstraction design. if we can get LLMs to mitigate some of that I the short term without causing too much distraction, that's fine, but we should really be using that to inform the solution to the fundamental problem.
I generally would disagree with this, though. I don't think there's solely a problem with abstraction design, I think the inherent complexity of many systems in the business world is very high (though obviously different implementations make it different levels of painful). If that's a problem, it's a people/social one, not a technology problem.
In my future we lean into the fact that people want features, they want complexity, for many things - everybody's ideal just-for-them workflow/tooling would look slightly different than the next person's - and use these tools to build things that do more, not less. Like the evolution of spellcheck from something you manually ran, to something that constantly ran, to something that can autocorrect generaly-usefully when typing on a touchscreen.
Let's get back to finding more features/customization to delight users with.
Nobody "deserves" anything. They do have the jobs though. Thinking that the world isn't full of people doing what they need to do to get by who don't give a shit about fitting a fantasy ideal is wild.
Cars can take you from place to place much faster than a horse can, all you have to do is learn to drive and constantly keep your hand on the wheel.
Part of using a technology is, well, learning how to use it. It's not the technology's fault that humans are lazy or not able to pay attention and crash.
I feel compelled to point out to you that this is a completely unsustainable, unsupportable, unsubstantiable claim. You have met ~0% of PMs, and of the ones you've met maybe you've experienced a non-zero percentage of their work, but statistically that's also very unlikely.
If you think you can say what most PMs do or what PMs are likely to do, then, I'm sorry, but you are not even thinking like an engineer. You're thinking, actually, a lot more like a PM to many of us.
> just like good devs
I'm so sorry, my sides just can't handle the starry-eyed nature of these takes. This is just too much for me.
To many of us this reads like you've never met people before. But who knows, maybe you live in Lake Wobegon, where all the women are strong, all the men are good-looking, and all the children are above average! If so then we're jealous, but you still should be more careful about how unrigorous your mental model is because it will make you a worse engineer.
Experience with different PMs and developers aside, the older you get in the profession the more you will hopefully realize that none of your quality effort fantasy matters. Sales happen and money rolls in independently of whether you think the PMs or the people who call themselves engineers do a "good job". Businesses thrive on sales and marketing, not engineering.
> It's 2026 and the idea that even with detailed-enough requirements you can one-shot even a workable (let alone perfect) solution also needs to die.
and brought up the failed anthropic experiment as proof of that. Yes, you are talking past each other, but that is not pron's fault. It is your fault.
Their compiler fails to compile (well, at least link) some C programs altogether, and in other cases it produces code that is 150,000x slower than a real C compiler with optimisations turned off (interestingly, the model trained on the real compiler's source code). That's not "not competitive" but "cannot be used in the real world". But even more importantly, the compiler cannot be fixed or evolved. It's bricked (at least as far as today's models' capabilities go). For any kind of software, not being able to improve or fix anything or add any new feature means it's effectively dead.
You could not use it in production even if no other C compiler existed.
As to your latter point, not sure why you think I think business doesn't continue on even with bad employees, of course it does and I didn't say otherwise. But that does not mean they're doing a good job, those two are orthogonal concepts.
And I'm not sure how we even got to this, the original point was that I personally as a dev can physically see PM productivity increasing with AI, even as other devs in this thread seem not to. For a competent PM, a tool that automates a detailed first draft fundamentally changes the psychology of ticket creation. If your argument is just "bad PMs will still be bad," then sure, I agree, but that doesn't really engage with how the tooling changes the workflow for everyone else.
- John Carmack embedded a C compiler and interpreter/runtime into Quake back in the mid 1990s as a scripting language! It was that efficient that it could be used in a real time 3D shooter. That's a solo effort as a minor component of a much larger piece of software.
- I've seen university CS courses hand out "implement a C compiler" as a homework / project exercise for students. It's not particularly difficult.
Sure, a modern C compiler like GCC has to handle inline assembly, various extensions, pragmas, intrinsics, etc... but like you said, all of those are thoroughly documented and have open source implementations to reference.
Similarly, the Rust compiler is implemented in Rust and could be used as an idiomatic reference for a generic compiler framework with input handling, parsing, intermediate representations, and so forth.
I would bet that those things are also true of at least one expensive commercial C compiler.
Uh. We're not talking about knowing what good is, which is completely irrelevant to anything in this thread. You made a claim without qualification about what it is more likely for PMs to do. I can't tell if you've lost the chain or are engaging in some kind of motte and bailey fallacy. Either way it's a bad sign for this conversation.
I'm going to summarize the threads so far. I hope it highlights why what you've said sounds so silly:
Someone: "I see X failing to do Y."
You: "X definitely do Y. Why would you think that X aren't doing Y? Doing Y is the obvious thing for X to do."
Someone: "I literally am seeing it happen right now."
You: "Well then those X are bad."
Someone: "Yeah, no shit. They just said as much."
You: "But most X would do Y."
Someone: "In my experience that is false."
Someone else: "Mine too."
Someone else: "Mine as well."
Someone else: "Same."
You: "The bad ones shouldn't have their jobs."
Someone: "They do though."
You: "But we can tell which ones are the bad ones."
Someone: "Bartender, another drink please."