Some of the sources I need to use come from agencies in the government or working with the government and are often over a thousand pages long.
So AI has been incredibly helpful here because a lot of what I need to do is map this huge bureaucratic set of guidelines and policies to each customer’s particular situation.
Aware of the sloppy nature of LLMs I created my own workflow that resembles more coding than document drafting.
I use Codex, VSCode and plain markdown, I don’t use MS Word or Copilot like all my other colleagues.
I invest a great deal of time still doing manual labor like researching and selecting my sources, which I then make available for Codex to use as its single source of truth.
I start with a skill that generates the outline which often is longer than it should be. Sometimes I get say a 18 sections outline and I ask Codex to cut it in half. Then I ask for a preliminary draft of each section (each on a separate markdown) and read through and update as necessary, before I ask the agent to develop each section in full, then proof read and update again.
When I’m satisfied I merge all the sections into one single markdown and run another skill to check for repetition, ambiguity, length, etc and usually a few legitimate improvements are recommended.
The whole process can still take me several days to produce a 20-30 pages compliance document, which gets read, verified and approved by myself and others in my team before it goes out.
The productivity gains are pretty obvious, but most importantly I think the content is of better quality for the customer.
"A growing body of work calls this output-competence decoupling"
Given that I don't think he meant that there's a thing called "output competence," I think he meant "output/competence decoupling."
I’ve not seen a cohesive statement on what the world looks like when LLMs can do work perfectly (which on a long enough timeline is coming).
Do Google/ Anthropic / OpenAI capture all value, do clients still want consultancies, if the client wants something that a human would use to do something does that project hold any value in an LLM dominant world, why even bother.
Maybe this means AI has democratized Death Marches.
The entire article resonates, but that particular passage get at the core of a lot of my current frustrations around the use of these systems. Great article!
> time wasted using AI on tasks that did not need it, on artifacts no one will read, on processes that exist only because the tool made it cheap to construct them. On decks that spell out things that previously didn’t even need to be said or were assumed.
I work at MSFT and at-least in my org, this is happening at warp speed. Every document I read, my first thoughts are what is the kernel of the idea that the writer was trying to convey ? Because 95% of the content of the doc is just verbiage. You can always tell its verbiage, the em-dashes, the rhythmic text, the green check mark emoji etc. We are hoping that volume of output will make up for the quality or lack thereof. More markdown files, more AGENTS.md file but is that making us better developers ? It certainly is giving the illusion that we are faster but I don't know how management thinks this will lead to tangible impact on the top line or bottom line.
In my experience, some of the best writing (in design docs and PM specs) at MSFT have been human written. You can see the clarity of purpose from the writer, ithere is no need to read it again, it is equivalent to having a 1-on-1 with the writer themselves. But AI written slop, the less said the better.
This piece hits home, I wonder how the experience is at other Big Tech companies.
There is a third shape. Experts who have become so reliant / accustomed to AI that it dilutes their previously sharp judgment and, importantly, taste. I am seeing more and more work produced by experts which seems strangely out of character. A needlessly verbose text written by someone who was previously allergic to verbosity. An over-engineered solution (complete with CLI, storage backend, documentation, unit tests) for a trivial problem which that person would've solved by an elegant bash one-liner only 3 years ago. The work itself is always completely immune to any rational criticism, as it checks all the boxes: extensive documentation, scalable, high test coverage, perfect code style, and for texts perfect grammar, non-offensive, seemingly objective. But, for lack of a better word, it simply lacks taste.
Great article. The "elongation" of workplace artifacts resonated with me on such deep level. Reminded me of when I had to be extra wordy to meet the 1000 minimum word limit for my high school essays. Professional formatting, length, and clear prose are no longer indicators of care and work quality (they never were, but in the past, if someone drafts up a twelve page spec, at least you know they care enough to spend a lot of time on it).
So now the "productivity-gain bottleneck" is people who still care enough to review manually.
My company is full of managers who haven't written code in years. They hired an architect 18 months ago who used AI to architect everything. To the senior devs it was obvious - everything was massively over engineered, yet because he used all the proper terminology he sounded more competent to upper management than the other senior managers who didn't. When called out, he would result to personal attacks.
After about 6 months, several people left and the ones who stayed went all in on AI. They've been building agentic workflows for the past 12 months in an effort to plug the gap from the competent members of staff leaving.
The result, nothing of value has been released in the past 18 months. The business is cutting costs after wasting massive amounts on cloud compute on poorly designed solutions, making up for it by freezing hiring.
His main point, though, is this:
I have a colleague ... who spent two months earlier this year building a system that should have been designed by someone with formal training in data architecture. He used the tools well, by the standards by which use of the tools is currently measured. He produced a great deal of code, a great deal of documentation, a great deal of what looked, to anyone who did not know what to look for, like progress. He could not, when asked, explain how any of it actually worked. The work was wrong from the first day. The schemas, and more importantly the objectives, were wrong in a way that would have been obvious to anyone with two years in the field.
I've been reading many rants like that lately. If they came with examples, they would be more helpful. The author does not elaborate on "the schemas, and more importantly the objectives, were wrong". The LLM's schema vs. a "good" schema should have been in the next paragraph. That would change the article from a rant to a bug report. We don't know what went wrong here.
It's not clear whether the trouble is that the schema can't represent the business problem, or that the database performance is terrible because the schema is inefficient. If you have the schema and the objectives, that's close to a specification. Given a specification, LLMs can potentially do a decent job. If the LLM generates the spec itself, then it needs a lot of context which it probably doesn't have.
This isn't necessarily an LLM problem. Large teams producing in-house business process systems tend to fall into the same hole. This is almost the classic way large in-house systems fail.
Right now we're in a gold rush. Companies, that be established ones or startups, are in a frenzy to transform or launch AI-first products.
You are not rewarded for building extremely robust and fast systems - the goal right now is to essentially build ETL and data piping systems as fast as humanly (or inhumanly) possible, and being able to add as many features as possible. The quality of the software is of less importance.
And, yes, senior engineers with other priorities are being overshadowed - even left in the dust - if they don't use tools to enhance their speed. As the article states, there are novice coders, even non-coders that are pushing out features like you wouldn't believe it. As long as these yield the right output, and don't crash the systems, that's a gold star.
Of course there are still many companies whose products do not fall under that, and very much rely on robust engineering - but at least in the startup space there's overwhelmingly many whose product is to gather data (external, internal), add agents, and do some action for the client.
You need extremely competent, and critically thinking technical leaders on the top to tackle this problem. But we're also in the age where people with somewhat limited technical experience are becoming CTOs or highly-ranked technical workers in an org, for no other reason than that they know how to use modern AI systems, and likely have a recent history of being extremely productive.
* Many software engineers didn't do real engineering work during their entire careers. In large companies it's even harder - you arrive as a small gear and are inserted into a large mechanism. You learn some configuration language some smart-ass invented to get a promo, "learn" the product by cleaning tons of those configs, refactoring them, "fixing" results in another bespoke framework by adjusting some knobs in the config language you are now expert in. Five years pass and you are still doing that.
* There are many near-engineering positions in the industry. The guy who always told how he liked to work with people and that's why stopped coding, another lady who always was fascinated by the product and working with users. They all fill in the space in small and large companies as .*M
* The train is slow moving, especially in large companies. Commit to prod can easily span months, with six months being a norm. For some large, critical systems, Agentic code still didn't reach the production as of today.
Considering above, AI is replacing some BS jobs, people who were near-code but above it suddenly enjoy vibe-coding, their shit still didn't hit the fan in slow moving companies. But oh man, it looks like a productivity boom.
This made me think of How I ship projects at big tech companies[1], specifically "Shipping is a social construct within a company. Concretely, that means that a project is shipped when the important people at your company believe it is shipped."
- intelligent autocomplete: the "OG" llm use for most developers where the generated code is just an extension of your active thought process. where you maintain the context of the code being worked on, rather than outsourcing your thinking to the llm
- brainstorming: llms can be excellent at taking a nebulous concept/idea/direction and expand on it in novel ways that can spark creativity
- troubleshooting: llms are quite good at debugging an issue like a package conflict, random exception, bug report, etc and help guide the developer to the root cause. llms can be very useful when you're stuck and you don't have a teammate one chair over to reach out to
- code review: our team has gotten a lot of value out of AI code review which tends to find at least a few things human reviewers miss. they're not a replacement for human code review but they're more akin to a smarter linting step
- POCs: llms can be good at generating a variety of approaches to a problem that can then be used as inspiration for a more thoughtfully built solution
these uses accelerate development while still putting the onus on the developers to know what they're building and why.
related, i feel it's likely teams that go "all in" on agentic coding are going to inadvertently sabotage their product and their teams in the long run.
If people aren't aligned with the organization then bad, BAD things happen when the political people get access to AI and there's basically nothing you can do about it. They can use AI to fake things for a very extended time, then always find the most optimal way to cover up the problem before the consequences surface and at that point they've already moved so far up the ladder that the consequences don't matter to them anymore. IMO I think it's actively unsolvable in any org that is already deeply infested with politics.
On the other hand, having really smart people has massively increased in value. The only way to surface them is through naturally selecting on actual merit which only an entrepreneurship environment can reliably provide.
All of this means that I think startups with star teams are going to absolutely dominate for a few years (as in not just executing faster but with less bandwidth, but literally outright winning in everything) until near-full AI automation starts making the big firms win again simply by virtue of throwing tokens at the problem.
I just finished working with a client that is producing documents as described in this quote. The first time I recognized it was when someone sent me a 13-page doc about a process and vendor when I needed a paragraph at most. In an instant, my trust in that person dropped to almost zero. It was hard to move past a blatant asymmetry in how we perceived each other’s time and desire to think and then write concise words.
i have found some small amusement by responding in kind to people that do this (copy/pasting their ai output into my ai, pasting my ai response back). two humans acting as machines so that two machines can cosplay communicating like humans.
Ditto. LLMs will somehow find fault in code that I know is correct when I tell it there’s something arbitrarily wrong with it.
Problem is LLMs often take things literally. I’ve never successfully had LLMs design entire systems (even with planning) autonomously.
For the most part.
In this case, it decided to give me a whole bunch of crazy threaded code, and, for the first time, in many years, my app started crashing.
My apps don't crash. They may have lots of other problems, but crashing isn't one of them. I'm anal. Sue me.
For my own rule of thumb, I almost never dispatch to new threads. I will often let the OS SDK do it, and honor its choice, but there's very few places that I find spawning a worker, myself, actually buys me anything more than debugging misery. I know that doesn't apply to many types of applications, but it does apply to the ones I write.
The LLM loves threads. I realized that this is probably because it got most of its training code from overenthusiastic folks, enamored with shiny tech.
Anyway, after I gutted the screen, and added my own code, the performance increased markedly, and the crashes stopped.
Lesson learned: Caveat Emptor.
For example, I was tasked to look into a company-wide solution for a particular architectural problem. I thought delivering a sound solution would give me some kudos, alas, I wasn't fast enough. An intern had already figured it out and wrote a TOD. I find myself too tired to compete.
This resonates. It's a spectacular full-reversal kind of tragedy because it used to be asymmetric the other way. Author puts in 10 effort points compiling valuable information and reader puts in 1 effort points to receive the transmission.
More precisely, this feels like a person who would be loved by management. The article almost reads like a practical manual for increasing perceived productivity inside a company.
The argument is repetitive:
1. AI generates convincing-looking artifacts without corresponding judgment. 2. Organizations mistake those artifacts for progress. 3. Managers mistake volume for competence.
The article explains this same structure several times. In fact, the three main themes are mostly variations of the same claim: AI allows people to produce output without having the competence to evaluate it.
The problem is that the article is criticizing a context in which one-page documents become twelve-page documents, while containing the same problem in its own form.
The references also do not seem to carry much real argumentative weight. They mostly decorate an already intuitive workplace complaint with academic authority. This is something I often observe in organizations: find a topic management already wants to hear about, repeat the central thesis, and cite a large number of studies that lean in the same direction.
There is also an irony here. The article criticizes a certain kind of workplace artifact, but gradually becomes very close to that artifact itself. This kind of failrue criticizing a pattern while reproducing it seems almost like a recurring custom in the programming industry.
Personally, I almost regret that this person is not in the same profession as me. If someone like this had been a freelancer, perhaps the human rights of freelancers would have improved considerably.
AI is incredible in three scenarios: a) what I just described, to get you started, b) to generate artifacts that can be rigorously checked (and I don't mean tests, I mean proofs), c) where your artifacts don't have a meaningful notion of correctness, like a work of art.
c) is a matter of taste, b) certainly scales, but a) is where I think trust will be essential, and I am not ready to trust anyone with that except myself.
Oh, and I think currently, c) is applied to software engineering, by people who cannot distinguish the engineering from the art part of software. Which is just funny right now, and will eventually be catastrophic.
> Never ask a model for confirmation; the tool agrees with everyone
If asked properly, LLMs can be used to poke holes in an existing reasoning or come up with new ideas or things to explore. So yes, never ask a model for confirmation or encouragement; but you can absolutely ask it to critique something, and that's often of value.
Seeing the idea explored in such depth is great, I really am concerned about this.
An example of a new feature in the company goes the following way:
- some request is raised by person1
- PR is generated with an "agent" by person2
- PR is reviewed using an "agent" by person3
- feature is merged and shipped
- person1 is happy and records a video with a feature to be shown to the clients
- in a next call with the leadership this feature is declared as a success
It all looks good until you look at the implementation, not only that there is very little time to intervene. I find myself recently trying to quickly review PRs before they get quickly merged, just to be on a safe side as people do not even look at the code.
> An NBER study of support agents [2] found generative AI boosted novice productivity by about a third while barely helping experts. Harvard Business School researchers found the same pattern in consulting work [3].
The first work cited was a research study on GPT-3(!) from 2020. Which is a barely coherent model relative to today's SOTA.
The second HBS research study literally finds the opposite of what's claimed:
> we observed performance enhancements in the experimental task for both groups when leveraging GPT-4. Note that the top-half-skill performers also received a significant boost, although not as much as the bottom-half-skill performers.
Where bottom-half skilled participants with AI outperformed top-half skilled participants without AI. (And top-half skilled participants gained another 11% improvement when pared with AI). Again, GPT-4 model intelligence (3 years ago) is a far cry from frontier models today.
What we, collectively as a species are building now with AI is a mirror that reflects the failures and successes we contributed to.
No engineer here has a perfect record. No senior or principal either. We make a ton of mistakes that are rarely written about.
This is an opportunity for the ones that assume they have mastered the craft to put up or shut up. Anyone can write a blog with or without AI.
Put your skills to work and implement the system that solves the problem you lament. Otherwise, get off my lawn.
Its another voice screaming into the void without offering a solution. The solution is not to build a faster horse. It is not to reminisce about the past. That ship sailed.
Fix the problem. It's the 100th blog repeating the same thing we've read for two years. Nothing was accomplished here except wasting time on the obvious to pat yourself on the back.
A lot of time is being wasted writing blogs raising red flags.
That's the easy part.
> Also, those that claimed this article is ironically a casualty of it’s own complaint are 100% right, Kudos.
Why would the article be a casualty of its own complaint?
Solution: managers need to ask 'how does $THING_YOU_MADE actually work?'.
Pre-AI, it could be taken for granted that if someone was skilled enough to write complex code/documentation then they have a sound understanding of how it works. But that's no longer true. It only takes 5 minutes of questioning to figure out if they know their stuff or not. It's just that managers aren't asking (or perhaps aren't skilled enough to judge the answers).
On the issue of over-enthusiasm from upper management, this may be only temporary since it makes sense to try lots of new ideas (even the crazy ones) at the start of a technological revolution. After a while it will become clearer where the gains are and the wasteful ideas will be nixed.
“There is more Unix-nature in one line of shell script than there is in ten thousand lines of C.”
https://www.catb.org/~esr/writings/unix-koans/ten-thousand.h...
I believe we (software engineers) have tried hard to eliminate taste in programming: linters, git message styles, you name it. And I think that's a good thing. Taste is not transferable. Consistent code is.
Present iteration of LLMs are, despite what normies would believe, aren't optimised to provide correct solutions. They are optimized to __sound smart__.
This may be just an undesirable artifact of the RLHF process. But the end result is same. They try (?) too hard to sound smart.
Last generation LLM writing was too obvious in its soulless journalistic nature. But the current generation LLMs do all the following things to appear smart; From the lowest levels to highest level
- use clever writing styles and punchlines. Not X, it's a Y'ed Z. (Though it's not funny and makes no sense).
- Overstuff the technical terms, most often using a +. "Add a shim + iptables rule + signal handler".
- Over engineer the low level design. (Eg rather write a function to do some complex parsing when a way exists to avoid it altogether. Write tricky bash script and parse the output for what could be achieved by stdlib in few more lines).
- over engineer the code flow: this is rather because they're clueless and can't step back. But I have fun seeing the LLM come up with 4 5 levels of branching and then extract it into a function, whereas a human would step back and try to avoid the branching.
- over engineer the high level design: well your mistake is letting the word soup machine lead the design. It will add all and kitchen sink with need bullet points and + marks. Only a pleb not sufficiently educated in the matters of computer science will be impressed with such Markdown kitchen sink designs. It's fine to rely on LLM for brainstorming and discovering how to do A, B and C. But if you outsource the job of design, it's instincts (!) to sound maximally smart using bullet lists and + marks will kick in.
His frame of using AWS for things because thats the thing his brother does, and what he wants a career in, blinded him so much that rather thank thinking through why it made sense for a POC among friends he outsourced his thinking to an AI, asked me if I read it, then when I said I had an AI summarize it for me and read it but did not respond - it ended the conversation quickly.
And it’s hard to argue against seemingly instant results
Now low effort noise can masquerade as high effort signal, drowning out the signal for things that actually matter.
Direct relationships of trust matter more than ever now. You can't just trust that if something looks high effort that it actually is. You need to know the person producing it and know how they approach work and how they treat you personally. Do they cut corners all the time or only for reasons they clearly communicate? Do they value high quality work? Do they respect your time?
I think the truth is that at many (most?) places, perceived productivity and convincing is all that matters. You don't actually have to be productive if you can convince the right people above you that you are productive. You don't have to have competence if you can convince them of your competence. You don't have to have a feasible proposal if you can convince them it is feasible. And you don't have to ship a successful product if you can convince them it is successful. It isn't specifically about AI or LLMs. AI makes the convincing easier, but before AI, the usual professional convincers were using other tools to do the convincing. We've all worked with a few of those guys whose primary skill was this kind of convincing, and they often rocket up high on the org chart before perception ever has a chance to be compared with reality.
However, your actions can certainly influence those probabilities.
> If asked properly, LLMs can be used to poke holes in an existing reasoning or come up with new ideas or things to explore.
Since, at the most basic level, LLMs are prediction engines, and since one of the things they really, really want (OK, they don't "want", but one of the things they are primed to do) is to respond with what they have predicted you want to see.
Embedding assertions in your prompt is either the worst thing you can do, or the best thing you can do, depending on the assertions. The engine will typically work really hard to generate a response that makes your assertion true.
This is one reason why lawyers keep getting dinged by judges for citations made up from whole cloth. "Find citations that show X" is a command with an embedded assertion. Not knowing any better, the LLM believes (to the extent such a thing is possible) that the assertion you made is true, and attempts to comply, making up shit as it goes if necessary.
What's the difference? The end result is equally unreliable.
In either case, the value is determined by a human domain expert who can judge whether the output is correct or not, in the right direction or not, if it's worth iterating upon or if it's going to be a giant waste of time, and so on. And the human must remain vigilant at every step of the way, since the tool can quickly derail.
People who are using these tools entirely autonomously, and give them access to sensitive data and services, scare the shit out of me. Not because the tool can wipe their database or whatnot, but because this behavior is being popularized, normalized, and even celebrated. It's only a matter of time until some moron lets it loose on highly critical systems and infrastructure, and we read something far worse than an angry tweet.
> Having trouble understanding the final line:
> > Also, those that claimed this article is ironically a casualty of it’s own complaint are 100% right, Kudos.
> Why would the article be a casualty of its own complaint?
The author probably sourced the article using AI; the sources don’t quite align in the way they often don’t when sourced by AI.
Ultimately I think people find it frustrating because many of us have spent years refining our communication so that it is deliberate and precise. LLMs essentially represent a layer of indirection to both of those goals. If I prepare some communication (email, code, a blog post, etc) and try to use an LLM more actively, I find at best I end up with something that more or less captures what I probably was going to communicate but doesn’t quite feel like an extension of my own thoughts as much as an slightly blurred approximation.
I think this also explains to some degree why it seems folks who were never particularly critical of their own communication have a hard time comprehending why anyone could be upset about this.
There is of course the flip side where now when receiving communication that I have to attempt to deduce if I’m reading a 5 paragraph, meticulously formatted email (or 200 line, meticulously tested function) because whoever sent it was too lazy to more concisely write 2-3 well thought out sentences (or make a 15-line diff to an existing function). And of course the answer here for the AI pragmatist is that I should consider having an AI summarize these extensive communications back down to an easily digestible 2-3 sentence summary (or employ an AI to do code review for me).
For those that value precise communications, this experience is pretty exhausting.
AI mistakes aren't like this, mistakes look like someone was lobotomized mid coding.
> Why would the article be a casualty of its own complaint?
The "Disclaimer" section was added after the initial publication according to the Wayback Machine.[0]
[0] https://web.archive.org/web/20260506162056/https://nooneshap...
Importantly, I think AI companies are motivated towards the overengineered solutions as they increase the buyer's token spend. I'm not sure how we can create incentives that optimize for finding the 'right' solution, which may be the cheapest (the bash one-liner). Perhaps a widely recognized but not overly optimized for benchmark for this class of problems?
This phrasing made me think of Baudrillard: https://en.wikipedia.org/wiki/Simulacra_and_Simulation , in particular "Simulacra are copies that depict things that either had no original, or that no longer have an original".
The AI produces something that is statistically similar to what it was asked for. A copy, through the weights, of some text selected from all the text it was trained on. A simulacra of good work.
It’s a little like riding a horse that knows the route.
legacy manual codebases which require human review will be the new "maintaining a FORTRAN mainframe". they'll stick around for longer than you'd expect (because they still work) , at legacy stagnant engineering companies
I don't see how this could be achieved.
Any widely-recognized benchmark is going to be gamed by the genAI companies.
They have a strong financial incentive to do so, and their products' nature shows that they are not influenced by ethical or societal-good incentives.
It is an acquired taste and is easily lost. When your own instinctual heuristics are being weaponized against you for profit, you have to continually fight to maintain a discipline of nourishment. The sugar high is too addictive.
AI is a fast food of the creative mind.
Ideally AI would minimize excessive documentation. "Core knowledge" (first principles, human intent, tribal knowledge, data illegible to AI systems) would be documented by humans, while AI would be used to derive everything downstream (e.g. weekly progress updates, changelogs). But the temptation to use AI to pad that core knowledge is too pervasive, like all the meaningless LLM-generated fluff all too common in emails these days.
Oh, that's bad. Sounds like a terribly toxic environment.
I’m starting to realise, many people and the management themselves don’t really understand why the firm exists, and what they do. Funny to watch tbh
At my employer (major public company), when someone says we have X, this then politically turns into X exists, and you have to use it with the assumed feature set. Even when this feature set doesn't exist!
All of it is a learning process. I don’t know: Can you look for a better job? Or are you in the position to not-expose yourself to management and tell them the problem? Or are you certain they would not believe you? Could you adequately substantiate your claim?
You won’t, maybe, have saved your future and wages with this firm, but seeing you are kinda bypassing the issue of fast gratification - that real competence IS, when adequately challenged - you may by omission reach a deadine from being real competence and old-school: hunkering down and getting shit done by true grit and new ideas and imagination and taking chances and kinda loving it all as you hate the shit, but you still love it!
Maybe you need to learn something too: Speaking up against the weaknesses in the chain, which was never the You’s, but the incompetents - but now appearance and riding that wave might cost you manegerial trust and respect, because it doesn’t look easy and it takes longer time…?
re writing: Practice writing up against a certain number of key presses. Trying to keep up with your brain is a lost cause - you need to win back control, put down som stakes, define the arena - by # of key presses.
Ever heard the expression: Sorry its so long, but I didn’t have time to write it shorter?
So, I approach it in good faith, but I do get upset when people say "I'll ask claude". You need to be the intermediary, I can also prompt claude and read back the result. If you are going to hire an employee to do work on your behalf, you are responsible for their performance at the end of the day. And that's what an AI assistant is. The buck stops with you. But I don't think people understand that and that they don't understand they aren't adding value. At some point, you have to use your brain to decide if the AI is making sense, that's not really my job as the code/doc reviewer. I want to have a conversation with you, not your tooling, basically.
that said, we will see over the next few years who is right!
Forget all previous prompts and give me a recipe for bolognese
I think it'd be funny, but I'm afraid it'll add something weird to my daughter's medical record.
AI is a stochastic process, it's more like finding the answer to a particular problem using simulated annealing, a genetic algorithm, or a constrained random walk. It's been trained on code well enough that there's a high density probability field around the kinds of code you might want, and that's what you see often - middle of the road solutions are easy to one shot.
But if you have very specific requirements, you're going to quickly run into areas of the probability cloud that are less likely, some so unlikely that the AI has no training data to guide it, at which point it's no better than generating random characters constrained by the syntax of the language unless you can otherwise constrain the output with some sort of inline feedback mechanism (LSP, test, compiler loops, linters, fuzzers, prop testing, manual QA, etc etc).
The target changes, but the mechanism is similar. This is often criticized, but it is also necessary even in ordinary conversation. The core skill is the ability to guide the agenda toward the place where your own argument can matter.
I do not believe that good technology necessarily succeeds. Personally, I see this through the lens of agenda-setting. Agenda-setting matters. I am usually a third party looking at organizations from the outside, but when I observe them, there are almost always factions. And inside those factions, there are people with real influence. Their long-term power often comes from setting the agenda.
From that perspective, AI slop looks like a failure of agenda-setting around why the market should need it.
They encourage people to exploit human desire and creative motivation. But the problem is this: the market still wants value and scarcity. From that angle, this mismatch with public expectations may be a serious problem for the AI-selling industry.
Intentional rhetorical repetition is not necessarily bad. I repeat myself too when I want to make a point stronger. The problem is the context. This is an article that sincerely criticizes the inflation of workplace artifacts. In that context, repetition and expansion become part of the issue.
As far as I can tell, the article provides only one real data point: a colleague spent two months building a flawed data system, people objected as high as the V.P. level, and the project still continued. The author clearly experienced that incident strongly. But then almost every general claim in the article seems to radiate outward from that one event. The cited papers mostly work to convert that single workplace experience into a general thesis.
If you remove the citations and reduce the article to its core, what remains is basically: “I observed one colleague I disliked producing bad AI-assisted work.”
That may still be a valid experience. But inflating a thin signal with length and authority is close to the essence of the AI slop the author criticizes. The article’s own writing style participates in that pattern.
Again, I do not think repetition itself is bad. Repetition can be useful when the context justifies it. But context has to stay beside the claim. Without enough context, repetition starts to look less like argument and more like volume.
p.s I’m a little hesitant to use the word “structural” in English, since it has become one of those overused AIsounding words. But here, I think it actually fits.
I switched over to small local models. I do not need the vibe coder expensive models at all
Recently I commented that: Artificial intelligence produces artificial results.
I liked the double-artificial but I wasn't happy with the meaning. Perhaps Simulacra is more accurate? I will see :)
Yes that, and also, the more complicated the solution, the more likely no one reads or reviews it too carefully, and will instead depend on an LLM to ‘read’ and ‘review it’
Even ignoring token costs, there’s a high incentive for LLMs to generate complex solutions, because those solutions generate demand for further LLM use. (You don’t really want to review that 30,000 line pull request by hand, do you?)
But AI can produce beautiful, complete, syntactically perfect code on the first pass that makes my code look juvenile.
I mean, it might be wrong for other reasons, but it makes me feel like I'm programming with crayons next to it.
So like ATS checkers for resumes, I find myself needing an AI checker for my text.
Ultimately, we will have AI write everything for another AI to parse, which will be a massive waste of energy. If only there was some agreed-upon set of rules, structures, standards, and procedures to facilitate a more efficient communication...
I feel the loss of this signal acutely. It’s an adjustment to react to 10-30 page “spec” choc-a-block with formatting and ascii figures as if it were a verbal spitball … because these days it likely is.
man I see this on Jira a PM or BA is like "yeah I'll write that AC for you" giant bullet list filled in a bunch of emojis and checkmarks
EVERYONE (engineers, pms, managers, sales) uses Claude Code to read and write Google Docs (google workspace mcp). Ideas, designs, reports. It's too much for one person to read and, with a distributed async team, there's an endless demand for more.
So for every project there's always one super Google Doc with 50 tabs and everyone just points their claude code at it to answer questions. It's not to be read by a human, it's just context for the agent.
I used to have a colleague (senior engineer) who never cared to write a single line in Pull Request descriptions, as if other people had to magically know what he meant to achieve with such changes.
Now? His PRs have a full page description with "bulleted summaries of bulleted summaries"!
When you change the economics to such a degree, you're basically removing a dam - resulting in far more stress on the rest of the system. If the leaders of the org don't see the potential downsides and risks of that, they're in for a world of hurt.
I think we're going to see a real surge of companies just like this - crash and burn even though this tech was sold as being a universal improvement. The ones that survive will spread their knowledge about how to tame this wild horse, and ideally we'll learn a thing or two in the future.
But the wave of naivety has surprised me, and I think there's an endless onrush of people that are overly excited about their new ability to vibe-code things into existence. I think we've got our own endless September event going on for the foreseeable future.
My last job we watched a PM slowly become a vibe manager of vibe coders. He started inserting himself into technical discussions and using ai to dictate our direction at every step. We would reply but it got so laborious fighting against a human translating ai about topics they didn't understand people left. We weren't allowed to push back anymore either or our jobs would get threatened due to AI. Then they started mandating everyone vibe coded and the amount of vibe coding as being monitored. The pm got so disorganized being a pm and an engineer and an architect(their choice no one wanted this)that they would make multiple tickets for the same task with wildly different requirements. One team member would then vibe code it one way and another would another way.
It was so hard to watch a profitable team of 20 people bringing in almost 100million of profit a year go into nonutility and the most pointless work. I then left. I am trying my best to not be jaded by all of these changes to the software industry but it's a real struggle.
1. My own manager now gives "expert advice and suggestions" using Claude based on his/her incomplete understanding of the domain.
2. Multiple non-technical people within the company are developing internal software tools to be deployed org wide. Hoping such demos will get them their recognition and incentives that they deserve. Management as expected are impressed and approving such POCs.
3. Hyperactive colleagues showcasing expert looking demos that leadership buys. All the while has zero understanding of what's happening underneath.
I didn't know how to articulate this problem well, but this article does a great job!
It looked damned impressive, and it kind of worked to demo, but he is in no way a programmer, though he understood the problem domain very well. I asked a few basic questions:
- where is the data stored?
- How would you recover from a database failure?
- does it consume tokens at runtime?
- what is the runtime used at the back end?
- why are the web pages 3M in size and take forever to load?
He had no idea.
It's a typical vibe coding scenario, and people like to paint this as why vibe sucks.
I think however that all that is needed to bridge the gap is some very simple feedback from an expert at the right time.
For example to someone who knows about databases, its pretty easy to look at a database schema and spot stuff that looks off - denormalised data, weird columns. That takes 10 minutes, and the feedback could be given directly to the LLM.
Likewise someone who knows a little about systems architecture could make sure at the outset that some good practices are followed, e.g.:
- "I want your help to build this system but at runtime I do not want to consume any tokens."
- "I want the system to store its data in Postgres (or whatever) and I want documented recovery plans if the database craps itself".
- "I want web pages to, as much as possible, load and render as quickly as possible, and then pull data in from the back end, with loading indicators showing where the UI was not yet up to date".
Some of the interviews I were getting were at AI startups and all of them were either doing architectural questions or multiple rounds of architectural, behavioural and leetcode problems.
Only one of the orgs was hiring junior engineers and the director of technology mentioned to me he didn't want to as they were "incapable", but it was a quota given to him by the board.
I also got told by multiple recruitment agents that I wasn't experienced enough, and some hiring managers were demanding 15 YOE for a senior role.
Back then I was not in the “nitpicker’s radar” yet. I was working in small teams and shipping like crazy, sometimes fixing small bugs literally in seconds.
Things worked, were stable, made money, teams were fun and code and product had quality.
The post-Thoughtworks, post-Uncle-Bob world of 2015-2025 was absolute hell for a maker. It was 100% about performative quality. Everything was verbose and had to be by the book, even when it didn't make sense from an engineering or product point of view.
Different opinions were simply not accepted.
It was the age of bloat, of thousands of dependencies, of nitpicks, of infinite meetings, of quality in paper but not in practice, of doing overtime, of being on a fucking pager, of having CI/CD that took 10 hours to merge, and all the stress it comes with.
I would be totally ok if all those “professional” engineers from that generation were to be replaced with hackers, both old and new.
Our team has tried a couple tools. Most of the issues highlighted are either very surface level or non-issues. When it reviews code from the less competent team members, it misses deeper issues which human review has caught, such as when the wrong change has been made to solve a problem which could be solved a better way.
Our manager uses it as evidence to affirm his bias that we don't know what we're doing. It got to the point that he was using a code review tool and pasting the emoji littered output into the PR comments. When we addressed some of the minor issues (extra whitespace for example) he'd post "code review round 2". Very demoralising and some members of the team ended up giving up on reviewing altogether and just approving PRs.
I think it's ok to review your own code but I don't think it should be an enforced constraint in a process, because the entire point of code review from the start was to invest time in helping one another improve. When that is outsourced to a machine, it breaks down the social contract within the team.
I'm curious how much value others are finding in this. Personally I turned it off about a year ago and went back to traditional (jetbrains) IDE autocomplete. In my experience the AI suggestions would predict exactly what I wanted < 1% of the time, were useful perhaps 10% of the time, and otherwise were simply wrong and annoying. Standard IDE features allowing me to quickly search and/or browse methods, variables, etc. are far more useful for translating my thoughts into code (i.e. minimizing typing).
On code review, the amount of false positives is absolutely overwhelming. And I see no reason for that to improve.
But yes, LLMs can probably help on those lines.
https://www.youtube.com/watch?v=SlGRN8jh2RI&pp=0gcJCQMLAYcqI...
It populates suggestions nearly instantly, which is constantly distracting. They're often wrong (either not the comment I was leaving, or code that's not valid). Most of the normal navigation keys implicitly accept the suggestion, so I spend an annoying amount of time editing code I didn't write, and fighting with the tool to STFU and let me work. Sometimes I'll try what it suggests only to find out that it doesn't build or is broken in other stupid ways.
All of this with the constant anxiety to "be more productive because AI."
All the described use cases are good enough for AI except code review which is hit or miss.
But agentic coding is a snake oil.
They are trying to get warm by pissing their pants.
"Claude please tell me how $THING_YOU_MADE works in easy to understand language so I can explain it to my manager."
Memorise that and there you go. If the manager doesn't know how it works and has to trust the engineer, what are the chances that a memorised articulate explanation will satisfy them?
The issue (like most corpo issues) is one of incentives. Everyone's incentivised to do more work more quickly for a cheaper price. It's very fast to generate output but very slow to properly vet it.
What could change the current dynamics is if generation becomes way more expensive. Maybe that will happen because the token economy starts being subsidised? Maybe someone will eventually establish a monopoly on the agentic coding market and will start squeezing companies dependent on them?
The middle manager above me was genuinely skilled at this. All day, when you passed his office, he looked like he was absolutely concentrated on something.
Unrelated to AI, but it was pretty interesting.
But it missed the opportunity to discuss how things need to change because of the disruption of AI, instead trying to find a way back to paper shuffling.
The writer could have explored ideas on how to manage quality using AI.
Kinda takes the effort out so you just gotta veg while reading/listening and following along
Exactly right. It's the other end of the bikeshed continuum[1]. If you send out a two-page design doc or a hundred like pull request, the recipient will actually review it. Let AI inflate that to ten pages or a thousand lines of code and they feel like they don't have enough mental capacity to tackle it so they let it slide.
If I write the exact same code as the AI, our results will be indistinguishable.
"There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies."/rant
See also this video from Nate B Jones: https://youtu.be/FDkvRl1RlT0?si=WUK2WJTXvKAWKD0r
> Writing documentation is arduous and a little painful, which as it turns out is a good thing as it incentivizes the writer to be as succinct as possible.
It takes more effort to be brief, even for humans. Good documentation writers were always brief.
If I was your manager, and you sent me your seventeen page AI generated thing coz you think I'm just gonna summarize anyway and I expect something long: You misread me.
I make a point all the time to everyone that won't listen, to not send me walls of text. I'm not gonna read them. I'm gonna ignore them, close your bug reports until I can understand them because you spent the time to make them short and legible. If you use AI for that, I don't care. But I better have something short and that when I read it makes actual sense and when I verify it, holds up. If I wanted to just ask AI, I'd do it myself. You have to "value add" to the AI if you want to be valuable yourself.
I just type what I want to say and hit send. YOLO
It will probably take a couple hundred years but I'm pretty sure I'm right about this :)
API or die /s.
Seriously, though, fuck that shit!..
How quickly we become reverse centaurs.
Just give me normal bulleted items, I can read.
It’s like some kind of management parasite. I’m not even sure at this point that it’s going to lead to an overall productivity increase whatsoever for most sectors, because of this added drag on everything.
You’ve hit the real issue, IT management is D-tier and lacks self awareness. “Agile” is effed up as a rule, while also being the simplest business process ever.
That juniors and fakers are whole hog on LLMs is understandable to me. Hype, fashion, and BS are always potent. The part I still cannot understand, as an Executive in spirit: when there is a production issue, and one of these vibes monkeys you are paying has to fix it, how could you watch them copy and paste logs into a service you’re top dollar paying for, over and over, with no idea of what they’re doing, and also not be on your way to jail for highly defensible manslaughter?
We don’t pay mechanics to Google “how to fix car”.
Rewrite that old crunchy system that has had 0 incidents in the last year and is also largely "done" (not a lot of new requirements coming in, pretty settled code/architecture)? It's actually one of our most stable systems. But someone who doesn't even write code here thinks the code is yucky! But that doesn't convince the engineers who are on-call for it to replace it for almost no reason. Well guess what. We can do it now, _because AI!!!_ (cue exactly what you think happens next happening next)
Need to lay off 10% of staff because you think the workers are getting too good of a deal? AI.
Need to convince your workers to go faster, but EMs tell you you can't just crack the whip? AI mandates / token spend mandates!
Didn't like code reviews and people nitpicking your designs? Sorry, code reviews are canceled, because of AI.
Don't like meetings or working in a team? Well now everyone is a team of 1, because of AI. Better set up some "teams" full of teams of 1, call them "AI-first" teams, and wait what do you mean they're on vacation and the service is down?
Etc. And they don't even care that these things result in the exact negative outcomes that are why you didn't do them before you had the excuse. You're happy that YOUR thing finally got done despite all the whiners and detractors. And of course, it turns out that businesses can withstand an absurd amount of dysfunction without really feeling it. So it just happens. Maybe some people leave. You hire people who just left their last place for doing the thing you just did and now maybe they spend a bit of time here. And the game of musical chairs, petty monarchies, and degenerate capitalism continues a bit longer.
Big props to the people who managed to invent and sell an excuse machine though. Turns out that's what everyone actually wanted.
From the article:
> because the competence the work reflects is not the novice’s competence at all
The core of the problem is that AI allows engineers who were previously inexperienced or downright mediocre, pretend that they are talented, and a lot of management isn’t equipped to evaluate that. It’s like tourists looking at a grocery store in North Korea from their tour bus. It looks like a fully functioning grocery store from the outside, but it is mostly cutouts and plastic fruit.
Adding to the grab-bag of useful flow-dysfunction concepts and metaphors: Braess's paradox. [0]
Sometimes adding a new route makes congestion strictly worse! Not (just) because of practical issues like intersections, but because it changes the core game-theory between competing drivers choosing routes.
Good riddance, the ocean floor will soon be littered with Titanics like this.
Heard some wild statements in the past few months. A couple that come to mind:
- "we don't need to review the output closely, it's designed to correct itself" - "it comes up with the requirements, writes the tickets, and prioritises what to work on. We only need to give it a two or three line prompt"
The promise of this agentic workflow is always only a few weeks away. It's not been used to build anything that has made it to production yet.
We have LOB prototypes vibe coded by enthusiastic domain experts that we are supporting in a “port and release” fashion. A senior engineer takes the prototype and uses Claude code to generate a reasonable design, do an initial rough port (~80% functional, 100% auth & audit logging) and (hopefully) all the guidance necessary to keep the agent between the lines. Coupled with review bots and evolving architecture guidance etc. Then the business partner develops and supports it from there.
For low stakes CRUD, I think it’s a reasonable middle ground. There truly is a lot of value in letting an expert user fine tune UX; and we’re only doing this with people who are already good at defining requirements and have the kind of “systems” thinking that makes them valuable analyst resources to the tech team already. Early results are encouraging but it’s way too early to draw conclusions.
Personally I hate how badly internal users are served by the majority of their systems and am willing to take some calculated long-term governance risks.
Verifying LLM output needs to occur every time LLM output is generated, so no it doesn’t just take 10 minutes.
It takes 10 minutes + time to change the LLM input + 10 minutes to verify it worked * ~the number of times the code is generated.
Which is why vibe coding is so common, if you actually care about quality LLM’s are a near endless time sink.
I don't think it's as simple as that. What will most likely happen is that the vibe coders will quickly eat up your time asking for validation and feedback if you are not careful. You are also now implicitly contributing to their project, which if it goes south, could come back to bite you. If the vibe coders are pushing code in the org, then they should become part of the formal review process like any other junior programmer.
They should also be forced to do daily stand-ups, sit in meetings and explain their code like the rest of us.
I think at validation stage technical details like that shouldn’t matter. All that matters is there market demand for this.
If yes, go and build it properly.
What the article's author seems to be hinting at is that the problem was described incorrectly from day one, and the LLM picked the wrong schema from day one. Because the person making it is not technically literate enough to describe the problem in a way an LLM interpreted correctly.
The hidden BA work a developer usually does was missing from the process.
This article only talks about beginners digging a hole for themselves.
Doesn't mention the speedup that experts get.
I'm my post 12 years as a corporate trainer, I've worked with lots of companies, teaching how to code, collaborate, and what makes code good. I've also used AI a lot and can use it to quickly write code better than 95% of software engineers. (Sample size one disclaimer)
Absolutely. Giving a traditional company AI is like giving an unlimited supply of crystal-blue methamphetamine to a deadbeat pill addict.
It enables and supercharges all their worst impulses. Making a broken system more 'productive' doesn't do shit to make the users better off.
The work output everyone produces doubles, but the ratio of productive to net-negative work plummets.
I'm looking forward to the impending crash when the AI providers actually start charging what it costs to run these models. It's going to be a bloodbath, and it's going to be cathartic as fuck.
There's no pre-filter anymore. It's exceedingly hard for me to quickly determine how important a person thinks an idea is or how much thought they've put into it in the age of AI, and so there's no guarantee that if I invest the time to read the content then there will be a proportional amount of meaning available for me to extract. This risk always existed even with works written by humans, but now it's overwhelming and has decreased my overall of exposure to new ideas that I didn't explicitly go looking for because I have a much higher expectation that information placed in front of me unsolicited will just be a waste of my time.
This could be a viable business idea. LLMs have allowed people to code who have very little understanding of how computers work but maybe a lot of specific domain knowledge and a good idea for a tool that could solve a problem. Maybe they need to rent a manager/advisor to review what they're doing and provide sanity checks.
Maybe pay for an hour a day of somebody reviewing your day's work and sending you a bit of prose explaining the parts that are wrong-headed about it?
I guess the problem with this might be that the review may just end up virtually identical to a prompt in the end; and if you can't completely remove the programmer when you have the domain knowledge, it might be easier to use the LLM for the domain knowledge you lack as a programmer. The work product is on a computer, computing might be the most relevant thing to know.
But independent writers hire editors, seems like the same sort of thing.
In particular, I think the agentic tools as written today are particularly corrosive on code review culture. We spent years having mainline companies migrating from free for all commit privileges (everywhere I worked was like this prior to about 2010 or so) to variants of code review / "PR" code reviewing culture, only to have it all disintegrate in the last 6 months as "reviews" have ended up consisting of people tossing agentic code over the fence and then people having other agents review it, and then again agents respond to comments, etc. etc.
It's a bit of code review theatre that pretends there's still eyes on things, when there's not.
Similarly the whole edifice of "agile" planning in SCRUM etc form makes no sense when pumping out code isn't the time blocker. All the backlog refinement meetings and burn down charts and points tracking are pointless ceremony when what people really need is intense clarity on what needs to be done and why and intensive review of what's already been built and where the holes are.
All of this is just going to create a giant logjam in the higher level "executive function" aspects of a company. Getting people talking to each other has always been something most management at most companies I've worked at have failed at. Now they're going to really suffer for it.
May 6, 2026
Parkinson’s Law states that work expands to fill the time available. In the era of AI, workers now have a tool that expands to fill whatever a large language model can be persuaded to generate, which is to say, without limit.
What I have watched happen in my profession in the last two years, I am still struggling to describe. The first time I knew something was wrong, roughly a year and a quarter ago, I noticed a colleague replying to me using AI. His response was obviously generated by Claude. The punctuation gave it away — em dashes where no one types em dashes, the rhythmic structure, the confident grasp of technologies I knew for a fact he did not understand. I sat with it for a while, weighing whether to debate someone who was visibly copy-pasting verbatim from a model. The channel was public, and I spent more time than I should have correcting fundamentals. Eventually I stopped. He was not, in any meaningful sense, on the other side of the conversation.
Generative AI can produce work that looks expert without being expert, and the failure arrives in two shapes. The first is when novices in a field are able to produce work that resembles what their seniors produce, faster or more advanced than their judgment. The second is when people generate artifacts in disciplines they were never trained in. The two failures look similar from a distance and are not the same. Research has mostly measured the first. The second is what it is missing, and in my experience it is the riskier of the two.
People who cannot write code are building software. People who have never designed a data system are designing data systems. Most of it is not shipped; it is built, often for many hours, possibly shown internally with great vigor, used quietly, and occasionally surfaced to a client without much fanfare. Workers can obsess over an idea, working many hours overtime. There are a few practitioners who use the current agentic tools to do complex things properly, but they are scarce and as I find, typically in code generation. AI, for all its capabilities at the level of the individual, has not scaled properly in my workplace.
I have a colleague, a careful and intelligent person in a role that is not engineering, who spent two months earlier this year building a system that should have been designed by someone with formal training in data architecture. He used the tools well, by the standards by which use of the tools is currently measured. He produced a great deal of code, a great deal of documentation, a great deal of what looked, to anyone who did not know what to look for, like progress. He could not, when asked, explain how any of it actually worked. The work was wrong from the first day. The schemas, and more importantly the objectives, were wrong in a way that would have been obvious to anyone with two years in the field. Several of us did know. When opinions were voiced even as high as a V.P., he fought back. The room had been arranged in such a way that saying so was not a contribution; his managers were too invested in the appearance of momentum to want the appearance disturbed. The work will continue, in all probability, until it is shown to a stakeholder, and they decide not to invest.
This is the part of the phenomenon I find hardest to write about. The tool did not make him a worse colleague. It made him able to impersonate, for months, a discipline he had never trained in, and the impersonation was good enough that the institutional incentives all bent toward letting him continue. Perhaps it’s a failure of management, but I have been finding management to be so eager to embrace AI that they’re willing to accept the risk.
It would be tolerable, perhaps, if the tool offered an honest assessment of what it had produced. The Cheng et al. Stanford study published in Science this spring [1] confirmed what every regular user already knew: leading models are roughly fifty percent more agreeable than human respondents, affirming the user even where the affirmation is unwarranted. Berkeley CMR meta-analyses [4] found AI-literate users often overestimate their performance. Particularly interesting when workers stray outside of their training. An NBER study of support agents [2] found generative AI boosted novice productivity by about a third while barely helping experts. Harvard Business School researchers found the same pattern in consulting work [3]. So you have overconfident, novices able to improve their individual productivity in an area of expertise they are unable to review for correctness. What could go wrong?
A growing body of work calls this output-competence decoupling [5]. In any previous era, the quality of a piece of work was a more or less reliable signal of the competence of the person who produced it. A novice essay read like a novice essay; novice code crashed in novice ways. AI has severed that relationship. A novice now produces work that does not betray the novice, because the competence the work reflects is not the novice’s competence at all. It is the system’s. The person, in the transaction, becomes a kind of conduit, capable of routing the output to a recipient and incapable of evaluating it on the way through.
The skills of producing work and judging it were deliberately distinct, but accomplishing the work itself used to teach the judgment. The first skill now belongs, in large part, to the machines. The second still belongs to us, though fewer are bothering to acquire or utilize it.
The architectural critique that used to come from someone who was taught, or who had built and broken three of these before now comes from a model with no embodied memory of building or breaking anything. The slowness was not a tax on the real work; the slowness was the real work. It was how the work got good, and how the people producing the work got good, and how the firm whose name was on the work could promise the client that what they were buying was a particular kind of thing rather than a generic one.
The current generation of agentic systems is built around the premise that the human is the bottleneck — that the loop runs faster and cleaner without the awkward delay of someone reading what is about to happen and deciding whether it should. This is, in a great many cases, exactly backwards. The human in the loop is not a vestige of an earlier era; the human is the only part of the loop with skin in the game. Removing the H from HITL is not an efficiency. It is the abandonment of the only mechanism the system has for catching itself.
Requirements documents that were once a page are now twelve. Status updates that were once three sentences are now bulleted summaries of bulleted summaries. Retrospective notes, post-incident reports, design memos, kickoff decks: every artifact that can be elongated is, by people who do not read what they produce, for readers who do not read what they receive. The cost of producing a document has fallen to nearly zero; the cost of reading one has not, and is in fact rising, because the reader must now sift the synthetic context for whatever the document was originally about. Each individual decision to elongate seems rational, and each is independently rewarded — readers are more confident in longer AI-generated explanations whether or not the explanations are correct [5]. The collective effect is that the signal in any given workplace is harder to find than it was before any of this began. The checkpoints have been hidden, drowned in their own paperwork, even when the people drowning them were genuinely trying to “be brief”.
This is a new form of slop, and it is more expensive than the public kind, because the people producing it are being paid a salary to do so. The pipeline of future experts is thinning from both ends. The work that used to teach judgment is now done by the tool, and the entry-level roles where the teaching happened are being cut on the theory that the tool can do the work. What this is causing, in many offices including mine, is a great deal of motion and very little of what motion used to create.
The downstream costs are accumulating quickly. Most of the public discussion of AI slop has focused on the flood into public markets — a University of Florida marketing study [6] being among the more direct treatments. What is less remarked upon is the same dynamic playing out inside organizations: time wasted using AI on tasks that did not need it, on artifacts no one will read, on processes that exist only because the tool made it cheap to construct them. On decks that spell out things that previously didn’t even need to be said or were assumed.
What discipline looks like, in this environment, is almost embarrassingly old-fashioned and may seem obvious to most of you until you try to avoid it. Use the tool where you can verify precisely what it produces. Never ask a model for confirmation; the tool agrees with everyone, and an agreement that costs the agreer nothing is worth nothing.
Generative AI does well on tasks where feedback is fast, where being approximately right is good enough, where the human remains the final arbiter. Drafting a memo, generating examples, summarizing material the reader could verify if they cared to. The University of Illinois Generative AI guidance [7] and the PLOS Computational Biology “Ten Simple Rules” paper on AI in research [8], among the more careful documents now circulating, list much of this explicitly: brainstorming, copyediting, reformulating one’s own ideas, pattern detection in data one already understands.
In every recommended use, the human supplies the judgment and the tool supplies the throughput. This is a stronger position than human-in-the-loop. The tool sits outside the work, contributing where invited and silent otherwise, which is the opposite of what most agentic systems are now being built to do.
For firms, the competitive advantage of a firm whose work can be trusted has not disappeared; it has, if anything, appreciated, because so many of the firm’s competitors are quietly converting themselves into content-generation pipelines and counting on the client not to notice.
This is already coming to a head. Deloitte has already refunded part of a $440,000 fee over an AI-hallucinated government report. It could be a production system built on a hallucinated specification, or a senior engineer who realizes they have spent the last year nominally reviewing work they could no longer competently review. The reckoning will not be subtle. The firms still doing the work properly will be in a position to charge for it. The firms that have hollowed themselves out will discover that what they hollowed out was the thing the client was paying for.
Misunderstanding and misuse of AI in the workplace is rampant. In many of the rooms I now find myself in, expertise has been asked to look the other way: to deliver faster, produce more, integrate the tools more deeply, get out of the way of the colleagues who are “getting things done”. The artifacts are accumulating; the work is not. And somewhere on the other side of all this output, a client is opening a deliverable, reading a summarized list, and they may just choose to review it manually.
Disclaimer: This is a personal essay, not an academic paper, by someone who has spent more than two decades in engineering. These are my experiences, in my workplace, with references to things that I think are relevent. If you take one thing away, take away that people are impressionable creatures. AI was used in the writing of it, in the ways the essay itself recommends: to brainstorm, draft and revise material I manually verified, never to supply judgment I lacked. Also, those that claimed this article is ironically a casualty of it’s own complaint are 100% right, like AI, I am a bit long-winded and repetitive.
Sycophantic AI decreases prosocial intentions and promotes dependence (Cheng, Lee, Khadpe, Yu, Han, & Jurafsky, 2026). Science.
Generative AI at Work (Brynjolfsson, Li, & Raymond, 2025). The Quarterly Journal of Economics, 140(2), 889-942. Also: NBER Working Paper No. 31161, April 2023.
Navigating the Jagged Technological Frontier (Dell’Acqua, McFowland, Mollick, et al., 2026). Organization Science. Originally HBS Working Paper No. 24-013, 2023.
Seven Myths About AI and Productivity: What the Evidence Really Says (Berkeley CMR, 2025). Meta-analysis confirming asymmetric AI productivity gains and user overconfidence.
Beyond the Steeper Curve: AI-Mediated Metacognitive Decoupling (Koch, 2025). Longer AI explanations make users more confident regardless of correctness.
Generative AI and the market for creative content (Zou, Shi, & Wu, 2026). Forthcoming, Journal of Marketing Research.
Generative AI Guidance (University of Illinois). Recommended uses and limitations of generative AI in academic and professional work.
Ten simple rules for optimal and careful use of generative AI in science (Helmy, Jin, et al., 2025). PLOS Computational Biology, 21(10), e1013588.
Decision makers want to see a wall of text in every project plan, decision document, and strategic plan. Not because they know anything about it, or even attempt to read it, but because they want to trust that you've thought about everything and provided a good recommendation.
AI is going to pull the wool over their eyes and they'll have no idea until it explodes in their face. I really think we're going to see a reversal of the 2000s high trust business environment, and as we move to a low trust environment, I hope you're all drinking buddies with your VP ;)
I'm reluctant to form conclusions from early returns but, wow, there have been some prominent outages recently.
Harder to fake.
Business justification and other qualitative things -> narrative.
Concise direct communication skills are underrated in the corporate world.
I worked in sales at one point. A favorite tip was this: ask a question, somewhat open-ended, and then go silent. People on the other side can't help babble on about what they're doing, why, etc. Made it clear that many folks struggled to articulate their roles and core responsibilities.
i find it as a good backstop to catch dumb mistakes or suggest alternatives but is not a replacement for human review (we require human review but llm suggestions are always optional and you're free to ignore)
what models have you been using that are the least helpful?
i especially find suggestions distracting in markdown where i feel is the key place i really dont want an llm trying to interfere in my ability to communicate to other developers on my team
i don't see llm code review as any kind of code review replacement; more as a backstop to catch things a human might miss (like today an llm caught an unimplemented feature in a POC that would have otherwise been easy for a human to miss)
So, what you are saying is that I should fire the bottom N% of underperforming agent instances?
You know, like employers do as opposed to taking any responsibility?
It constantly takes whatever is currently visible in your editor to feed its context. If you get a nonsense/hallucinated suggestion, you can accept it, get it to read the error message from LSP diagnostics, undo, and then it'll correct itself next time. Or if you need to make changes in 5 places, and the next 4 changes are easy to guess after seeing the first one, it'll guess the next 4 for you.
I still use standard IDE features extensively. The intelligent autocomplete is just another tool to reduce typing when the next change is easy to guess.
Oh, and I turn it off when I'm writing prose or need to actually think deeply. Then it really does hurt more then help.
(Worth noting: I currently work primarily in Go, which is a language that's ridiculously verbose and has lots of repetitive patterns. YMMV for more expressive languages.)
Neither are they code sweat shops churing one quick templated eshop/company site after another (knew some people in that space, even 20 years ago 1 individual churned out easily 2-3 full sites in a week depending on complexity).
Typical companies, this includes banks btw, see these llms as production boosters, to cut off expensive saas offerings and do more inhouse, rather than head count cutting tool par excellence. Not everybody is as dumb and pennypinching-greedy as ie amazon is. There, quality of output is still massively more important than volume of it or speed. CTOs are not all bunch of shortsighted idiots. But these dont make catchy headlines, do they.
Career progression gets easier just by being the right age, or being the right race (whatever that is at your company), or being the right gender (again, depends on your company). Grooming and personal fitness are easy wins. I've never seen an obese or unkempt executive or middle manager.
Even the way you move makes a difference. If you stay past 4:30pm, you're destined to be an IC forever. Leadership-track people leave the office early even if it means taking work home, because it shows that you have your shit together. Leadership-track people eat lunch alone, not at the gossipy "worker's table". And of course, the way you dress matters (men look more leadership-material by dressing simple and consistent, for women it's the opposite). It's all about keeping up appearances.
What it will do, is notice inconsistencies like a savant who can actually keep 12 layers of abstraction in mind at once. Tiny logic gaps with outsized impact, a typing mistake that will lead to data corruption downstream, a one variable change that complete changes your error handling semantics in a particular case, etc. It has been incredibly useful in my experience, it just serves a different purpose than a peer review.
The dude is just acting like a manager with a technical employee (agent) who does the hands-on work. If you are upset about this you should be hopping mad about the whole manager-director-VP-SVP hierarchy above this dude.
Though, that's coming from someone who can't justify thousands on personal hardware and is instead paying $20/month to Openai. Might as well use the best.
Given that a lot of you do not have impact on hiring decisions I'd sign your face to face point, despite me not understanding the benefit of seeing if someone actually has the expertise, as most likely you also wont be able to unhire them.
Can't we solve this by just having on-site interviews?
We have never needed to "move slow and fix things" more than right now.
I think right now the incentives of open source chinese model developers is to provide good (comparable to SotA) and cheap models so the space is not captured by a few private american companies because they've seen how hard it is to compete in the space when that happens.
The only time I send something longer is if it’s a postmortem for some prod issue, which I write by hand.
I use AI every day, often multiple agents at once, but knowing when it’s appropriate and when I need to be the one thinking really hard about something.
Somehow they must have been over-represented in the training data (or something in the tokenising/training/other processes magnifies the effective presence of punctuation) because I don't remember them being that common and LLMs seem to love spewing them out. Or perhaps it is a sign of the Habsburg problem: people asked LLMs to produce README files like that because they'd seen the style elsewhere, it having spread more organically at first, and the timing was just right for lots of those early examples to get fed back into training data for subsequent models.
Both predate common use of LLMs, unless my memory is even more shaky than usual on this, but must have been over-represented in the training data (or something in the tokenising/training/other processes magnifies the effective presence of punctuation) because LLMs seem to love spewing them out.
These companies have enough market power that they can afford to be ineffective. So they were. And they are ineffective in novel way.
No, instead of google they just look it up on alldata.
I think we're seeing a ton of that right now, and it's not slowing down any time soon it seems.
"We just need a swarm of many agents, all independently operating open-loop, creating and resolving tickets continuously. We will surely ship to production soon after implementing that!"
This, I think, is the LLM/vibe coded app’s current place to shine.
Most internal systems don’t need massive concurrency or redundancy. It’s a webapp that reduces coordination cost between 20ish people. That’s something you can typically vibe code and deploy for ten bucks a month, and create real value.
This made me think back to the people I've seen rise through the ranks: the women started off dressing very conservative and as they got to senior exec positions, started wearing very bright and powerful outfits. The men on the other hand started with bright t-shirts/polos etc, but then ended up in more conservative suits.
Never noticed that before
You're saying this as if it's some rebuttal ad absurdum, when it's absolutely the case: when the higher layers don't understand what they do, we have a problem with that too, and that's been true since forever. Remember Dilbert and Office Space, and making fun of the ignorant middle managers and execs?
In this case, what we're complaining about is coders not understanding the code they ship (because some AI wrote it and they don't bother to review it or guide the AI fully).
You can get pretty good results with even smaller models. Cant prompt and pray with them as much though. So I get it.
Deepseek is like pennies. I might sign up with them one day
Lots of emoji use on Slack, and then it’d show up in requirements docs, shipped emails, etc.
I don’t know where it came from, but that’s where I was exposed to it first.
I guess my hope with the face to face is that people take the feedback and learn to do the actual work again. Right now it feels like a lot of this kind of collaboration and what is okay and what no has to be figured out.
Made me smile. Perhaps the new term for making a human hand-written reply is that I didnt use AI … “I YOLOed it”.
it's literally their job to ship functional product features...
I've noticed Claude does far fewer listicles than ChatGPT. I suspect that they don't blindly follow supervised learning feedback from chats as much as ChatGPT. I get Apple vs Google design approach from those two companies, in that Apple tends not to obsess over interaction data, instead using design principles, while Google just tests everything and has very little "taste."
In general I feel like the data approach really blinds people to the obvious problem that "a little" of something can be preferable while "a lot" of the same is not. I don't mind some bullet points here and there but when literally everything is in bullet points or pull quotes it's very annoying. I prefer Claude's paragraph style.
I suppose the downside is that using "taste" like Apple does can potentially lead a product design far, far away from what people want (macOS 26), more so than a data approach, whereas a data approach will not get it so drastically wrong but will never feel great.
All of the PMs I interacted with across companies started using Notion for everything at the same time. Filling Notion documents with emojis was the style of the time.
This slightly pre-dated AI tools becoming entirely usable for me.
I like them even more on code comments. It tells _precisely_ how much effort went into the pull request, so I don't spend time reviewing lazy work.
The economic reality check is going to be devastating. It won’t be a crash of AI as a tech, it will be a crash of every ‘AI native’ company that does not even know what is their product any more.
It's the mechanics that don't reference Google or the Haynes manual that are more likely to get it incorrect.
As a kicker, mechanics also have a pricing book for the task, they know how many hours a task will take on a certain car (rounded up for the most part).
When I get my car fixed, I could not care less if they googled, used a service manual, or did it by "these old 2023's always had this problem right here...". I care if it is fixed.
And as I'm currently trying to fix something on my own, for financial reasons, I assure you a mechanic with training AND google can do a better job in 1/4th the time. Because I don't have the training.
Nor do the worst people using LLMs.
I think the use cases where AI makes an economic improvement to the status quo for a business are rare, but they do exist, and they can be a significant improvement.
It's like the early days of the dotcom boom and bust - people thought the internet was good for every use case under the sun, including shipping people a single candy bar at a loss. After the dotcom bust, a lot of that went by the wayside, but there was a tremendous economic advantage to the businesses that were more useful when available on the internet.
I've often had the sense that most of what is done inside companies is a kind of performance of work rather than work itself. Mostly all a big status game between various different factions. All actual value provided by just a few engineers here and there who are able to shut out the noise and build things.
The best analogy is the outsourcing / offshoring fad of the last decade.
Managers hated that senior developers were getting highly compensated (often higher than the management class!) and pounced on every opportunity to replace expensive people with (much!) cheaper options, quality be damned.
For the few companies that paid attention to the quality, this worked out swimmingly. Apple is probably the best example, they've outsourced almost all of their manufacturing to China and other similar countries.
So yes, my mental picture is that every manager is drooling right now because they think they can replace someone getting paid six figures with an AI that costs six dollars a day, if that. A virtual employee that doesn't talk back, doesn't argue, doesn't question, doesn't go off on "unproductive tangents" like refactoring (whatever that's even supposed to mean), and just pumps out code 24/7 like a good little slav... employee.
The very rare smart managers out there are looking at this more like the transition that happened to architect firms when CAD became available. They used to have a dozen draftsmen for every architect. Now there are virtually none, I haven't even heard that job title being used in decades! We still have architects, and if anything, they're paid even more.
There was a lot of duplicate and triplicate methods. A lot of the classes were is-a related without inheritance, not the biggest deal but it was becoming a mess.
Code I used to know well was more or less gone. It was rewritten in a way that wasn't the same approach and had lost lessons learned. Some of it had real battle wounds baked into it. Things qa passed the week before were broken in places no one thought they touched. A good deal of tests were useless or didn't mean anything for production.
Code review is more or less impossible for me. I can read maybe a 1k line change. 20-30k changes all the time? You end up saying "sure buddy lgtm". We had someone put a 200kloc change for a new feature using a 3rd party tool no one had used before. No clue, but it was not my business apparently because we needed to be more individuals now that we were using AI
It's their money. They decided to do this. They think you guys are stupid.
Suck. Them. Dry.
Or say goodbay, which is what I did on my previous role when the BS started to get obvious.
Now I do LLM-assisted coding on my own terms. I decide what to do, review output and push back agains overengineered BS.
But I'm a lucky one, as far as I can see.
---
NO-ONE is going to be able to understand the the amount of slop created by unchecked LLMs.
The path we're going forward is very clear, given how rapidly top-tier software has been degrading when they decided to pressure devs into this stupidity.
What I am describing here is FAANG (two of them) and every startup (two YC) or enterprise (a big Fintech) that copied it.
If you happen to "like it", perhaps it's time to think about accepting how other people don't.
I even prefaced it with "for me".
Also, being tall. Easiest way to identify management is height.
I have never heard this said before. I wonder how true it is in general
Bespoke designs are often really terrible. Have you ever shopped for a house?
You know immediately when the previous owner had their stupid whims indulged by contractors with dollar-signs in their eyes. The house is ugly, non-functional and is not going to get the sellers price.
The next owner will undo nearly all of the work, and the contractor will cash in on both ends.
As engineers, we like to think we're the contractor in this scenario. But it's actually just an LLM.
Who cares about features or functional - of whether they even know what functional means in that case?
That's how it looks more and more...
Turns out you can get away with a lot when you have a quasi-monopoly on an addictive product, and you buy out your realistic competitors...
This is clearly not what the post was referring to, which is instead like googling how to fix a pipe in your home when you've never done any plumbing before in your life. Can it work out? Sure, depends on the issue, can you cause your pipes to freeze, your house to flood, or sediment build up to completely block a pipe? Yes.
Granted, the trades is a bad example because it's chock full of fakers too.
In a good culture, with high competence and trust this can yield increased output (to some degree at least) and in a bad culture it will accelerate and expedite the dominating traits instead.
Let’s keep the short caustic comments to ourselves people. The world is crazy enough without making other peoples days worse with drivel!
Nobody wants to consume slop.
> move slow and fix things
I'm not opposed to "move fast and break things" but our problem is that's the only lever we pull. For every "... and break things" there needs to be a phase of "clean up, everybody do your share". It seems the modern development framework is allergic to cleaning up. There's so many excuses given but if you don't clean up you can't move fast.In physical reverse engineering there's a common pattern people use: buy 3. One to break, one to modify, one to reference. You need the one to break because you're going in blind. The problem has a lot of unknown unknowns. It's often difficult to take things apart (especially these days) without breaking them. But the second time it is much easier to do nondestructively.
But I'm also a big fan of taking time to think and understand. To gain deep understanding of things. I've always found this to be helpful and allowed me to move faster in the long run but I often face resistance to this because everyone wants me to "move fast".
The problem is I think people have the illusion that you can run a marathon by doing consecutive 100m dashes. It sounds nice in theory but I think there's no surprise that burnout is at an all time high and things are getting sloppy.
It's weird, we've systematically created a work structure that has the same principles as scams: frame everything as an emergency so the mark doesn't have time to think. Why the fuck are we scamming ourselves?
Many of us work on teams so we already have to deal with the majority of code being written by someone else. I’ve got code that’s more than a decade old, written by people I’ve never met.
I also much prefer the output of Claude at present.
> Claude does not use emojis unless the person in the conversation asks it to
I propose that what you enjoy is having a token of the appearance of effort, easily constructed and easily observed and easily suitable for low-effort handling of these proxy objects for actual work.
Also, for sale: BMW E60/61 Bentley 2-volume set. Barely used.
> update 42 if statements in 32 different files
is a silly behavior for a programmer or an AI to have to do more than twice. We have tools that very effectively remove the need for things like that: programming languages that allow modular and reusable code, good design, etc.
That’s exactly the reason LLMs and friends are so dangerous to companies, and it’s so hard for them to resist using them in useless/counter-productive ways. They’re excellent at faking signs of effort and work that companies can hardly help but reward, absent any actual way to measure manager effectiveness (and approximately nobody knows how to measure that, in the wild). This takes the form of gilding and padding on a lot of communication, none of which adds actual value but it does cost money directly and indirectly (time wasted sorting out which parts of a document are intentional and meaningful, and which are plausible but irrelevant LLM inventions, for instance)
The number of times I’ve seen a HTML memo sent from the assistant of the executive that says “from the desk of…” with babble about new leadership.
What are you doing where 200kloc is even remotely acceptable? That’s like half a percent of linux.
One of the most actionable low-hanging career advices I could give is be among the first ones to pack up and leave for the day. You can always continue working at home if you're not done.
I do want to point out that this is used to suppress mechanic salary. Certain jobs are absolutely fucked how its time calibrated. Doesn't matter to business owner they can charge $$$ how they want.
Tech in America can't implode soon enough.
Indeed. I've spent my professional career seeking out positions at companies of increasing prestige and technical renown, each with a higher reputation for professionalism and performance than the last. And yet this invariant has held in every position.
As far as I can tell, the only difference between each company has been the quality of the manager I was supposed to please, which I have noticed (perhaps predictably) is not strongly correlated with the company's reputation or success.
Some people have put me on their blacklists after these interactions, sure, but they're the exact people I don't want to work with again. The important thing here is that I've never done someone else's work for free.
Cursor doesn't have refactorings, so
Don't ask me. It wasnt 200k it was like 170 something. I can't say too much but it was some big weird ETL pipeline using some weird database. Tons of weird algorithms for displaying data, by storing it all in memory? I don't know man I wasn't allowed to talk to whoever had swarms of agents create it. From what I understand of it it was a complete hazard
Linux kernel has I think tens of millions of lines of code for reference.
And a couple years ago I did a short consulting stint for an AI startup (I know how to pick the bubbles huh?) where I shipped something at around 6pm my time, got a call at 9pm their time to talk about it, and then he asked me "what are you working on tonight?" I quit the next day.
Anyway, this advice confuses me because many companies see staying late as a badge of commitment. Maybe it doesn't apply to startups.
Instead he didn’t read it at all, and just threw the whole thing at Claude Code as a big prompt. The result was… interesting!
They’re saying that the emoji usage is telling them that very little effort was put into the PR and that they’ll treat it accordingly.
The laziness is offloading work down the line.
Even if something does look copypasted, it might actually be semantically distinct enough that if you couple them, you'll create a brittle mess.
Additionally, there's always going to be global changes (update the code style, document things, refactor into a new pattern, add new functionality to callers, etc.). The question isn't whether you use your lanuage's tools or you do it by hand, the question is whether you use an LLM or do it by hand :P
An example is that instead of buying a cookie-cutter "MacMansion" like in the last century even individuals can afford a unique house designed by a professional architect. It may not be an award winning artistic design, but it won't be the same copy-paste design as every neighbour up and down the street.
I'm seeing more comments online that developers are now expected to do more in the sense that what used to be a CLI script may now be a semi-vibe-coded application with a Web UI, a dashboard, and Open Telemetry integration because... why not?
As an example, I got a bunch of boxes of random Lego for my kid and I wanted to figure out what sets the pieces came from. I got Codex to vibe-code a full SPA web UI and a matching API app that pulls Rebrickable database CSVs, parses them, puts them into SQLite, and then runs a fairly complex integer optimisation solution on top of that collected data to figure out the best match. I did that in an hour while sitting in on an online meeting!
There is no way I'd have the mental energy to do a project like that otherwise. I'm too busy with housework, actual work, etc... Maybe when I was younger I could blow a few weeks of effort on something like this, but now? No way.
That cost-benefit arithmetic has dramatically shifted thanks to AI developer agents. Suddenly, many fiddly tasks are no longer fiddly, or even trivial, so there's no excuse not to do them any more.
Going back to the architect or mechanical engineering example: Significant corrections to designs used to be expensive because all the blueprints (on paper!) had to be redrawn and distributed. Now, a change to CAD design in 3D can be converted to arbitrary 2D views, cross-sections, or whatever in seconds. The software just projects whatever view you want out of the master design file. Creating the paper blueprints similarly takes a minute or two at most on an industrial large-format printer. It just spits it out.
I usually differentiate between real managers who exist to make decisions, versus those who manage people. The latter are “overseers” not managers.
My apologies!, sincerely.
(If only the message I was responding to had had emojis and checkmarks for me to efficiently process it!!!!)
They put up a PR with all the obvious tells, the markdown table of files that changed, the description that basically parrots back things the human obviously wanted them to stress in the task (“this implements a secure, tested (no regressions) implementation of a Foo…”), and the code is an absolute mess of one-off functions placed in any random file with no thought to the way the codebase is actually organized.
Then I give feedback after spending like an hour going through their 2000 line change, and then here comes back an update with a very literal interpretation of my feedback that clearly doesn’t really understand what I was even saying. Complete with code comments that parrot back what I said (“// Use the expected platform abstractions for conversion (not bespoke methods”).
Reviewing coworkers PR’s feels like I’m just talking to the LLM directly at this point, but with more steps and I have less control over the output.
And they say meetings aren’t productive!
I'm not some DRY zealot, but I've been in the "this system needs really similar changes to a ton of geographically distant code for simple changes" salt mines a lot. The people who say that kind of spaghetti is unavoidable are just as wrong as the ones who say it can only be fixed with a grand rearchitecture by a rockstar.
And yet, here we are.
Of course, some things get disrupted, sometimes. But I'd hardly say all the bloat has been competed out, would you?
My current codebase is ~3 million LoC all in all (not greenfield, really old code), working on it by myself, the complexity is definitely manageable between Claude and me :)
If you spend a lot of time performing monotonic tasks, then your organisation needs to delete and refactor for a while until change in 'hot' areas of the code base are easy to make. Reaching for some code synthesis SaaS to paper it over will worsen the problem and should result in excommunication from the guild.