And on Jess' comments on validating docs vs generating them... It's a traditional locking problem, with traditional solutions. And it's not as if the agent cannot read git, and realize when one thing is done first, in anticipation of the other by convention.
I'm quite senior: In fact, I have been a teammate of a couple of people mention in this article. I suspect that they'd not question my engineering standards. And yet I've no seen any of that kind of debt in my LLM workflows: if anything, by most traditional forms of evaluating software quality, the projects I work on are better than what they were 5, 10 years ago, using the same metrics as back then. And it's not magic or anything, but making sure there are agents running sharing those quality priorities. But I am getting work done, instead of spending time looking for attention in conferences.
My counter is that technical intent, in the way he is describing it, only exists because we needed to translate human intent into machine language. You can still think deeply about problems without needed to formulate them as domain driven abstractions in code. You could mind map it, or journal about it, or put post-it notes all over the wall. Creating object oriented abstractions isn't magic.
A concrete example, I had a set of python models that defined a database schema for a given set of logical concepts.
I added a new logical concept to the system, very analogous to the existing logical set. Claude decided that it should just re-use the existing model set, which worked in theory, but caused the consumers to have to do all sorts of gymnastics to do type inference at runtime. It "worked", but it was definitely the wrong layer of abstraction.
This lines up with YAGNI, but most people believe the opposite, often using YAGNI to justify NOT building the necessary abstractions.
If you are thinking through deterministic code, you are thinking through the manipulation of bits in hardware. You are just doing it in a language which is easier for humans to understand.
There is a direct mapping of intent.
I agree! You often see this realized when projects slowly migrate to using more and more ctypes code to try and back out of that pit.
In a previous job, a project was spun up using Python because it was easier and the performance requirements weren't understood at that time. A year or two later it had become a bottleneck for tapeout, and when it was rewritten most of the abstract architecture was thrown out with it, since it was all Pythonic in a way that required a different approach in C++
I don't think what Fowler says here is in favor of saddling the early versions of your system with abstractions before you actually seen its use in practice, and its needs over time as requirements and conditions change.
From this "Laziness drives us to make the system as simple as possible (but no simpler!) — to develop the powerful abstractions that then allow us to do much more, much more easily." it's clear that when he talks of abstractions he means of very basic, and as simple as possible, building blocks. Like having core, orthogonal, principles in the system.
Not the kind of piling of software and pattern design abstractions e.g. the Java land in the past used to build.
> if anything, by most traditional forms of evaluating software quality, the projects I work on are better than what they were 5, 10 years ago, using the same metrics as back then.
In this side sentence you're introducing so much vagueness. Can you share insights to get some validation on your claim? What metrics are you using and how is your code from 10, 5, 0 years performing?
I feel throwing in a vague claim like that unnecessarily dilutes your message and distracts from the point. But, if you do have more to share I'd be curious to learn more.
It’s similar to how doing math in natural language without math notation is cumbersome and error-prone.
Reading the Hacker News comments, I kept thinking that programming is fundamentally about building mental models, and that the market, in the end, buys my mental model.
If we start from human intent, the chain might look something like this:
human intent -> problem model -> abstraction -> language expression -> compilation -> change in hadrware
But abstraction and language expression are themselves subdivided into many layers. How much of those layers a programmer can afford not to know has a direct effect on that programmer’s position in the market. People often think of abstraction as something clean, but in reality it is incomplete and contextual. In theory it is always clean; in practice it is always breaking down.
Depending on which layer you live in, even when using the same programming language, the form of expression can become radically different. From that point of view, people casually bundle everything together and call it “abstraction” or “intent,” but in reality there is a gap between intent and abstraction, and another gap between abstraction and language expression. Those subtle friction points are not fully reducible.
Seen from that perspective, even if you write a very clear specification, there will always be something that does not reduce neatly. And perhaps the real difference between LLMs and humans lies in how they deal with that residue.
Martin frames the issue in a way that suggests LLM abstractions are bad, but I do not fully agree. As someone from a third-world country in Asia, I have seen a great deal of bad abstraction written in my own language and environment. In that sense, I often feel that LLM-generated code is actually much better than the average abstractions produced by my Asian peers. At the same time, when I look at really good programming from strong Western engineers, I find myself asking again what a good abstraction actually is.
The essay talks about TDD and other methodologies, but personally I think TDD can become one of the worst methodologies when the abstraction itself is broken. If the abstraction is wrong, do the tests really mean anything? I have seen plenty of cases where people kept chasing green tests while gradually destroying the architecture. I have seen this especially in systems involving databases.
The biggest problem with methodology is that it always tends to become dogma, as if it were something that must be obeyed. SOLID principles, for example, do not always need to be followed, but in some organizations they become almost religious doctrine. In UI component design, enforcing LSP too rigidly can actually damage the diversity and flexibility of the UI. In the end, perhaps what we call intent is really the ability to remain flexible in context and search for the best possible solution within that context.
From that angle, intent begins to look a lot like the reward-function-based learning of LLMs.
In the olden days we used Duff's devices and manually unrolled loops with duplicated code that we wrote ourselves.
Now, the compiler is "smart" enough to understand your intent and actually generates repeated assembly code that is duplicated. You don't care that it's duplicated because the compiler is doing it for you.
I've had some projects recently where I was using an LLM where I needed a few snippets of non-trivial computational geometry. In the old days, I'd have to go search for a library and get permission from compliance to import the library and then I'd have to convert my domain representations of stuff into the formats that library needed. All of that would have been cheaper than me writing the code myself, but it was non-trivial.
Now the LLM can write for me only the stuff I need (no extra big library to import) and it will use the data in the format I stored it in (no needing to translate data structures). The canon says the "right" way to do it would be to have a geometry library to prevent repeated code, but here I have a self contained function that "just works".
I'm a proponent of architectural styles like MVC, SOLID, hexagonal architecture, etc, and in pre-LLM workflows, "human laziness" often led to technical debt: a developer might lazily leak domain logic into a controller or skip writing an interface just to save time.
The code I get the LLM to emit is a lot more compliant with those BUT there is a caveat that the LLMs do have a habit of "forgetting" the specific concerns of the given file/package/etc, and I frequently have to remind it.
The "metric" improvement isn't that the LLM is a better architect than a senior dev; it's that it reduces the cost of doing things the right way. The delta between "quick and dirty" and "cleanly architected" has shrunk to near zero, so the "clean" version becomes the path of least resistance.
I'm seeing less "temporary" kludges because the LLM almost blindly follows my requests
Using a formal language also help to enter in a kind of flow. And then details you did not think about before using the formal language may appear. Everything cannot be prompted, just like Alex Honnold prepared his climbing of El Capitan very carefully but it's only when he was on the rock that he took the real decisions. Same for Lindbergh when he crossed the Atlantic. The map is not the territory.
That being said, even model-based design (MBD) has largely been a failure, despite it being about mapping formal models to (formal-language) program code.
If you invent a formal language that is easy to read and easy to write, it may look like Python... Then someone will probably write an interpreter.
We have many languages, senior people who know how to use them, who enjoy coding and who don't have a "lack of productivity" problem. I don't feel the need to throw away everything we have to embrace what is supposed to be "the future". And since we need good devs to read and LLM generated code how to remain a good dev if we don't write code anymore ? What's the point of being up to date in language x if we don't write code ? Remaining good at something without doing it is a mystery to me.
It works relatively well but not always.
I realize that most researchers use AI to assist with writing, but when the topic of your paper is "cognitive surrender", I struggle to take any content in there seriously.
I think the "cognitive bottlenecks" in software engineering live between artifacts, where code is simply one of them.
outcome → requirements → spec → acceptance criteria → executable proof → review
I'm making experimental tooling that automates the boring parts around those transitions, while keeping humans focused on validating that intent survived each step.
I assure you, they do not.
This is disgusting
My most recent Claude Code fix consisted of one line: calling `third_party_lib._connect()`. It reaches into the internals of an external library. The fix worked, but it is improper to depend on the specific implementation. The correct fix was about 20 lines.
(Tangentially, this is why I think LLMs are more useful for senior developers because junior developers tend to not have a sense for what's good quality and accept whatever works.)
Here's image you can open on the phone https://pbs.twimg.com/media/HGjHvSsWIAAkhHL?format=jpg&name=...
I also did a post explaining reasoning behind this diagram: https://x.com/br11k_dev/status/2047105958451507268
But I'll make a proper post on HN once I have all ingredients ready!
- Minimal CLI tooling
- Jupyter Lab you can go through step by step, on example greenfield project (URL shortener app)
- Blog post on what I've been doing for last 2 months
I'm kidding. But yes, I explicitly didn't model it yet. The bigger vision is there's a reason for Spec to exist, right?
And that would be Outcome.
> "We observed that users share 100+ characters long links too often and they are frustrated when it doesn't work / crop / browser address bar limitations"
So the outcome is: "Users no longer have to worry about long URLs". And then you have idea, a spec: "what if we let them create and use short URLs for sharing?" -> URL shortener app.
And yes, this ERD is easily expandable. I'd rather not add more fields but keep the "core" schema short and nice.
Things like outcome, observations, analytics, they can be simply extra tables linking to Spec, ACs, etc. Jira tickets, Datadog dashboards, Tableau analytics, whatever makes sense to teams. And it doesn't require you to setup a postgres instance. MVP would run on sqlite3.
I also seen a lot of effort trying to link different systems together specifically for simpler context access for agents. "RAG enterprise intelligent search" it is.
What's concerning to me is that even Sourcegraph haven't thought about what I'm thinking since 2015: linking specs to code directly, via SCIP. I should be able to press a button "find specs", in addition to "find references" and "find implementations". And I strongly believe they are sitting on a gold mine right now.
From my experience, it all comes down to code, and so code was the first-class artifact for a long time. Up until I realized that code is only a lossy representation of the spec artifacts. And if nobody ever records spec as an artifact...
What I'm saying is that the pain is real, I've been here for a long enough time. And I should be able to at least use something like this even if the industry doesn't want to.
As we see LLMs churn out scads of code, folks have increasingly turned to Cognitive Debt as a metaphor for capturing how a team can lose understanding of what a system does. Margaret-Anne Storey thinks a good way of thinking about these problems is to consider three layers of system health:
- Technical debt lives in code. It accumulates when implementation decisions compromise future changeability. It limits how systems can change.
- Cognitive debt lives in people. It accumulates when shared understanding of the system erodes faster than it is replenished. It limits how teams can reason about change.
- Intent debt lives in artifacts. It accumulates when the goals and constraints that should guide the system are poorly captured or maintained. It limits whether the system continues to reflect what we meant to build and it limits how humans and AI agents can continue to evolve the system effectively.
While I’m getting a bit bemused by debt metaphor proliferation, this way of thinking does make a fair bit of sense. The article includes useful sections to diagnose and mitigate each kind of debt. The three interact with each other, and the article outlines some general activities teams should do to keep it all under control
❄ ❄
In the article she references a recent paper by Shaw and Nave at the Wharton School that adds LLMs to Kahneman’s two-system model of thinking.
Kahneman’s book, “Thinking Fast and Slow”, is one of my favorite books. Its central idea is that humans have two systems of cognition. System 1 (intuition) makes rapid decisions, often barely-consciously. System 2 (deliberation) is when we apply deliberate thinking to a problem. He observed that to save energy we default to intuition, and that sometimes gets us into trouble when we overlook things that we would have spotted had we applied deliberation to the problem.
Shaw and Nave consider AI as System 3
A consequence of System 3 is the introduction of cognitive surrender, characterized by uncritical reliance on externally generated artificial reasoning, bypassing System 2. Crucially, we distinguish cognitive surrender, marked by passive trust and uncritical evaluation of external information, from cognitive offloading, which involves strategic delegation of cognition during deliberation.
It’s a long paper, that goes into detail on this “Tri-System theory of cognition” and reports on several experiments they’ve done to test how well this theory can predict behavior (at least within a lab).
❄ ❄ ❄ ❄ ❄
I’ve seen a few illustrations recently that use the symbols “< >” as part of an icon to illustrate code. That strikes me as rather odd, I can’t think of any programming language that uses “< >” to surround program elements. Why that and not, say, “{ }”?
Obviously the reason is that they are thinking of HTML (or maybe XML), which is even more obvious when they use “</>” in their icons. But programmers don’t program in HTML.
❄ ❄ ❄ ❄ ❄
Ajey Gore thinks about if coding agents make coding free, what becomes the expensive thing? His answer is verification.
What does “correct” mean for an ETA algorithm in Jakarta traffic versus Ho Chi Minh City? What does a “successful” driver allocation look like when you’re balancing earnings fairness, customer wait time, and fleet utilisation simultaneously? When hundreds of engineers are shipping into ~900 microservices around the clock, “correct” isn’t one definition — it’s thousands of definitions, all shifting, all context-dependent. These aren’t edge cases. They’re the entire job.
And they’re precisely the kind of judgment that agents cannot perform for you.
Increasingly I’m seeing a view that agents do really well when they have good, preferably automated, verification for their work. This encourages such things as Test Driven Development. That’s still a lot of verification to do, which suggests we should see more effort to find ways to make it easier for humans to comprehend larger ranges of tests.
While I agree with most of what Ajey writes here, I do have a quibble with his view of legacy migration. He thinks it’s a delusion that “agentic coding will finally crack legacy modernisation”. I agree with him that agentic coding is overrated in a legacy context, but I have seen compelling evidence that LLMs help a great deal in understanding what legacy code is doing.
The big consequence of Ajey’s assessment is that we’ll need to reorganize around verification rather than writing code:
If agents handle execution, the human job becomes designing verification systems, defining quality, and handling the ambiguous cases agents can’t resolve. Your org chart should reflect this. Practically, this means your Monday morning standup changes. Instead of “what did we ship?” the question becomes “what did we validate?” Instead of tracking output, you’re tracking whether the output was right. The team that used to have ten engineers building features now has three engineers and seven people defining acceptance criteria, designing test harnesses, and monitoring outcomes. That’s the reorganisation. It’s uncomfortable because it demotes the act of building and promotes the act of judging. Most engineering cultures resist this. The ones that don’t will win.
❄ ❄ ❄ ❄ ❄
One the questions comes up when we think of LLMs-as-programmers is whether there is a future for source code. David Cassel on The New Stack has an article summarizing several views of the future of code. Some folks are experimenting with entirely new languages built with the LLM in mind, others think that existing languages, especially strictly typed languages like TypeScript and Rust will be the best fit for LLMs. It’s an overview article, one that has lots of quotations, but not much analysis in itself - but it’s worth a read as a good overview of the discussion.
I’m interested to see how all this will play out. I do think there’s still a role for humans to work with LLMs to build useful abstractions in which to talk about what the code does - essentially the DDD notion of Ubiquitous Language. Last year Unmesh and I talked about growing a language with LLMs. As Unmesh put it
Programming isn’t just typing coding syntax that computers can understand and execute; it’s shaping a solution. We slice the problem into focused pieces, bind related data and behaviour together, and—crucially—choose names that expose intent. Good names cut through complexity and turn code into a schematic everyone can follow. The most creative act is this continual weaving of names that reveal the structure of the solution that maps clearly to the problem we are trying to solve.