We are absolutely drowning in documentation and code that seems legit and the only recourse is to lean on AI to help process the sheer quantity of it. I have a feeling that the fallout from this phase of the industry is going to be an exotic form of technical debt that is remarkable mostly in its enormity.
When starting on a new codebase, how do you make yourself into a helpful contributor as quickly as possible? I go straight for the humans and their human docs. What problem was the system originally built to solve? What was the original design, and what were its biggest problems? Who is currently using it? If you know these, reading the code is much easier because you can guess why things were done the way they are.
Also, this blog post has gotten popular: https://blog.gpkb.org/posts/just-send-me-the-prompt/
I think Charity is observing a very old problem and expecting the new technology to lead to a new solution of some kind. I doubt she thinks even the current generation of tools are the end of the AI software development story. She's not saying we'll drop design docs right into Claude code and walk away (design docs aren't complete either, that's why when you're ramping up you also have to talk to people, read old tickets and postmortems, etc.)
What she's observing is that, in prod, people don't like infra where it's hard to tell how it got into is current state, and so infra-as-code is what we do now. She's also observing that, "it's hard to tell how it got into its current state" is the status quo with codebases, which other people have observed going back to "Programming as Theory Building" and earlier. And she's expecting that, analogous to infra, software development will somehow be done with tools focused on making "how the code got into its current state" clearer.
> Code becomes precious when it is the only place knowledge lives.
Reading AI code all day is _agonizing_. Just, a horrible way to live, and it melts people's brains at the moment you need them to be the most capable.
Manual programming has this really productive and gratifying feedback loop, where you read the code, write the code, and fix it until it compiles/runs/does what you want. AI code not only does half that for you, but it makes the "click" at the end uninspiring because you're never sure if it's cheated a bit to get to that moment.
Trying to operate with AI-generated code as the only durable artifact of programming is a dead end for the industry. Charity points to (and correct discards) architecture diagrams/specs as an interesting space to work in. My suspicion is that it's closer to the thing that's hand-written: prompts, markdown plans, and other nudges. Focus on the thing that you, as a human, produce, and that's the basis for both the core loop of "did the AI follow my instructions" and it's higher-leverage when you go to code review.
By the time you get to the PR, you've probably typed enough to Claude that you can regenerate the code, but the current industry default is to just throw away all those sessions and ship the code. That's backwards!
> A sufficiently detailed specification is runnable code.
In a way I think LLMs will enable the dream of 4gl and "sufficiently smart compilers"[c].
LLMs aren't smart, but they are capable. Especially capable of translation and transformation.
I can certainly see them help move the abstraction horizon at which we work - so that rigid high level descriptions of the desired logic/process along with the process for quality testing - become the relevant curated artifacts - and the generated go/rust/java/python/etc code become incidental and mutable; subject to constant rewriting as part of the deployment of systems.
[c] You know, the ones that take naive C/C++ and produce executables that fully leverage RISC/EPIC platforms to be better than CISC. See also: Intel Itanium
Now that AI coding speed and performance outperformed most of human. But AI still need human to be commanded. Yes, you can let AI agent manage sub-agents but still, human is at the top of manager who order AI what should be written.
So human must command and final say on when it's done.
Is laziness still a good virtue in AI era?
In general most developers are going to find themselves fighting incentives which will color their opinion. AI isn't there yet but if you are going to abase your whole world view on a point on a graph and not on the trajectory you are in for a bad time.
That question was answered decisively last November."
It's easy to forget that people said this exact thing about every model after GPT 3.5. This is a standard trick the industry uses to invalidate negative experience with LLMs. 'You are prompting it wrong' becomes 'you are using Gemini, but you should use Clade' which then becomes 'well, all of your criticism is now irrelevant, because everything is fixed in this new version'.
This "discussion" about capabilities is set up to be asymmetrical and basically non-falsifiable.
Writing software begins with a solid design that is defensible. If you don't have that, the AI will produce slop.
Once you're happy with the design, you need a solid plan. If you don't have that, the AI will produce slop.
Once you're happy with the plan, you can set the AI loose, but don't get too complacent! Anything that you missed in the previous phases could very well lead to slop (although likely localized).
And then then, as your project matures and you gain more understanding of the space, you start to notice deficiencies in your model. This is where AI really shines: design and code changes to adapt to reality.
LLMs are prolific and they love to add shit. Truly capable engineers are able to achieve more business outcomes with less code / fewer moving parts.
I've yet to have Opus 4.8 fail me with defensive explict code. Often it'll write code that is better than what I might have done. I imagine it would be a nightmare to go through one of the OOP debug chains with implict error handling, but when every function has a runtime assertion which is basically the contract for how it is supposed to work and exactly what to do if it encounters a corrupt state, then things are just so much easier with AI.
I do agree with you on documentation. The amount we have has exploded in the post AI world. Which is a little ironic since the assertion is frankly what you'll need to know and not the 10 pages of prose the AI autogenerated in the shared loop (microsoft's terrible confluence). It is what it is though, and at least it's easier to meet EU compliance rules now, since those are more about the bureaucracy than actual security.
Chalk up yet another echo of the 1920s Gilded Age? Between all these economic spasms and the simultaneous tilting towards fascism, I think there is way too much historical rhyming going on right now...
> When starting on a new codebase, how do you make yourself into a helpful contributor as quickly as possible? I go straight for the humans and their human docs. What problem was the system originally built to solve? What was the original design, and what were its biggest problems? Who is currently using it? If you know these, reading the code is much easier because you can guess why things were done the way they are.
This is the way but plenty of engineering teams don't have any human docs at all. Decisions are made in one engineer's head or in a chat that isn't saved. The spec was just a few notes in a ticket that was deleted during cleanup or lost when the team changed trackers. There's no map of the codebase or features, no ADRs, minimal observability. All you have is the code. You read the code to try and figure out what is going on then ping an engineer who made a recent commit to a specific area to ask if they remember why something was done the way it was. Someone makes a change and it breaks something on the other side of the codebase that they thought was totally unrelated, etc.
Was this article written by AI? It's certainly stupid enough!
- Schema validation with appropriate size limits on all relevant fields.
- Authentication.
- Access control.
- Backpressure management and rate limiting in case a (possibly malicious) user tries to perform too many computationally expensive actions in a short time.
- Ensuring that the actions of one user doesn't throttle another user which is connected to the same process/host, e.g. using async constructs to avoid freezing the main process.
- DDoS mitigation.
- Avoiding race conditions.
- Designing a good database schema, with well chosen indexes, with deterministic IDs/idempotency to avoid double-insertion scenarios. You don't want to be forced to rely on overly complex queries with a lot of joins. This doesn't scale well and rarely necessary.
- Logging and error handling.
- Avoiding conflicts and accidental overwrite with old data when multiple users are editing different fields of the same resource concurrently.
- Efficient distribution of realtime messages.
- Scalability.
The list goes on and on... And every piece has to be implemented perfectly. This involves a huge number of carefully thought-out decisions.
But it's also the exact sort of thing that LLMs are literally perfect for in my experience so there's really no excuse anymore. I've never seen Claude fail to turn a 5k PR into a well-decomposed Graphite stack.
> perfect formatting and at least superficial plausibility
Basically, a library full of books that have nice covers is going to take time to see that all those books are just filled with ipsum lorem. Before, they coudln't stand up a fake library.
The issue comes down to time and effort.
First product compares the code to the prompts and highlights places the agent made decisions you weren't involved in: https://tern.sh/docs/tours/
If you buy that, then it follows that the more work you accomplish with AI, the "lazier" of a dev you are.
The author makes the wrong assumption though that the majority of people who are doing engineering want to do even more engineering.
It’s my experience that most technology workers just want a high paycheck and have some kind of association with being in tech and doing cool things
1. What a C compiler was
2. What a C compiler looked like
3. What the C compiler had to do at runtime to pass gcc’s torture suite through some sort of collaborative iteration (compile, run, did it get stuck at some torture suite test or fail?)
Remove 1 and 2, or replace it with imperfect business logic, and you’re left with a system that is built to _only_ pass the tests you supply it, or in the most extreme case, print(“unit and functional tests pass!”)
What happened in 2025 was this: the economics of code production were turned upside down. Instead of being very hard, time-consuming, and expensive to generate code, it became effectively free and instant. Lines of code went from being treasured, reused, cared for and carefully curated, to being disposable and regenerable, practically overnight.
It's not so much as "the economics [...] were turned upside down", but that a manufacturing process that used to be strictly additive (akin to 3D printing) is now complemented by a subtractive process (akin to CNC milling). The "shape" that is demanded hasn't really changed, and nor has the human effort (as long as you care about achieving certain tolerances). You still have to "treasure, reuse, care for, and curate" your product to whatever degree the market demands.Also I disagree with:
Lines of code are not the ideal artifact to review
What does "ideal" mean here? When I was growing up "show your work" was the rule for all examinations. Why? Because we're working to improve mental models and thought processes for the next generation, not just products we will release tomorrow.yeh, I can see how that is now mistaken for a definition of 'engineer' or 'hacker'.
I am sorry you never knew what engineering truly means.
I really don't know how I'm supposed to reply to stuff like this.
> What does "ideal" mean here? When I was growing up "show your work" was the rule for all examinations. Why? Because we're working to improve mental models and thought processes for the next generation, not just products we will release tomorrow.
They're saying that the mental models and thought processes are incredibly important but that code is not the place for that work to live.
What I meant is that, insofar as some work has been produced with a human mind involved and where imperfect abstractions are used, one should not for whatever idealistic reasons push for reviewing the work at some coarser granularity than the details which are readily available. That's a way to foster and encourage mistakes, in both the work and in the mental model.
So when you say that code is not the place for that work to live, you are essentially purporting that there is a perfect abstraction that can generally be trusted, which I disagree is currently the case for an LLM spec versus produced code.
They’re important for discussion and brainstorming. They’re also important for sharing context before reviewing. But code is the only perfect representation in terms of semantics of what the computer will do.
You can have all the diagram and all the proses you want, but they’re still ambiguous.
A few days back I wrote a piece called “AI enthusiasts are in a race against time, AI skeptics are in a race against entropy.”
I have notes on a whole pile of AI-related topics that I’d like to cover in depth: AI mandates, communication norms, code review, AI art, and more. Unfortunately, I got too many interesting responses to my last piece, and now I have to address those before I can move on to other topics. 😉
There were two types of interesting responses: the first on the technical merits, the second on ethical grounds. I will respond to each of these separately. Let’s take the technical side first, because it’s easier.
Somehow, a subset of readers came away believing I was telling everyone to ditch code review and push their shittiest code straight into production without reading it, right now, tout suite.1
That is not what I am doing. That is not what I think you should do. But I did not pick that example at random, and I will tell you why.
It’s easy to forget, but for most of 2025, the idea that AI-generated code was slop and might always be slop was not only a reasonable position to hold, it was the default, mainstream position.2
That question was answered decisively last November. Ever since Opus 4.5 came out, AI has been able to generate code that is approximately as good as that of the median software engineer, at least for common patterns, and much faster and more cheaply. I came out of a book hole and realized this in January, and over the first few months of 2026, it seemed like everyone around me was having a similar realization.
But many saw it coming much sooner.
The popular narrative holds that Opus 4.5 was what changed. But Opus 4.5 was more like the tipping point. Agentic harnesses (the code that wraps the LLM in a loop with tools) became a real thing in mid 2025, with precursors building back to late 2024. Tool use, function calling, MCPs…all of this wave was building over the course of 2025, and crested into real general purpose usability at the end of the year.
That’s what the enthusiasts were trying to tell us last year. Not only “this is coming”, but “this is coming faster than you think.”
As it turns out, they were right.
As you may know, I come from the reliability side of the house. The compliment I will pay to myself and my people is that we do not struggle to adapt to new realities. As soon as a problem is real and in front of us, we adjust smoothly, even eagerly, thanks to an unwholesome zest for lapping up disgusting technical messes (and the campfire tales we get to tell later).
The un-compliment I will pay myself and my people is that we sometimes struggle to accept that progress is real, that the continued existence of bugs and edge cases does not diminish the fact that huge swaths of problem space do get more-or-less solved over time, to the point they can be taken for granted by most people.3
The speed at which code went from total crap to “ah damn, that’s not bad” is what I have in the back of my mind, as enthusiasts are telling us that harness engineering and AI validation is real, it’s already here, and it’s getting better astonishingly fast.
Holding out for “I’ll believe it when I see it” was forgivable the first time, but much less so the second time. This is what it feels like to be on the inside of an exponential change curve, turns out.4
I want to pause here and be very clear about what I think is happening. Then I’m going to tell you what specifically I am excited about, and why.
You are under no obligation to join me there. But there are way too many sweeping statements out there right now about “it was never X” — “it was always Y” — “the future belongs to xyzzy” 🤮 — and I want to be crystal clear how conditional and specific and contextual my claims are.
What happened in 2025 was this: the economics of code production were turned upside down. Instead of being very hard, time-consuming, and expensive to generate code, it became effectively free and instant. Lines of code went from being treasured, reused, cared for and carefully curated, to being disposable and regenerable, practically overnight.
For most of computing history, the primary way people have learned to understand software is by writing the code. Once you've achieved some mastery, reading and discussing code gets you most of the way there. (I might argue that software engineers have always relied far too heavily on the code instead of sensemaking the system through observability.)
Many great software engineers hold that true product of every (good) software engineering team has always been a shared understanding of the software we own. That it gets stored as cache state in our fragile little meat brains, frequently flushed to disk, deployed to production, committed to github, but our minds are where meaning has always lived.
Is it any wonder that software has always been such a fiercely collectivist endeavor, exquisitely sensitive to relationship dynamics and manners and questions of fairness and emotional valence? It’s exactly what you’d expect when part of your brain lives in other people’s brains, and your collective interdependence is sky high.
It’s something that I love about this industry. But there’s no denying that minds have been a poor container for certain aspects of the software development model. We are forgetful, distractible, impatient. We are bad at spotting small details, we grow habituated to repetition. Worst of all, the model in our heads diverges massively and perpetually from the world our users interact with.
Anyway, SREs have never quite bought that explanation. To us, it’s clear that the true product of every (good) software engineering team is production.
Only prod is prod. Test in prod, or live a lie.
(This is all backstory. I am getting to the point, I promise.)
We issued our AI mandate last August.5 I had seen enough to know that this was happening, and it was time to do the responsible thing. Honeycomb is a devtools company, and people come to us to help with hard problems on the forefront of technology. I was all in on AI, but I can’t say I was super excited about it, in my heart of hearts.6
Then I found Chad Fowler’s writings on Phoenix Architectures.
If you don’t know what I’m talking about, you should honestly stop reading my shit right now and go read his. Chad is the guy who coined the term “immutable infrastructure” in 2013. His best-known essay is “Relocating Rigor”, because Martin Fowler7 mentioned it recapping a Thoughtworks meetup on the future of software. I replied with “Production Is Where the Rigor Goes”, complaining that they didn’t talk about production enough.
When I wrote that, I think “Relocating Rigor” was the only piece I had read. But soon I found the rest of it, and after reading two or three essays, it just clicked. I knew exactly what he was talking about. I could predict the rest of what he was going to say. And then, reader…then I got excited.
I am going to give you a small sample of Chad quotes, just enough to get the gist. Here’s one from “The Death and Rebirth of Programming”.
Immutable infrastructure. Stateless services. Containers. Blue-green deployments. Infrastructure as code.
These ideas all share a common premise: never fix a running thing. Replace it.
AI pushes this premise beyond infrastructure and into application code itself. When rewriting is cheap, editing in place becomes risky. Mutation accumulates entropy. Replacement resets it.
Another favorite: “The Deletion Test”.
Here’s a simple test you can apply to any software system you work on:
Imagine deleting the entire implementation.
Most engineers experience deletion as existential. Code feels like the thing. It’s what we write, review, version, deploy, and debug. Losing it feels like losing the system itself.
When people say, “We can’t just throw the code away,” what they usually mean is something more precise:
We don’t know exactly what behavior is required.
We don’t know which failures are unacceptable.
We don’t know what invariants must always hold.
We don’t know how to tell if a new version is correct.
We don’t know which bugs are intentional fixes for forgotten edge cases.
Those are not code problems. They are evaluation problems.
Code becomes precious when it is the only place knowledge lives.
and,
For most of software history, treating code as durable was reasonable.
We treated code as permanent because the labor to produce it was the bottleneck. Rewriting was expensive. Re-validation was risky. Implementations accumulated meaning over time. Structure, tests, comments, bug fixes, and tribal knowledge fused into something you learned not to disturb.
That made sense when production was the constraint.
When regeneration is easy, code stops being an asset and starts acting as a cache: a materialized view of understanding that is useful while current, disposable when stale.
“A materialized view of understanding that is useful while current, disposable when stale.” I think that might have been the exact line that made it click in my head.
I am just barely old enough that my first job title was “System Administrator”. I was a teenager, working at the university, with root on every machine in the days before they learned they should definitely not do that.8
I lived through the shift from handcrafted server pets to immutable infrastructure cattle. I didn’t really understand what was happening at the time, but I’ve contemplated it a lot in recent years. I wrote this in the final chapter of “Observability Engineering”, 2nd edition (available for download as of Wednesday, June 17th!):
The shift from handcrafted servers to immutable infrastructure taught us that mutability is the sworn enemy of understanding. Any artifact that is edited in place creates drift. Drift is what makes systems impossible to maintain.
Our ability to kill and regenerate infrastructure components is the reason we trust it. At Honeycomb, we kill the oldest Kafka node off via cron every Tuesday. That’s why we are confident in our bootstrapping and balancing processes: everything is repeatable, the data can be regenerated, the commitments live elsewhere.
The fact that we cannot regenerate our code in the same way is a sign that we do not understand it. We do not know which commitments we have made, we do not know which dependencies will break. We find them by breaking them, mostly.
Think of all the years of your working life you have wasted on painful migrations and rewrites. Think of replacing load-bearing legacy code. Think of all the strangler figs.
Lines of code have been doing too much. The code has been the bundled up repository of developer intent, user expectations, implicit and explicit behaviors, the only fossilized composite record we have of bugs gone by. It’s too much!
And look at all the domains that have been neglected due to the towering, all-consuming expense of maintaining and mutating lines of code. Where are the artifacts I can review and discuss to understand how our architecture is evolving? Where are our architecture artifacts, period? What if we could discuss and converge on an architecture diagram, and the code could be regenerated from changes to the architecture, instead of the architecture being kinda-sorta inferred from the code?
I am not asserting that all code will eventually be AI-generated to spec, bypassing human understanding. The feasibility of this whole endeavor hangs on the question of what a spec is, or what a spec could be. Anyone who has ever done a painful database migration should have learned some goddamn humility about our ability to extract and formalize users’ expectations in a replayable, automate-able way.
But I think that every step we can take in that direction will be good for us.
The tools to do this don’t exist yet, but many of the ideas do exist. Most come from operations and QA, two domains that software engineering has historically been rather snobbish about.
Those tests and techniques are not about testing for correctness or what ought to be happening, they are about observing and encoding what is happening. Behavioral tests, characterization tests, capture/replay, traffic splitters. Observability (the good kind).
Having nondeterministic code in production is finally forcing us to do the things we should have done all along. Instrumenting with traces. Tests and evals in production. Production is not what happens after development is over, production is a stage of development.
Human brains are not good at validation. The nitpickiness, the repetition. This is the worst thing to be clinging to, y’all. There are so many better things for us to want to preserve and assert for ourselves in the production and maintenance of software. We are never going to beat the machine when it comes to validation — we are literally the weakest link!
My money’s on humans for a good long time when it comes to creativity, inspiration, leaps of logic, and a lot of other things, but PLEASE do not rest your killer argument for humans in software on us being the best quality gate. OMG. 🙈
Alright. I’m almost done here. Just one more thing.
I think what many engineers have found so alienating and terrifying about the last two years of AI discourse has been the way so many prominent AI voices appear to be gleefully declaring that software is no longer an engineering problem. “SaaS is dead!” “Making AI great at coding was the strategy that unlocks everything else”, and so on. Even Adam Jacob, one of my dearest friends and someone who is rarely wrong about technology, seems to anticipate a bloodbath of software jobs.9
If 2025 was the year of vibe coding, where AI got as good at generating lines of code as the median software engineer, and the range of possible futures often felt destabilizingly, impossibly wide open, I feel like 2026 is shaping up to be a return to discipline.
The knowledge in our heads is unavailable to AI until we encode it into the system, after all. The returns on those investments will be massive and nonlinear. We might argue that they always would have paid for themselves in the long run. But now every CEO in existence is chomping at the bit to get some of those AI cookies, so let’s give it to them. Discipline first, cookies second.
The share of software engineering teams that work in short, fast feedback loops (the cardinal sign of discipline in my book) is, and always has been, appallingly small. Five percent, maybe? Definitely less than 10%. AI tooling brings this more within reach than ever before. Or it can. It could. The discontinuous returns on investment in engineering discipline are real enough that it just might happen.
I am not worried, at least in the near term, about AI creating massive, discontinuous returns on investment in the absence of engineering discipline. (Many will try, and it will be entertaining to watch.)
But value is backed by durability, not disposability, and I don’t see that changing. Bits are cheap and fast and governed by the rules of logic and language, but anything with value must ultimately resolve with physical systems: persistence on the one side, user experience on the other.
People do not want to wake up every day and log in to Slack and find the buttons and menus all subtly moved around. People do not want financial transactions that complete most of the time. Determinism is not going anywhere, my friends.
AI is not magic. This is still engineering. As Adam says, “it’s still technology, and technology needs technologists.” And I for one am looking forward to learning new and interesting engineering problems, reviewing different kinds of artifacts.
And never doing another sticky, picky, two year long API rewrite or strangler fig migration, ever, ever again.
~charity
P.S. Thanks to everyone who read a draft and gave me feedback: Dave Williams, Chad Fowler, Adam Jacob, Mark Ferlatte, Austin Parker, Erwin van der Koogh, Ankur Bhatt.
I was not trying to be neutral or even-handed in my last piece, only to give a baseline of courtesy to everyone. But I think it’s revealing how many times I was accused of being “so overly hard on skeptics”, by skeptics, and “so overly hard on enthusiasts”, by enthusiasts, and sometimes simply “It’s sad how some people can’t accept reality” with no indication which side they meant. Lord.
Fred Hebert and I gave the closing keynote at SRECon in March of 2025 where we told SREs they should get to know AI, maybe even try vibe coding (pause for laughs), because otherwise their critiques wouldn’t land as well.
Seriously, that was our big pitch. Learn AI so that you can complain more effectively.
Infrastructure, for example. I think this is true of a lot of engineers, btw. I just think it’s really really true of the type of engineer that signs up to be an SRE. Technological pessimism and ADHD, our two most defining traits.
There is a segment of AI enthusiasts who believe we are entering an era of eternal exponential growth, in which the machines begin to build better and better machines, in ways we cannot understand.
I think those people are bad at math. The only thing we know for certain about exponential growth is that it will end. It always does. either in an S curve or a crash. (For a good time, google Heinz van Foerster and “our great-great grandchildren will be squeezed to death.”)
I definitely think we will use machines to build the machines — duh, we already are — but that’s about recursion and specialization. I think the exponential curve we are on the inside of now was created by sloshy free money chasing high returns, plus the properties of software as a function of language and logic, plus the biggest discoveries always happen in the early days of a technology boom, because low hanging fruit gets picked first.
My personal sense — and keep in mind that I am no kind of expert on AI — is that the exponential advancement in AI models leveled out a while ago, and gains are becoming harder to earn and more incremental in nature. I may turn out to be very wrong, of course. But even if there were no more AI innovations moving forwards, the past year has unleashed enough pent-up force to radically reshape the software industry as we know it. Like a pig in a python, we will be dealing with the consequences for a long time to come.
More on this coming EXTREMELY soon. Watch the Honeycomb blog!
The tech is cool, but as a thinking, feeling, breathing human who cares about other people, it can be hard to get excited about anything that so many people are this upset about. It’s also hard to get excited about something when so many of the loudest voices are out there talking gleefully about putting everyone permanently out of work, and so many artists and writers and people from developing nations are talking openly about the impact on them.
Hold your desire to jump in and berate me here, I beg you. Like I said, I will deal with the ethics and morality of using AI in my very next post. Be honest, your attention span is no more up for reading a 10,000-word essay than mine is up for writing one. (Can we blame AI for that too?)
“The Other Fowler.” I gather they’ve been making this joke for like.. fifty years.
I share a longer version of this story in the second edition of “Observability Engineering, chapter 32, downloadable later this week!!"
Adam is rarely wrong about technology, and I am 100% sure he is living and working in _a_ future of software engineering. I am less sure it is the future we will all be living in. If the hardest part of software has never been writing code — as is my belief — it logically follows that even if the economics of code production drop to zero, the hard parts will still be hard.
No posts