Prompts like "move the code relating to SQL query analysis into a new file", "look for opportunities to use pytest parametrize to remove duplication in that test", "rename method X to Y".
Early indications are that this is helping a lot with the problem where it's easy to churn out thousands of lines of code and not really have it stick in my head, even if I review every line of it.
Reviewing code and actively refactoring it is less tedious and more mentally engaging than reviewing code without changes.
If this was a human collaborator I'd be worried that I'm just creating busywork for them, but I don't care about busywork for LLMs!
The goal is to produce code that I understand and that I can remember just well enough that I get an updated mental model to help me productively make future decisions about the codebase.
One thing that I look at is pushback rate: what percentage of the agent's proposals are rejected or critiqued? If it's below 5% I have found I have gotten too credulous and I am no longer closely following. Danger! If it's above 50%, I have clearly not given the agents sufficient context to perform the task and need to update my harness and instructions.
Who watches the watchers? I can imagine a guard dog process that halts the session to yell at the human if it detects complacency: if the human is providing too few tokens per minute of new context relevant to the task.
We have chatbots in a sidebar that will just generate code for you or, more helpfully, answer your questions. We also have inline LLM code completion, which I've turned off completely because they're incredibly noisy.
What I want is something between those. My ideal use of LLMs while coding would be, i start writing a function and need to act on some data. I don't know what method to use, maybe I'm in an unfamiliar language/framework and don't know what my options are. I want the AI to explain what methods I can call to do X in this specific place, no more, no less. It would need to know what outcome I want, which would be hard to do without jumping out of the code and typing into the chat, but I basically want it to function like Intellisense on steroids. Something that doesn't break my focus.
Current LLMs are anti-flow. For me, that's poison.
I understand the rationale behind this, but can't help feeling that this is a downward spiral. The software industry has always been a hard place to build and sustain a career because of the pace of change. With these tools, the pressure to increase output is going to grow, jobs are going to be axed - so software devs need to work harder to stay relevant. Weren't these tools supposed to make our lives easier?!
I don't go along with their mitigations though.
In programming we have one tool for this: abstraction. Decomposition, pattern recognition, even data structures and algorithms are all down stream of abstraction. Collectively, we've never truly mastered abstraction, but it's what we have and we collectively wield it well enough that it's usually somewhat effective.
We are in dire need of a better abstraction.
Being conscious of what type of memory you're working in (or need to engage) may be the trick to building rhythm or flow, or whatever. Depending on the case the LLM may not even be necessary. Use something else.
The trap could be in trying to depend on and work with a model the same way we would work by ourselves, as the author describes, letting every type of memory unconsciously operate.
I'm really trying to do this too. The problem is it's *so easy* to let your standards slip, even for just a moment, and that piece of code suddenly becomes foreign.
I find more mental energy is spent on restraint than execution these days.
Second this. This is why zuckerberg is dying to spend as much as he can to make meta an AI company.
I hope the field moves out of the TUI with prompt + pull the lever paradigm soon‚ when it comes to agentic programming. And the Markdown paradigm too, tbh.
There hasn't been anything that really sticks yet for a shift to happen.
I think this is how we should be reading code as well.
First understand the top level. Then the next level of detail and so on. I treat my understanding as graph of interconnected black boxes. If I don't understand a particular black box or a node in the graph. I click expand on it, grok the details and then collapse the node. Here's the grokking details of a particular sub-node also follows the same structure as understanding the root node. You don't need to understand everything from the get-go, expand your understanding on the need-to-know basis.
I like to do the opposite, asking the LLM to give me relevant follow-up documentation, like the actually docs, where I can read and understand things myself. Data structures, techniques, etc. I still like to read that from the authors, much easier and trustworthy to grasp.
I've found that it's much more rewarding to use LLMs as an aid to deep work instead of a substitute for it, and it's even helped me feel more optimistic about my place in this field after a couple of days of getting used to the mental friction again.
Now the question to the round: in your opinion, are LLMs ok to learn in this way? At least on the theoretical side of things?
Imho most of professional coders trade their time for oblivion.
The problem is that you really don’t remember anything about the code. It is not your creation.
It’s like a monkey in front of a slot machine, just pulling the lever and waiting to see if it hits the jackpot.
At the end of the day, it remembers that it pulled the lever. And how many times it won :)
Agentic-based coding with /goal and multiple agents coding together is another level…
But the issue remain imho - if there is an error, who is going to repair it?
This made me chuckle a bit.
There’s no point in fooling ourselves about our own skill retention if this is the case.
The experience will feel uncannily similar to AI generated code. So treat slop the same way. Give it a good, well tested API, and file an issue or PR when something breaks.
Can we be 10x more productive though? Or is it more like 1.25x? Is it AGI or is it more like an advanced compiler?
Unfortunately the world is betting on 10x when the reality on the ground feels more like 1.25x.
It’s almost like a buffer space would be useful for code.
I’ve been using tuicr for agent code reviews and have been enjoying that. I think I’ll try your idea as part of my workflow.
By which I mean, we should -- as software engineers -- be insisting on tools that put us in the driver's seat more.
Instead we're letting the agent drive. (I'm as guilty as this as anybody). But really we're letting Dario, Sam, Boris, etc. drive. And it should be clear from their public pronouncements and emissions that they don't have the best interests of our profession -- or the quality of software engineering generally -- in mind.
Yes, certainly, alter how you use the tools. But we need to fix the tools themselves.
There’s a lot of overlap there with the sorts of things traditional automated refactoring tools can do approximately instantly, locally, and for free.
lmao I hope I never use your products with anything sensitive ever
The next problem is few care about that, at any level: coders, managers, execs. Just want their feature churn.
The even worse problem (or maybe, a positive) is that most of that code and the products powered by it aren't needed either.
In a late stage capitalism market economy, their only actual requirements were to make profit for the shareholders and VCs.
If that means making our lives harder, firing most of us, making us stupider, being addictive, being used for surveillance to sell us shit or control us, or even being used to kill people, all of those are fine, if they fulfil that requirement.
And I'm not sure how this relates to TFA's point. Are you saying we collectively need to get better at abstraction so that LLMs get better at abstraction (either by training, or our prompting), so that their code is easier to read?
And the company says "fuck you then, I'll fire you, and keep fewer coders, willing to keep the AI dance".
The role of AI as a tech is to put you out of the drivers seat as much as possible. All the way to job elimination.
Solution?
Getting them to run ast-grep is really fun, especially when it saves me from having to memorize that syntax myself.
If there's this huge productivity boost what is it being spend on? I know, many have been laid off, but that's not universally true. So we have a productivity boost that doesn't really deliver anything and overall quality a lot of products/code/writing/communication is going down, yet we spend an ungodly amount of money on datacenters... for what, just spinning the wheels?
But of course AI is also making union/worker pressure matter even less, since it's function is to cheapen the cost/leverage of workers.
So the only solution is fighting that at the political/legal/social level. Which I ain't see happening anytime soon.
> Are you saying we collectively need to get better at abstraction so that LLMs get better at abstraction (either by training, or our prompting), so that their code is easier to read?
No - our current abstraction for coding agents is a loop where we express some freeform specification of a goal, then a sub loop kicks off where an llm takes a stab at what good looks like for the next step (make an edit, search for info, run a command to cause some side effect etc etc), it iterates in this loop and when it's finished its sub loop, it declares end of turn and the loop returns to the user for steering input.
That inner agent loop can make it quite hard to stay in control.
What if instead of only these low level free form prompts we additionally had some higher level primitives to work with?
May 28 2026
Lately, I’ve been feeling like I’m losing control over the code I write when I work with agentic code generation.
When I finish an agentic session, I get all the outward signs of having written code, but none of the internal processes that happen when we write code by hand.
As a quick primer, the human brain has several types of memory, short-term, working, and long-term. Short-term memory gathers information temporarily and processes it quickly, like RAM. Long-term memory includes things you’ve learned previously and tucked away, like database storage. Working memory takes the information from short-term memory and long-term memory and combines them to synthesize, or process the information and come up with a solution.
When we’re working on code, (and by working on, we mean most often reading someone else’s code) all of these processes are going on in our brain simultaneously to try to help us make sense of the programming environment.
It’s not surprising. Code generation, in its default mode, is antithetical to skill retention, particularly because its UX affordances are reminiscent of a slot machine’s: you pull the lever, you get a reward (a solution to your coding problem.) In some ways, we’ve replaced the social media feed with a stream of tokens, and I look forward to reading those papers in ten years.
It really does take extra concerted effort to move from just generating answers to using the tool deliberately. One thing Oz suggested when I posted on X that I felt like I came away from an agentic session with a brain fog was rewriting portions of code myself.
Inspired by that advice, the paper, thoughts on slowing down, and using AI to write better code more slowly, and Mitchell’s adoption journey, I’ve been working on using the tool more deliberately and adding friction back into development.
Here’s what’s worked for me so far:
All of these negate the supposed speed up effects of LLM-generated code in the short-term by adding friction, and yet, in the longer term, make me better at using the tool, because they solidify my own foundation instead of the foundation models'.
We should be more tired than the model.
This is technically true, but lets not act like we haven't seen immense improvement of both models are harnesses for these models in the past years. They may not be learning, but they are getting better
It's not unlike some managers who tell their teams to do something trivially easy that they could have done themselves.
(I'm not saying this is ideal and I'm not defending my laziness. It's just the current state of things.)
Even the doFoo to performBar is tedious because you need to catch all instances and your find/replace script strategy might have unintended victims.
In this case indeed, it's just much more convenient.
1. Find the code you want to change
2. Run the tests to confirm that test coverage is good for the starting point
3. Track down everywhere else that might call or interact with that code
4. Update the tests (red/green TDD)
5. Alter the code
6. Update the things that call the code
7. Run the tests again
8. Apply linters/formatters
9. Address any feedback from linters
10. Check to see if any documentation needs updating and do that
11. Land a commit with a descriptive commit message
I can get all of that done with a coding agent with a single sentence prompt - especially if it's already in a session where it knows that I do "red/green TDD".
... and then I can work on something else while the agent is churning through those steps.
Mature workflows for those kinds of tasks have been mostly ubiquitous across professional-grade engineering tools like those from JetBrains or Visual Studio itself for longee than many people here have even been working in the trade.
It's clearly not the case for simonw, but much of what many people task AI tools to do foe them are only a novelty for the "VS Code"-type users who stubbornly refused to explore more professional-grade paid tools in the past.
Yet for many tasks, those mature paid tools provided reliable and efficient features that make the AI approach look like an expensive, slow, and dangerously nondeterministic regression.
And those execs will get their bonuses anyway, and will be drinking their champagne far away from their executive roles and the company by the time that's felt.
As a recent example, I recently had to abandon the multiple LLM reviewer/verifier model I was using because zig 0.16 was released with major changes.
I actually reverted back to full self hosted because the foundation models we’re trying too hard to revert to the older versions of the language.
It is going to be a balancing act and there is fundamentally no way for LLMs to get around this.
We will have to develop methods to do so, most likely by focusing agents on problems that are more static.
I guess the difference may be in people's mode of AI working: Do you primarily develop in your IDE or a bunch of terminals running vim, and occasionally fire up claude to do more complex things? Or do you primarily develop in a long-lasting claude terminal, and occasionally tab over to the IDE to watch/codereview? In other words: What dev tool is on your primary monitor and what's on your secondary monitor? It's getting hard for developers in one camp to discuss coding and see eye-to-eye with developers from the other camp.
Maybe, but to eat it you need to kill it first or it will stomp you. And this can only happen all at once :)
There are a lot of small refactorings that I wouldn't consider to be worth 15 minutes of my time, so I wouldn't do them.
Outsourcing those to an agent means I don't have to make that tradeoff, which means I can get better quality code.
But yes, for a lot of my work I'm now a Claude Code / Codex first developer. I run Zed so I can navigate the code and occasionally make small edits.
Getting the agent to grep std, example code, comments that reference inaccessible security or bugs etc.. help a little.
But for my needs, not refactoring would just be stepping over dollars to pick up pennies.
But yes it is a problem.
I've never liked the larger IDEs - VS Code only won me over because it was indistinguishable from a lighter text editor at first, and the IDE tools then emerged slowly as I used it.
If you have zig installed, you can run ‘zig std’ to see that.
You still have the limitations of attention etc…
Even zed’s agent will leverage that built in tarball, but it doesn’t solve the problem, especially as some of the languages killer features are unavailable in C and other languages.
1 - Find the code
4 - Move the code
10 - Change the documentation
You don't do the other steps because it's deterministic and always correct.
I don't trust any refactors until I've seen the test suite pass.
(OK, sure, "rename method" might be OK, but most of my refactors and design changes are more interesting than that.)