We should be more tired than the model

Something I've been trying recently for non-throwaway code is extensive refactoring, without typing any code myself but by closely directing the coding agent.

Prompts like "move the code relating to SQL query analysis into a new file", "look for opportunities to use pytest parametrize to remove duplication in that test", "rename method X to Y".

Early indications are that this is helping a lot with the problem where it's easy to churn out thousands of lines of code and not really have it stick in my head, even if I review every line of it.

Reviewing code and actively refactoring it is less tedious and more mentally engaging than reviewing code without changes.

If this was a human collaborator I'd be worried that I'm just creating busywork for them, but I don't care about busywork for LLMs!

The goal is to produce code that I understand and that I can remember just well enough that I get an updated mental model to help me productively make future decisions about the codebase.

I don't know. I find that I'm moving up a level and improving my product-management skills while delegating most of the code to the agents. I'm still very much hands-on with the design and requirements, and I'm asking questions like, "What's our security story for XYZ?", "Are we accounting for colour-blindness?", etc. Not being down in the code allows me to prairie-dog a bit more and see the landscape better.

I clearly identify with the problem the author raises, which is: the bottleneck is understanding.

I don't go along with their mitigations though.

In programming we have one tool for this: abstraction. Decomposition, pattern recognition, even data structures and algorithms are all down stream of abstraction. Collectively, we've never truly mastered abstraction, but it's what we have and we collectively wield it well enough that it's usually somewhat effective.

We are in dire need of a better abstraction.

> We should be more tired than the model

I understand the rationale behind this, but can't help feeling that this is a downward spiral. The software industry has always been a hard place to build and sustain a career because of the pace of change. With these tools, the pressure to increase output is going to grow, jobs are going to be axed - so software devs need to work harder to stay relevant. Weren't these tools supposed to make our lives easier?!

> Using the agent to keep asking questions about pieces of the code I don’t understand instead and pull up relevant documentation and PRs.

I like to do the opposite, asking the LLM to give me relevant follow-up documentation, like the actually docs, where I can read and understand things myself. Data structures, techniques, etc. I still like to read that from the authors, much easier and trustworthy to grasp.

Skill atrophy is a real issue when it comes to creative skills. But I argue that not all of what we call coding belongs to these skills. I consider lots of it chores due to inefficiency of the languages and abstraction layers. Problem solving, hypothesizing, researching, running experiments, and designing solutions all require critical thinking and creative skills. If you're worried about losing coding skills, ask yourself this question: what are you trying to achieve?

https://web.stanford.edu/class/ee384m/Handouts/HowtoReadPape...

I think this is how we should be reading code as well.

First understand the top level. Then the next level of detail and so on. I treat my understanding as graph of interconnected black boxes. If I don't understand a particular black box or a node in the graph. I click expand on it, grok the details and then collapse the node. Here's the grokking details of a particular sub-node also follows the same structure as understanding the root node. You don't need to understand everything from the get-go, expand your understanding on the need-to-know basis.

The point about the UI affordances strikes me as very relevant. I find that the way I want to use LLMs in coding is not available.

We have chatbots in a sidebar that will just generate code for you or, more helpfully, answer your questions. We also have inline LLM code completion, which I've turned off completely because they're incredibly noisy.

What I want is something between those. My ideal use of LLMs while coding would be, i start writing a function and need to act on some data. I don't know what method to use, maybe I'm in an unfamiliar language/framework and don't know what my options are. I want the AI to explain what methods I can call to do X in this specific place, no more, no less. It would need to know what outcome I want, which would be hard to do without jumping out of the code and typing into the chat, but I basically want it to function like Intellisense on steroids. Something that doesn't break my focus.

Current LLMs are anti-flow. For me, that's poison.

> adding friction back into development

I'm really trying to do this too. The problem is it's *so easy* to let your standards slip, even for just a moment, and that piece of code suddenly becomes foreign.

I find more mental energy is spent on restraint than execution these days.

>>In some ways, we’ve replaced the social media feed with a stream of tokens, and I look forward to reading those papers in ten years.

Second this. This is why zuckerberg is dying to spend as much as he can to make meta an AI company.

Lately I've been thinking about this a lot. I've slightly shifted my use of Claude from implementing tool to scaffold generator for me to actually do the hard parts. It's frustrating at first, because the impulse always is "I could get Claude to do this in minutes", but that's just the brain trying to spare some energy.

I've found that it's much more rewarding to use LLMs as an aid to deep work instead of a substitute for it, and it's even helped me feel more optimistic about my place in this field after a couple of days of getting used to the mental friction again.

Not a programmer, but I'm beginning to discover a rhythm similar to the author's that doesn't save time and effort as much as it fragments and redistributes them both.

Being conscious of what type of memory you're working in (or need to engage) may be the trick to building rhythm or flow, or whatever. Depending on the case the LLM may not even be necessary. Use something else.

The trap could be in trying to depend on and work with a model the same way we would work by ourselves, as the author describes, letting every type of memory unconsciously operate.

> particularly because its UX affordances are reminiscent of a slot machine’s: you pull the lever, you get a reward (a solution to your coding problem.)

I hope the field moves out of the TUI with prompt + pull the lever paradigm soon‚ when it comes to agentic programming. And the Markdown paradigm too, tbh.

There hasn't been anything that really sticks yet for a shift to happen.

This is how I treated LLMs from the beginning, maybe because of my impostor syndrome of not knowing if my understanding of _anything_ is correct, and going down the rabbit hole of the concepts that are presented there...

Now the question to the round: in your opinion, are LLMs ok to learn in this way? At least on the theoretical side of things?

I agree with the article, though I will say with an agentic workflow I feel more tired at the end of it than I would doing it by hand. Maybe it’s the constant reading and digging in the generated code, or the constant context switching while waiting for it to think/generate. Or it’s both.

Reading all the code the model generates will tire you out pretty quickly

OP’s approach is one in a thousand.

Imho most of professional coders trade their time for oblivion.

The problem is that you really don’t remember anything about the code. It is not your creation.

It’s like a monkey in front of a slot machine, just pulling the lever and waiting to see if it hits the jackpot.

At the end of the day, it remembers that it pulled the lever. And how many times it won :)

Agentic-based coding with /goal and multiple agents coding together is another level…

But the issue remain imho - if there is an error, who is going to repair it?

For human in the loop to be effective, the human needs to actually be performing some substantive action, giving real guidance and critique and pushback. If the human only ever accepts the default plans then not only is there no understanding but the agent should learn to stop asking. It is not learning anything from the human, after all.

One thing that I look at is pushback rate: what percentage of the agent's proposals are rejected or critiqued? If it's below 5% I have found I have gotten too credulous and I am no longer closely following. Danger! If it's above 50%, I have clearly not given the agents sufficient context to perform the task and need to update my harness and instructions.

Who watches the watchers? I can imagine a guard dog process that halts the session to yell at the human if it detects complacency: if the human is providing too few tokens per minute of new context relevant to the task.

“Using the agent after trying for 20 minutes”

This made me chuckle a bit.

There’s no point in fooling ourselves about our own skill retention if this is the case.

All of these posts are a replay of what Marx wrote about machinery and alienation from work and intensification of the workday.

I don’t know about being more tired than the model, but when I’ve had a particularly productive session, I feel more tired (brain fog) than during a coding session. Probably because I’ve replaced brainless typing with the cognitive load of decision-making and weighing plan approaches.

The solution is to move slower, not faster.

I encourage you to crack open a dependency tree for any project and ask: how many of these do I understand? Then open one and ask: do you really understand whats happening? How much of the code there do you even use?

The experience will feel uncannily similar to AI generated code. So treat slop the same way. Give it a good, well tested API, and file an issue or PR when something breaks.

The struggle here for many is expectation. We can certainly be more productive with these tools.

Can we be 10x more productive though? Or is it more like 1.25x? Is it AGI or is it more like an advanced compiler?

Unfortunately the world is betting on 10x when the reality on the ground feels more like 1.25x.

So many efforts out there to alter the usage of the tool to regain control, when it's clear to me that the tool is the problem?

By which I mean, we should -- as software engineers -- be insisting on tools that put us in the driver's seat more.

Instead we're letting the agent drive. (I'm as guilty as this as anybody). But really we're letting Dario, Sam, Boris, etc. drive. And it should be clear from their public pronouncements and emissions that they don't have the best interests of our profession -- or the quality of software engineering generally -- in mind.

Yes, certainly, alter how you use the tools. But we need to fix the tools themselves.

Another option is just not using an LLM at all.

Something I've been trying recently for non-throwaway code is extensive refactoring, without typing any code myself but by closely directing the coding agent.

Prompts like "move the code relating to SQL query analysis into a new file", "look for opportunities to use pytest parametrize to remove duplication in that test", "rename method X to Y".

Early indications are that this is helping a lot with the problem where it's easy to churn out thousands of lines of code and not really have it stick in my head, even if I review every line of it.

Reviewing code and actively refactoring it is less tedious and more mentally engaging than reviewing code without changes.

If this was a human collaborator I'd be worried that I'm just creating busywork for them, but I don't care about busywork for LLMs!

The goal is to produce code that I understand and that I can remember just well enough that I get an updated mental model to help me productively make future decisions about the codebase.

>Prompts like "move the code relating to SQL query analysis into a new file", "look for opportunities to use pytest parametrize to remove duplication in that test", "rename method X to Y".

There’s a lot of overlap there with the sorts of things traditional automated refactoring tools can do approximately instantly, locally, and for free.

I think the best approach is active code review as the agent does small batches. Or letting it come up with a solution, testing if it passes or fails the desired outcome, then creating a separate fresh project and asking it to rewrite in small parts, and have it explain to you what and why it's doing to achieve each part.

Interesting idea.

It’s almost like a buffer space would be useful for code.

I’ve been using tuicr for agent code reviews and have been enjoying that. I think I’ll try your idea as part of my workflow.

The point about the UI affordances strikes me as very relevant. I find that the way I want to use LLMs in coding is not available.

Current LLMs are anti-flow. For me, that's poison.

> We should be more tired than the model

That’s only true if companies give some of the productivity gains back to the employee, but most companies don’t do that. They keep the profits purely for themselves. There are some exceptions.

I'm not convinced jobs will be axed in the long-term - All the big tech companies frequently staff teams on projects that basically go nowhere to spread bets on multiple projects in case one has legs. Once LLMs reach the point of commoditization and drop in price, it seems like the natural next step is more teams with smaller structures to spread bets even more. A 5 person team that is LLM-assisted is going to move faster and be more cohesive than an 10 person team that ends up stepping all over each other.

>Weren't these tools supposed to make our lives easier?

In a late stage capitalism market economy, their only actual requirements were to make profit for the shareholders and VCs.

If that means making our lives harder, firing most of us, making us stupider, being addictive, being used for surveillance to sell us shit or control us, or even being used to kill people, all of those are fine, if they fulfil that requirement.