Using AI to write better code more slowly

I've hit this point with AI where it's not a simple process, but a long drawn out back and forth.

I'll use AI to design the implementation of a medium sized, cross cutting feature. Review all the details, maybe iterate on just that. Then implement with Claude 4.7 Max - which runs slower, but does a better job. Then review the implementation, then have Codex GPT 5.5 xhigh fast review it - which almost always finds corner cases. Have Claude fix those - Claude is better at writing intuitive maintainable code versus Codex overengineered/shortcut filled code. (Codex is better at finding/fixing bugs and doing reviews - it's annoyingly pedantic)

Then repeat with fresh Claude/Codex instances having them both review the current staged changes and getting feedback, handling the feedback. Then covering it in tests. I mean overall I still implement the feature faster than coding it manually, but I spend a majority of the time going back and forth with reviews, handling corner cases and at the finish end up with what I feel a really solid implementation of whatever feature I'm working on. The v1 feature feels more like a v3 given the amount of iteration it already went through.

This article doesn't address writing code with AI, just code review. My issue with agentic coding is that I make numerous micro-architectural decisions while programming. I almost never have a full spec up front and develop one as I consider what I am writing.

When using Claude Code or Codex, that is all gone. Claude Code is extremely eager to reach the end goal to the point that it feels like a fever dream to write code with it. In the end, I have low confidence about edge cases and fit into the project's architectural and design goals.

On top of that, I enjoy programming, reverse engineering, etc. and I feel that the LLMs, while able to solve some problems or deliver some features, take that fun away. I'm trying really hard to find a workflow with them that I'm confident in, but I fear that workflow is just chat, search, and being a rubber duck for my thoughts.

I find myself spending on average more time in LLM review/resolution loops than it would take for me to write the code by hand. Partially because once I'm in the flow I write very very quickly and the code pours out sometimes faster than I can write. But also because the LLM code on the first few tries is generally really really bad. What I find interesting though is that spending the time to personally review and direct the LLM through several iterations of review and revision on average results in higher quality code written in about the same time as I would have written it. This might be particular to me, but seeing several interations of someone else's code helps me better understand holistically my objective as opposed to whatever happens to come out of my flow-state consciousness.

“A lot of people seem convinced that the point of AI coding is to write low-quality code as fast as possible.”

A lot of people think a lot of things, but I don’t think the majority of people think the point of using LLMs is so they can produce low-quality code. Do they produce low-quality code sometimes or often? Of course. But they also produce high-quality code very often. And sometimes they just a “fine” job.

One of the promises - and there are plenty of cases where it’s met and where it falls drastically short - is that agentic coding tools can help us code faster that is just as good or better than what a human can. One of the other big ideal payoffs is that agentic coding can allow non-programmers to create things that previously required programmers to create.

We can debate as to how successful we’ve been toward the two goals above, but I think it’s misguided to say that the majority of people think LLMs should produce lower quality code.

This is one of the most sane takes on shipping code using AI where it's being actively reviewed and it respects your colleagues' time and attention. I like it.

The linked article about getting LLMs to critique each others' code review[1], the magpie tool[2], and also this recent article from Cloudflare about their code review stack[3] are all quite compelling.

I'm fairly AI-skeptical not on grounds of "do they work" but "are they good for the world". I feel that getting AIs to do this kind of review work is a rare case that doesn't outsource thinking and deskill workers. It doesn't trigger the same alarm bells as having the AI write the code (including having the AI fix the issues it discovers). That's setting aside environmental and other ethical concerns, which are still significant to me.

I have been impressed by the recent quality of AI code reviews*, but the experience of interacting with 3 separate AI reviewers via GitHub PRs is pretty terrible. Having more local-oriented and jj/rebase-aware review rounds would be great.

*context: fairly large PHP/Laravel backend and Vue frontend

[1]: https://milvus.io/blog/ai-code-review-gets-better-when-model...

[2]: https://github.com/liliu-z/magpie

[3]: https://blog.cloudflare.com/ai-code-review/

Regardless of what model you use, agentic coding tools are indeed pretty good at finding issues if you target them a bit. And they have no respect for their own code or any sense of shame. So, you can just point them at their own code with a new thread.

Many AI models seem biased to cutting corners by default when generating code, even when you ask them not to. But a few simple follow up prompts can address that. Simply ask for covering corner cases with tests, test all the known non happy paths, look for weaknesses, verify adherence to SOLID principles, do security audits, etc. It will find issues. With bigger projects, you can actually make it file those issues in gh with labels and priorities. And then you can make it iterate on fixing issues with separate PRs.

On a recent project, I made it implement a simple benchmark test for measuring throughput. I had a hunch it was doing very sub optimal things. I then asked it to look for potential performance bottlenecks and use the benchmark to verify improvements. At that point I already had a lot of end to end tests to verify correctness. So, these performance tweaks were relatively low risk. I got about two orders of magnitude improvement and a lot more graceful behavior when pushed to the limit.

If you have a bit of experience engineering systems, just treat these tools like they are junior developers. Competent but likely to skip some essential steps. So, just double check with a lot pointed questions "did you do X? If not, do it now". Anything that needs repeated asking, turn it into a guard rail / skill.

There's a bit of effort and skill involved with this. I imagine a lot of less experienced developers might struggle to get good results because they aren't asking for the right things.

The anchoring thing is what gets me. Once I've seen the AI's first try, even when it's wrong, I can't really write fresh in my head anymore. I end up editing instead of starting over. Code quality usually ends up fine. Time-wise it's a wash or worse, you just don't feel it until you look at the clock at end of day.

We may be in the last Golden age of AI, where experienced professionals still exist who can code manually, and AI already exists who can code automatically, and when the former use the latter skillfully, wonders happen. This magical intersection may not exist iin the future, or become very rare.

I find that it really is effective when you iterate and plan and review, but the problem is more psychological on the human side. It's just too easily available to take the lazy option and just let it do the thing, postpone the thorough reviewing and you end up in a similar situation as tech debt. In an ideal world with no deadline pressure and infinite discipline, AI can be used in productive ways for sure. But when you actually write the code, there is more of a "do you do it or not" switch, and with AI it's a smooth ramp, you can be just a bit less involved or just a bit more. And I end up feeling like I'm not fully involved, I'm halfway working and my whole mind isn't tuned into it properly. I'm not sure how to express it. Also, now several months in, I just don't get the same feeling of accomplishment from the little wins. It's too automatic, doesn't feel earned.

I’ve landed on a very similar usage in my last pet project. I’ve used the llm mainly as a glorified refactoring tool/LSP/rubber duck. I can define custom skills that act as specific passes over the codebase that are hard to do with traditional tools, I am using Julia, so I have a skill that is only about doing a semantic and type analysis pass to catch potential type instabilities. Or another that is just about documentation reporting. The workflow for me is always: talk the problem to death/get a report. Triage, decide what I can and should do on my own, what can be left to the llm as mundane boring refactoring tasks, what instead needs me to figure out the correct shape first and then ask the llm to propagate the new pattern in the codebase. Then act. A lot of the time I am implementing the llm suggestion by hand on my own to get a feel of how the codebase is shifting under my feet and stay on top of things. This indeed makes things more slow, but allows for an overall higher quality codebase. Especially the refactoring part.

Title of this article suggested more depth and I was expecting actual code examples. But it is like other opinion pieces. It suggests a prompt (ask AI to find bugs) that works for the author advising everyone to do it that way.

I use these tools at both work and for personal side projects and I was expecting to watch and learn. But these opinion pieces without examples are way too many now.

One thing that's been interesting to me over the last few years is charting the edge of my coding laziness. As a coder, I'm lazy about boilerplate code -- I hate writing it, I hate maintaining it, etc. And so I design and architect (or used to) around that preference. Sometimes that's smart, sometimes that's not. But it was my preference, and I avoided something that was hard for me to do.

When LLMs started being somewhat useful for coding a few years ago, and I found they were in fact great at boilerplate, in fact pretty much only good at boilerplate ca 2023 or so, it got me thinking about all the accommodations we make in design and systems architecture that are sort of tacitly understanding who we're working with and their strengths and weaknesses.

The modern models have their own very different strengths and weaknesses compared to humans, and deploying them is a really interesting exercise of different architectural and engineering skills. I've enjoyed it, and hope I continue to.

Does anyone have good recommendation for ai auto completion?

My goal is to draft the solution with ai, write it myself but faster with auto complete, then throw ai review.

Another way I'm "going slower" is to have the AI implement individual sub-steps of the current task, and review each one. It's slower than having it yolo out the whole thing, but it's much smaller incremental bits to review, so my brain doesn't glaze over in a huge review, like I had if I had it do the whole task.

I'm following an Ideas -> PRD -> Issues -> Tasks methodology, where each task has a bunch of sub-tasks. I have it just do one (or a few, I'm having it do Red/Green/Refactor as separate sub-steps, so I review the Red case, and then once that's good, do the Green and Refactor steps, and review those).

I think AI exists to make humans better, not to replace us (which it can’t anyway). I use LLM’s with new topics answer questions and tutor me (for instance with multivariable calculus -course this spring I asked Claude to create 10 practice exercises, which I then did and it reviewed. Harder ones it did with me step by step.) hopefully not needing them after awhile, when I gain proficinency. Automating humans away is not going to work. There’s a reason why we are the apex predator and ruled this planet for million years.

As I read this, I'm also working through a pretty dense feature that took a fair bit of iteration. The end result is actually significantly less code than it was about halfway through. And I was wondering if the AI actually helped me at all, since surely I could have written the code in the same time it took to iterate

But! Because of AI I was able to rapidly hack out like 4 variants of this feature that I didn't like. And felt comfortable throwing them away just as quick.

Exactly. That is what we do. We do software that can kill people and it is very sophisticated, like controlling robots and we prototype using LLMs and it is amazing.

People believe that you can only use LLMs for sloppy programming. But you can also use it for writing ten times more code of Swiss cheese model tests, and domain specific languages.

You write ten times more code than necessary and all that extra code is testing. Projects like SqlLite do that because they need to be perfect.

Before LLMs we had to use engineers for that and it was a painful and repetitive work, and they were always late and made much more mistakes than LLMs, specially because it was dull and tedious for great engineers to spend their time into.

Now we write tests and when all test pass we write new test for checking the tests.

We divide each complex problem in small subproblems and we warrantee each of them by formal means. We have multiple ways of solving the same problem, usually with one brute force solution that is simple and warranted to work but inefficient, and we can use it to compare with more efficient methods.

Before machines could do that, people doing that were burned down and exhausted, and always leaved pending work to complete.

So I am figuring out how to let LLM write code automatically as long as I clarify the requirements. I have made a set of skills to deal with this and it called tdd-pipeline. I eat this dog food and by several rounds of iterations to fix bugs, it works better and better. Now I feel much relax while it is working.

I open sourced it on GitHub, you may search alexwwang/tdd-pipeline to find it if you are interested in it.

Input sequence mutation >>> novel token generation in LLMs. Why? IDK, there must be a good theory article someone could point me to.

Yep, it definitely can help with being an 0.1x developer, it's a long drawn out process but the output is actually good.

On the other hand, some companies are pushing the idea that engineers should build robust self-evaluating agent pipeline with human feedback in the loop so that agents write most of the production code. Creao's CEO said that they rearchitected their entire production systems in two weeks this January. He also claimed that their agents implemented so many features so fast that they had to wait their business development to catch up.

I wonder how we can evaluate these two options: using AI to 100X the output versus using AI to advance one's craft.

In the meantime, the productivity gain of AI is real. Case in point, An engineering org of Snowflake has met all its OKRs ahead of time in the first quarter for the time in the company's history. It had never happened, and usually meeting 70% of the planned OKR would be considered an achievement. I can imagine the stress of the engineers when they see such outcome.

Optimizing for code quality over raw output speed is a great approach. The time 'lost' writing it slowly is easily made up by the time saved on debugging and maintenance later.

Would love to but my boss wants 15 features delivered yesterday

I think using speed to describe the rate of progress in software development is where the frustration comes from. Software isn't a velocity thing. It's a space thing. It's memory. Information in some media. You can transfer a billion bits in less than a second. The time domain is largely irrelevant in business terms.

Having taste and the ability to author high quality prompts is still the most important thing. It was always the most important thing if you think abstractly about how all of this works.

The main insight here I think is that LLMs are great tools for iterative development and iterative problem solving in general.

You can very effectivly iterate alone using the LLM as a mirror, rephrasing what you put in and adding a bit.

You can use LLMs to quickly create prototypes to give to other human beings to help you with the next iteration.

If you get something from someone else to iterate on you can use the LLM to help you with understanding to rephrase things in a way more suitable for your understanding.

But instead everything anybody seems to be talking about seems to be one shoting things and AI iterating with other AI.

The big problem here is that the one thing AI does not have is agency. The naming AI agent is wishful thinking and marketing.

This is exactly the reason why I like to work with local models on a regular specced machine. The fact that the agent moves slower allows me to stay in the loop much better, compared to skimming through a huge amount of generated content and data and then going to the end really fast to make sense of it all, in the interest of time (and thus losing track and quality). The fact that I can run it locally makes it (much) cheaper too.

> But if you’re the kind of developer who uses agents to write multi-hundred-line PRs that you barely understand yourself, I’d invite you to slow down a bit and try this other, slower style of “vibe coding.” Ask an agent how your PR works and how it might fail. Have it write Markdown docs with Mermaid charts if necessary. Use Matt Pocock’s /grill-me skill until you understand the entire PR front-to-back.

Man so much work to retrofit something that obviously, simply, plainly - just does not work. How about just writing the code yourself? You can even consult AI on the libraries or whatever, but how about just building that model in your head YOURSELF and not loading up on AI slop and trying to memorise that crap. The names of the functions will ring different in your memory once you spend some time thinking over whether you picked the right and clear name vs. just going with whatever statistical median the slop machine picked for you.

I usually do this for complex features:

- Opus 4.7 writes the code - I make GPT-5.5 in Codex to review it (given context) - I provide the review back to Opus and ask it to verify the review findings - Make Opus plan the fixes then execute them - Ask GPT-5.5 to review the fixes and check if they solve the problems

Yes! That's what I've been doing at work for the last few weeks! And while it doesn't appear to be super fast, I'm already pretty certain that the next round of testing will come back with fewer unexpected issues because together with my agent and the right usage, I was already able to catch stuff that I would have missed otherwise.

Also feels much better than pure vibe-coding (which I still do for personal projects that aren't mission critical for anyone).

100% agree after building a production ready platform ground up. it took 3-4 months but without AI i would never had been done with a team of 3. one thing to note that AI is weak at Front end. So, we did the entire front end without AI.

The greatest part of this approach is that you actually become better in the process.

The downside is you use less tokens.

I used LLM as a tutor to tackle unfamiliar terrain. That is, I write code that I know very likely doesn't work but is the best code that I could have written. The LLM will happily tirelessly show me what I did wrong and what the correct code actually look like. Then, at the end of it, I got code that running. That's a tight feedback loop.

It's still very slow. It took me two hours to write code that generate JSON data and then to write a web page that displays a knowledge graph.

One thing you have to be aware is that the LLM will happily generate code for you and you have to discipline it from time to time. I notice that my reading comprehension begins to suffer if I don't write the code myself and have to understand what the LLM wrote for me as opposed to the LLM correcting where I went wrong.

One thing I would like to try with an LLM is understanding a large and complex existing codebase like OpenSCAD that doesn't leverage my existing skillset(high level programming languages with OpenSCAD as primary language in the past year). That has always been a barrier to contribution for me.

Hot take, barring from special edge cases, I find using dumber models (like local Qwen 3.6) to be the best balance. Smart enough to do stuff but dumb enough where I don’t trust it and verify what it’s doing rather than letting it do the third whole code base refactoring of the day. Also forces me to know my code base and ask very descriptive tasks rather than go “something is wrong, fix it”.

Another thing that I feel is underappreciated about agentic coding is that you can actually learn from it. I am a programmer with 25+ years of experience and I tend to do a lot of stuff according to fixed patterns/habits. Seeing how my coding agents do stuff helps me break out of these patterns, lets me consider new approaches, helps me pick up idioms and teaches me new hacks and tricks. That is very satisfying in its own right.

I think my current conclusion is that AI makes <foo> more important than ever.

I’m not exactly sure what <foo> is but I feel it. I think it’s quality and authenticity and craftsmanship. That difference between an expensive tool and a cheap one that you can’t easily describe but you just know it.

Is there a word for this? I bet the Japanese or Germans have a word for this.

I use AI a lot now. But I also do it in small steps. It isn’t a craftsman, but it can help me be one.

Finding bugs in PR’s are exactly what GitHub copilot, GitHub cursor bot and tonnes of other PR bots already do…

Love this. I use a similar "ralph-loop" approach that starts with an approved plan and then hand it off to a coordinator which does it across 2 sessions (build and review for simplicity), with each session getting its own model.

Basically you're using AI as very costly linters...

what's wrong with (depending on the language) checkstyle, sonarlint, ruff, mypy, xmllint, and/or eslint?

Very much agreed. Something specific that has helped me a lot (beyond just automatic formatting, linting and testing) was putting a hard fail on any file with more than 1500 lines or so, with an allowlist for specific files with specific reasons for their length. I realized the agents were squirreling away code without wanting to do any sort of refactor. Every time one of these rat's nests has turned up, the codebase has been much improved with a small refactor, to the point it doesn't feel like such a pile of slop anymore.

To me the blocker with using coding agents is having to rely on a paid external service. Are there any local models that are good enough to be used for coding?

Thank you. That is really important to remind this to people especially in the upper management

https://news.ycombinator.com/item?id=48246232

This reminds me the article above. Now people have diverse ideas on agentic coding. Some suggest human-in-the-loop while others suggest giving a detailed specification and let the agent run freely; some suggest leveraging LLM's high productivity and here we get an opinion that LLM can actually slowly write good code.

It's happy to see opinions that are more practical and variant emerging, turning LLM into literally a tool instead of something to be hated or hyped.

In my own practice, I find LLMs (SOTA ones) good at medium-level tasks, those needed to reason and plan for a while. However, the design taste on architecture is unexpectedly disgusting. Sometimes writing interfaces myself and asking LLMs to fill in implementations, alongside context-completing tools like context7, deepwiki, docs.rs MCPs, etc. and giving a escape hatch (e.g. encouraging it to use the AskUser tool in Claude Code), may be considered my best practice.

Instead of using a skill and having the agent own the flow for this, I've been building an external orchestrator that handles the process.

By default it uses pi agent core + pi ai (from the excellent pi coding agent) as a multi model runtime but also supports a Claude Agent SDK runtime.

I can have an implementation and review process of an OpenSpec change run anywhere from 2 hours to 24+ hours going through review/fix/verification rounds automatically until the implementation matches the spec and any additional reviewers are done finding issues after the fix rounds.

it's going to be fully open sourced in the next two weeks and fully free to use

https://engine.build

Stop being reasonable! This is a hypecycle!

This is the approach I take, with many guardrails and nested CLAUDE.md's to keep things sane.

How profound! Talking points are changing from "vibe coding delivers bug free software" to "slow down and enjoy the AI".

Great how the promoters are mirroring the current anti-AI sentiment. The next step is canceling all subscriptions and not using AI at all. Maybe your mind will work again.

The bug-finding use case alone makes this worth it.

AI makes senior engineers slower in the same way code reviews make teams slower: locally inefficient, globally beneficial.

> This is the opposite of the “10x productivity” slop-cannon style of development that most people imagine when they think of vibe coding, but I find it very satisfying.

I can relate to this. When I spend time on writing unit test , even the one which takes 1% of code coverage, it will be honestly wholesome moment for me to ship it confidently.