The Short Leash AI Coding Method for Beating Fable

I feel like OP is still in the year 2025.

> The AI will have gone off the rails multiple times and you will only notice it later when you actually try to use the software.

Except that said AI can now themselves use your software and find and fix bugs themselves, not to mention drive new features.

>Your agent might go “off the rails” and start doing something you don’t want it to do

This happens but far less often than it used to, and the case for full autonomous agents is getting stronger, not weaker.

>It is humanly impossible to build your own understanding of a codebase

This again feels outdated. I think we're mving towards humans no longer needing to understand a codebase, and letting AI drive it.

This “short leash” seems like more of a crutch to me, and a sign of not giving the AI enough detail on the problem to begin with, or not reviewing and iterating on its output.

I have much better results treating the AI like a peer, having discussions before implementation, discussions to review its results or the diff, and then iterating. Hand-holding great models like Fable through implementation is a waste of time, and a waste of Fable. The process of discussing designs and their implementations, questioning things that look weird to me, and actually reading the AI’s responses also helps me to find better solutions.

For example, one time I wanted to write a greedy solver for a problem, and Opus suggested using an existing MILP library to solve the problem exactly. I’d never even heard of MILP, but my final implementation ended up being better and simpler than what I’d have done alone.

I thought this was how everyone who can actually code uses AI for anything that’s actually important.

Am I wrong? Are you guys just YOLOing everything these days?

LLMs are still next token predictors, just because you can give it more vague instructions and it still finds the right steps to follow, it doesn't mean it's intelligent. It means you're speaking the same language as the harness they trained your model on.

And that has a limit. If you are stuck at PoC level or simple apps, you have no idea how limited the current models still are. There you really need to break tasks down, not just trust a token predictor to list steps that sound good. There has to be a human in the loop somewhere, because by the time you start skipping permissions, best case you get the jackpot, more likely is you get a suboptimal solution and token waste and what's genuinely still terrifying when the model ignores instructions and does some stupid nonsense, ruining your day. It really is as sharp as a CNC machine. It's not not useful, but could be dangerous, so maybe don't try to carve wood with a monster machine, or park your Ferrari in that crammed neighbourhood if you don't know how to parallel park.

Maybe I'm too optimistic, but given appropriate skills and references (not just for writing but also reviewing) and intelligent use of subagents for isolated reviews and checks, you can lengthen the leash a bit.

But you still need to properly review plans and PRs to keep a good mental model of the codebase. This effectively limits the number of tasks being done in parallel to maybe 2-3. Though you'll be mentally exhausted and probably start to make mistakes or take shortcuts in reviews yourself.

I find it hard to stay engaged doing this. I do get good results, but it's just hard to not get distracted when it's doing the work.

I <3 how everyone and their brother feels qualified to write advice to hundreds? thousands? of other developers about AI ... based on a couple months of experience as a personal user.

I mean, it's like writing a book about how to use React or Django or some other major software ... after you used it for one project for a month!

Authors: I know this is the Internet, and I know bloggers blog about whatever pops into their head ... but if you are going to act like an authority, how about you learn more than the average reader before you start telling them authoritatively what to do?

This post seems like some decent advice mixed in with a lot of overconfidence and unverifiable claims.

“expert developers whose skills have reached the point where they outclass any and all “frontier AI models” in their area of expertise”

Are any developers saying they outclass any and all frontier models? I’d say at best it’s mixed at this point. The best developers still do certain things better, but not even close to all things.

“The problem is that even code written and/or reviewed by Fable 5, will stink”

I’m skeptical. Example prompt and output please.

There really wasn't much substance to this article.

I'm curious whether Opus4.8 or similar can attain Mythos level through good system prompting and steering? You would expect this to work if it's true that the strength of Mythos is its unwillingness to quit before it gets a desired outcome

This is probably slower than writing the code yourself. Doesn't make sense to me. Using an agent without YOLO mode is not wort it.

The way I rather do it is tightly control the output by skills written yourself, prompts, plans, etc. and have the closest possible outcome you would write yourself.

Seems hella inefficient.

Better method start to realizing that everything that every program do is data transformations and or movement

Then you ask llm to subdivide data in a tree along the domain model, classifing streaming vs storing nodes

Then for each node you discuss with the ai for the best data structure

Then you ask for an interface that fully encapsulate the structure and every mutation only allows to go from a valid state to a valid state and bidding else is allowed to touch the state

And that's mostly it just connect all the interfaces until input goes to monitor or to storage or to api or wherever the destination is