09 May, 2026
This dev-log is getting a lot of attention on HN (scary!): HN Thread.
To those who are coming here from HN: This started as an investigation or rather a question: "How far I can get with building a piece of software by keeping myself completely out of the loop". The tl;dr of this dev log is that I still need to be in the loop to make anything meaningful. Take aways:
Humans intervention is still needed as of 10/05/2026. You can totally go back now!
Here is k10s: https://github.com/shvbsle/k10s/tree/archive/go-v0.4.0
234 commits. ~30 weekends. Built entirely on vibe-coded sessions with Claude, whenever my tokens lasted long enough to ship something.
I'm archiving my TUI tool and rewriting it from scratch.
k10s started as a GPU-aware Kubernetes dashboard (and my first foray into building something serious with AI). Think k9s but built for the people running NVIDIA clusters, people who actually care about GPU utilization, DCGM metrics, and which nodes are sitting idle burning $32/hr. I built it in Go with Bubble Tea [1] and it worked.
For a while... :(
I learned over these 7 months is worth more than the 1690 lines of model.go I'm throwing away. And I think anyone doing serious vibe-coding can benefit from this, because this part doesn't surface much (I feel it gets buried under the demo reels and the velocity wins).
tl;dr: AI writes features, not architecture. The longer you let it drive without constraints, the worse the wreckage gets. The velocity makes you think you're winning right up until the moment everything collapses simultaneously.
I started k10s in late September 2025. The first few weeks were magic. I'd prompt Claude with "add a pods view with live updates" and boom, it worked. Resource list views, namespace filtering, log streaming, describe panels, keyboard navigation. Each feature landed clean because the project was small enough that the AI could hold the whole thing in context.
The basic k9s clone took maybe 3 weekends. Resource views for pods, nodes, deployments, services. A command palette. Watch-based live updates. Vim keybindings. All working, all vibe-coded in single sessions. I was building at maybe 10x my normal speed and it felt incredible.
Then I wanted the main selling point.
The whole reason k10s exists is the GPU fleet view. A dedicated screen that shows you every node's GPU allocation, utilization from DCGM, temperature, power draw, memory. Not buried in kubectl describe node output, but right there in a purpose-built table with color-coded status. Idle nodes in yellow. Busy in green. Saturated in red.

The fleet view on mock GPU nodes
And Claude one-shot it. I prompted for the fleet view, it generated the FleetView struct, the tab filtering (GPU/CPU/All), the custom rendering with allocation bars. It looked beautiful. I was riding the high.
Then I typed :rs pods to switch back to the pods view.
Nothing rendered. The table was empty. Live updates had stopped. I switched to nodes, it showed stale data from the fleet view's filter. I went back to fleet, the tab counts were wrong.
The god object had consumed itself.
This is the title of the blog post. This is where I intervened for the first time. For 7 months I'd been prompting and shipping without ever sitting down and actually reading the code Claude wrote. I'd look at the diff, verify it compiled, test the happy path, move on. But now something was fundamentally broken and I couldn't just prompt my way out of it.
So I sat down and read model.go. All 1690 lines. I was horrified.
Here's what it looked like. One struct to rule them all:
type Model struct { // 3rd party UI components table table.Model paginator paginator.Model commandInput textinput.Model help help.Model
// cluster info and state
k8sClient \*k8s.Client
currentGVR schema.GroupVersionResource
resourceWatcher watch.Interface
resources \[\]k8s.OrderedResourceFields
listOptions metav1.ListOptions
clusterInfo \*k8s.ClusterInfo
logLines \[\]k8s.LogLine
describeContent string
currentNamespace string
navigationHistory \*NavigationHistory
logView \*LogViewState
describeView \*DescribeViewState
viewMode ViewMode
viewWidth int
viewHeight int
err error
pluginRegistry \*plugins.Registry
helpModal \*HelpModal
describeViewport \*DescribeViewport
logViewport \*LogViewport
logStreamCancel func()
logLinesChan <-chan k8s.LogLine
horizontalOffset int
mouse \*MouseHandler
fleetView \*FleetView
creationTimes \[\]time.Time
allResources \[\]k8s.OrderedResourceFields // fleet's unfiltered set
allCreationTimes \[\]time.Time // fleet's timestamps
rawObjects \[\]unstructured.Unstructured
ageColumnIndex int
// ...
}
UI widgets. K8s client. Per-view state for logs, describe, fleet. Navigation history. Caching. Mouse handling. All in one struct. And the Update() method was a 500-line function dispatching on msg.(type) with 110 switch/case branches.
This is the moment I stopped vibe-coding and started thinking.

Ok I guess I'll let you copy logs with your mouse. What could go wrong?
Here's what I extracted from 7 months of watching AI generate a codebase that slowly ate itself. Each of these is something I did wrong, why it happens with AI-assisted coding, and what you should actually put in your CLAUDE.md or agents.md to prevent it.
Tenet 1: AI builds features, not architecture.
Every time I prompted Claude for a feature, it delivered. Perfectly. The fleet view worked on the first try. Log streaming worked. Mouse support worked. The problem is that each feature was implemented in the context of "make this work right now" without any awareness of the 49 other features sharing the same state.
Here's what the resourcesLoadedMsg handler looks like. This is the code that runs every time you switch views:
case resourcesLoadedMsg: m.logLines = nil // Clear log lines when loading resources m.horizontalOffset = 0 // Reset horizontal scroll on resource change
if m.currentGVR != msg.gvr && m.resourceWatcher != nil {
m.resourceWatcher.Stop()
m.resourceWatcher \= nil
}
m.currentGVR \= msg.gvr
m.currentNamespace \= msg.namespace
m.listOptions \= msg.listOptions
m.rawObjects \= msg.rawObjects
// For nodes: store the full unfiltered set, classify, then filter
if msg.gvr.Resource \== k8s.ResourceNodes && m.fleetView != nil {
m.allResources \= msg.resources
m.allCreationTimes \= msg.creationTimes
if len(msg.rawObjects) \> 0 {
m.fleetView.ClassifyAndCount(m.rawObjectPtrs())
}
m.applyFleetFilter()
} else {
m.resources \= msg.resources
m.creationTimes \= msg.creationTimes
m.allResources \= nil
m.allCreationTimes \= nil
}
See the if msg.gvr.Resource == k8s.ResourceNodes && m.fleetView != nil conditional? That's the fleet view being special-cased inside the generic resource loading path. Every new view that needed custom behavior got another branch here. And every branch needed to manually clear the right combination of fields or the previous view's data would bleed through.
How many = nil cleanup lines exist in this file? I counted:
m.logLines = nil // Clear log lines when loading resources m.allResources = nil // Clear fleet data when not on nodes m.resources = nil // Clear resources when loading logs m.resources = nil // Clear resources when loading describe view m.logLines = nil // Clear log lines when loading describe view m.resources = nil // Clear resources when loading yaml view m.logLines = nil // Clear log lines when loading yaml view m.logLines = nil // ... two more in other handlers m.logLines = nil
Nine manual nil assignments scattered across a 1690-line file. Miss one and you get ghost data from the previous view. This is what happens when there's no view isolation. AI can't see this pattern decaying over time because each prompt only touches one code path.
What to do instead: Write the architecture yourself before any code. Not a vague design doc. A concrete set of interfaces, message types, and ownership rules. Then put those rules in your CLAUDE.md so the AI sees them on every prompt:
# Architecture Invariants (CLAUDE.md)
- Each view implements the View trait. Views do NOT access other views' state. - All async data arrives via AppMsg variants. No direct field mutation from background tasks. - Adding a new view MUST NOT require modifying existing views. - The App struct is a thin router. It owns navigation and message dispatch. Nothing else.
The AI will follow these if you write them down. It just won't invent them for you.
Tenet 2: The god object is the default AI artifact.
AI gravitates toward single-struct-holds-everything because it satisfies the immediate prompt with minimal ceremony. But it gets worse. Because there's no view isolation, key handling becomes a nightmare. Here's the actual key dispatch for the s key:
case m.config.KeyBind.For(config.ActionToggleAutoScroll, key): if m.currentGVR.Resource == k8s.ResourceLogs { m.logView.Autoscroll = !m.logView.Autoscroll if m.logView.Autoscroll { m.table.GotoBottom() } return m, nil } // Shell exec for pods and containers views if m.currentGVR.Resource == k8s.ResourcePods { // ... 20 lines to look up selected pod, get name, namespace ... return m, m.commandWithPreflights( m.execIntoPod(selectedName, selectedNamespace), m.requireConnection, ) } if m.currentGVR.Resource == k8s.ResourceContainers { // ... container exec logic ... return m, m.commandWithPreflights(m.execIntoContainer(), m.requireConnection) } return m, nil
One keybinding. Three completely different behaviors depending on which view you're in. The s key means "autoscroll" in logs, "shell" in pods, and "shell into container" in containers. This is all in one flat switch because there are no per-view key maps. The AI generated this because I said "add shell support for pods" and it found the nearest key handler and jammed it in.
And look at how Enter works. This is the drill-down handler:
case m.config.KeyBind.For(config.ActionSubmit, key): // Special handling for contexts view if m.currentGVR.Resource == "contexts" { // ... 12 lines ... return m, m.executeCtxCommand([]string{contextName}) } // Special handling for namespaces view if m.currentGVR.Resource == "namespaces" { // ... 12 lines ... return m, m.executeNsCommand([]string{namespaceName}) } if m.currentGVR.Resource == k8s.ResourceLogs { return m, nil } // ... 25 more lines of generic drill-down ...
Every view is a conditional in a flat dispatch. There are 20+ occurrences of m.currentGVR.Resource == used as a type discriminator in this single file. Not types. String comparisons. Every new view means touching every handler.
What to do instead: Put this in your CLAUDE.md:
# State Ownership Rules
- NEVER add fields to the App/Model struct for view-specific state. - Each view is a separate struct implementing the View trait/interface. - Each view declares its own key bindings. The app dispatches keys to the active view. - If you need to add a keybinding, add it to the relevant view's keymap, not a global one. - Adding a view means adding a file. If your change requires modifying existing views, stop and ask.
The AI will always take the shortest path ("add another if-branch"). Your job is to make the shortest path also the correct path by putting guardrails in the file it reads on every invocation.
Tenet 3: Velocity illusion widens your scope.
This one's psychological, not technical, and I think it's the most dangerous.
When I started k10s, I wanted a GPU-focused tool. For people running training clusters. A niche audience that I'm part of. But vibe-coding made everything feel cheap. "Oh I can add pods view in one session? Let me add deployments too. And services. And a full command palette. And mouse support. And contexts. And namespaces."

damn I added everything in it dawg...
Suddenly I was building k9s. A general-purpose Kubernetes TUI. For everyone. Because the AI made it feel like each feature was free.
It wasn't free. Each feature was another branch in the god object. Here's the keybinding struct:
type keyMap struct { Up, Down, Left, Right key.Binding GotoTop, GotoBottom key.Binding AllNS, DefaultNS key.Binding Enter, Back key.Binding Command, Quit key.Binding Fullscreen key.Binding // log view Autoscroll key.Binding // log view (also shell in pods!) ToggleTime key.Binding // log view WrapText key.Binding // log + describe view CopyLogs key.Binding // log view ToggleLineNums key.Binding // describe view Describe key.Binding // resource views YamlView key.Binding // resource views Edit key.Binding // resource views Shell key.Binding // pods (CONFLICTS with Autoscroll!) FilterLogs key.Binding // log view FleetTabNext key.Binding // fleet view only FleetTabPrev key.Binding // fleet view only }
One flat keymap for all views. Comments in parens show which view each binding applies to. Autoscroll and Shell are both s. This "works" because the dispatch checks m.currentGVR.Resource before acting. But it means you can't reason about keybindings locally. You have to trace through the entire 500-line Update function to know what a key does.
The complexity was accumulating invisibly while the velocity metric said "you're shipping!"
What to do instead: Write a vision doc that explicitly says who you're NOT building for, and put the scope boundary in your CLAUDE.md:
# Scope (do NOT expand beyond this)
k10s is for GPU cluster operators. Not all Kubernetes users. Supported views: fleet, node-detail, gpu-detail, workload. That's it. Do NOT add generic resource views (pods, deployments, services). Do NOT add features that duplicate k9s functionality. If a feature request doesn't serve someone running GPU training jobs, reject it.
Vibe-coding makes you feel like you have infinite implementation budget. You don't. You have infinite LINE budget (the AI will generate as much code as you want). But you have the same finite complexity budget as always. The architecture can only support so many features before it buckles, regardless of how fast you wrote them. The CLAUDE.md scope section is you saying no in advance, before the velocity high convinces you to say yes.
Tenet 4: Positional data is a time bomb.
Every resource in k10s was fetched from the Kubernetes API and immediately flattened:
type OrderedResourceFields []string
Column identity was purely positional. Here's the sort function for the fleet view. Look at the index access:
func sortFilteredResources(rows []k8s.OrderedResourceFields, times []time.Time, tab FleetTab) { sort.SliceStable(indices, func(a, b int) bool { ra := rows[indices[a]] rb := rows[indices[b]]
switch tab {
case FleetTabGPU:
// Sort by Alloc column (index 3) ascending
allocA, allocB := "", ""
if len(ra) \> 3 {
allocA \= ra\[3\]
}
if len(rb) \> 3 {
allocB \= rb\[3\]
}
return allocA < allocB
case FleetTabCPU:
// Sort by Name column (index 0) ascending
nameA, nameB := "", ""
if len(ra) \> 0 {
nameA \= ra\[0\]
}
if len(rb) \> 0 {
nameB \= rb\[0\]
}
return nameA < nameB
case FleetTabAll:
// GPU nodes first, then CPU nodes.
// Within GPU: sort by Alloc (index 3).
// Within CPU: sort by Name (index 0).
computeA, computeB := "", ""
if len(ra) \> 2 {
computeA \= ra\[2\]
}
if len(rb) \> 2 {
computeB \= rb\[2\]
}
aIsGPU := strings.HasPrefix(computeA, "gpu")
bIsGPU := strings.HasPrefix(computeB, "gpu")
// ...
}
})
}
ra[3] is Alloc. ra[2] is Compute. ra[0] is Name. These are magic numbers. The only thing connecting index 3 to "Alloc" is a comment and the column order defined in resource.views.json:
{ "nodes": { "fields": [ { "name": "Name", "weight": 0.28 }, { "name": "Instance", "weight": 0.15 }, { "name": "Compute", "weight": 0.12 }, { "name": "Alloc", "weight": 0.12 }, ... ] } }
Add a column between Instance and Compute? Every sort, every conditional render, every place that says ra[2] or ra[3] is now silently wrong. The compiler can't help you because it's all []string. And the JSON config can't express sort behavior, conditional rendering, or custom drill targets, so those live in Go code that hardcodes the positional assumptions from the JSON.
AI generates this pattern because it's the shortest path from "fetch data" to "render table." A []string satisfies any table widget immediately. Typed structs require more ceremony upfront. So the AI picks the fast path, and six months later you're debugging why sort puts "Name" values in the "Alloc" column.
What to do instead: Put this directive in your CLAUDE.md:
# Data Representation
- NEVER flatten structured data into []string, Vec, or positional arrays. - All data flows as typed structs (FleetNode, PodInfo, etc.) until the render() call. - Column identity comes from struct field names, not array indices. - Sort functions operate on typed fields, never on positional access like row[3]. - The ONLY place strings are created for display is inside render()/view() functions.
Then your typed struct makes impossible states impossible [2]:
struct FleetNode { name: String, instance_type: String, compute_class: ComputeClass, alloc: GpuAlloc, }
You can't sort by the wrong column when columns are named fields. You can't accidentally compare Alloc strings as names. The compiler enforces this for you. AI will always pick Vec<String> because it satisfies the prompt faster. Your CLAUDE.md makes the typed path the path of least resistance.
Tenet 5: AI doesn't own state transitions.
The Bubble Tea architecture has a beautiful idea: Update() is the only place state mutates, driven by messages. But k10s violated this. The updateTableMsg handler spawned a closure that mutated Model fields from inside a goroutine:
case updateTableMsg: return m, func() tea.Msg { // block on someone sending the update message. <-m.updateTableChan // Preserve cursor position across column/row updates so that // background refreshes don't reset the user's selection. savedCursor := max(m.table.Cursor(), 0) // run the necessary table view update calls. m.updateColumns(m.viewWidth) m.updateTableData() // Restore cursor, clamped to valid range. rowCount := len(m.table.Rows()) if rowCount > 0 { if savedCursor >= rowCount { savedCursor = rowCount - 1 } m.table.SetCursor(savedCursor) } return updateTableMsg{} }
This returned function (a tea.Cmd) is executed by Bubble Tea in a separate goroutine. It calls m.updateColumns(m.viewWidth) and m.updateTableData() which read and write m.resources, m.table, m.viewWidth. Meanwhile, View() is called on the main goroutine reading the same fields. There's no lock. No mutex. The channel <-m.updateTableChan blocks the goroutine until someone sends an update signal, but nothing prevents View() from reading half-written state.
This is a textbook data race. It worked 99% of the time. Corrupted the display 1% of the time in ways that made me think I was going insane.
AI generates this because "just mutate it in the closure" is the shortest path to working code. Proper message passing (send a message back to Update(), let Update() apply the mutation atomically on the main loop) requires more types, more plumbing. The AI is optimizing for the prompt, not for correctness under concurrency.
What to do instead: All mutations to render-visible state happen on the main loop. Period. Background workers produce data. They send it as a message. The main loop receives the message and applies it. This is the one rule you cannot break in concurrent UI code.
// Background task: tx.send(AppMsg::FleetData(nodes)).await;
// Main loop: match msg { AppMsg::FleetData(nodes) => { self.fleet_view.update_nodes(nodes); } }
No shared mutable state. No data races. No "works 99% of the time." Put this in your CLAUDE.md:
# Concurrency Rules
- Background tasks (watchers, scrapers, API calls) NEVER mutate UI state directly. - Background tasks send results through a channel as typed messages. - Only the main event loop applies state mutations from received messages. - render()/view() is a PURE function. No side effects. No I/O. No channel operations. - If you need to update state from async work, define a new AppMsg variant.
If your AI doesn't generate this pattern by default, the directive makes it the only legal option.
I'm rewriting k10s in Rust. Not because Rust is better but, because it's the language I can steer. I've written enough of it to feel when something's wrong before I can articulate why. That instinct is the one thing vibe-coding can't replace. The AI hands you plausible-looking code. You need a nose for when it's garbage.
The other change is simpler: I'm doing the design work myself, by hand, before any code gets written. Not a vague doc. Concrete interfaces, message types, ownership rules. The architecture decisions that the AI kept making wrong are now made in writing before the first prompt. Whether that's enough to keep the rewrite from collapsing under its own weight... I'll find out.
[1] Bubble Tea is a TUI framework for Go based on The Elm Architecture. It's excellent. The architecture problems in k10s were mine, not Bubble Tea's.
[2] "Making impossible states impossible" is a phrase from Elm/Rust communities. The idea: design your types so that invalid states can't be constructed, rather than checking for invalid states at runtime.
some states, for an example, are meant to be assumed from the data shape, rather than the actual state fields, but damn they like adding a state field.
7 months ago was early November. Coding assistants were getting very good back then, but they were still significantly poorer at making good architectural decisions in my experience. They tended to just force features into the existing code base without much thought or care.
Today I've noticed assistants tend to spot architectural smells while working and will ask you whether they should try to address it, but even then they're probably never going to suggest a full refactor of the codebase (which probably is generally the correct heuristic).
My guess is that if you built this today with AI that you wouldn't run into so many of these problems. That's not to say you should build blind, but the first thing that stood out to me was that you starting building 7 months ago and coding assistants were only just becoming decent at that time, and undirected would still generally generate total slop.
But here's the thing, you almost never know what the architecture is up front. If you do you probably aren't the one writing the actual code anymore. Writing the code, with or without an AI is part of the design process. For most people it isn't until they've tried several times, fucked it up a bunch, and refactored or rewrote even more that you actually know what the architecture needs to be.
If you understand good software architecture, architect it. Create a markdown document just as you would if you had a team of engineers working with you and would hand off to them. Be specific.
Let the AI do the implementation of your architecture.
Yea, that's why engineers are still very important for now (until models can do this type of longer term designs and stick to them).
It would have been easy to run a few ai agents to review the code and find these issues as well and architect it clearly
Inb4 “you’re gonna be replaced” god damn it I hope so, I do not want to spend the rest of my life behind a computer screen…
This is what I was doing right from the beginning. AI just fills out methods and doing other low intelligence work. Both are happy. My architectures and code are really mine, easy to read and reason. AI gets paid and does not get a chance to fuck me in the process. At no point I felt any temptation to leave "serious" to AI.
That trial and error process is still happening with a LLM, but much faster, and with instantaneous cross-references to various forms of documentation that I would be looking up myself otherwise. It produces code of a quality that is dependent on the engineer knowing what they want in the first place and prompting for it and refining its output correctly.
It's the exact same process of sculpting code that the majority of the industry was doing "by hand" prior to the release of LLMs, but faster, and the harnesses are only getting better. To "vibe code" is to prompt vaguely and ignore the quality of the output. You're coming to a forum full of professionals and essentially telling us that you're getting really frustrated with your Scratch project.
I don't know if you're trying to lead a charge or whatever but good luck with that. As a senior SWE, it is clear to me that this is the new paradigm until something better than LLMs comes along. My workflows and efficiency have been vastly improved. I will admit that I have never really been a "I made a SMTP server in 3k of Rust" kind of guy, though.
Do they write empty functions and let AI fill them in?
Or do they use some kind of specification language?
Are people designing those languages?
This. I definitely agree with this statement at this point in AI-assisted development. This gets at the "taste" factor that is still intrinsically human, especially in software engineering. If you can construct and guide the overall architecture of an application or system, AI can conceivably fill in the smaller feature bits, and do so well. But it must have a strong architecture and opinionated field in which to play.
1. If I use a coding agent to generate code, it should be something I am absolutely confident I can code correctly myself given the time (gun to my head test).
2. If it isn't, I can't move on until I completely understand what it is that has been generated, such that I would be able to recreate it myself.
3. I can create debt (I believe this is being called Cognitive Debt) by breaking rule 2, but it must be paid in full for me to declare a project complete.
Accumulating debt increases the chances that code I generate afterwards is of lower quality, and it also feels like the debt is compounding.
I'm also not really sure how these rules scale to serious projects. So far I've only been applying these to my personal projects. It's been a real joy to use agents this way though. I've been learning a lot, and I end up with a codebase that I understand to a comfortable level.
Isn't Golang relatively easier to read than Rust? I was under the impression that Rust is a more complex language syntactically.
> The other change is simpler: I'm doing the design work myself, by hand, before any code gets written. Not a vague doc. Concrete interfaces, message types, ownership rules. The architecture decisions that the AI kept making wrong are now made in writing before the first prompt.
This post is good to grasp the difference between "vibe-coding" and using the AI to help with design and architectural choices done by a competent programmer (I am not saying you are not one). Lately I feel that Opus 4.7 involves the user a lot more, even when given a prompt to one-shot a particular piece of software.
Now I do feel lucky that I started learning coding about four years before the LLM revolution, but these things are really just natural language compilers, aren’t they? We’re just in that period - the 1980s, the greybeards tell me - where companies charged thousands of dollars per compiler instance, right? And now, I myself have never paid for a compiler.
This whole investor bubble will blow up in the face of the rentier-finance capitalists and I’ll be laughing my head off while it happens.
Hey I don't want to over simplify, I'm sure it was complicated, but did the author have functional tests for these broken views? As long as there are functional tests passing on the previous commit I'd have thought that claude could look at the end situation and work out how to get the desired feature without breaking the other stuff.
TUIs aren't an exception, it's still essential to have a way to end-to-end test each view.
This is a special case of a general fundamental point I'm struggling with.
Let's assume AI has reduced the marginal cost of code to zero. So our supply of code is now infinite.
Meanwhile, other critical factors continue to be finite: time in a day, attention, interest, goodwill, paying customers, money, energy.
So how do you choose what to build?
Like a genie, the tools give us the power to ask for whatever we want. And like a genie, it turns out we often don't really know what we want.
That’s the hard part of coding. If you have an architecture then writing the code is dead simple. If you aren’t writing the code you aren’t going to notice when you architected an API that allows nulls but then your database doesn’t. Or that it does allow that but you realize some other small issue you never accounted for.
I do not know how you can write this article and not realize the problem is the AI. Not that you let it architect, but that you weren’t paying attention to every single thing it does. It’s a glorified code generator. You need to be checking every thing it does.
The hard part of software engineering was never writing code. Junior devs know how to write code. The hard part is everything else.
> back to writing code by hand
But what they are doing is
> doing the __design work__ myself, by hand, before any code gets written.
So... Claude still is generating the code I guess?
And seriously, I can't understand that they thought their vibe coded project works fine and even bought a domain for the project without ever looking at source code it generated, FOR 7 MONTHS??
This all works pretty great. Where it starts going off the rails is if I let it use a library I'm not >=90% comfortable with. That's a good use of these tools, but if I let it plow through feature requests, I end up accumulating debt, as you pointed out.
For my uses, I'm still finding the right balance. I'm not terribly sure it makes me faster. What I do think it helps with is longer focused sections because my cognitive load is being reduced. So I can get more done but not necessarily faster in the traditional sense. It's more that I can keep up momentum easier, which does deliver more over time.
I'm interested in multi agent systems, but I'm still not sure of the right orchestration pattern. These AI tools still can go off the rails real quick.
Another note was for me e2e tests; while AI can write them it never comes up with just basic organization or abstraction required to manage a large e2e test suite with hundreds of tests. It immediately starts to produce spaghetti code.
You can't test every permutation of app usage. You actually need good architechture so you can trust your test and changes to be local with minimal side-effects.
But we don’t follow the same things for dependencies, work of colleagues, external services, all the layers down to the silicon when trying to work.
Why is AI suddenly different?
We just have to do this by risk and reward. What’s the downside if it’s wrong, and how likely is an error to be found in testing and review? What is the benefit gained if it’s all fine? This is the same for libraries and external services.
A complex financial set of rules in a non-updatable crypto contract with no testing?
A viewer for your internal log data to visualise something?
For example, consider a lint rule that bans Kysely queries on certain tables from existing outside of a specific folder. You'd write a rule like this in an effort to pull reads and writes on a certain domain into one place, hoping you can just hand the lint violations to your AI agent and it would split your queries into service calls as needed.
And at first, it will appear to have Just Worked™. You are feeling the AGI. Right up until you start to review the output carefully. Because there are now little discrepancies in the new queries written (like not distinguishing between calls to the primary vs. the replica, missing the point of a certain LIMIT or ORDER BY clause, failing to appropriately rewrite a condition or SELECT, etc.) You run a few more reviewer agent passes over it, but realize your efforts are entirely in vain... because even if the reviewer agent fixes 10 or 20 or 30 of the issues, you can still never fully trust the output.
As someone with experience in doing this kind of thing before AI, I went back to doing it the old way: using a codemod to rewrite the code automatically using a series of rules. AI can write the codemod, AI can help me evaluate the results, but actually having it apply all of the few hundred changes automatically led to a lack of my ability to trust the output. And I suspect that will continue to be true for some time.
This industry needs a "verification layer" that, as far as I know, it does not have yet. Some part of me hopes that someone will reply to this comment with a counterexample, because I could sorely use one.
Had a project idea which I coded with the help of AI and it became quite large to a point I was starting to have uncharted areas in the code. Mostly because I reviewed it too shallow or moved fast.
It was a good thing as that project never floated but if I were to do such a thing on my breadwinning project I would lose the joy.
But I will say... you have to know Golang. You have to have at least tried to make a BubbleTea app yourself and try to understand ELM architecture. You have to look at the code and increment with it.
It makes total sense for OP to switch to Rust and Ratatui if they don't know Golang well. But I don't think it's a better language for it. [Ratatui has brought me great inspiration though!]
Independent of framework, the LLMs get the spacial relationships. I say things like "the upper right panel's content is not wrapping inside and the panel's right edge should extend to the terminal edge" and the LLM will fix it. They can see the resultant text, I'm copy-pasting all the time.
TUI code is finicky; one mis-rendered component mucks everything up. The LLMs will decide themselves make little, temporary BubbleTea fixtures to help understand for itself when things aren't right.
The only real problem with LLMs and BubbleTea is that upon first prompt, they insist on using BubbleaTea v1 versus BubbleTea v2, released in December 2025. But then you just point it to the V2_UPGRADE.md and it gets back on track. That will improve as training cutoffs expand.
I vibe-coded this TUI for Mom's last night. I actually started with Grok (who started with v1) and then moved into Claude Code after some iteration:
https://gist.github.com/neomantra/1008e7f2ad5119d3dd5716d52e...
I follow the plan -> red/green/refactor approach and it is surprisingly good, and the plans it produces all look super well reasoned and grounded, because the agent will slurp all the docs and forums with discussions and the like.
Trouble is once it starts working there would inevitably be a point where the docs and the implementation actually differ - either some combination of tools that have not been used in that way, some outdated docs, or just plain old bugs.
But if the goals of the project/feature are stated clearly enough it is quite capable of iterating itself out of an architectural dead end, that is if it can run and test itself locally.
It goes as deep as inspecting the code of dependencies and libraries and suggesting upstream fixes etc. all things that I would personally do in a deep debugging session.
And I’m supper happy with that approach as I’m more directing and supervising rather than doing the drudgery of it.
Trouble is a lot of my team mates _dont_ actually go this deep when addressing architectural problems, their usual mode of operandi is “escalate to the architect”.
This will not end up good for them in the long run I feel, but not sure what they can do themselves - the window of being able to run and understand everything seems to be rapidly closing.
Maybe that’s not super bad - I don’t exactly what the compiler is doing to translate things to machine code, and I definitely don’t get how the assembly itself is executed to produce the results I want at scale - that is level of magic and wizardry I can only admire (look ahead branching strategies and caching on modern cpus is super impressive - like how is all of this even producing correct responses reliable at such a a scale …)
Anyway - maybe all of this is ok - we will build new tools and frameworks to deal with all of this, human ingenuity and desire for improvement, measured in likes, references or money will still be there.
Now it is different in a way where now I don’t have time to use those apps.
That’s a joke.
But I do believe it answers the question of “what to build?”. If you didn’t have time before LLM assisted coding you still don’t have time for it. You most likely know what is used and what not already by heart or by some measurements.
And I'm sure the rewrite is going to teach me a whole different set of lessons...
+1 on Open 4.7 involving the user a lot more. Rn I'm trying to get to a state where I can codify my design + decision preferences as agents personas and push myself out of the dev loop.
It sounds like the author knows Rust, and might not be as familiar with Go.
A language that you are proficient in is always going to be easier read than one you don’t, even if it is an objectively easier language to to read in general.
You need to be checking every thing it does.
This is what seems to be lost on so many. As someone with relatively little code experience, I find myself learning more than ever by checking the results and what went right/wrong.This is also why I don't see it getting better anytime soon. So many people ask me "how do you get your claude to have such good output?" and the answer is always "I paid attention and spotted problems and asked claude to fix them." And it's literally that simple but I can see their eyes already glazing over.
Just as google made finding information easier, it didn't fix the human element of deciphering quality information from poor information.
The developers that thing coding is hard are the ones that absolutely love AI coding. It's changed their world because things they used to find hard are now easy.
Those that think coding is easy don't have such an easy time because coding to them is all about the abstractions, the maintainability and extensibility. They want to lay sensible foundations to allow the software to scale. This is the hard part. When you discover the right abstractions everything becomes relatively easy. But getting there is the hard part. These people find AI coding a useful tool but not the crazy amazing magical tool the people who struggle with coding do.
The OP is definitely in the second camp since they could spot and realise the shortcomings of the AI. They spotted the problem, and that problem is that the AI can't do the hard bit.
The first group are still thinking fairly deeply about design and interfaces and data structures, and are doing fairly heavy review in those areas. The second group are not, and those are the ones that I find a bit more worrisome.
I don’t think it’s that weird to not look at the code if it’s a side project and you follow along incrementally via diffs. It’s definitely a different way of working but it’s not that crazy.
You can skip that and go directly to writing code. But that meant you replaced a few hours of planning with a few weeks of coding.
And the goal of the article is to draw attention to their project.
The rewrite is me sitting down with a blank doc and drawing the boxes before any code exists. Then the CLAUDE.md enforces what I already decided. Whether that actually holds up as the project grows, I genuinely don't know yet.
There are some programmers who treat the job as just plumbing together what is to them completely incomprehensible black boxes, who treat the computer as a mystery machine that just does things "somehow", but these programmers will almost always be hacks that spend their entire career producing mediocre code.
There are things such a programmer can build, but they are very limited by their lack of in depth understanding, and it is only a tiny fraction of what a more competent programmer can put together.
To get beyond being a hack, you need to understand the entire stack, including the code that you didn't write, including both libraries, frameworks and the OS, and including the hardware, the networking layers, and so forth. You don't have to be an expert at these things by any means, but you do need to understand them and be comfortable treating them as transparent boxes that you may have to go in and fiddle with at some point to get where you need to go. Sometimes you need to vendor a dependency and change it. Sometimes you need to drop it entirely and replace it with something more fit for purpose you built yourself.
In that situation you have two choices:
1. Tell claude to iterate until the tests for the new view and the old views are all passing.
2. git reset --hard back to the previous commit at which all tests are passing and tell claude to try again, making sure not to break any tests.
It's essential to use tests when vibecoding anything non trivial. Almost certainly in a TDD style.
If it’s beyond our ability to review and we blindly trust it’s correct based on a limited set of tests… we’re asking for trouble.
"PhD level" just means you finished a bachelor and masters degree and are now doing a bit of original research as an employed research assistant.
Claude isn't "PhD level" anything. This shows a complete lack of understanding here. Claude has read every single text book in existence, so it can surface knowledge locked away in book chapters that people haven't read in years (nobody really reads those dense books on niche topics from start to finish).
Since Claude has infinite patience, you can just keep asking until you get it.
... that can't even count.
I’m going to guess that this is Gell-Mann amnesia more than anything, and it’s going to get a lot of organizations into a lot of weird places.
Normally with mathematical problems you have to prove the solution correct. Testing is not sufficient, unless you can test all possible inputs exhaustively.
If manage is reasonable, you can explain to them that there isn't time to check the work of the AI, and that it frequently makes obscure mistakes that need to be properly checked, and that takes time.
At this point, if they still insist you just give it the AI's work, they've made a decision that is their fault. You've done what you can.
And when the shit hits the fan, we're back to whether they're reasonable or not. If they are, you explained what could happen and it did. If they force responsibility on you, they aren't reasonable and were never going to listen to you. That time bomb was always going to go off.
Comprehension debt just sounds like there are things you don’t (yet) understand.
Cognition debt means your lack of understanding compounds and the cognition “space” required to clear it increases accordingly.
An increasing comprehension debt that can be paid off one bit at a time within reasonable cognition space takes linear time to clear.
Cognition debt takes exponential time to clear the more of it you have. If it reaches a point where you simply don’t have the space for the cognition overhead required to understand the problem, you probably need to start over from your specifications.
An outsourced developer isn't a "tool". They're a human being, and responsible for their actions. They're being paid, and they either act responsibly or they get replaced.
A vibe coder is a human using a tool. The human is responsible for code quality, and if it's not good enough, they need to keep using the tool to make it better. That means understanding the tool's output.
If an artist used Photoshop to create a billboard ad that was ugly, they don't get to blame Photoshop. They have to keep using the tool until their output is good.
There is one exception to this: If the AI also delivers the proof of why the math is correct, in a machine-checked format, and I understand the correctness theorem (not necessarily its proof). Then I would use it without hesitation.
Your manager is unknowingly helping you create a form of job security for yourself, with all the technical debt and bugs being accumulated.
He might not understand it, and it might not be the type of work you want to do, but someone is going to have to fix those issues. And the longer they wait, the bigger the task gets.
Unfortunately, it is not, and many of its attempts at mathematical proofs have major flaws. You shouldn't trust its proofs unless you are already able to evaluate them--which I think is pretty much all the OP is saying.
Good architecture in any language is obvious to someone who is experienced and cares.
Go is actually great for bots to write if you’re actually thinking.
I’ve used AI tools to do i18n translations to Spanish and Portuguese (somewhat ashamed to admit this). I’ve grown more familiar with the structure of these languages, and come to recognize some of the common vocabulary for our agtech domain. If anything, I feel more clueless about both languages now than I did before, when it comes to any sort of proficiency.
Not sure why good coverage wouldn't mitigate risk in a refactor...
My mantra whenever I'm working with AI is that I want it to know what "point b" looks like and be able to tell by itself whether it's gotten there...
If you have a working implementation, it sounds like you have a basis for automated tests to be written... once you have that (assuming that the tests are written to test the interface rather than the implementation), then it should be fairly direct to have an agent extract and decompose...
It's the same thing here. AI has dropped the cost of software development, so developers are now fooling themselves into producing low or zero value software. Since the value of the software is zero or near zero, it doesn't really matter whether you get it right or not. This freedom from external constraints lets you crank up development velocity, which makes you feel super productive, while effectively accomplishing less than if you had to actually pay a meaningful cost to develop something.
Like, what is the purpose of Gas Town? It looks to me like the purpose of Gas Town is to build Gas Town.
I worry about the first group too, because interfaces and data structures are the map, not the territory. When you create a glossary, it is to compose a message, that transmit a specific idea. I find invariably that people that focus on code that much often forgot the main purpose of the program in favor of small features (the ticket). And that has accelerated with LLM tooling.
I believe most of us that are not so keen on AI tooling are always thinking about the program first, then the various parts, then the code. If you focus on a specific part, you make sure that you have well defined contracts to the orther parts that guarantees the correctness of the whole. If you need to change the contract, you change it with regard to the whole thing, not the specific part.
The issue with most LLM tools is that they’re linear. They can follow patterns well, and agents can have feedback loop that correct it. But contracts are multi dimensional forces that shapes a solution. That solution appears more like a collapsing wave function than a linear prediction.
I’m not making a judgement call about which is better, but it was widely accepted in tech before the advent of LLMs that you just fundamentally lack a sense of understanding as a reviewer vs an author. It was a meme that engineers would rather just rewrite a complicated feature than fix a bug, because understanding someone else’s code was too much effort.
I can't speak for others, but I'd go further and say that LLMs allow me to go deeper on the design side. I can survey alternative data structures, brainstorm conversationally, play design golf, work out a consistent domain taxonomy and from there function, data structure and field names, draft and redraft code, and then rewrite or edit the code myself when the AI cost/benefit trade off breaks down.
I find it useful to not listen to people who just talk.
> Claude (c) by Anthropic (R) is the best thing since sliced bread and I'm Lovin' It(tm)! Here's a breakdown of you too can live a code free life for 10 easy payments of $99.99 a month if you subscribe now!
> Step one in your journey to code free life: code the whole damn project and put it together yourself
It's so much fluff and baloney and every single article is identical. And every single one is just over the top praise of Claude that doesn't come off as remotely authentic. There's always mentions of Claude "one shotting"(tm) something.
Additionally, they couldn't even bother to write their own blog post, so it's a little hard to take them seriously when they say they're going to write their own code...
I struggle to remember even relatively simple maths like working out "what percentage of X is Y" so if I write a formula like that I'll put in some simple values like 12 and 6 or 10,000 and 2,456 just to confirm I haven't got the values backwards or something. I've been shown sheets where someone put a formula in that they don't understand, checked it with numbers they can't easily eyeball and just assumed it was right as it's roughly in their ball park / they had no idea what the end result should be.
Then again I've also seen sheets where a 10% discount column always had a larger number than the standard price so even obviously wrong things aren't always checked.
I'd think that depends on the model of responsibility at play.
For example, suppose I hire a building contractor to build a house, and the electrician he subcontracts makes mistake.
From my perspective, the prime contractor is equally responsible for that mistake regardless of whether he used a subcontractor, or did the work himself but used a broken tool.
This doesn't make the electrician any less of a "person" in the deeply important ways, but it's not a distinction that's relevant to my handling of the problem.
I've reached solutions by trial and error too, and tried to rationalize them later, quite a few times. And it's easier to rationalize a working solution, however adversarial you claim to be in your rationalization.
I don't see using gen AI for the (not so) “brute force” exploration of the solution space as that different from trial and error and post fact rationalization.
> Go reads fine whether the architecture is good or bad
Were you reading the Golang code all along and got fooled or did you review it after it failed? Sorry I admit I didn't read the whole article.
PMs can now cross reference and organize tickets with just a few keystrokes. Organisational knowledge, business knowledge, design systems and patterns, etc all of it is encoded in LLM consumable artefacts. For PMs it is the same switch - instead of having to do it by hand you direct lower level employees to handle the details and inconsistencies and you just do vibe and vision.
When all of the pieces successfully connect and execute reliably, what is left for humans to do? Just direct and consume?
And AI companies with their huge swaths of data are soon gonna be in the situation of being able to do the directing themselves
Its not weird to not look at the code, as long as you're looking at the code? (diffs?)
Uh, ok
In their mind they’ve already done the “architectural heavy lifting” and accelerated the team. More often than not it just adds cognitive overhead where you spend more time deciphering and cleaning up garbage than actually building the thing properly from scratch.
But we still hold good cards in hand.
Do they want their pile of steaming slop fixed, or not? Because no amount of complaints about the deadline being "yesterday" are going to change anything about the fact that time will be needed to fix the accrued technical debt, whether they like it or not.. And if AI dug you in that deep to start with, the solution is not to dig deeper.
I suspect some companies are going to find that out the hard (costly) way.
A really screwed code base blows out your context window and just starts burning tokens as the AI works out a way to kill -9 itself to escape the hell you're subjecting it to.
It’s a valid direction to look in, it just doesn’t address the root issue of throwing slop across the wall and also having unrealistic expectations due to not knowing any better.
If there’s one thing that’s disturbing with AI proponent is how trusting they are of code. One change in the business domain and most of the code may have turn from useful to actively harmful. Which you have to rewrite. Good luck doing that well if you’re not really familiar with the code.