Here is another one that goes in depth as well: www.markdown.engineering for anyone going deep on learning.
I looked at the leaked code expecting some "secret sauce", but honestly didn't found anything interesting.
I don't get the hype around Claude Code. There's nothing new or unique. The real strength are the models.
I've been working on my own coding agent setup for a while. I mostly use pi [0] because it's minimal and easy to extend. When the leak happened, I wanted to study how Anthropic structured things: the tool system, how the agent loop flows, A 500K line codebase is a lot to navigate, so I mapped it visually to give myself a quick reference I could come back to while adapting ideas into my own harness and workflow.
I'm actively updating the site based on feedback from this thread. If anything looks off, or you find something I missed, lmk.
[0] https://pi.dev/
If you don't have a rigid, external state machine governing the workflow, you have to brute-force reliability. That codebase bloat is likely 90% defensive programming; frustration regexes, context sanitizers, tool-retry loops, and state rollbacks just to stop the agent from drifting or silently breaking things.
The visual map is great, but from an architectural perspective, we're still herding cats with massive code volume instead of actually governing the agents at the system level.
Isn't it a simple REPL with some tools and integrations, written in a very high level language? How the hell is it so big? Is it because it's vibecoded and LLMs strive for bloat, or is it meaningful complexity?
I love your implementation.
Here was my first stab:
Also I definitely want a Claude Code spirit animal
This deployment is temporarily paused
For example the whole animation on this website, what does it say beyond that you make a request to backend and get a response that may have some tool call?
I use it all day and love it. Don't get me wrong. But it's a terminal-based app that talks to an LLM and calls local functions. Ooookay…
The utils directory should only contain truly generic, business-agnostic utilities (such as date retrieval, simple string manipulation, etc.).
We can see that the code produced by Vibe is not what a professional engineer would write. This may be due to the engineers using the Vibe tool.
0 - https://github.com/zackautocracy/claude-code/blob/main/src/u...
First command I looked at:
/stickers:
Displays earned achievement stickers for milestones like first commit, 100 tool calls, or marathon sessions. Stickers are stored in the user profile and rendered as ASCII art in the terminal.
That is not what it does at all - it takes you to a stickermule website.What is the motivation for someone to put out junk like this?
The only suggestion/nit I have is that you could add some kind of asterisk or hover helper to the part when you talk about 'Anthropic's message format', as it did make me want to come here and point out how it's ackchually OpenAI's format and is very common.
Only because I figure if this was my first time learning about all this stuff I think I'd appreciate a deep dive into the format or the v1 api as one of the optional next steps.
A 1yo project may be in good shape if written by just one dev, maybe a few. But if you have many devs, I can guarantee it will be messy and buggy. If anything, at 1yo it is probably still full of bugs because not enough time has elapsed for people to run into them.
This is why I personally don't take technical debt arguments about how LLM maintained code bases deteriorate with size/age seriously; it presumes that at some point I'll give up with the LLM and be left with a mess to clean up by hand, but that's not going to happen, future maintenance is to be left to LLMs and if that isn't possible for some reason then the project is as good as dead anyway. When you start a project with a LLM the plan should be to see it through with LLMs, planning to have unaided humans take over maintenance at some point is a mistake.
That's how you get "oh this TUI API wrapper needs 68GB of RAM" https://x.com/jarredsumner/status/2026497606575398987 or "we need 16ms to lay out a few hundred characters on screen that's why it's a small game engine": https://x.com/trq212/status/2014051501786931427
I particularly valued the tool list. People in these comments are complaining about how bad the code is, but I found the client-side tools that the model actually uses to be pretty clean/general.
My takeaway was more that at a very basic level they know what they are doing - keep the client general, so that you can innovate on the server side without revving the client as much.
https://web.archive.org/web/20260331105051/https://www.cclea...
BTW, that's why you should use your own infrastructure and not depend on Vercel
If you prompt with little raw material and little actual specification of what you want to see in the end, eg you just say make a detailed breakdown dashboard-like site that analyzes this codebase, the result will have this uncanny character.
I'd describe it as a kind of "fanfic", it (and now I'm not just talking about this website but my overall impression related to this phenomenon) reminds me a bit like how when I was 15 or so, I had an idea about how the world works then things turned out to be less flashy, less movie-like, less clear-cut, less-impressive-to-a-teenage-boy than I had thought.
If you know the concept of "stupid man's idea of a smart man", I'd say AI made stuff (with little iteration) gives this outward appearance of a smart man from the Reddit-midwit-cinematic-universe. It's like how guns in movies sound more like guns than real guns. It's hyperreality.
Again this is less about the capabilities of AI and it's more connected to the people-pleasing nature of it. It's like you prompt it for some epic dinner and it heaps you up some hmmm epic bacon with bacon yeah (referring to the hivemind-meme). Or BigMac on the poster vs the tray, and the poster one is a model made with different components that are more photogenic. It's a simulacrum.
It looks more like your naive currently imagined thing about what you think you need vs what you'd actually need. It's like prompting your ideal girlfriend into AI avatar existence. I'm sure she will fit your ideal thought and imagination much better but your actual life would need the actual thing.
This relates to the Persona thing that Anthropic has been exploring, that each prompt guides the model towards adopting a certain archetypal fiction character as it's persona and there are certain attraction basins that get reinforced with post training. And in the computer world, simulated action can be easily turned into real action with harnesses and tools, so I'm not saying that it doesn't accomplish the task. But it seems that there are more sloppy personas, and it seems that experts can more easily avoid summoning them by giving them context that reflects more mundane reality than a novice or an expert who gives little context. Otherwise the AI persona will be summoned from the Reddit midwit movie.
I'm not fully clear about all this, but I think we have a lot to figure out around how to use and judge the output of AI in a productive workflow. I don't think it will go away ever, but will need some trimming at the edges for sure.
Agents in general are easy to make, and trivial to make for yourself especially, and the result will be much better than what any of the big providers can make for you.
`pi` with whatever commands/extensions you want to make for yourself is better than CC if you really don't want to go through the trouble of making your own thing.
But you can do a lot of interesting things on top of this. I highly recommend writing an agent and hooking it up to a local model.
The open source models are quite close, and they'd probably be just as good with the equivalent amount of compute/data the frontier labs have access to.
it looks really interesting.
Getting something with a link to their GitHub onto the frontpage of HN. Because form matters much more in this world than substance.
The animated explanation at the top is also way too fast at 1x, almost impossible to follow; that immediately hinted at the author not fully reading/experiencing the result before publishing this.
It's inappropriate to label a free side project 'junk' or 'slop' even if it contains major errors.
Particularly when there's a disclaimer about possible inaccuracies on the page.
I have no idea why because in my experience Claude Code and the same models inside of Cursor behave almost identically. I think all the secret sauce is in the RLHF.
I had used pi and cc to analyze the unpacked cc to compare their design, architecture and implementation.
I guess your site was also coded with pi and it is very impressive. Wonderful if you can do a visualization for pi vs cc as well. My local models might not be powerful enough.
Thanks for the hard work!
My takeaway from looking at the tool list is that they got the fundamental architecture right - try to create a very simple and general set of tools on the client-side (e.g. read file, output rich text, etc) so that the server can innovate rapidly without revving the client (and also so that if, say, the source code leaks, none of the secret sauce does).
Overall, when I see this I think they are focused on the right issues, and I think their tool list looks pretty simple/elegant/general. I picture the server team constantly thinking - we have these client-side tools/APIs, how can we use them optimally? How can we get more out of them. That is where the secret sauce lives.
If it doesn't deliver on the promise we have bigger problems than "oh no the code is insecure". We went from "I think this will work" to "this has to work because if it doesn't we have one of those 'you owe the bank a billion dollars' situations"
The core architecture is not interesting? its an LLM tui, theres not much there to discuss architecturally. The code itself is the actual fascinating train wreck to look at.
If they fail, doesn't software and the giant companies that make it go back to owning the world?
And I'm sure we all know that when working on a greenfield project you can produce a lot more LoC per day than maintaining a legacy one.
Given that vibe code is significantly more verbose, you're probably talking about ~15 engineers worth of code?
I know that's all silly numbers, but this is just attempting to give people some context here, this isn't a massive code base. I've not read a lot of it, so maybe it's better than the verbose code I see Claude put out sometimes.
I find LLMs very useful and capable, but in my experience they definitely perform worse when things are unorganized. Maintenance isn't just aesthetics, it's a direct input to correctness.
Just a thought experiment, I very much doubt I'm the first one to think of it. It's probably in the same line of "why doesn't an LLM just write assembly directly"
(Yes, I know I can turn it off. I have.)
“Complete thyself.”
And I want an octopus. Who orchestrates octopuses.
Sincerely, someone running a team building similar things for analytics.
curious as i haven't gotten around to writing my own agent yet
However, I assume that usage data could be increasingly valuable as well. That will likely help the big commercial cloud models to maintain a head start for general use.
BECAUSE ITS WRONG! THE DATA IS WRONG!
The real value of Anthropic is in the models that they spent hundreds of millions training. Anyone can build a frontend that does a loop, using the model to call tools and accomplish a task. People do it every day.
Sure, they've worked hard to perfect this particular frontend. But it's not like any of this is revolutionary.
What is the naive implementation you're comparing against? Ssh access to the client machine?
Claude Code CLI is actually horrible: it's a full headless browser rendering that's then converted in real-time to text to show in the terminal. And that fact leaks to the user: when the model outputs ASCII, the converter shall happily convert it to Unicode (no latter than yesterday there was a TFA complaining about Unicode characters breaking Unix pipes / parsers expecting ASCII commands).
It's ultra annoying during debugging sessions (that is not when in a full agentic loop where it YOLOs a solution): you can't easily cut/paste from the CLI because the output you get is not what the model did output.
Mega, mega, mega annoying.
What should be something simple becomes a rube-goldberg machinery that, of course, fucks up something fundamental: converting the model's characters to something else is just pathetically bad.
Anyone from Anthropic reading? Get your shit together: if you keep this "headless browser rendering converted to text", at least do not fucking modify the characters.*
- Opencode (anomalyco/opencode) is about 670k LOC
- Codex (openai/codex) is about 720k LOC
- Gemini (google-gemini/gemini-cli) is about 570k LOC
Claude Code's 500k LOC doesn't seem out of the ordinary.
I guess I just find it weird because all the signals are messed up so whenever I see these sorts of layouts, I feel like I'm looking at the average where I don't think "gorgeous and interesting" at all. Instead, I'm forced to think "I should be skeptical of this based on the presentation because it presents as high quality but this may be hiding someone who is not actually aware of what they're presenting in any depth" as the author may have just shoved in a prompt and let it spin.
There's actually a similarly designed website (font weights, font styles etc) here in New Zealand (https://nzoilwatch.com/) where at a glance, it might seem like some overloaded professional-backed thing but instead it's just some guy who may or may not know anything about oil at all, yet people are linking it around the place like some sort of authoritative resource.
I would have way less of an issue if people just put their names by things and disclosed their LLM usage (which again, is fine) rather than giving the potentially false impression to unequipped people that the information presented is actually as accurate and trustworthy as the polish would suggest.
Last week we I was struggling to go from vague prompt to a OMG-it's-so-nice-looking web app, I remembered that example above and then decided to create my own component library, which I did in a couple days: https://www.substrateui.dev/. I was actually super happy that I was able to accomplish that, and then I realized I wanted to better understand the content that I had vibe coded into existence. So now I'm recreating that design system step by step w/ Claude code, filling in gaps in my knowledge & learning a bit about colors, typography, CSS, blah blah blah. It's actually a lot of fun because I'm able to explore all of the concepts and learn enough to build a front end that doesn't suck & is good enough for my use case without getting stuck for days on trying to center a stupid div by hand or play whack-mole-fix-something-and-break-something-else when trying to clean up AI slop.
Content resizing, needing to juggle a speed knob to read, and the overall presentation makes it feel like Edward Tufte flavored nightmare fuel.
Can you expand on this?
My experience is they require excessive steering but do not “break”
If you can download it client side you can likely place a copy in a folder and ask claude
‘decompile the app in this folder to answer further questions on how it works. As an an example first question explain what happens when a user does X’.
I do this with obscure video games where i want to a guide on how some mechanics work. Eg. https://pastes.io/jagged-all-69136 as a result of a session.
It can ruin some games but despite the possibility of hallucinations i find it waaay more reliable than random internet answers.
Works for apps too. Obfuscation doesn’t seem to stop it.
Aren't all the other products also vibe-coded? "All vibe-coded products look like this" doesn't really seem to answer the question "Why is it so damn large?"
It's a repl, that calls out to a blackbox/endpoint for data, and does basic parsing and matching of state with specific actions.
I feel the bulk of those lines should be actions that are performed. Either this is correct or this is not:
1. If the bulk of those lines implement specific and simple actions, why is it so large compared to other software that implements single actions (coreutils, etc)
2. If the actions constitute only a small part of the codebase, wtf is the rest of it doing?
I'm not saying that this is necessarily too much, I'm genuinely asking if this is a bloat or if it's justified.
edit: Claude is actually (TS) 395K. So Gemini is more bloat. Codex is arguable since is written in lower-level language.
I doubt it needs to be more than 20-50kloc.
You can create a full 3D game with a custom 3D engine in 500k lines. What the hell is Claude Code doing?
I don't know where you get this. you should ask folks at Meta. They are probably the biggest and happiest users of CC
" Most people's mental model of Claude Code is that "it's just a TUI" but it should really be closer to "a small game engine".
For each frame our pipeline constructs a scene graph with React then -> layouts elements -> rasterizes them to a 2d screen -> diffs that against the previous screen -> finally uses the diff to generate ANSI sequences to draw
We have a ~16ms frame budget so we have roughly ~5ms to go from the React scene graph to ANSI written. "
Personally, I don't think I will be putting any such disclaimers or disclosures on my work, unless I deem it relevant to the functionality.
I've created some chinese characters learning website and I took me typing 1/3 of LoTR to get there[1]. I would have typed like 1% of that writing code directly. It is a different process, but it still needs some direction.
Like that drunk uncle that takes half an hour and 20 000 words to tell you a 500 word story.
Correction: a code base of 500kLoC would take 23 engineers a year to write. There is no indication that the functionality needed in a TUI app that does what this app does needs 500kLoC.
My suspicion is that the "language" part of LLMs means they tend to prefer languages which are closer to human languages than assembly and benefit from much of the same abstractions and tooling (hence the recent acquisition of bun and astral).
Anything general is always going to be worse for specific use cases, and agents from these big providers are very general. They'll spend tons of tokens doing things that you might not need, including spend extra tokens on supporting MCP, etc., when you might not even need that.
Sure. You could have. But you're not the one playing football in the Champions League.
There were many roads that could have gotten you to the Champions League. But now you're in no position to judge the people who got there in the end and how they did it.
Or you can, but whatever.
What every developer learns during their “psh i could build that” weekendware attempt is that there is infinite polish to be had, and that their 20k loc PoC was <1% of the work.
That said, doesn't TFA show you what they use their loc for?
Engineers using LOC as a measure of quality is the inverse of managers using LOC as a measure of productivity.
I'm serious. The hype chasing clearly clearly matters. .
things like this: https://github.com/instructkr/claw-code I mean ok, serious people put in years of effort for 100 of those stars ...
it's continually wild how extremely irrelevant hard effortful careful work is.
I think that's the game. Get up, look at the headlines, figure out how you can exploit them with vibe coding, do some hyphy project and repeat.
Maybe some lobster themed bullshit between openclaw and the claudecode leak.
I'm not being a cynic here, I'm just telling you what I'm going to do tomorrow.
That just seems to be human nature unfortunately - the complainers are always louder.
I liken it to the problem of applying machine learning to hard video games (e.g. Starcraft). When trained to mimic human strategies, it can be extremely effective, but machine learning will not discover broadly effective strategies on a reasonable timescale.
If you convert "human strategies" to "human theory, programming languages, and design patterns", perhaps the point will be clear.
But: could the ouroboric cycle of LLM use decay the common strategies and design patterns we use into inexplicable blobs of assembly? Can LLMs improve at programming if humans do not advance the theory or invent new languages, patterns, etc?
I strongly disagree, but it made me chuckle a bit, thinking about labeling software as "handmade" or marketing software house as "artisanal".
> You're complaining about vibe coding while also complaining about how you "feel" about the code. Do you see the irony in that?
Where did I complain about how I feel about the actual code? I have feelings, negative ones, about the size of the code given the simple functionality it has, but I have no feelings on the code because I did not look at the code.
- Dark mode design with lots of colors
- Buttons that have vibrant, bright borders and duller backgrounds
- Excessive (IMO) usage of monospace fonts for stylistic reasons
None of this proves that it's AI (the other comments have covered that) but in my experience it's always correct.
Yeahhh strong disagree there, I find Codex and CC to be buggy as hell. Desktop CC is very bad and web version is nigh unusable.
For comparison, it takes me less time to load Chrome and go to gemini.google.com.
The more lines of code you have the more likely there is for one of them to be wrong and go unnoticed. It results in bugs, vulnerabilities,... and leaks.
But given that we know the functionality of Claude Code, we can guess how much complexity should be required. We could also be wrong.
>Why does it matter?
If there’s massively more code than there needs to be that does matter to the end user because it’s harder to maintain and has more surface area for bugs and security problems. Even with agents.
It's sloppy work
Does not matter. Sloppiness is unimportant
Those within well informed, technical circles will fall somewhere in between the for/against labels, myself included.
The GenAI hype cycle is finally starting to collapse as the general population starts to realize that these systems aren't the panacea for "everything" after all. They provide enormous utility in some domains like coding, but even then there are massive tradeoffs, footguns and the usual horse blinder ills that come with every hype cycle. I just hope we stop having to "learn the hard way" with respect to undisciplined use of current-gen LLM systems writ large, and cooler heads prevail sooner rather than later.
This is a two-pizza team sized project, so it's not a project that the code quality would inevitably spiral out of control due to communication problems.
A single senior architect COULD have kept the code quality under control.
The current training loop for coding is RL as well - so a departure from human coding patterns is not unexpected (even if departure from human coding structure is unexpected, as that would require development of a new coding language).
> It's a shame, because it's still the best coding agent, in my experience.
If it is the best, and if it delivers the value users are asking for, then why would they have an incentive to make further $$$ investments to make it of a "higher" quality if the value this difference could make is not substantial or hurts the ROI?
On many projects I found this "higher quality" not only to be false of delivering more substantial value but actually I found it was hurting the project to deliver the value that matters.
Maybe we are after all entering the era of SWE where all this bike-shedding is gone and only type of engineers who will be able to survive in it will be the ones who are capable of delivering the actual value (IME very few per project).
Now whether that’s actually possible is a second topic.
For example, I found an implementation of a PRNG, mulberry32 [1], in one of the files. That's pretty strange considering TS and Javascript have decent PRNGs built into the language and this thing is being used as literally just a shuffle.
[1] https://github.com/AprilNEA/claude-code-source/blob/main/src...
I think a lot of the people prasing Claude & co are on Macs.
The only reason people are using Claude Code is because it's the only way to use their (heavily subsidized) subscription plans. People who are okay with using and paying for their APIs often opt out for other, better, tools.
Also, analogies don't work. As we know for a fact that Claude Code is a bloated mess that these "champions league-level engineers" can't fix. They literally talk about it themselves: https://news.ycombinator.com/item?id=47598488 (they had to bring in actual Champions League engineers from bun to fix some of their mess).
The principles of good software don't suddenly vanish just because now it's a machine writing the code instead of a human, they still have to deal with the issues humans have for more than half a century. The history of programming is new developers coming up with a new paradigm, then rediscovering all the issues that the previous generation had figured out before them.
Because it's unmaintainable slop that they themselves don't know how to fix when something happens? https://news.ycombinator.com/item?id=47598488
Or that's why tgey had to buy bun with actual engineers to work on Claude Code to reduce memory peaks from 68 GB (yes, 68 gigabytes) to a "measely" 1.7? Because code quality doesn't matter?
Or that a year later they still cannot figure out how to render anything in the terminal without flickering?
The only reason people use Claude Code is because it's the only way to use Anthropic's heavily subsidized subscription. You get banned if you use it through other, better, tools.
This file is exactly what I'm talking about.
Take the loadInitialMessage function: It's encumbered with real world incremental requirements. You can see exactly the bolted-on conditionals where they added features like --teleport, --fork-session, etc.
The runHeadlessStreaming function is a more extreme version of that where a bunch of incremental, lateral subsystems are wired together, not an example of superfluous loc.
It's a completely different domain, e.g. very different integration surface area and abstractions.
Claude Code's source is dumped online so there's probably a more concrete analysis to be had than "that sounds like too many loc".
You just repeat the same statement.
That bloated mess is what got them to the Champions League. They did what was necessary to get them here. And they succeeded so far.
But hey, according to some it can be replicated in 50k lines of wrapper code around a terminal command, so for Anthropic it's just one afternoon of vibe coding to get rid of this mess. So what's the problem? /s
It turns out that there is a tradeoff in code between velocity and quality that smart businesses consider relative to hardware cost/quality. The businesses that are outcompeting others are rarely those who have the highest quality code, but rather those that are shipping quickly at a quality level that is satisfactory for current hardware.
At some point someone will probably take their LLM code and repoint it at the LLM and say 'hey lets refactor this so it uses less code is easier to read but does the same thing' and let it chrun.
One project I worked on I saw one engineer delete 20k lines of code one day. He replaced it with a few lines of stored procedure. That 20k lines of code was in production for years. No one wanted to do anything with it but it was a crucial part of the way the thing worked. It just takes someone going 'hey this isnt right' and sit down and fix it.
Since you keep putting words in my mouth that I never said, and keep being deliberately obtuse, this particular branch is over.
Go enjoy Win11 written by same level of champions or something.
Adieu.
That worked because of rapid advancements in CPU performance. We’ve left that era.
It’s about more than performance. Code is and always has been a liability. Even with agents, you start seeing massive slowdowns with code base size.
It’s why I can nearly one shot a simple game for my kid in 20 minutes with Claude, but using it at work on our massive legacy codebase is only marginally faster than doing it by hand.
When a TUI requires 68 GB of RAM to run, or when they spend a week not being able to find a bug that causes multiple people to immediately run out of tokens, it's not a "them" problem.
Also a AAA game (with the engine) with physics, networking, and rendering code is up there in terms of the most complex pieces of software.
Meanwhile I apparently need to change my persoective about this: https://news.ycombinator.com/item?id=47598488
For example, without looking at the code, the superstition also works in the opposite direction: Claude Code is an interface to using AI to do any computer task while a 3D game just lets you shoot some bad guys, so surely the 3D game must be done in fewer loc. That's equally unsatisfying.
You'd have to be more concrete than "sounds like a lot".
Shouldn't interfaces be smaller than the implementation?
Claude Code is quite literally a wrapper around a few APIs. At one point it needed 68GB of RAM to run and requires 11ms to "lay a scene graph" to display a few hundred characters on screen. All links here: https://news.ycombinator.com/item?id=47598488
> while a 3D game just lets you shoot some bad guys, so surely the 3D game must be done in fewer loc.
Yes, most games should be done in fewer loc
As the ZMachine interpreter (V3 games at least, enough for the mentioned example), even a Game Boy used to play Pokemon Red/Blue -and Crystal/Sylver/Blue, just slightly better specs than the OG GB- can run Tristam Island with keypad based input picking both selected words from the text or letter by letter as when you name a character in an RPG. A damn Game Boy, a pocket console from 1989. Not straightly running a game, again. Emulating a simple text computer -the virtual machine- to play it. No slowdowns, no-nothing, and you can save the game (the interpreter status) in a battery backed cartridge, such as the Everdrive. Everything under... 128k.
Claude Code and the rest of 'examples' it's what happens when trade programmers call themselves 'engineers' without even a CS degree.