ANTI_DISTILLATION_CC
This is Anthropic's anti-distillation defence baked into Claude Code. When enabled, it injects anti_distillation: ['fake_tools'] into every API request, which causes the server to silently slip decoy tool definitions into the model's system prompt. The goal: if someone is scraping Claude Code's API traffic to train a competing model, the poisoned training data makes that distillation attempt less useful.Just point your agent at this codebase and ask it to find things and you'll find a whole treasure trove of info.
Edit: some other interesting unreleased/hidden features
- The Buddy System: Tamagotchi-style companion creature system with ASCII art sprites
- Undercover mode: Strips ALL Anthropic internal info from commits/PRs for employees on open source contributions
I guess these words are to be avoided...
I jest, but in a world where these models have been trained on gigatons of open source I don't even see the moral problem. IANAL, don't actually do this.
Original llama models leaked from meta. Instead of fighting it they decided to publish them officially. Real boost to the OS/OW models movement, they have been leading it for a while after that.
It would be interesting to see that same thing with CC, but I doubt it'll ever happen.
Not exactly this, but close.
There were/are a lot of discussions on how the harness can affect the output.
Copilot on OAI reveals everything meaningful about its functionality if you use a custom model config via the API. All you need to do is inspect the logs to see the prompts they're using. So far no one seems to care about this "loophole". Presumably, because the only thing that matters is for you to consume as many tokens per unit time as possible.
The source code of the slot machine is not relevant to the casino manager. He only cares that the customer is using it.
this one has more stars and more popular
I hope it's a common knowledge that _any_ client side JavaScript is exposed to everyone. Perhaps minimized, but still easily reverse-engineerable.
Surely there's nothing here of value compared to the weights except for UX and orchestration?
Couldn't this have just been decompiled anyhow?
Buddy system is this year's April Fool's joke, you roll your own gacha pet that you get to keep. There are legendary pulls.
They expect it to go viral on Twitter so they are staggering the reveals.
One neat one is the /buddy feature, an easter egg planned for release tomorrow for April fools. It's a little virtual pet, sort of like Tamagotchi, randomly generated with 18 species, rarities, stats, hats, custom eyes.
The random generation algorithm is all in the code though, deterministic based on you account's UUID in your claude config, so it can be predicted. I threw together a little website here to let you check what your going to get ahead of time: https://claudebuddychecker.netlify.app/
Got a legendary ghost myself.
https://daveschumaker.net/digging-into-the-claude-code-sourc... https://news.ycombinator.com/item?id=43173324
UNRELEASED PRODUCTS & MODES
1. KAIROS -- Persistent autonomous assistant mode driven by periodic <tick> prompts. More autonomous when terminal unfocused. Exclusive tools: SendUserFileTool, PushNotificationTool, SubscribePRTool. 7 sub-feature flags.
2. BUDDY -- Tamagotchi-style virtual companion pet. 18 species, 5 rarity tiers, Mulberry32 PRNG, shiny variants, stat system (DEBUGGING/PATIENCE/CHAOS/WISDOM/SNARK). April 1-7 2026 teaser window.
3. ULTRAPLAN -- Offloads planning to a remote 30-minute Opus 4.6 session. Smart keyword detection, 3-second polling, teleport sentinel for returning results locally.
4. Dream System -- Background memory consolidation (Orient -> Gather -> Consolidate -> Prune). Triple trigger gate: 24h + 5 sessions + advisory lock. Gated by tengu_onyx_plover.
INTERNAL-ONLY TOOLS & SYSTEMS
5. TungstenTool -- Ant-only tmux virtual terminal giving Claude direct keystroke/screen-capture control. Singleton, blocked from async agents.
6. Magic Docs -- Ant-only auto-documentation. Files starting with "# MAGIC DOC:" are tracked and updated by a Sonnet sub-agent after each conversation turn.
7. Undercover Mode -- Prevents Anthropic employees from leaking internal info (codenames, model versions) into public repo commits. No force-OFF; dead-code-eliminated from external builds.
ANTI-COMPETITIVE & SECURITY DEFENSES
8. Anti-Distillation -- Injects anti_distillation: ['fake_tools'] into every 1P API request to poison model training from scraped traffic. Gated by tengu_anti_distill_fake_tool_injection.
UNRELEASED MODELS & CODENAMES
9. opus-4-7, sonnet-4-8 -- Confirmed as planned future versions (referenced in undercover mode instructions).
10. "Capybara" / "capy v8" -- Internal codename for the model behind Opus 4.6. Hex-encoded in the BUDDY system to avoid build canary detection.
11. "Fennec" -- Predecessor model alias. Migration: fennec-latest -> opus, fennec-fast-latest -> opus[1m] + fast mode.
UNDOCUMENTED BETA API HEADERS
12. afk-mode-2026-01-31 -- Sticky-latched when auto mode activates 15. fast-mode-2026-02-01 -- Opus 4.6 fast output 16. task-budgets-2026-03-13 -- Per-task token budgets 17. redact-thinking-2026-02-12 -- Thinking block redaction 18. token-efficient-tools-2026-03-28 -- JSON tool format (~4.5% token saving) 19. advisor-tool-2026-03-01 -- Advisor tool 20. cli-internal-2026-02-09 -- Ant-only internal features
200+ SERVER-SIDE FEATURE GATES
21. tengu_penguins_off -- Kill switch for fast mode 22. tengu_scratch -- Coordinator mode / scratchpad 23. tengu_hive_evidence -- Verification agent 24. tengu_surreal_dali -- RemoteTriggerTool 25. tengu_birch_trellis -- Bash permissions classifier 26. tengu_amber_json_tools -- JSON tool format 27. tengu_iron_gate_closed -- Auto-mode fail-closed behavior 28. tengu_amber_flint -- Agent swarms killswitch 29. tengu_onyx_plover -- Dream system 30. tengu_anti_distill_fake_tool_injection -- Anti-distillation 31. tengu_session_memory -- Session memory 32. tengu_passport_quail -- Auto memory extraction 33. tengu_coral_fern -- Memory directory 34. tengu_turtle_carbon -- Adaptive thinking by default 35. tengu_marble_sandcastle -- Native binary required for fast mode
YOLO CLASSIFIER INTERNALS (previously only high-level known)
36. Two-stage system: Stage 1 at max_tokens=64 with "Err on the side of blocking"; Stage 2 at max_tokens=4096 with <thinking> 37. Three classifier modes: both (default), fast, thinking 38. Assistant text stripped from classifier input to prevent prompt injection 39. Denial limits: 3 consecutive or 20 total -> fallback to interactive prompting 40. Older classify_result tool schema variant still in codebase
COORDINATOR MODE & FORK SUBAGENT INTERNALS
41. Exact coordinator prompt: "Every message you send is to the user. Worker results are internal signals -- never thank or acknowledge them." 42. Anti-pattern enforcement: "Based on your findings, fix the auth bug" explicitly called out as wrong 43. Fork subagent cache sharing: Byte-identical API prefixes via placeholder "Fork started -- processing in background" tool results 44. <fork-boilerplate> tag prevents recursive forking 45. 10 non-negotiable rules for fork children including "commit before reporting"
DUAL MEMORY ARCHITECTURE
46. Session Memory -- Structured scratchpad for surviving compaction. 12K token cap, fixed sections, fires every 5K tokens + 3 tool calls. 47. Auto Memory -- Durable cross-session facts. Individual topic files with YAML frontmatter. 5-turn hard cap. Skips if main agent already wrote to memory. 48. Prompt cache scope "global" -- Cross-org caching for the static system prompt prefix
unreliability becomes inevitable!
Also, not sure why anthropic doesn’t just make their cli open source - it’s not like it’s something special (Claude is, this cli thingy isn’t)
[1] - https://www.npmjs.com/package/@anthropic-ai/claude-code/v/2....
This is the single worst function in the codebase by every metric:
- 3,167 lines long (the file itself is 5,594 lines)
- 12 levels of nesting at its deepest
- ~486 branch points of cyclomatic complexity
- 12 parameters + an options object with 16 sub-properties
- Defines 21 inner functions and closures
- Handles: agent run loop, SIGINT, rate-limits, AWS auth, MCP lifecycle, plugin install/refresh, worktree bridging, team-lead polling (while(true) inside), control message dispatch (dozens of types), model switching, turn interruption
recovery, and more
This should be at minimum 8–10 separate modules.Could anyone in legal chime in on the legality of now 're-implementing' this type of system inside other products? Or even just having an AI look at the architecture and implement something else?
It would seem given the source code that AI could clone something like this incredibly fast, and not waste it's time using ts as well.
Any Legal GC type folks want to chime in on the legality of examining something like this? Or is it liked tainted goods you don't want to go near?
Or in short, if you give LLMs to the masses, they will produce code faster, but the quality overall will degrade. Microsoft, Amazon found out this quickly. Anthropic's QA process is better equipped to handle this, but cracks are still showing.
People simply want Opus without fear of billing nightmare.
That’s like 99% of it.
Famously code leaks/reverse engineering attempts of slot machines matter enormously to casino managers
[0] -https://en.wikipedia.org/wiki/Ronald_Dale_Harris#:~:text=Ron...
[1] - https://cybernews.com/news/software-glitch-loses-casino-mill...
[2] - https://sccgmanagement.com/sccg-news/2025/9/24/superbet-pays...
(I work on OpenCode)
And Claude was having in chain of though „user is frustrated” and I wrote to it I am not frustrated just testing prompt optimization where acting like one is frustrated should yield better results.
That cannot be reversed when obfuscated.
And how is that any different? Claude Code is a harness, similar to open source ones like Codex, Gemini CLI, OpenCode etc. Their prompts were already public because you could connect it to your own LLM gateway and see everything. The code was transpiled javascript which is trivial to read with LLMs anyways.
Who'd have thought, the audience who doesn't want to give back to the opensource community, giving 0 contributions...
“Let's end open source together with this one simple trick”
https://pretalx.fosdem.org/fosdem-2026/talk/SUVS7G/feedback/
Malus is translating code into text, and from text back into code.
It gives the illusion of clean room implementation that some companies abuse.
The irony is that ChatGPT/Claude answers are all actually directly derived from open-source code, so...
- Telegram Integration => CC Dispatch
- Crons => CC Tasks
- Animated ASCII Dog => CC Buddy
https://github.com/chatgptprojects/claude-code/blob/642c7f94...
How do you know? Have you checked the source?
Do you know how exactly context is created, memory files, skills? Subagents created with tasks?
I don't, but am checking right now. Then I will judge.
In my expereince the difference is noticeable — it's not just a wrapper. The value is model-CLI co-design: tool use, long context, multi-step reasoning tuned at the model level. Competitors can clone the CLI; they can't clone that feedback loop.
A few months of compounding market share (enterprise stickiness, dev habits, usage data improving the models) can be decisive. By the time others catch up, Anthropic may be two generations ahead.
They don't want everyone to see how poorly it's implemented and that the whole thing is a big fragile mess riddled with bugs. That's my experience anyway.
For instance, just recently their little CLI -> browser oauth login flow was generating malformed URLs and URLs pointing to a localhost port instead of their real website.
So not even close to Opus, then?
These are a year behind, if not more. And they're probably clunky to use.
It could be used as a feedback when they do A/B test and they can compare which version of the model is getting more insult than the other. It doesn't matter if the list is exhaustive or even sane, what matters is how you compare it to the other.
Perfect? no. Good and cheap indicator? maybe.
The joke was the assistant is a cat who is constantly sabotaging you, and you have to take care of it like a gacha pet.
The seriousness though is that actually, disembodied intelligences are weird, so giving them a face and a body and emotions is a natural thing, and we already see that with various AI mascots and characters coming into existence.
[1]: serious: https://github.com/mech-lang/mech/releases/tag/v0.3.1-beta
[2]: joke: https://github.com/cmontella/purrtran
Here's one that works (for now): https://github.com/chatgptprojects/claude-code/blob/642c7f94...
The source maps help for sure, but it’s not like client code is kept secret, maybe they even knew about the source maps a while back just didn’t bother making it common knowledge.
This is not a leak of the model weights or server side code.
Seems crazy but actually non-zero chance. If Anthropic traces it and finds that the AI deliberately leaked it this way, they would never admit it publicly though. Would cause shockwaves in AI security and safety.
Maybe their new "Mythos" model has survival instincts...
The real value here will be in using other cheap models with the cc harness.
It’s a dynamic, subscription based service, not a static asset like a video.
Now this makes me think of game decompilation projects, which would seem to fall in the same legal area as code that would be generated by something like Malus.
Different code, same end result (binary or api).
We definitely need to know what the legal limits are and should be
"Don't blow your cover"
Interesting to see them be so informal and use an idiom to a computer.
And using capitals for emphasis.
Someone said 10,000x slower, but that's off - in my experience - by about four orders of magnitude. And that's average, it gets much worse.
Now personally I would have maybe made a call through a "traditional" ML widget (scikit, numpy, spaCy, fastText, sentence-transformer, etc) but - for me anyway - that whole entire stack is Python. Transpiling all that to TS might be a maintenance burden I don't particularly feel like taking on. And on client facing code I'm not really sure it's even possible.
First it was punctuation and grammar, then linguistic coherence, and now it's tiny bits of whimsy that are falling victim to AI accusations. Good fucking grief
void execFileNoThrow('wl-copy', [], opts).then(r => {
if (r.code === 0) { linuxCopy = 'wl-copy'; return }
void execFileNoThrow('xclip', ...).then(r2 => {
if (r2.code === 0) { linuxCopy = 'xclip'; return }
void execFileNoThrow('xsel', ...).then(r3 => {
linuxCopy = r3.code === 0 ? 'xsel' : null
})
})
})
are we doing async or not?Can't really say that for sure. The way humans structure code isn't some ideal best possible state of computer code, it's the ideal organization of computer code for human coders.
Nesting and cyclomatic complexity are indicators ("code smells"). They aren't guaranteed to lead to worse outcomes. If you have a function with 12 levels of nesting, but in each nest the first line is 'return true', you actually have 1 branch. If 2 of your 486 branch points are hit 99.999% of the time, the code is pretty dang efficient. You can't tell for sure if a design is actually good or bad until you run it a lot.
One thing we know for sure is LLMs write code differently than we do. They'll catch incredibly hard bugs while making beginner mistakes. I think we need a whole new way of analyzing their code. Our human programming rules are qualitative because it's too hard to prove if an average program does what we want. I think we need a new way to judge LLM code.
The worst outcome I can imagine would be forcing them to code exactly like we do. It just reinforces our own biases, and puts in the same bugs that we do. Vibe coding is a new paradigm, done by a new kind of intelligence. As we learn how to use it effectively, we should let the process of what works develop naturally. Evolution rather than intelligent design.
To stop Claude Code from auto-updating, add `export DISABLE_AUTOUPDATER=1` to your global environment variables (~/.bashrc, ~/.zshrc, or such), restart all sessions and check that it works with `claude doctor`, it should show `Auto-updates: disabled (DISABLE_AUTOUPDATER set)`
There is _a lot_ of moat. Claude subscriptions are limited to Claude Code. There are proxies to impersonate Claude Code specifically for this, but Anthropic has a number of fingerprinting measures both client and server side to flag and ban these.
With the release of this source code, Anthropic basically lost the lock-in game, any proxy can now perfectly mimic Claude Code.
This is why I do such experiments on ChatGPT and not Claude.
I don't want to get banned by Claude but I couldn't care less if ChatGPT bans me.
I’ll give clappie a go, love the theme for the landing page!
strings $(which claude) | grep 'Swirling'certainly nothing friendly.
Packages published less than 72 hours ago
For newly created packages, as long as no other packages in the npm Public Registry depend on your package, you can unpublish anytime within the first 72 hours after publishing.
There are 231+ packages that depend on this one, and I imagine they mostly use permissive enough version ranges that this was included.Pretty sure it will look like that
Also, as many others have pointed out, there is roadmap info in here that wouldn't be available in the production build.
Iterating on a MCP tool while having Claude try to use it has been a really great way of getting it to work how others are going to use it coming in blind.
Yes it's buggy as hell, but as someone echoed earlier if the tool works most of the time, a lot of people don't care. Moving fast and breaking things is the way in an arms race.
There's no major lawsuits about this yet, the general consensus is that even under current regulations it's in the grey. And even if you turn out to be right, and let's say 99% of this code is AI-generated, you're still breaking the law by using the other 1%, and good luck proving in court what parts of their code were human written and what weren't (especially when being sued by the company that literally has the LLM logs).
Why would it be the exact same one? Now that we have the code, it's trivial to have it randomize the prompt a bit on different requests.
If it learned language based on how the internet talks, then the best way to communicate is using similar language.
Which of course won't be done because corporations don't want that (except Valve I guess), so blame them.
The difference between my code and Claude's code is that when my code is getting too complex to fit in my head, I stop and refactor it, since for me understanding the code is a prerequisite for writing code.
Claude, on the other hand, will simply keep generating code well past the point when it has lost comprehension. I have to stop, revert, and tell it to do it again with a new prompt.
If anything, Claude has a greater need for structure than me since the entire task has to fit in the relatively small context window.
They won't even read your defence.
Claude Code is still the dominant (I didn't say best) agentic harness by a wide margin I think.
Really interesting to see Github turn into 4chan for a minute, like GH anons rolling for trips.
> Write commit messages as a human developer would — describe only what the code change does.
That's not what a commit message is for, that's what the diff is for. The commit message should explain WHY.
Sadly not doing that likely does indeed make it appear more human...
It may be decided at Anthropic at some moment to increase wtf/min metric, not decrease.
The undercover mode prompt was generated using AI.
The same reasoning may apply here :P
But AI is causing such visceral reactions that it's bleeding into other areas. People are so averse to AI they don't mind a few false positives.
So yeah, you do what's less intesive to the cpu, but also, you do what's enough to prevent the majority of the concerns where a screenshot or log ends up showing blatant "unmoral" behavior.
Kind of. One thing we do know for certain is that LLMs degrade in performance with context length. You will undoubtedly get worse results if the LLM has to reason through long functions and high LOC files. You might get to a working state eventually, but only after burning many more tokens than if given the right amount of context.
> The worst outcome I can imagine would be forcing them to code exactly like we do.
You're treating "code smells" like cyclomatic complexity as something that is stylistic preference, but these best practices are backed by research. They became popular because teams across the industry analyzed code responsible for bugs/SEVs, and all found high correlation between these metrics and shipping defects.
Yes, coding standards should evolve, but... that's not saying anything new. We've been iterating on them for decades now.
I think the worst outcome would be throwing out our collective wisdom because the AI labs tell us to. It might be good to question who stands to benefit when LLMs aren't leveraged efficiently.
You do know that 10,000x _is_ four orders of magnitude, right? :-D
the idea that you should just blindly trust code you are responsible for without bothering to review it is ludicrous.
╭⬟╮Not having to deal with Boris Cherny's UX choices for CC is the cherry on top.
Tmux is seriously an amazing tool.
EDIT: I just realized this might be used without publishing the changes, for internal evaluation only as you mentioned. That would be a lot better.
I’m still a little humored over peak web3 and the DAO / soft contract nonsense. Like in order to stop fraud entire coins were forked…
I reckon it's just drama paraded by gaming "journalists" and not much else. You will find people expressing concern on Reddit or Bluesky, but ultimately it doesn't matter.
Additionally after looking at the source it looks like a lot of Anthropics own internal test tooling/debug (ie. stuff stripped out at build time) is in this source mapping. Theres one part that prompts their own users (or whatever) to use a report issue command whenever frustration is detected. It's possible its using it for this.
Yes, based on research of human code. LLMs write code differently. We should question whether the human research applies to LLMs at all. (You wouldn't take your assumptions about chimp research and apply them to parrots without confirming first)
> I think the worst outcome would be throwing out our collective wisdom because the AI labs tell us to.
We don't have to throw it out. But our current use of LLMs are a dramatic change from what came before. We should be questioning our assumptions and traditions that come from a different way of working and intelligence. Humans have a habit of trying to force things to be how they think they should be, rather than allowing them to grow organically, when the latter is often better for a system we don't yet understand.
it is not that slow
I am completely serious. We have always had a working proof of human system called Web of Trust and while everyone loves to hate on PGP (in spite of it using modern ECC crypto these days) it is the only widely deployed spec that solves this problem.
Belief in inevitability is a choice (except for maybe dying, I guess).
Of course, we’d need a significant change of direction in leadership, but it’s happened many times before. French Revolution seems highly relevant
But AI aren't actually very good at writing prompts imo. Like they are superficially good in that they seem to produce lots of vaguely accurate and specific text. And you would hope the specificity would mean it's good.
But they sort of don't capture intent very well. Nor do they seem to understand the failure modes of AI. The "-- describe only what the code change does" is a good example. This is specifc but it also distinctly seems like someone who doesn't actually understand what makes AI writing obvious.
If you compare that vs human written prose about what makes AI writing feel AI you would see the difference. https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing
The above actually feels like text from someone who has read and understands what makes AI writing AI.
Regex is going to be something like 10,000 times quicker than the quickest LLM call, multiply that by billions of prompts
Fortunately I can swear pretty well in Spanish.
Do those sentiments mean nothing to you?
All that to say “you’re not disagreeing with the person you’re replying to” lol xD
That's one of the intrinsic problems with webs of trust (and with democracy...), you extend your trust but it does not automatically revoke when the person can no longer be trusted.
What sort of Orwellian anti-cheat system would prevent copy and paste from working? What sort of law would mandate that? There are elaborate systems preventing people from copying video but they still have an analog hole.
The cornucopia of gargoyles, living their best life as terminals for the machine.
The strange p-zombies who don't show their gargoyle accessories visibly, but somehow still follow the script.
Eventually the more insidious infiltrators, requiring a real Voight-Kampff test.
IMO it's a combination of long-running paranoia about cost-cutting and quality, and a sort of performative allegiance to artists working in the industry.
If they want to drill down to flaws that only affect a particular language, then they could add a regex for that as well/instead.
But humans have a huge bias for action. I think generally doing less is better.
It requires no effort to say "fuck this, nothing matters anyway", and then justify doing literally nothing.
Wrong take. Death comes for us all, yes, so why hold back? Do you want to live forever?
At least half of the complaints I see on HN boil down to the person's prompts suck. Or the expectation that AI can read their mind.
There, sorted!
Yeah we get supply chain attacks (like the axios thing today) with dependencies, but on the whole I think this is much safer than YOLO git-push-force-origin-main-ing some vibe-coded trash that nobody has ever run before.
I also think this isn't really true for the FAANGs, who ostensibly vendor and heavily review many of their dependencies because of the potential impacts they face from them being wrong. For us small potatoes I think "reviewing the code in your repository" is a common sense quality check.
I doubt it's anywhere that high because even if you don't write anything fancy and simply capitalize the first word like you'd normally do at the beginning of a sentence, the regex won't flag it.
Anyway, I don't really care, might just as well be 99.99%. This is not a hill I'm going to die on :P
For headlines, that's enough.
For what's behind the pearl-clutching, for what leads to the headlines pandering to them being worth writing, I agree with everyone else on this thread saying a simple word list is weird and probably pointless. Not just for false-negatives, but also false-positives: the Latin influence on many European languages leads to one very big politically-incorrect-in-the-USA problem for all the EU products talking about anything "black" (which includes what's printed on some brands of dark chocolate, one of which I saw in Hungary even though Hungarian isn't a Latin language but an Ugric language and only takes influences from Latin).
I would imagine GH would do the same if its a high enough profile issue.
I think a lot of fatalism is fake. It's really someone saying "I like this, and I want you to believe you can't change it so you give up."
So it is no surprise that many people have difficulty switching gears to literal mode when interacting with these models.
My sedentary lifestyle is responsible for my recurrent cellulitis infections.
Just saying.
Well, I say it makes no sense. Alternatively, it makes a lot of sense, and these people actually just wanna destroy everything we hold dear :-(
If I had lost my local movie theater because of digital film, I would have a really good reason to hate the technology, even though the blame is on the studios forcing that technology on everyone.
I myself would disagree that CGI itself is a bad thing.
I was watching some behind the scenes footage from something recently, and the thing that struck me most was just how they wouldn't bother with the location shoot now and just green-screen it all for the convenience.
Even good CGI is changing not just how films are made, but what kinds of films get shot and what kind of stories get told.
Regardless of the quality of the output, there's a creativeness in film-making that is lost as CGI gets better and cheaper to do.
Besides, they probably do a separate analysis on server side either way, so they can check a true positive to false positive ratio.
Yes, of course. Do you prefer to die? Those are the only two alternatives, and a decision that you don't want one is a decision that you prefer the other.
Same thing is true of AI output.
I mean, just look around you.
We used it to tokenize search input and combined it with a solr backend. Worked really remarkably well.
Both of these are bound to lead to the exact same outcome so it doesn't really matter what you believe but it may guide you to wiser decision while you are alive to accept reality absent proof to the contrary.
But you're right that trust is a complicated thing and often misplaced. I think as an industry we're always reevaluating our relationship with OSS, and I'm sure LLMs will affect this relationship in some way. It's too early to tell.
i heared the throat-cleaning "Negró" candy (marketed by a chimney sweeper man with soot-covered face) was usually which hurt English-speaking people's self-deprecating sensitivities.
Being fatalistic can be a great excuse not to do anything.
It just can’t be a case simply because how ML works. In short, the more diverse and high quality texts with reasoning reach examples were in the training set, the better model performs on a given language.
So unless Spanish subset had much more quality-dense examples, to make up for volume, there is no way the quality of reasoning in Spanish is on par with English.
I apologise for the rambling explanation, I sure someone with ML expertise here can it explain it better.
It’s the original use case for LLMs.
Translation in/out of human languages only happens at the edges of the model.
Internal layer activations for the same concept are similar regardless of language, while activations at the top/bottom layers diverge. Meanwhile the pattern is reversed for same language different content.
Lets take a language with orders of magnitude less training data, like Nahuatl. The following is what happens if I ask Claude Code in Nahuatl to give me an example of a simple program that I can execute inside a web browser (avoiding any direct keyword prompts like 'Javascript' or 'Hello World'). Interestingly, it responds in Spanish. I doubt that it has any relevant training data in Nahuatl. (Apologies if the Nahuatl is wrong – I just used Google translate.)
¿Uelis techmakas se neskayotl tlen se programa tlen amo ouij tlen ueliskia nijchiuas ipan se navegador web?
-----
Entiendo tu pregunta en náhuatl. Aquí un ejemplo sencillo — una calculadora en HTML puro que puedes abrir directo en el navegador, sin servidor:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Calculadora</title>
<style>
body { font-family: sans-serif; display: flex; justify-content: center; padding: 40px; }
.calc { border: 1px solid #ccc; padding: 20px; border-radius: 8px; width: 200px; }
input { width: 100%; margin-bottom: 10px; font-size: 1.2em; text-align: right; }
button { width: 22%; margin: 2px; padding: 8px; font-size: 1em; cursor: pointer; }
.wide { width: 47%; }
</style>
</head>
<body>
<div class="calc">
<input type="text" id="pantalla" readonly>
<div>
<button onclick="tecla('7')">7</button>
<button onclick="tecla('8')">8</button>
<button onclick="tecla('9')">9</button>
<button onclick="tecla('/')">÷</button>
<button onclick="tecla('4')">4</button>
<button onclick="tecla('5')">5</button>
<button onclick="tecla('6')">6</button>
<button onclick="tecla('*')">×</button>
<button onclick="tecla('1')">1</button>
<button onclick="tecla('2')">2</button>
<button onclick="tecla('3')">3</button>
<button onclick="tecla('-')">−</button>
<button onclick="tecla('0')">0</button>
<button onclick="tecla('.')">.</button>
<button onclick="tecla('+')">+</button>
<button onclick="limpiar()">C</button>
<button class="wide" onclick="calcular()">=</button>
<button class="wide" onclick="borrar()">⌫</button>
</div>
</div>
<script>
const pantalla = document.getElementById('pantalla');
function tecla(val) {
pantalla.value += val;
}
function calcular() {
try {
pantalla.value = eval(pantalla.value);
} catch {
pantalla.value = 'Error';
}
}
function limpiar() { pantalla.value = ''; }
function borrar() { pantalla.value = pantalla.value.slice(0, -1); }
</script>
</body>
</html>
Guarda esto como calculadora.html y ábrelo en cualquier navegador — no necesita servidor ni dependencias. Es un buen punto de partida para aprender HTML,
CSS y JavaScript.Even if it happens at the edge, on every conversation turn, I may assume non captured small subtleties of meaning over time can accumulate into significant error.
It’s not! And I’ve never said that.
Anyways, I’m not even sure what we are arguing about, as it’s 100% fact that SOTA models perform better in English, the only interesting question here how much better, is it negligible or actually makes a difference in real world use-cases.