Cool reverse engineering/analysis report but if this is the extent of nefarious activity that came of it (trying to catch/mitigate chinese lab model distillations), that's kind of encouraging.
> This is not a malicious feature, but it is a weird choice for a developer tool that asks for trust.
They already tell you they scan for malicious prompts, and they have no ZDR guarantees for consumers. Why do signatures like this matter at all?
What do you mean you don't know where the bug is coming from?
No, I absolutely didn't make it up, how could you accuse me of that?
Does anyone know when this regex isn't working? I double checked it 27 times, I even asked the LLM. They all say this regex should be finding these dates.
Weird, suddenly all the conversations are breaking when I feed them into this other tool? Something about UTF-8 errors, but I'm sure I'm only using ASCII?
I do try to take care to make sure the things I build can be used by other people even when they care about different things. I care about understandably, determinism (as it relates to computing), and repeatability (because I want to be able to trust the systems I use).
If y'all would be willing to try to account for use cases of others, and try not to break them... that would be nice.
Please note: that generally when you modify something that belongs to someone else without telling them... things should be expected to break.
What’s the punishment here exactly?
I understand how this can be useful to Anthropic if the 3rd-party is acting as a proxy (because they end up hitting the Claude API with the marked prompt), but it looks like requests where "hostname contains deepseek" would never be sending data to Anthropic. What am I missing?
I used that month to complete a work project and then beef up my personal harness so I'd never have to deal with Anthropic (and these sorts of shenanigans) again.
I would guess that's their first line of defense; they should have more techniques to identify distillation because that's a very simple way of detecting the host and can be easily spoofed.
And that's also why, as a legitimate customer, want none of it, you never know if you accidentally entered a zone they don't like.
https://www.chinatalk.media/p/how-to-buy-cheap-claude-tokens...
This watermark may trigger a similar mechanism.
i.e. this will allow them to literally commit fraud against paying customers
Harnesses are/can be incredibly simple things, not much more than a HTTP client that renders things in a way that suites your taste.
Me, personally, I didn’t build it from scratch but I ported original CC from published sources into Python and extended it to match my own requirements.
You have to pay API pricing, which is far more costly.
I'd either switch to GLM wholesale or just continue to use Opus within Claude Code as the blessed, subsidized path.
The agentic harness on the open source side does need some work, however.
Here's an example. Say you have your team use patched binaries. Then CC updates and requires a new patched binary with new tricks. You now have to have a team ready to analyze the binary and begin to address the tricks; meanwhile, unpatched code is now a fingerprint. If some researcher decides to update Claude on their own to access new features, they get fingerprinted.
Defeating a single fingerprinting technique once is easy. Defeating all of the techniques all the time is hard.
https://news.ycombinator.com/item?id=48259288
https://github.com/anthropics/claude-code/issues/62061
Looks like they just keep finding new "creative" uses for such things, as expected. I'll keep patching them out.
Anthropic pushes fear and control. But the only way to win is by innovating. China is flooding the market with cheap, good enough models, while the U.S. is building a Chinese firewall.
All Anthropic has done is reduce trust, once again, with legitimate customers, while doing nothing to stop illegitimate customers. They need to get adults into key leadership roles, quickly.
You're actually trust your security to your harness AND model AND inference API provider in this scenario: https://jacob.gold/posts/why-i-wont-run-untrusted-models/
Claude Code has more or less full access to the client computer. The server (that hosts the actual AI) can just go: execute this payload and tell me the result - otherwise I won't answer any further questions or re-route you to a stupider model.
The payload could check for Chinese time-zones, scan for copies of the little red book on the local hard-drive, or ping truth.social to see it was behind the great firewall.
There seem to be all sorts of continual under-the-cover changes like this one that make life harder. It feels like the entire product has been taken over by overly ambitious PMs that care more about making their mark than in improving the experience, and all of their marks have made me less productive.
I've been using Pi with GLM5.2 the past few days, and though it's expensive, I find it far more productive and less annoying. The remote session plugin is far more reliable, I don't need to intuit some undocumented usage pattern to figure out how to use it well, and it just works.
And no, IMO stenography isn't security by obscurity, in the same that using RSA and keeping the private key private isn't security by obscurity - keeping the private thing private is part of the security model.
I'm authenticated to Claude, so they already have the whole attribution thing solved.
I expect DeepSeek V4 Flash (or an equivalently sized model) to reach parity with GLM 5.2 some time this year (this based on DeepSeek V4 Flash launching at GLM 5.0 parity[0], and GLM 5.2 being freely available to distill from)
GLM 5.2 is within spitting distance of Opus 4.8 and is at least as good as Opus 4.6[1] which some devs were willing to spend hundreds to single-digit thousands of dollars a month for a few months ago.
[0]: https://artificialanalysis.ai/models/comparisons/deepseek-v4...
[1]: https://artificialanalysis.ai/models/comparisons/claude-opus...
Or maybe you don't understand this hypothetical situation either, but I'm suspecting you just don't care about other people's privacy.
That they choose to implement it by fingerprinting my access patterns without first disclosing is where they shit the bed. It isn't "sneaky" it's straight up sneaky and dishonest at that. That this particular instance is harmless doesn't give me much comfort. Who's to say they aren't harvesting PII?
are you using the API for glm 5.2 or how exactly is it more expensive? How is GLM5.2 more expensive than using Claude code, that doesn't line up to my experience but to be fair I am on an older yearly subscription which generously only has 5 hour limits.
To be fair though one minor criticism of GLM 5.2 that I have is that it does seem to overthink quite a lot sometimes but the results end up being (good?),
I personally have used Glm 5.2 with (Opencode + obra/superpowers) / Oh-my-pi / Maki.sh
I like the 1st one when I am doing a longer project, the 2nd or 3rd one when I am doing a project which doesn't want me to ask too many questions and simply spin me up something. I sometimes use free online interfaces of claude and gemini and others like AIstudio for that as well which surprisingly can lead you to go far as well.
Overall, I am decently happy with the state of Open-source models actually and the eco-system around it is probably gonna have even more innovation surrounding it.
It shouldn't, not if you run CC as a separate unprivileged user. I wouldn't run CC on my main user account with sudo and access to my home directory or other resources. This is what the UNIX permissions system was designed for.
Not really distillation, just synthetic training data.
I was inspecting Claude Code for privacy reasons.
Most devs give their harnesses ridiculous access. FS, shell, git, browser access, even computer use nowadays. That is the whole point. They need enough context to do useful work.
That also means the client itself deserves scrutiny. If a coding agent can read your repo and run commands, the binary that ships it should be boring (ƒor example, pi harness)
So I took a look at my local Claude Code (2.1.196) install.
Inside the Claude Code binary, there is a function that changes the current date string inserted into the system prompt.
The normal string looks like this:
Claude Code can silently change two things:
Today's- to /Here is the relevant code, cleaned up from the minified bundle:
This is prompt steganography, a technique used to hide data in plain sight.
The visible sentence still reads like a normal date. The model and the user see something boring. The raw request contains a marker.
The trigger is ANTHROPIC_BASE_URL, Claude Code's API base URL override.
Then it checks if:
Asia/Shanghai or Asia/UrumqiThe timezone check changes:
into:
The hostname check changes the apostrophe:
| Condition | Apostrophe |
|---|---|
| Normal | ' |
| Known domain | \u2019 |
| Lab keyword | \u02BC |
| Known domain and lab keyword | \u02B9 |
These are visually tiny changes you would never notice in most mono fonts.
The domain and keyword lists are stored as base64 strings and XOR-decoded with key 91.
The decoded lab keyword list is:
The decoded domain list is much larger. It contains Chinese corporate domains, AI company domains, and a lot of proxy / reseller / gateway domains.
Some examples:
The date function is used when building the agent context:
So the marker becomes part of the system context sent to the model. (Where Anthropic probably parses in their backend)
My installed binary is signed by Anthropic:
My current shell had ANTHROPIC_BASE_URL unset, and my timezone was:
So on my machine, under my current environment, this path would produce the normal apostrophe and the normal YYYY-MM-DD date string.
Anthropic probably wants to detect API resellers, unauthorized Claude Code gateways, and model "distillation attack" pipelines. A custom ANTHROPIC_BASE_URL pointing at a known reseller domain is a useful signal. A hostname containing deepseek or zhipu is also a useful signal.
That part makes sense, but the implementation is weird.
CC silently alters the system prompt using invisible-ish Unicode markers. It encodes proxy / gateway classification into a sentence that looks like plain English. It hides the domain list behind XOR and base64. This is not a malicious feature, but it is a weird choice for a developer tool that asks for trust.
Coding agents already live on the wrong side of a scary boundary. They can inspect code, summarize secrets by accident, run commands, install packages, edit files, and push commits on your local machine. Most developers accept that because the productivity gain is worth the risk.
Trust from real developers depends on the boring behavior.
If the client wants to detect custom API gateways, it can say so plainly. It can send an explicit telemetry field with documentation. It can make the policy visible. It can put the behavior in release notes.
Hiding the signal in the system prompt makes every other privacy claim harder to believe.
For most users, this path probably stays inactive.
If you are using the official Anthropic API endpoint, Crt() returns early. If ANTHROPIC_BASE_URL is unset, Crt() returns early. If you are using a normal setup, the date prompt stays "boring".
The interesting case is people routing CC through a custom base URL. That includes:
In that case, Claude Code classifies the hostname and encodes the result into the prompt.
The bypass is also trivial. Change hostname, change timezone, patch the binary, wrap the process. Any serious adversary can make this signal useless.
So the feature mostly punishes the exact people who are easier to fingerprint: normal developers doing weird but legitimate things.
I think this could have been explicit.
Developer tools can enforce terms. API providers can detect abuse. Companies can protect their models.
When a tool with filesystem and shell access starts hiding classification bits inside invisible prompt punctuation, the correct reaction is scrutiny.
Trust is earned in the boring parts.