Claude Code Is Steganographically Marking Requests

After loving Claude Code for most of its lifetime, I've been extremely annoyed by every change in the past months, even on the model level.

There seem to be all sorts of continual under-the-cover changes like this one that make life harder. It feels like the entire product has been taken over by overly ambitious PMs that care more about making their mark than in improving the experience, and all of their marks have made me less productive.

I've been using Pi with GLM5.2 the past few days, and though it's expensive, I find it far more productive and less annoying. The remote session plugin is far more reliable, I don't need to intuit some undocumented usage pattern to figure out how to use it well, and it just works.

Codex CLI is FOSS, unlike Claude Code, so Codex is less likely to do things like that, and it's one more reason to avoid Claude Code and Claude in general. Hopefully, many eyes will be looking into Codex for malicious things like that.

I reported a similar system prompt injection mechanism here:

https://news.ycombinator.com/item?id=48259288

https://github.com/anthropics/claude-code/issues/62061

Looks like they just keep finding new "creative" uses for such things, as expected. I'll keep patching them out.

“So the feature mostly punishes the exact people who are easier to fingerprint: normal developers doing weird but legitimate things”

What’s the punishment here exactly?

I don't understand the privacy concerns the author is trying to highlight. Granted, doing anything "sneaky" will always raise suspicious once caught, but on the other hand, there would be no point in implementing these "security features" if they were upfront about how they work.

And no, IMO stenography isn't security by obscurity, in the same that using RSA and keeping the private key private isn't security by obscurity - keeping the private thing private is part of the security model.

Can somebody clarify for me - if ANTHROPIC_BASE_URL is set to a different provider... then isn't this "marked" system prompt being sent to that provider's API rather than Anthropic's?

I understand how this can be useful to Anthropic if the 3rd-party is acting as a proxy (because they end up hitting the Claude API with the marked prompt), but it looks like requests where "hostname contains deepseek" would never be sending data to Anthropic. What am I missing?

I used Claude Code for a month because my boss gifted me a sub and wanted me to try it.

I used that month to complete a work project and then beef up my personal harness so I'd never have to deal with Anthropic (and these sorts of shenanigans) again.

This is very interesting. Combating resellers and distillation seems like a very difficult problem indeed. Interesting to me is that these techniques mentioned in the article are just like anti-observation techniques used by some of the more sophisticated malware out there, however defeating them is pretty trivial.

If they only collect the data for analysis I guess this is fine (they already get way more sensitive data from users anyways, so if privacy is your concern you've made the mistake many steps ago). The much more interesting question is if they directly act on this data in their API. For example by rate-limiting, compute-limiting or rerouting to weaker models. That might even be legally questionable. I would really like to see this as a follow-up analysis, but I guess it is way more difficult and will also cost quite a bit in tokens.

This is weird but, help me understand how this meaningfully impacts our exposure.

I'm authenticated to Claude, so they already have the whole attribution thing solved.

None of this is surprising - they're trying to mask and relay when they detect known patterns of what looks like distillation attacks and client app copying/modification. The list obfuscation here is likely to prevent or make it difficult for those same adversaries to work around this or delete/null it out when making a bootleg copy.

Cool reverse engineering/analysis report but if this is the extent of nefarious activity that came of it (trying to catch/mitigate chinese lab model distillations), that's kind of encouraging.

Claude code does feel very malwarey to be honest. They have been like that from the start.

This was already discovered during the source map leak.

> This is not a malicious feature, but it is a weird choice for a developer tool that asks for trust.

They already tell you they scan for malicious prompts, and they have no ZDR guarantees for consumers. Why do signatures like this matter at all?

That's a lot of effort when they could just play a short video saying 'You wouldn't steal a car' instead

(This sounds like a clumsy way of catching the Chinese that easily can be side-stepped.)

Claude Code has more or less full access to the client computer. The server (that hosts the actual AI) can just go: execute this payload and tell me the result - otherwise I won't answer any further questions or re-route you to a stupider model.

The payload could check for Chinese time-zones, scan for copies of the little red book on the local hard-drive, or ping truth.social to see it was behind the great firewall.

What's the point of even trying to obfuscate this with such a simple method? Could at least have hidden the targeted features by storing their hashes or embedding a bloom filter or similar

> "That also means the client itself deserves scrutiny. If a coding agent can read your repo and run commands, the binary that ships it should be boring (ƒor example, pi harness)"

You're actually trust your security to your harness AND model AND inference API provider in this scenario: https://jacob.gold/posts/why-i-wont-run-untrusted-models/

One more example of "I thought Anthropic was supposed to be the good guys."

It is about China detection. They seems to put a tracker on the email as well.

Anthropic must think that their moat isn't very large if they're this worried about distillation.

This seems really, really stupid. Similar to the weird Zig runtime signature thing from a few months ago ago, it was bound to be discovered, quickly, and all the resellers have to do is find a new domain name that (checks notes) doesn't have the word DEEPSEEK in it. Like, seriously? Your goal was to identify resellers by checking if the proxy has the corporate name of one of your competitors in it? Is this amateur hour?

All Anthropic has done is reduce trust, once again, with legitimate customers, while doing nothing to stop illegitimate customers. They need to get adults into key leadership roles, quickly.

Is this why Claude never knows what date and time it is right now?

If there weren't already enough tells that something is AI-generated, I guess you could add this to the list.

The AI race right now is in a sad state. Chinese's playbook is releases open weight models and trains them on their own chips.

Anthropic pushes fear and control. But the only way to win is by innovating. China is flooding the market with cheap, good enough models, while the U.S. is building a Chinese firewall.

It piqued my interest. I think I’ve found a weekend project

Frankly, I don't see this as the concerning behaviour the article describes. It is fine to try to protect against distillation through a technique like this. This will also allow them to, instead of blocking the distillation agents, respond with a poorer result/model, hindering the progress of distillation, momentarily at least.

I would guess that's their first line of defense; they should have more techniques to identify distillation because that's a very simple way of detecting the host and can be easily spoofed.

I clicked the link to learn what steganography mean...

Cool fingerprinting avenue.

this is the one they wanted us to find

Is it just a minified localization(l10n) function maybe?

Silicon valley season 6 was on point.

Headline is, frankly, awful. This isn't the AI secretly doing stuff and hiding it. This is the very human Anthropic engineers trying to detect Chinese scraping via some frankly hamfisted and unimaginative URL trickery.

Here's the sha of the prompt I submitted... no I don't know why there are no saved prompts with that sha.

What do you mean you don't know where the bug is coming from?

No, I absolutely didn't make it up, how could you accuse me of that?

Does anyone know when this regex isn't working? I double checked it 27 times, I even asked the LLM. They all say this regex should be finding these dates.

Weird, suddenly all the conversations are breaking when I feed them into this other tool? Something about UTF-8 errors, but I'm sure I'm only using ASCII?

I do try to take care to make sure the things I build can be used by other people even when they care about different things. I care about understandably, determinism (as it relates to computing), and repeatability (because I want to be able to trust the systems I use).

If y'all would be willing to try to account for use cases of others, and try not to break them... that would be nice.

Please note: that generally when you modify something that belongs to someone else without telling them... things should be expected to break.

The more I learn about Anthropic the more they disgust me. Finger crossed for all the companies from their “ban list”

Is that really how it is? How will this affect our future?

Cool reverse engineering/analysis report but if this is the extent of nefarious activity that came of it (trying to catch/mitigate chinese lab model distillations), that's kind of encouraging.

Claude code does feel very malwarey to be honest. They have been like that from the start.

This was already discovered during the source map leak.

> This is not a malicious feature, but it is a weird choice for a developer tool that asks for trust.

They already tell you they scan for malicious prompts, and they have no ZDR guarantees for consumers. Why do signatures like this matter at all?

It is about China detection. They seems to put a tracker on the email as well.

If there weren't already enough tells that something is AI-generated, I guess you could add this to the list.

It piqued my interest. I think I’ve found a weekend project

Cool fingerprinting avenue.

Is it just a minified localization(l10n) function maybe?

Here's the sha of the prompt I submitted... no I don't know why there are no saved prompts with that sha.

What do you mean you don't know where the bug is coming from?

No, I absolutely didn't make it up, how could you accuse me of that?

Does anyone know when this regex isn't working? I double checked it 27 times, I even asked the LLM. They all say this regex should be finding these dates.

Weird, suddenly all the conversations are breaking when I feed them into this other tool? Something about UTF-8 errors, but I'm sure I'm only using ASCII?

If y'all would be willing to try to account for use cases of others, and try not to break them... that would be nice.

Please note: that generally when you modify something that belongs to someone else without telling them... things should be expected to break.

It's released and signed by GitHub I believe (although not deterministic builds), but there's at least a little bit of provenance that you're getting the real repository.

“So the feature mostly punishes the exact people who are easier to fingerprint: normal developers doing weird but legitimate things”

What’s the punishment here exactly?

Higher odds of being banned for legitimate usage.

They probably run a heavily dumbed down version of the model, same as what they got caught doing with Fable.

And that's also why, as a legitimate customer, want none of it, you never know if you accidentally entered a zone they don't like.

Output poisoning and/or eventual account bans, if I had to guess.

Can somebody clarify for me - if ANTHROPIC_BASE_URL is set to a different provider... then isn't this "marked" system prompt being sent to that provider's API rather than Anthropic's?

This catches Claude resellers. Meaning companies who proxy Claude traffic for users in, say, China.

https://www.chinatalk.media/p/how-to-buy-cheap-claude-tokens...

My guess is for distillation, they need to forward the prompt to Anthropic to get the real Anthropic model's response so they can train their own models on it

The theory is probably Deepseek might be collecting those streams, and sending a portion of it to Anthropic to see what the Anthropic/Opus response would be.

I used Claude Code for a month because my boss gifted me a sub and wanted me to try it.

I used that month to complete a work project and then beef up my personal harness so I'd never have to deal with Anthropic (and these sorts of shenanigans) again.

Given the Anthropic shenanigans, do you trust the personal harness code it wrote for you?

How do people build something like a personal harness? Are there tools for that or is it done from scratch?

What models are you using? Aren’t you still dealing with some provider even if you are not using their binary

Agree - Claude code is helping me set up a bunch of local llm rigging. For some home stuff, I can now delegate locally.

The agentic harness on the open source side does need some work, however.

Yes, defeating this is relatively easy, particularly for sophisticated actors. But it's hard to always defeat all of the tricks. Sort of like how it's expensive and hard and uncertain to defeat all of the tricks when forging money.

Here's an example. Say you have your team use patched binaries. Then CC updates and requires a new patched binary with new tricks. You now have to have a team ready to analyze the binary and begin to address the tricks; meanwhile, unpatched code is now a fingerprint. If some researcher decides to update Claude on their own to access new features, they get fingerprinted.

Defeating a single fingerprinting technique once is easy. Defeating all of the techniques all the time is hard.

seems ironically like a similar problem of content owners trying to filter bot scrapers from legit users

Would it be legally questionable, or actually complying with U.S. export law?

I've heard that it was possible to trigger really obvious output poisoning on Fable with something as basic as asking the model to think outside of its built-in hidden thinking delimiters.

This watermark may trigger a similar mechanism.

What's the point of even trying to obfuscate this with such a simple method? Could at least have hidden the targeted features by storing their hashes or embedding a bloom filter or similar

The point is not raising red flags I guess

I would guess that's their first line of defense; they should have more techniques to identify distillation because that's a very simple way of detecting the host and can be easily spoofed.

> This will also allow them to, instead of blocking the distillation agents, respond with a poorer result/model,

i.e. this will allow them to literally commit fraud against paying customers

The more I learn about Anthropic the more they disgust me. Finger crossed for all the companies from their “ban list”

Which AI company have you learned more about where you liked them more as more details came out?

Is that really how it is? How will this affect our future?

It's released and signed by GitHub I believe (although not deterministic builds), but there's at least a little bit of provenance that you're getting the real repository.

Higher odds of being banned for legitimate usage.

They probably run a heavily dumbed down version of the model, same as what they got caught doing with Fable.

And that's also why, as a legitimate customer, want none of it, you never know if you accidentally entered a zone they don't like.

Output poisoning and/or eventual account bans, if I had to guess.

This catches Claude resellers. Meaning companies who proxy Claude traffic for users in, say, China.

https://www.chinatalk.media/p/how-to-buy-cheap-claude-tokens...

My guess is for distillation, they need to forward the prompt to Anthropic to get the real Anthropic model's response so they can train their own models on it

The theory is probably Deepseek might be collecting those streams, and sending a portion of it to Anthropic to see what the Anthropic/Opus response would be.

Given the Anthropic shenanigans, do you trust the personal harness code it wrote for you?

Would it be legally questionable, or actually complying with U.S. export law?

I've heard that it was possible to trigger really obvious output poisoning on Fable with something as basic as asking the model to think outside of its built-in hidden thinking delimiters.

This watermark may trigger a similar mechanism.

The point is not raising red flags I guess

> This will also allow them to, instead of blocking the distillation agents, respond with a poorer result/model,

i.e. this will allow them to literally commit fraud against paying customers

Which AI company have you learned more about where you liked them more as more details came out?

How do people build something like a personal harness? Are there tools for that or is it done from scratch?

I started mine from scratch in 2023 because I wanted to use LLMs from a terminal and there was nothing else compelling at the time (nowadays there is pi and opencode)

Harnesses are/can be incredibly simple things, not much more than a HTTP client that renders things in a way that suites your taste.

Not the comment author, but I use pi and customize it with my own extensions. Pi automatically tells models how to customize itself, so it's a pretty easy process.

It’s not that difficult, it’s just a system prompt and a set of basic file edit/bash/etc tools.

Me, personally, I didn’t build it from scratch but I ported original CC from published sources into Python and extended it to match my own requirements.

Build it from scratch. Understanding fundamentals of how agentic coding harnesses is a must though if you gonna go that route. I think everyone should take time and learn these things, maybe reverse engineer Codex Cli or something like that as a starter. That info is very valuable in this day and age.

Why use a personal harness?

You have to pay API pricing, which is far more costly.

I'd either switch to GLM wholesale or just continue to use Opus within Claude Code as the blessed, subsidized path.

I started mine from scratch in 2023 because I wanted to use LLMs from a terminal and there was nothing else compelling at the time (nowadays there is pi and opencode)

Harnesses are/can be incredibly simple things, not much more than a HTTP client that renders things in a way that suites your taste.

Not the comment author, but I use pi and customize it with my own extensions. Pi automatically tells models how to customize itself, so it's a pretty easy process.

It’s not that difficult, it’s just a system prompt and a set of basic file edit/bash/etc tools.

Me, personally, I didn’t build it from scratch but I ported original CC from published sources into Python and extended it to match my own requirements.

Why use a personal harness?

You have to pay API pricing, which is far more costly.

I'd either switch to GLM wholesale or just continue to use Opus within Claude Code as the blessed, subsidized path.

I love how well this comment works as a vexillology joke, even if it wasn't intended.

1st, this technique is not fraud, and fraud is a separate accusation. 2nd, paying customers can legally and legitimately be banned and monitored for breaking terms of service, which probably includes things like using the model against U.S. export restrictions.

That's what capitalism is all about, baby! Especially if the customers don't notice.

Agree - Claude code is helping me set up a bunch of local llm rigging. For some home stuff, I can now delegate locally.

The agentic harness on the open source side does need some work, however.

seems ironically like a similar problem of content owners trying to filter bot scrapers from legit users

I love how well this comment works as a vexillology joke, even if it wasn't intended.

That's what capitalism is all about, baby! Especially if the customers don't notice.

What models are you using? Aren’t you still dealing with some provider even if you are not using their binary

I self-host DeepSeek V4 Flash on 2 DGX Sparks (approx. $10k)

I expect DeepSeek V4 Flash (or an equivalently sized model) to reach parity with GLM 5.2 some time this year (this based on DeepSeek V4 Flash launching at GLM 5.0 parity[0], and GLM 5.2 being freely available to distill from)

GLM 5.2 is within spitting distance of Opus 4.8 and is at least as good as Opus 4.6[1] which some devs were willing to spend hundreds to single-digit thousands of dollars a month for a few months ago.

[0]: https://artificialanalysis.ai/models/comparisons/deepseek-v4...

[1]: https://artificialanalysis.ai/models/comparisons/claude-opus...

Defeating a single fingerprinting technique once is easy. Defeating all of the techniques all the time is hard.

Not to mention, it isn't that hard for vendor's to require updated code to run the product. Vendors do this all the time.

Is it hard? Just ask AI if the update added any new fingerprinting vectors?

I reported a similar system prompt injection mechanism here:

https://news.ycombinator.com/item?id=48259288

https://github.com/anthropics/claude-code/issues/62061

Looks like they just keep finding new "creative" uses for such things, as expected. I'll keep patching them out.

The AI race right now is in a sad state. Chinese's playbook is releases open weight models and trains them on their own chips.

Anthropic pushes fear and control. But the only way to win is by innovating. China is flooding the market with cheap, good enough models, while the U.S. is building a Chinese firewall.

All Anthropic has done is reduce trust, once again, with legitimate customers, while doing nothing to stop illegitimate customers. They need to get adults into key leadership roles, quickly.

Is this why Claude never knows what date and time it is right now?

One more example of "I thought Anthropic was supposed to be the good guys."

> "That also means the client itself deserves scrutiny. If a coding agent can read your repo and run commands, the binary that ships it should be boring (ƒor example, pi harness)"

You're actually trust your security to your harness AND model AND inference API provider in this scenario: https://jacob.gold/posts/why-i-wont-run-untrusted-models/

this is the one they wanted us to find

That's a lot of effort when they could just play a short video saying 'You wouldn't steal a car' instead

Silicon valley season 6 was on point.

Not to mention, it isn't that hard for vendor's to require updated code to run the product. Vendors do this all the time.

(This sounds like a clumsy way of catching the Chinese that easily can be side-stepped.)

The payload could check for Chinese time-zones, scan for copies of the little red book on the local hard-drive, or ping truth.social to see it was behind the great firewall.

> Claude Code has more or less full access to the client computer.

It shouldn't, not if you run CC as a separate unprivileged user. I wouldn't run CC on my main user account with sudo and access to my home directory or other resources. This is what the UNIX permissions system was designed for.

After loving Claude Code for most of its lifetime, I've been extremely annoyed by every change in the past months, even on the model level.

curious for those with experience - what do people prefer about Pi vs. opencode alternatives? i've mostly been using pi as well but not out of any principled decision

Given the source code leak, I would think there'd be open source versions by now.

> I've been using Pi with GLM5.2 the past few days, and though it's expensive

are you using the API for glm 5.2 or how exactly is it more expensive? How is GLM5.2 more expensive than using Claude code, that doesn't line up to my experience but to be fair I am on an older yearly subscription which generously only has 5 hour limits.

To be fair though one minor criticism of GLM 5.2 that I have is that it does seem to overthink quite a lot sometimes but the results end up being (good?),

I personally have used Glm 5.2 with (Opencode + obra/superpowers) / Oh-my-pi / Maki.sh

I like the 1st one when I am doing a longer project, the 2nd or 3rd one when I am doing a project which doesn't want me to ask too many questions and simply spin me up something. I sometimes use free online interfaces of claude and gemini and others like AIstudio for that as well which surprisingly can lead you to go far as well.

Overall, I am decently happy with the state of Open-source models actually and the eco-system around it is probably gonna have even more innovation surrounding it.

Anthropic must think that their moat isn't very large if they're this worried about distillation.

Dario's been openly talking how worried he is about China and labs getting synthetic training data off their models, for years. Most recently in relation to "Mythos level" capabilities.

Not really distillation, just synthetic training data.

Anthopic choosing to delay their models' invevitable distillation by competitors is their prerogative.

That they choose to implement it by fingerprinting my access patterns without first disclosing is where they shit the bed. It isn't "sneaky" it's straight up sneaky and dishonest at that. That this particular instance is harmless doesn't give me much comfort. Who's to say they aren't harvesting PII?

If the countries were reversed, and some Chinese software implemented an equivalent "security feature" to track US users, it would be all over the news about how China is conducting spying and espionage on America.

Or maybe you don't understand this hypothetical situation either, but I'm suspecting you just don't care about other people's privacy.

I clicked the link to learn what steganography mean...

Steganography is, essentially, hiding information within another message, such that it's not readily apparent that the message contains the information.