No, it doesn't cost Anthropic $5k per Claude Code user

If Anthropic's compute is fully saturated then the Claude code power users do represent an opportunity cost to Anthropic much closer to $5,000 then $500.

Anthropic's models may be similar in parameter size to model's on open router, but none of the others are in the headlines nearly as much (especially recently) so the comparison is extremely flawed.

The argument in this article is like comparing the cost of a Rolex to a random brand of mechanical watch based on gear count.

> Qwen 3.5 397B-A17B is a good comparison

It is not. It's a terrible comparison. Qwen, deepseek and other Chinese models are known for their 10x or even better efficiency compared to Anthropic's.

That's why the difference between open router prices and those official providers isn't that different. Plus who knows what open routed providers do in term quantization. They may be getting 100x better efficiency, thus the competitive price.

That being said not all users max out their plan, so it's not like each user costs anthropic 5,000 USD. The hemoragy would be so brutal they would be out of business in months

What people don't realize is that cache is *free*, well not free, but compared to the compute required to recompute it? Relatively free.

If you remove the cached token cost from pricing the overall api usage drops from around $5000 to $800 (or $200 per week) on the $200 max subscription. Still 4x cheaper over API, but not costing money either - if I had to guess it's break even as the compute is most likely going idle otherwise.

How confident are you in the opus 4.6 model size? I've always assumed it was a beefier model with more active params that Qwen397B (17B active on the forward pass)

I calculated only last weekend that my team would cost, if we would run Claude Code on retail API costs, around $200k/mo. We pay $1400/month in Max subscriptions. So that's $50k/user... But what tokens CC is reporting in their json -> a lot of this must be cached etc, so doubt it's anywhere near $50k cost, but not sure how to figure out what it would cost and I'm sure as hell not going to try.

the openrouter comparison is interesting because it shows what happens when you have actual supply-side competition. multiple providers, different quantizations, price competition. the spread between cheapest and priciest for the same model can be 3-5x.

anthropic doesn't have that. single provider, single pricing decision. whether or not $5k is accurate the more interesting question is what happens to inference pricing when the supply side is genuinely open. we're seeing hints of it with open router but its still intermediated

not saying this solves anthropic's cost problem, just that the "what does inference actually cost" question gets a lot more interesting when providers are competing directly

Well, IDK, I have used CC with API billing pretty extensively and managed to spend ~$1000 in one month more or less. Moved to a Max 20x subscription and using it a bit less (I'm still scared) but not THAT less and I'm around 10% weekly usage. I'm not counting the tokens, though.

Good article! Small suggestions:

1. It would be nice to define terms like RSI or at least link to a definition.

2. I found the graph difficult to read. It's a computer font that is made to look hand-drawn and it's a bit low resolution. With some googling I'm guessing the words in parentheses are the clouds the model is running on. You could make that a bit more clear.

This is such a well-written essay. Every line revealed the answer to the immediate question I had just thought of

These margins are far greater than the ones Dario has indicated during many of his recent podcasts appearances.

By the way, one of the charts in the article shows that Opus 4.6 is 10x costlier than Kimi K2.5.

I thought there was no moat in AI? Even being 10x costlier, Anthropic still doesn't have enough compute to meet demand.

Those "AI has no moat" opinions are going to be so wrong so soon.

Is it fair to say the Open Router models aren't subsidized though? They make the case that companies on there are running a business, but there are free models, and companies with huge AI budgets that want to gather training data and show usage.

This article is hilariously flawed, and it takes all of 5 seconds of research to see why.

Alibaba is the primary comparison point made by the author, but it's a completely unsuitable comparison. Alibab is closer to AWS then Anthropic in terms of their business model. They make money selling infrastructure, not on inference. It's entirely possible they see inference as a loss leader, and are willing to offer it at cost or below to drive people into the platform.

We also have absolutely no idea if it's anywhere near comparable to Opus 4.6. The author is guessing.

So the articles primary argument is based on a comparison to a company who has an entirely different business model running a model that the author is just making wild guesses about.

Nobody gets RSI typing “iterate until tests pass”

Was anyone under the impression that it does? Serious question. I've never heard that, personally.

Ok but so it does cost Cursor $5k per power-Cursor user?? Still seems pretty rough..

> I'm fairly confident the Forbes sources are confusing retail API prices with actual compute costs

Aren't they losing money on the retail API pricing, too?

> ... comparisons to artificially low priced Chinese providers...

Yeah, no this article does not pass the sniff test.

not saying this solves anthropic's cost problem, just that the "what does inference actually cost" question gets a lot more interesting when providers are competing directly

Good article! Small suggestions:

1. It would be nice to define terms like RSI or at least link to a definition.

If Anthropic's compute is fully saturated then the Claude code power users do represent an opportunity cost to Anthropic much closer to $5,000 then $500.

Anthropic's models may be similar in parameter size to model's on open router, but none of the others are in the headlines nearly as much (especially recently) so the comparison is extremely flawed.

The argument in this article is like comparing the cost of a Rolex to a random brand of mechanical watch based on gear count.

But opportunity cost is not actual cost. “If everyone just kept paying but used our service less we would be more profitable” is true, but not in any meaningful way.

Are Anthropic currently unable to sell subscriptions because they don’t have capacity?

Don’t give them any ideas, please! I need my 100 USD subscription with generous Opus usage!

> If Anthropic's compute is fully saturated then the Claude code power users do represent an opportunity cost to Anthropic much closer to $5,000 then $500.

I think it's the other way around? Sparse use of GPU farms should be the more expensive thing. Full saturation means that we can exploit batching effects throughout.

Opportunity cost is not the same thing as actual cost. They might have made more money if they were capable of selling the API instead of CC, but I would never tell my company to use CC all the time if I didn’t have a personal subscription.

> The argument in this article is like comparing the cost of a Rolex to a random brand of mechanical watch on gear count

I mean... rolex is overpriced brand whose cost to consumers is mainly just marketting in itself. Its production cost is nowhere close to selling price and looking at gears is fair way of evaluating that

You can rent the GPUs and everything needed to run the model. Opportunity cost is not a real cost here.

> Qwen 3.5 397B-A17B is a good comparison

It is not. It's a terrible comparison. Qwen, deepseek and other Chinese models are known for their 10x or even better efficiency compared to Anthropic's.

That being said not all users max out their plan, so it's not like each user costs anthropic 5,000 USD. The hemoragy would be so brutal they would be out of business in months

That's a tautology. People think chinese models are 10x more efficient because they're 10x cheaper, and then you use that to claim that they're 10x more efficient.

Opus isn't that expensive to host. Look at Amazon Bedrock's t/s numbers for Opus 4.5 vs other chinese models. They're around the same order of magnitude- which means that Opus has roughly the same amount of active params as the chinese models.

Also, you can select BF16 or Q8 providers on openrouter.

> That being said not all users max out their plan,

These are not cell phone plans which the average joe takes, they are plans purchased with the explicit goal of software development.

I would guess that 99 out of every 100 plans are purchased with the explicit goal of maxing them out.

>It is not. It's a terrible comparison. Qwen, deepseek and other Chinese models are known for their 10x or even better efficiency compared to Anthropic's.

I find it a good comparison because it is a good baseline since we have zero insider knowledge of Anthropic. They give me an idea that a certain size of a model has a certain cost associated.

I don't buy the 10x efficiency thing: they are just lagging behind the performance of current SOTA models. They perform much worse than the current models while also costing much less - exactly what I would expect. Current Qwen models perform as good as Sonnet 3 I think. 2 years later when Chinese models catchup with enough distillation attacks, they would be as good as Sonnet 4.6 and still be profitable.

What people don't realize is that cache is *free*, well not free, but compared to the compute required to recompute it? Relatively free.

> [...] if I had to guess it's break even as the compute is most likely going idle otherwise.

Why would it go idle? It would go to their next best use. At least they could help with model training or let their researchers run experiments etc.

How confident are you in the opus 4.6 model size? I've always assumed it was a beefier model with more active params that Qwen397B (17B active on the forward pass)

Yeah that's a massive assumption they're making. I remember musk revealed Grok was multiple trillion parameters. I find it likely Opus is larger.

I'm sure Anthropic is making money off the API but I highly doubt it's 90% profit margins.

Even if it's larger, OpenRouter has DeepSeek v3.2 (685B/37B active) at $0.26/0.40 and Kimi K2.5 (1T/32B active) at $0.45/2.25 (mentioned in the post).

Also curious if any experts can weigh in on this. I would guess in the 1 trillion to 2 trillion range.

You can use `npx ccusage` to check your local logs and see how much it would have cost through the API.

Ask Opus to figure out how much it would cost. Lol.

This is such a well-written essay. Every line revealed the answer to the immediate question I had just thought of

I can’t get past all the LLM-isms. Do people really not care about AI-slopifying their writing? It’s like learning about bad kerning, you see it everywhere.

These margins are far greater than the ones Dario has indicated during many of his recent podcasts appearances.

By the way, one of the charts in the article shows that Opus 4.6 is 10x costlier than Kimi K2.5.

I thought there was no moat in AI? Even being 10x costlier, Anthropic still doesn't have enough compute to meet demand.

Those "AI has no moat" opinions are going to be so wrong so soon.

Claude Code Max obviously doesn't cost 10x more than Kimi. The article even confirms that you can get $5k worth of computer for $200 with Claude Code Max.

So no, Claude would not be getting NEARLY as much usage as it's currently getting if it weren't for the $100/$200 monthly subscription. You're comparing Kimi to the price that most people aren't paying.

This article is hilariously flawed, and it takes all of 5 seconds of research to see why.

We also have absolutely no idea if it's anywhere near comparable to Opus 4.6. The author is guessing.

So the articles primary argument is based on a comparison to a company who has an entirely different business model running a model that the author is just making wild guesses about.

What? Aws is a good comparison if you want only infra level costs which is what the post is talking about.

Was anyone under the impression that it does? Serious question. I've never heard that, personally.

Ed Zitron made that claim (in particular here: [1]). In the same article he admits he not a programmer, and had to ask someone else to try out Claude Code and ccusage for him. He doesn't have any understanding of how LLMs or caching works. But he's prominent because he's received leaked financial details for Anthropic and OpenAI, eg [2]

[1] https://www.wheresyoured.at/anthropic-is-bleeding-out/ [2] https://www.wheresyoured.at/costs/

You would be surprised because there are lots of posters here who think that the cost is so enormous that this whole industry is unviable.

> If Anthropic's compute is fully saturated then the Claude code power users do represent an opportunity cost to Anthropic much closer to $5,000 then $500.

I think it's the other way around? Sparse use of GPU farms should be the more expensive thing. Full saturation means that we can exploit batching effects throughout.

You can rent the GPUs and everything needed to run the model. Opportunity cost is not a real cost here.

That's a tautology. People think chinese models are 10x more efficient because they're 10x cheaper, and then you use that to claim that they're 10x more efficient.

Also, you can select BF16 or Q8 providers on openrouter.

>It is not. It's a terrible comparison. Qwen, deepseek and other Chinese models are known for their 10x or even better efficiency compared to Anthropic's.

I find it a good comparison because it is a good baseline since we have zero insider knowledge of Anthropic. They give me an idea that a certain size of a model has a certain cost associated.

Claude Code Max obviously doesn't cost 10x more than Kimi. The article even confirms that you can get $5k worth of computer for $200 with Claude Code Max.

What? Aws is a good comparison if you want only infra level costs which is what the post is talking about.

[1] https://www.wheresyoured.at/anthropic-is-bleeding-out/ [2] https://www.wheresyoured.at/costs/

You would be surprised because there are lots of posters here who think that the cost is so enormous that this whole industry is unviable.

I mean, the very first paragraph of TFA is describing who is under that impression. Literally the first sentence:

> My LinkedIn and Twitter feeds are full of screenshots from the recent Forbes article on Cursor claiming that Anthropic's $200/month Claude Code Max plan can consume $5,000 in compute.

Nobody gets RSI typing “iterate until tests pass”

Recursive self improvement and Repetitive Strain Injury being the same initialism is really funny to me

Honest questions: have you never heard of a hyperbole before and are you on the spectum?

> I'm fairly confident the Forbes sources are confusing retail API prices with actual compute costs

Aren't they losing money on the retail API pricing, too?

> ... comparisons to artificially low priced Chinese providers...

Yeah, no this article does not pass the sniff test.

> Aren't they losing money on the retail API pricing, too?

No, they aren't, and probably neither is anyone else offering API pricing. And Anthropic's API margins may be higher than anyone else.

For example, DeepSeek released numbers showing that R1 was served at approximately "a cost profit margin of 545%" (meaning 82% of revenue is profit), see my comment https://news.ycombinator.com/item?id=46663852

Ok but so it does cost Cursor $5k per power-Cursor user?? Still seems pretty rough..

Yes, you could turn it around to say that using Anthropic models in Cursor, Copilot, Junie, etc. is 'subsidising' Claude Code users.

$5 = $5

but $5 that I amortize over 7 years might end up being $1.7 maybe if I don't rapidly combust (supply chain risk)

I wonder how they are defining a power user. How many tokens, what could be the size the code base?

No, to use $5k in Cursor you have to pay $5k.

But opportunity cost is not actual cost. “If everyone just kept paying but used our service less we would be more profitable” is true, but not in any meaningful way.

Are Anthropic currently unable to sell subscriptions because they don’t have capacity?

> Are Anthropic currently unable to sell subscriptions because they don’t have capacity?

Absolutely! Im currently paying $170 to google to use Opus in antigravity without limit in full agent mode, because I tried Anthropic $20 subscription and busted my limit within a single prompt. Im not gonna pay them $200 only to find out I hit the limit after 20 or even 50 prompts.

And after 2 more months my price is going to double to over $300, and I still have no intention of even trying the 20x Max plan, if its really just 20x more prompts than Pro.

Opportunity costs are real. In many cases they are more real than 'actual costs'. However, I otherwise agree with you.

Don’t give them any ideas, please! I need my 100 USD subscription with generous Opus usage!

Google's Antigravity has Opus access, and I suspect it's subsidised.

You’re looking through the wrong end of the telescope. An investor is buying opportunity and it is a real cost to them.

> The argument in this article is like comparing the cost of a Rolex to a random brand of mechanical watch on gear count

> production cost is nowhere close to selling price

When has production cost had anything to do with selling price?

> That being said not all users max out their plan,

These are not cell phone plans which the average joe takes, they are plans purchased with the explicit goal of software development.

I would guess that 99 out of every 100 plans are purchased with the explicit goal of maxing them out.

I’m not maxing them out… I have issues that I need to fix, features I need to develop, and I have things I want to learn.

When I have a feeling that these tools will speed me up, I use them.

My client pays for a couple of these tools in an enterprise deal, and I suspect most of us on the team work like that.

If my goal was to max out every tool my client pays, I’d be working 24hrs a day and see no sunlight ever.

I guess it’s like the all you can eat buffet. Everybody eats a lot, but if you eat so much that you throw up and get sick, you are special.

My employer bought me a Claude Max subscription. On heavy weeks I use 80% of the subscription. And among software engineers that I know, I'm a relatively heavy user.

Why? Because in my experience, the bottleneck is in shareholders approving new features, not my ability to dish out code.

goal? yeah. but in reality just timing it right (starting a session at 7-8am, to get 2 sessions in a workday, or even 3 if you can schedule something at 5am), i rarely hit limits.

if i hit the limit usually i'm not using it well and hunting around. if i'm using it right i'm basically gassed out trying to hit the limit to the max.

There’s absolutely no way that’s true.

In saas this is not true. Most saas is highly profitable or was i suppose because they knew that most of their customers would never max out their plans.

> [...] if I had to guess it's break even as the compute is most likely going idle otherwise.

Why would it go idle? It would go to their next best use. At least they could help with model training or let their researchers run experiments etc.

inference compute is vastly different versus training, also it has to stay hot in vram which probably takes up most of it. There is limited use for THAT much compute as well, they are running things like claude code compiler and even then they're scratching the surface of the amount of compute they have.

Training currently requires nvidia's latest and greatest for the best models (they also use google TPU's now which are also technically the latest and greatest? However, they're more of a dual purpose than anything afaik so that would be a correct assesment in that case)

Inference can run on a hot potato if you really put your mind to it

Recursive self improvement and Repetitive Strain Injury being the same initialism is really funny to me

Honest questions: have you never heard of a hyperbole before and are you on the spectum?

Yes, you could turn it around to say that using Anthropic models in Cursor, Copilot, Junie, etc. is 'subsidising' Claude Code users.

$5 = $5

but $5 that I amortize over 7 years might end up being $1.7 maybe if I don't rapidly combust (supply chain risk)

No, to use $5k in Cursor you have to pay $5k.

> Are Anthropic currently unable to sell subscriptions because they don’t have capacity?

And after 2 more months my price is going to double to over $300, and I still have no intention of even trying the 20x Max plan, if its really just 20x more prompts than Pro.

Opportunity costs are real. In many cases they are more real than 'actual costs'. However, I otherwise agree with you.

Google's Antigravity has Opus access, and I suspect it's subsidised.

You’re looking through the wrong end of the telescope. An investor is buying opportunity and it is a real cost to them.

You can use `npx ccusage` to check your local logs and see how much it would have cost through the API.

Ask Opus to figure out how much it would cost. Lol.

> Aren't they losing money on the retail API pricing, too?

No, they aren't, and probably neither is anyone else offering API pricing. And Anthropic's API margins may be higher than anyone else.

Weird that they're all looking for outside money then

I wonder how they are defining a power user. How many tokens, what could be the size the code base?

The $5k power user is the one that consistently uses all input and output tokens available under the Max subscription

> production cost is nowhere close to selling price

When has production cost had anything to do with selling price?

Not directly. But if production cost is above selling price, you typically tend to get less production. And if production cost is (way) below selling price, that tends to invite competition.

Yeah that's a massive assumption they're making. I remember musk revealed Grok was multiple trillion parameters. I find it likely Opus is larger.

I'm sure Anthropic is making money off the API but I highly doubt it's 90% profit margins.

> I find it likely Opus is larger.

Unlikely. Amazon Bedrock serves Opus at 120tokens/sec.

If you want to estimate "the actual price to serve Opus", a good rough estimate is to find the price max(Deepseek, Qwen, Kimi, GLM) and multiply it by 2-3. That would be a pretty close guess to actual inference cost for Opus.

It's impossible for Opus to be something like 10x the active params as the chinese models. My guess is something around 50-100b active params, 800-1600b total params. I can be off by a factor of ~2, but I know I am not off by a factor of 10.

You can estimate on tok/second

The Trillions of parameters claim is about the pretraining.

It’s most efficient in pre training to train the biggest models possible. You get sample efficiency increase for each parameter increase.

However those models end up very sparse and incredibly distillable.

And it’s way too expensive and slow to serve models that size so they are distilled down a lot.

Anthropic CEO said 50%+ margins in an interview. I'm guessing 50 - 60% right now.

Even if it's larger, OpenRouter has DeepSeek v3.2 (685B/37B active) at $0.26/0.40 and Kimi K2.5 (1T/32B active) at $0.45/2.25 (mentioned in the post).

Opus 4.6 likely has in the order of 100B active parameters. OpenRouter lists the following throughput for Google Vertex:

    42 tps for Claude Opus 4.6 https://openrouter.ai/anthropic/claude-opus-4.6
    143 tps for GLM 4.7 (32B active parameters) https://openrouter.ai/z-ai/glm-4.7
    70 tps for Llama 3.3 70B (dense model) https://openrouter.ai/meta-llama/llama-3.3-70b-instruct

For GLM 4.7, that makes 143 * 32B = 4576B parameters per second, and for Llama 3.3, we get 70 * 70B = 4900B, which makes sense since denser models are easier to optimize. As a lower bound, we get 4576B / 42 ≈ 109B active parameters for Opus 4.6. (This makes the assumption that all three models use the same number of bits per parameter and run on the same hardware.)

Also curious if any experts can weigh in on this. I would guess in the 1 trillion to 2 trillion range.

Try 10s of trillions. These days everyone is running 4-bit at inference (the flagship feature of Blackwell+), with the big flagship models running on recently installed Nvidia 72gpu rubin clusters (and equivalent-ish world size for those rented Ironwood TPUs Anthropic also uses). Let's see, Vera Rubin racks come standard with 20 TB (Blackwell NVL72 with 10 TB) of unified memory, and NVFP4 fits 2 parameters per btye...

Of course, intense sparsification via MoE (and other techniques ;) ) lets total model size largely decouple from inference speed and cost (within the limit of world size via NVlink/TPU torrus caps)

So the real mystery, as always, is the actual parameter count of the activated head(s). You can do various speed benchmarks and TPS tracking across likely hardware fleets, and while an exact number is hard to compute, let me tell you, it is not 17B or anywhere in that particular OOM :)

Comparing Opus 4.6 or GPT 5.4 thinking or Gemini 3.1 pro to any sort Chinese model (on cost) is just totally disingenuous when China does NOT have Vera Rubin NVL72 GPUs or Ironwood V7 TPUs in any meaningful capacity, and is forced to target 8gpu Blackwell systems (and worse!) for deployment.

I can’t get past all the LLM-isms. Do people really not care about AI-slopifying their writing? It’s like learning about bad kerning, you see it everywhere.

I had a similar reaction to OP for a different post a few weeks back - I think some analysis on the health economy. Initially as I was reading I thought - "Wow, I've never read a financial article written so clearly". Everything in layman's terms. But as I continued to read, I began to notice the LLM-isms. Oversimplified concepts, "the honest truth" "like X for Y", etc.

Maybe the common factor here is not having deep/sufficient knowledge on the topic being discussed? For the article I mentioned, I feel like I was less focused on the strength of the writing and more on just understanding the content.

LLMs are very capable at simplifying concepts and meeting the reader at their level. Personally, I subscribe to the philosophy of - "if you couldn't be bothered to write it, I shouldn't bother to read it".

I think you're just hallucinating because this does not come across as an AI article

It is certainly very obvious a lot of the time. I wonder if we revisited the automated slop detection problem we’d be more successful now… it feels like there are a lot more tells and models have become more idiosyncratic.

People care, when they can tell.

Popular content is popular because it is above the threshold for average detection.

In a better world, platforms would empower defenders, by granting skilled human noticers flagging priority, and by adopting basic classifiers like Pangram.

Unfortunately, mainstream platforms have thus far not demonstrated strong interest in banning AI slop. This site in particular has actually taken moderation actions to unflag AI slop, in certain occasions...

I don’t see the usual tells in this essay