It really depends how you use it, if you're using prompts to generate detailed designs, breaking those into lists of tasks, and then feeding those to multiple agents - it's really easy to burn through many thousands.
If you're being more deliberate and using a few agents at a time interactively, having it review PRs/resolve issues, automated clean-ups and performance optimization, etc it could be more like $1500.
If you're just throwing it one-off questions like a better stack-overflow that is well under a $100.
I've really gotten into /goal, if you can find something verifiable and leave it overnight - it's kinda like christmas morning to see where it landed.
Uber’s COO says it’s getting harder to justify money spent on tokenmaxxing
https://news.ycombinator.com/item?id=48268871
Uber torches 2026 AI budget on Claude Code in four months
https://news.ycombinator.com/item?id=47976415
Corporate America Is Starting to Ration AI as Cost Skyrockets
Probably even less because you would spend those 1500 extra per employee also if you just save 10% so 150 per employee that’s 1.5% on salary.
This is imho one of the best ranges we can assume for now how much would that be on the whole swe market?
Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?
Many lower-budget individuals are now moving to China open weight models like DeepSeek. I wonder if China's really subsidising the providers, or if inferencing costs are actually much lower, and Anthropic/OpenAI are just making sure no money's left on the table for their eventual IPOs.
1) Don't ask LLMs for big changes
2) Review everything and point them in the right direction
Large models still suck at big changes, they produce questionable architecture and you still have to review the code, if your project is serious enough.
The codebase quickly become a mess, if you don't pay enough attention. Does not matter which model.
So why bother with big models, when flash models are 10x cheaper and much faster to iterate under guidance? Large models can be used for security and bug audits. Flash models work almost the same for changes under 300 LOC when you dictate how you want your code to look.
Maybe Microsoft and Nvidia are on to something.
128 GB machines that can run local LLMs are a bargain even if priced $5-8k. Yes, tok/s is not quite there, but that's probably OK since the bottleneck really isn't the code; it's WTF did Uber build with all of that spend? How did it meaningfully impact their revenue in a positive direction?
> I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI - which currently costs me just $100 per provider thanks to their generous subsidized plans for individual subscribers.
This whole article seems to me like Multi level marketing "businesses" where 'Diamonds' have made their money by promoting MLM in seminars and telling hopefuls at bottom that "Buying AI subscription now is their one shot to be a winner in life"
Perhaps there is something to MLM vs LLM to create a FOMO effect.
They can't say that $0 per employee is the appropriate amount for AI spending. So they capped it, perhaps in order to "send a signal" that is eagerly picked up by the AI boosters.
There is no signal. Uber does not work any better since AI. They still want to promote AI, so they chose the highest number that doesn't bankrupt them so the press and AI promoters pick it up as the new price anchor.
Probably they'll quietly reduce the number more soon.
The reason, I use F# & Clojure is they hit JVM and CLR, two popular enterprise stacks.
In my not so humble opinion Lisp(Clojure) still remains the language of AI.
That being said, I do have to wonder why someone as bug as say Uber, simply not rollout OSS model in the cloud for their team, I'd imagine that would be cheapest & most flexible option, while also keeping all the data shared with LLM private.
I think the frontier labs will need to drop their high per-token prices at least for their low and mid-level models for the reason that several Chinese models (at least Qwen, DeepSeek, Kimi and GLM) are "close enough" that with the right harness they are cost effective alternatives.
They won't necessarily need to close the gap - at least not yet -, because these models won't necessarily compete at the same token counts. E.g. at least some of them need to do far more work to solve the same problems.
But, yeah, the prices will come down one way or the other.
At the same time, even the subscriptions for the cheap Chinese models are probably subsidised, and those subscriptions are likely to get less generous over time.
Raise, they are going to raise the prices. We will spend more on AI infrastructure in 2026 and 2027 than the gross sales of the entire global software and services sector. Current pricing is at a major loss for current providers.
Remember that utilization of these huge racks will not be 24h/7, and these are usually not GPU intensive shops that would train models on the spare compute. With prices of 100-200k USD and north with ~2 years lifetime, that would be hard to justify financially.
Self hosting could easily amount to ~1000 USD a month amortized across many developers. In rush hours - there will be hard rate limits.
Would that 1500-1000=500$ monthly USD justify the 10% decrease in "AI Productivity" ? I guess not. In most cases.
For everyone that asks me around, I'd say that in short term, unless there's a really good reason to self host these coding assistant models, then the big 2/3 coding assistants providers are the better choice.
No one got fired from licensing claude code.
It costs money to maintain the hardware and hire experts to manage the services. For something as common as LLM models, there is absolutely no reason a company serves models on their own hardware unless they are maniac about sending bytes to AWS.
You tried that on a personal machine for yourself once. It's completely different calculation when serving a model to 3000 employees with ever evolving hardware and software requirements. You'll need dedicated hardware in data centers and experts to run them. A company will need to figure out how to manage acquisition, assets and expenses plus 1000 other things, in addition to its actual business. Guess who has figured out all of that already? AWS/Azure/OpenAI etc.
Your other plans are fixed price with rate limits where you get more tokens than the dollar equivalent you pay monthly. These plans are economical only if majority of users spend less tokens in $ than the plan's costs. This subsidizes the gap vs. power users who spend multiple k$ monthly in API tokens.
One of my most expensive sessions cost me over $100 in token spend in a single evening. I'd just found out that the time tracking & invoicing SaaS I use is increasing their monthly pricing by 2.4x - so I assigned Claude Opus 4.8 to recreate the entire SaaS for myself, and load in 13 years of my historical data. I've only completed a full read-only implementation so far, with adding & editing of records still to come, but I do expect Claude will have fully recreated the entire SaaS for me at an API cost less than a single 1 year seat of continued subscription to their service. And since I'm actually on a Max plan, it didn't actually cost me $200 of tokens at all.
coff i would not buy the Bending Spoons IPO coff saaspocalypse
I could ramble on about where the other $1750 of usage goes, but I imagine it's similar for most heavy Claude / AI users. Interactive coding sessions, a daily personalized podcast, some automated overnight agentic "proactive" sessions, a daemon that wakes up if I send Claude an email or voicetext to check something when I'm out. I've also noticed that if Claude's tool-use goes haywire & Claude gets confused or lost, sometimes a single email reply session that would normally be just $1 of API might spiral to $12 of API while it bangs its head against trying to run a program that's in a different folder to the one it's currently in. Sometimes a simple 'pwd' would save you a lot of headache, Claude....
For example, what if you're a tiny startup and you're considering whether to hire an extra engineer or do all the coding yourself. I would estimate that AI is worth far more than $18,000 a year in that situation where you might reasonably decide to put off hiring an engineer.
higher ups pushed for these last 2 years to be AI focused so I don't think this restriction is a measure of "don't use too much AI" as much as it is a measure of "don't use only 'manual' AI tooling" since we had a dozen more specialized tools in-house running locally or otherwise that didn't count towards the budget
china will be major token exporter soon. mark my words.
> Compounding the problem, labs in China often release dual-use capable models as open-weight. Once a model is open-weight, safeguards that do exist can be removed, making the model available to any state or non-state actor to use for malicious purposes, including the cyber and CBRN misuse those safeguards were built to prevent.
So you have on one end the token revenue trending down, on the other end the training cost going up for the next frontier models, and you need to pay back your 10y debt.
I genuinely do not know how prices can get lower from the current major providers in NA without the whole market collapsing. Everyone is spending copious amounts of money to presumably make more money back.
Wait a minute. We didn’t save money by adding AI. We just added an expense.
Now we have to pay for employees AND AI.
This is why I'm building role-model, a routing protocol and a router runtime: https://role-model.dev/
Can anyone expand on this point? I read an article saying that the big AI co's datacentre spend was a bunch of lies because they can't build datacentres at anywhere near the rate they want to.
But this overlooks the other critical part of getting the most out of these things: the harness. I run an autonomous plan/design/code/build/test pipeline with agents using my own orchestrator. Different models are better at different stages, and I use LLMs to judge the output between them. Not everything needs Opus 4.8.
The harness provides both the scaffolding to get the right things into the model, and the right things out. But it also lets you dictate which model does which work.
It's the pipeline, not the model, that gets you quality at a given token budget.
I believe it can be great for vibe coding, but mundane day work? Hell no, I'd rather work with Haiku. It's too slow, checks too many things, it's annoying as hell.
NFTs? My company had nothing to do with blockchain but I ended up working on NFT integration regardless.
You could probably reach the former figure on a prosumer platform but only for very special workloads. If you spend a lot of time on prefill (which is common for agentic workloads) the outlook is even worse since that's a significant constraint for any on-prem AI.
Stop giving Anthropic money and figure out how to take the same money to buy some GPUs, and physically insert them into workstations. It is not that hard, I promise.
My $100 subscription is not cheap. At the same time our product burns orders of magnitude more tokens.
But yeah, for a company at Uber’s scale, I can see why they would want real engineering discipline around it.
Who do you think would be paying me, and what would they expect in return?
Simon is very fascinated by AI and at times he can be a little too optimistic but he is generally balanced and his perspective evolves over time which can be seen in his writing.
Nerd who loves nerd things a little too much? Sure. Paid shill by Big LLM? Nah.
Or the fixed cost plans reflect the real cost and the people paying API prices give them the profit.
Anyway, none of my customers will let me bill them $1500 more (about $75 per day) because I'm using AI. And what for? I'm not working to move money from the pockets of my customers to the pockets of AI companies.
Probably better to use the fully-loaded cost of the engineer, which is much higher than their compensation package. The fully-loaded cost is the total cost paid for the labor power of the engineer, and it includes big ticket items such as office space, food, equipment, insurance, payroll tax, fringe benefits, recruiting costs.
If the median compensation package is $330k/year then the median fully loaded cost is probably around $450-500k.
I definitely have written a goal file, and then just ran claude in a loop over the goal in order to 'token max'... why not? I'm doing research and have some clear KPIs where research into all kinds of techniques / tuning can improve the results. I can spend my budget on a "experiment with blah blah blah to improve blah blah" or give it a list of things to try that I know will take awhile.
Its no problem hitting hundreds of $ of API spend while sitting at a computer with 3 monitors have 6 windows of useful claude code interactive sessions, while working on 2 or 3 projects and using worktrees, and it's a little weird when you hit your limit by 2 o'clock and have to wait for token budgets to reset; god forbid, I manually edit code... which I did do for the first time in months.
You can also start to generate a lot of token spend if you do something like "hey make me a stylized slide deck using internal skill / agent XYZ based on commits A through C", which as an engineer, makes presentations building much less painful.
This uber limit is not high compared to the big SV companies.
Just looked at spent for the past 30 day, didn't even come to $600. 95% of my tokens are from cache. If I were to reach even $1500 I have to let claude run unsupervised over night (and with the amount of mistakes it still makes and guidance it needs, I do not believe we are there yet.)
when looking at costs - numbers make sense. however decisions as an org/company/solo founder - costs help you set prices, but to reach profitability you want to model around ROI.
now the question is what's the ROI for a $36K/investment per engineer or $90M for the total org ?
I bet the ROI is negative.
I'd guess there should be a few people Uber is bascially allocating unlimited AI spending to and a large swath they're giving basically nothing.
Not necessarily, the bond holders could simply take a massive hair cut and lose shitloads of money. On the topic of bubbles and exuberance, Jeff Bezos made the salient point that there was a massive over-invested biotech boom in the 1990s and tons of sophisticated investors ended up losing lots of money. But humanity still kept the medical advancements made by the boom. Stocks going down didn't un-research drugs, and it won't un-research new GPUs or un-build datacenters.
> „[AI vendors are] paying for a fixed cost with a depreciating commodity“
That's just a confusing way to say you don't think future models will be worth the development costs. Because if future models are significantly better, why would the price of tokens to access those models deprecate?
I think its only accounting depreciation.
I have been using my laptop for a decade, what is stopping datacenters from using the purchased GPU chips for a decade?
Small models are fine for small coding tasks but I don't see why big ones can't be broken down most of the time.
> Review everything and point them in the right direction
Sorry upper management doesn't care. That's an engineering problem that you need to solve.
Right now the AI LLM PRs we're seeing are just introducing more work for other people, while these so-called builders are looking good with their new dashboards and functionality they're demoing.
But you can't talk to them about the flow of the code. You can't ask them for their thinking as to why certain things are.
It's not built up from the ground with experience from x people taken into account. It's materialized from nothing, with no foundational separation, and barely any abstractions.
No one wants to touch it. The PRs are too large, and the 'authors' of the PRs aren't on call with us.
They get all the glory, but do none of the work.
It's kinda like designing a house and then sending it to an architect and engineer saying: make this work.
Think of people who were very strict with variable names. People who pushed for multiple-levels deep of abstractions for a single API logic that’s not going to be reused. People who believed that coding is craft, rather than just a process to get to the end during work hours. This makes most of these people’s points more-or-less moot.
I was in some of those camps, but I’ve seen coding evolve in the last 15 years. So I understand that these priors need to be updated, as most arguments don’t apply to today’s world.
Let me ask you this: is any technology worth so much break-neck adoption without first seeing clear evidence of ROI? No. The adoption is irrational.
everyone making comparisons to the dotcom bubble seems misguided. this is clearly computing 2.0 imo
I find anything below 50 tps or so entirely unusable...
Regardless its Apples to oranges anyway, inference is quite cheap for open weight models its just that Claude and OpenAI can charge very high margins compared to e.g. DeepSeek or various provider on OpenRouter since open models are a commodity.
Using local hardware is expensive when it's running a complicated software stack that can break in 10,000 different ways.
These eventual local AI servers will just talk some protocol for AI and sit in the corner and nobody will think about them.
I guess they still might need access to various systems, so idk. Eventually I think someone will offer "AI in a box" though, running the latest open model or whatever.
It probably allowed them to avoid hiring as many people to build a certain amount of software. Even if it didn't increase revenue, it could have lowered human labor costs.
> 128 GB machines that can run local LLMs are a bargain even if priced $5-8k.
Don't forget the energy costs. Searching around, advanced models use an average of 25 Wh/1000Tok.
$1500/month gets you about 150M tokens.
At the aforementioned energy/token, that's 3750kWh.
What are your local office electricity rates/tariffs? (Hint: they are going up because of AI data centers). Even if my price and energy assumptions are wrong above, you probably aren't going to get the rates that the hyperscalers do.
Even at cheap (i.e Texas) retail electricity rates, that many tokens will probably cost you hundreds per month. In most other electricity markets, probably far more.
You can ask the same for the median 330k salary in the US for Uber Engineering... and being a bit snarky, attending Uber engineers talks here and there at a few conferences, looks like. they love to (re)invent internal tooling/platforms. That's pretty expensive on its own.
EDIT: I'm not saying that Uber's engineers didn't add value to the company, they absolutely did and handling the scale up they had to handle is not an easy feat. But I do challenge the notion of "what features did they create with that (LLM) spending?" of GP.
Anthropic: https://support.claude.com/en/articles/12883420-view-usage-a...
OpenAI: https://help.openai.com/en/articles/10875114-workspace-analy...
If you use stuff like opusplan and /advisor so you use Sonnet for most of the work and only Opus for the really complex stuff then it's quite easy to keep costs low without affecting performance.
(Cost of an employee is much higher than their salary, it includes things like office space, supporting structures like HR/accounting, insurance, hardware/software, and much more)
people self limit when there are caps. if you give people unlimited they wont even use sonnet easy things.
And yeah, it does feel like GPUs will start losing values slower going forward with Moore's Law being dead for a while. It used to be that 3-5 years old GPUs were more useful as space heaters than GPUs, but that's much less of the case today.
One organization, that is a software company
> which seems to be roughly inline with "normal" consumption for most full-time engineers
My peers are using $20/mo plans, only a handful are using more than $100/mo in tokens. We haven’t had any limits imposed yet.
Uber is not representative of any trend beyond big tech and VC over funded startups.
This one does not have routing, but reasonix is insane, absolutely insane for saving money. I've used 1.3billion tokens at the cost of 4$. (99-100% cache hit)
I've never worked at a company that didn't have a technical backlog measured in years.
Literally nothing works, all the timers/time counters are different across the pages, constantly commands hardware to do stupid shit, breaks during critical moments/in front of clients.
Eventually mgmt had to institute change freezes for high profile events because the team was breaking too much shit all the time.
The average C suite dipshit doesn't realize that the performance drops off a cliff once your project is more than some fraction of the context window so they will make pretty dashboards all day long but once you need to cover all the edge cases of a real system it all explodes.
AI isn't trained on the type of software style we'll need to create systems using AI, it's trained on how we used to write software. It doesn't reuse code or elegantly structure annoying, it just adds more code until the thing builds and passes some fake tests, even if half of it is functionally dead/unused.
Bold prediction. :)
I think anyone predicting a drop or near-term flattening is not thinking beyond the online bubbles where these tools are discussed. In a local tech meetup a lot of the normal companies are barely coming online with AI tools at their company, and even then with very low limits.
That was clearly a short-term trend that would obviously get fixed. Doesn't say much about AI coding as a business model.
The biggest reason large models are un-attainable for local applications is the lack hardware with large amount of unified/graphics memory (and the cost of the platforms that do). Once the memory slog goes back to normal and hardware manufacturers adapt to demand, we may see consumer hardware with large memory capacity effectively opening the door for slow but usable frontier model inference (assuming improvements in model efficiency and compute capacity)
At that point, inference becomes a race to the bottom. The large labs hope they can attain a leap in capability (which is increasingly looking bleak, with a average catch-up of just a few months) or market dominance through integration (integration in platforms and OS, exclusive deals with companies or governments).
For coding agents, i suspect no player will manage lock in enough market to enforce pricing much higher than the true inference cost, and catering to programmers becomes an unsustainable proposition. We will instead be further hit with a lot of AI integrated into our other tooling costs, such as GitHub, Microsoft suite, G-suite, forcing in AI functions as a value-ad into the total cost without giving the option to exclude them. (using their market position)
$1,500/mo * 14 months = $21,000.
If local models are 14mo behind as many in HN say it may be profitable to just wait. Maybe just spend a few hundred dollars of your tokens and buy hardware piece by piece.
This sounds like something a harness could do (and might already be doing), with work delegated to subagents running on lower-cost models.
You can absolutely do this. It's even right most of the time.
There are plenty of valid criticisms or warnings about over-reliance on AI coding, but this is not one of them. Today, I am using a semi-autonomous agentic coding system which has an `interview` functionality built in - when it spits out the PR from the input, if you have questions about the motivation or context for a particular choice, you can start up a clone of the original agent in a sandbox to question it.
Now, you might claim that those responses aren't always reliable, accurate, or consistent, and that claim has a little more weight (though, in my experience, decreasingly so) - but it is _certainly_ not the case that you cannot interview an agent about choices made. I'm literally doing it every day.
The more things change, the more they stay the same.
The general thrust that everything would be online was correct, it was just that the market mistimed and misallocated of capital by a decade or more. There was massive spending on infrastructure capacity that we wouldn't end up needing until the 2010s. There were hype driven valuations completely disconnected from business fundamentals just because a company was an 'internet' company. Things were going from cutting edge to obsolete in less than a year. There were breathless promises that this was business 2.0! Of course, none of that sounds remotely like what is going on today...
I'm optimistic about AI, but I also don't think that it is going to change everything as fast as promised.
I have my concerns with current inference pricing in that there's a non-zero possibility for a rug pull in the future for the subscription plans for organizations and individuals that can still use them. For now, its only companies larger than ~150 users that need to pay per token, but what if that wasn't the case? Not every company can afford over $1k/month/employee to give them access to AI tooling, further making it harder to compete against the behemoths. If we get to a point where an individual can no longer pay $100/month for nearly unlimited usage and instead must pay per token, that's going to be a problem.
Personal computing eventually became an equalizer (until we started centralizing on mainframes again, aka the cloud) because it got cheap. My hope is that inference also gets just as, if not cheaper.
I have high hopes for local AI and open weight models and we will continue the ethos of local, personal computing and not needing to offload everything to OpenAI/Anthropic/Google, etc. to get work done once the hardware and hardware availability catch up.
You update it for them every 3/4 years (if they're lucky).
It probably makes a bit more sense to compare it to existing software subscriptions like Office, or the old-school 'per-seat' licenses per user for software.
“AI in a box” sounds a heck of a lot like “the box” from the Silicon Valley TV show. Or the Google search appliance. Or name any other on-premise thing that is equally dinosauric.
The real finding of this article is that AI tokens are direct competitors with offshoring. $1,500/month buys you a whole employee in India.
And this is before AI companies inevitably increase pricing after the conclusion of the growth phase.
3rd June 2026 - Link Blog
Uber Caps Usage of AI Tools Like Claude Code to Manage Costs. I wrote the other day about Uber blowing its 2026 AI budget in four months, and how that wasn't particularly surprising given they would have set that budget in 2025, before anyone could have predicted how popular token-burning coding agents were about to become. Natalie Lung for Bloomberg:
The rideshare giant is limiting all employees to $1,500 in monthly token spending per AI coding tool, an Uber spokesperson said in response to a Bloomberg News inquiry. That means spending on one tool doesn’t have a bearing on the budget for another. The limits, which have been instituted in recent months, only apply to agentic coding software such as Cursor or Anthropic PBC’s Claude Code.
A $1,500 monthly limit per tool strikes me as a rational policy response to over-spending, and much more sensible than those tokenmaxxing leaderboards encouraging employees to compete for as much AI usage as possible.
It's also interesting in that it hints at a real dollar value for what Uber is getting out of these tools. If we assume two actively used tools per engineer that's $3,000 * 12 = $36,000 cap per engineer per year. Levels.fyi lists the median yearly compensation package for Uber software engineers in the USA at $330,000.
That means each employee's AI spending cap is ~11% of that median compensation package.
I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI - which currently costs me just $100 per provider thanks to their generous subsidized plans for individual subscribers. Those plans are no longer available to larger companies like Uber.
Their new policy means if I were working at Uber I'd still have ~$500/month of tokens to spare for each of those tools, given my current usage patterns.
Saves like $2-3 per session. Same quality code.
Chips do wear out and need to be replaced (entropy do be like that and durability is not a primary concern for chip design) so you'll need to refresh your stock and, even if you don't need cutting edge models, the price of all chips at scale will go up over time. It may feel unintuitive since, when the PS3 was released PS1s were extremely cheap - but if you're struggling to understand this effect from your experiences in the consumer market you're actually looking at the price factor that starts making antiques increase in value since at a certain point they become scarce goods. The market price for an NES is higher today than it was in 2003 because the price had already bottomed out from demand from the general consumer market but the demand remaining (speedrunners and the like) is now fixed or growing while the supply is inevitably shrinking.
Despite no moving parts things broke anyway and, even if it doesn't break, the vendor can make you change the technology just by playing with maintenance cost of the older one, limiting or removing spare parts from the market.
If you build a 100MW data center with GPU compute and three years laster a new data center opens with the same cost for GPUs and same electricity cost you do, but can do twice as much compute, you quickly lose business unless the market is just so constrained customers can't afford to be picky. But the moment there's slack in the market you'll see major migrations off of providers that have the same cost but half, or quarter of the same performance.
So when you see someone talking about GPUs fully deprecating in value in 1-3 years this is what they're talking about. Right now it's not a big deal because there's no slack in the market. But once there is, the bottom will drop out.
The V100 (2017 -> 9 years old) can be rented from $0.02 to $0.37/h (right now I can find a V100 with a Xeon Gold 6140 and 48GB RAM for $0.165/h). Let's assume the guy you rent it to pins it at its 250W TDP and let's ignore the running costs of CPU/RAM/etc... Then you draw 1/4 kwh for that compute hour. The industrial electricity prices in the US vary between 7.5 and 25 ct per kwh (depending on state, time of day, etc...), so at 100% efficiency, assuming nothing ever breaks, and the CPU consumes 0W you earn about 14ct/h.
And remember: V100s hours are sometimes sold at 1/10th the price.
If I pick average conditions you need to start thinking of whether it is worth it to rent them out: Usually it isn't unless you have them anyways and just sell idle capacity.
It's barely worth it to run them in a pure "is it profitable" sense, if we also account for the opportunity cost of taking up a slot in your datacenter it seizes to be worth it really quickly.
Well, they just rent their hardware, so I'm not so sure. But they'll both be public soon and we should get that breakout in their cost structures, somewhat.
Your main audience would be snake oil salesmen trying to prove their AI products are unbiased and not under the thumb of any outside influence. This doesn't address the biases of the model itself, but that's not your business. Your business is selling tokens and security certificates. If you can get the right angel investor, you could maybe have your new standard required for some government applications.
Just looked into it, seems like at most they have just 3.2, not 4: https://aws.amazon.com/bedrock/pricing/
Looking around their catalogue more, most of their models seem quite outdated, aside from the OpenAI and Anthropic ones (but those get more expensive). I wouldn't willingly pick Bedrock and would instead throw money at OpenRouter, that has both a bunch of providers, as well as almost any model for you to try.
Same with the MS surface(?) tables (not tablets). I saw load of companies buy into the hype and then discard.
Unless they are iteratively replacing expensive vendors and optimizing other headcount costs?
The idea of "if you add intelligence you make more money" is contradicted by the fact companies don't just always hire more people. Wy doesn't google just hire everyone?
This is not a good bellwether for the AI industry, including its adherents. Their growth assumed a level of indispensability that’s not being reflected in hard numbers and real costs, which lends credence to the notion that these IPOs being fast-tracked are meant to try and cash out before the bubble really pops in earnest. There’s no way consuming enterprises are going to pay such insane costs for such minimal uplift in the long run, and the AI companies can’t keep offering subsidized tokens via subscription plans at their current pricing.
That's still in the ballpark. A modest change in your usage habits or workload could easily get you there.
Hiring someone vs paying a vendor for a service:
- different level of commitment
- might tie your org to a physical location
- different legal risks
- shows investors a different picture (probably this would even influence a bank loan)
- manager has to fight a different bureaucracy
Not to mention that comparing the cost of a hire by looking at their salary is pretty dumb. ISTR hearing at Google that the overall estimated cost of employing a SWE is like 4X their compensation? Can't remember the exact figures though.
If one uses AI minimally and is able to out perform peers who are maxing out AI spend, one might want to use that in salary negotiations.
Where is the knowledge stored?
All of my knowledge typically gets stored in plans outside of the agent?
And each agent window gets archived regularly, anyways.
1. They're costs are so so out of control that they need to impose a blanket cap immediately. Figuring out an allocation mechanism that can be deployed company wide is time consuming and they need to staunch the bleeding immediately, despite it being obviously suboptimal.
2. The few people who should have unlimited tokens were given exactly that. No reason to introduce such nuance to a public PR move. The hard-cap limit is a great negotiating posture with token providers.
If we were seeing 3X, 5X etc improvement from individual engineers, that 10% increase in expense would be a fantastic investment (even 3 engineers for the price of 1.1??!). I have a feeling they are just not seeing that much of an improvement.
I could imagine something like “inference is done at home or in China, that’s the price to beat” and it’s not worth keeping all those GPUs cool out in Nevada.
The solder joints are notorious to fail at a high rate too.
They can't run larger modern models. They can't run smaller models as fast as newer servers. So their remaining market is applications where customers are okay with older, smaller models and slower performance.
They have to price the service lower than competitors due to the lower performance. The older GPUs are less efficient so it costs them more to keep them running. They're paid off, but they're taking up valuable power, space, and cooling in a data center.
Eventually there is a tipping point where it's better to replace that space and power budget with something new that has more demand.
The parts are sold off on the open market. There's an equilibrium demand for the parts from other data centers keeping older servers running and from hobby people who are okay with a jet engine sounding toaster of a GPU running in their home.
When they stray too close to the line ... you get Intel's 13/14th gen chips that wear out after 1-2 years instead of 10-20 years. Intel calls it "Vmin drift" because that doesn't sound scary, but the actual point is that various wear-out mechanisms push the chip outside of its design envelope - increasing the voltage or lowering the clock speed may get it to run for a while longer, but you're living on borrowed time as the various circuits just stop working right and you get unpredictable instruction mis-execution: https://fgiesen.wordpress.com/2025/05/21/oodle-2-9-14-and-in...
In future, we might have fixed cost GPUs but not today.
I wonder how often the Agent actually follows the guidance. I do see them follow it when I look. But it doesn't seem so every time.
So my question remains the same: How are the players investing 100s of billions in buildout going to hope to make this back? Market capture looks bleak, inference looks like a race to the bottom. End users look like they could be beneficiaries. Where do the big boys go?
A paranoid part of me thinks that these models are all inherently biased and instructed to be pro CCP, with specific gaps in their training data related to undesirable historic events and political ideas.
Source: directly involved in these discussions. You can downvote as much as you'd like but you can't ignore the facts.
You even have a fair chance of getting a response like that when there isn't anything wrong and the question wasn't rhetorical - which perfectly illustrates the level of the genuine understanding LLMs operate at.
I’m not proponent of AI generating everything without any supervision as of now. But willing to change my mind when it gets better.
Most software engineering jobs are not cutting-edge tech, or research, or solving unsolved problems. Integrations, APIs, figma-to-react pipelines, devops and etc. is what people get hired for. All those can be done much faster in the same-or-better quality by an experienced person with the supplement of AI. It’s hard to imagine any company would go against the grain and slow things down on purpose.
The LLM can easily do this type of stuff, just tell it and it'll happily do it. This is exactly what I mean when I tell people they need to work closer with the AI, tell it how to do things. Don't just tell it what to do and get frustrated when it does it differently than you would.
A good way to achieve this without writing huge prompts is tell it to plan the change first. Just give it some vague low-effort directions. It'll usually get most things right, you tell it what you want different and once you're happy you tell it to go ahead.
People DO.
It's well known that most tech companies are ran incompetently. As you say, it's not the engineers' fault.
But most projects and hiring in these companies exists to juice promotion criteria. And that, depending on perspective, these companies are either massively overstaffed or massively underproductive.
The comparison to AI spending being wasteful holds up pretty well, these are companies that readily piss away billions in pointless spending.
I don't know; I'm a Ron Popeil "set it and forget it" kind of guy. Make the dumbest, simplest thing that's going to work with some clear path for scaling. Then go do valuable things instead.
Yeah, I bet all labs releasing SOTA models are more than happy to remove the main way they make money and let you run it locally, especially if you're a big spender like Uber who seems very willing to throw money into the sea as an experiment.
If the large, well founded IT companies in the world believes the current AI cost is to high, then Anthropic, OpenAI and CoPilot have no actual customer base. AI is then relegated to very profitable niche business, but that can't fund the R&D for the models.
If plans were at cost and API pricing was marked up that would mean there’s a 90%+ profit margin on tokens and instead of raising money and talking about revenue, Anthropic and OpenAI would be talking about their obscene profits.
[1] the caveat is that the average plan user probably doesn’t use all of their quota, I guess maybe 30% is the average across all users.
For a traditional software engineer? I retired last year after 3 decades and my salary was about the same as it was in the early 2000's at the last company I was at. Maybe I should have negotiated more but I thought only FAANG paid traditional pre-AI engineers more than $250K.
Who's this "we" you're talking about? Are you a software engineer or a temporarily embarrassed billionaire? Do you think the rational thing is to pay the lowest regional salary worldwide?
Unless you work in some obscure domain, chances are that any general "knowledge" Claude has "learned" is already public data somewhere.
If you don't believe me, launch Codex and immediately start working on the same project (s). You might discover that all the knowledge accumulated means almost nothing.
And the obvious question: what it's the cost of that revenue? Because it looks huge but ...
If you are interested you can try it out at markbase.cloud (disclaimer and all that). I am not charging for it.
Can you expand on this?
**Lead with the answer when asked how/which/whether.** Name the command/mechanism first; a question seeking understanding isn't a go-ahead to execute. Answer, then offer to act.* EDIT * What's with the downvoting? That's a correct description of what happened. You can't ask an LLM why it did something and expect a coherent response, because there's no thinking chain, and no stored thinking state... At best, you can get a reconstruction of how the context relates to the output (basically a summarization of the context).
Often that started with the macro recorder. Then you worked out what that "recorded" code/sludge did, removed the crud you didn't need or want, improved the logic and so on. I bought books to understand it better. Now you can ask a (different) LLM "what is this? why is it used? How would I?" etc which is probably a faster learning curve than books, newsgroups and old school personal home pages with good info.
I would have been quite surprised when I first used a VBA macro in anger just how far I would go down the rabbit hole. C, asm, verilog, Linux were no part of what I originally signed up for!
Some people will specialise in the equivalent of recording macros and go no further. And this will be fine for code that gets it done but doesn't matter too much in the other dimensions (security, reliability, usefulness without the authors' support, etc.) Much like VBA utilities inside companies that were useful way back when. Other people will want what they produce to be better, even good, and they will learn about floating point [1] and all the rest, much as I did. Probably learn pretty fast too. [2]
[1] https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.h...
[2] Working out how to write an excel vba webserver and using it to collect and and collate summary data from various divisions into reports was seedy as hell, solved the actual business problem (given ridiculous but intractable constraints) and isn't something you can record. We all have stories from a misspent youth that we're simultaneously ashamed and yet somehow proud of.
No, but you do need to know the answer to respond to that 3AM page about prod being down.
All companies who make this transition will be more or less at the mercy of model providers.
Claude 100% of the time even thinks we use laravel despite the project being some old lumen codebase, so most of laravels features are not available. It also gets the PHP version we are using wrong 100% of the time.
I think it's a general problem, but in my rare conversations with execs nowadays, they seem rather uninterested in improving their decision making there. The actual performance of the organization does not appear to be all that relevant to them.
But in Uber's case, they tend to reinvent lower level pieces of platform/infra.
Also, I don't believe you need to spend $1500 a month on a coding agent if you optimize usage at all.
My ongoing coverage of AI ethical issues: https://simonwillison.net/tags/ai-ethics/ - 308 posts
I've been the loudest voice about the fundamental insecurity of LLMs for several years: https://simonwillison.net/tags/prompt-injection/ - 150 posts
In https://simonwillison.net/2025/Aug/25/agentic-browser-securi... I said "I strongly expect that the entire concept of an agentic browser extension is fatally flawed and cannot be built safely."
“I'm finding that coding agents can take me from a vague idea to a working solution, one with tests and documentation and that looks like a carefully considered project evolved over the course of many weeks... in less than an hour.
Even if the code is rock solid, there's a limit to how many projects like that I can sensibly care for - and if they're instantly abandoned, what value was there from creating them in the first place?”
https://simonwillison.net/2026/May/31/the-solution-might-be-...
Here is Simon questioning a fundamental belief held by the pro-LLM lobby. Would a paid shill question that?
Simon is, without question, an enthusiastic pro-LLM person. I disagree with what he says often, the product market fit post was a bad take. But I don’t believe he is shying away from sharing his thoughts when they’re not favorable to the industry.
as far as we know there's no evidence that they can produce any profits at all
The fact that Anthropic is rumoured to have a profitable quarter indicates that their margins on API priced inference are very strong.
The fiber laid during the dotcom bubble never paid back the investors or lenders, but it's still profitably connecting customers all these years later.
This makes no sense, 99% of the people using Chinese models are using them via Western inference providers who are running them and serving them to people over openrouter or whatever. If anyone is stealing your data it would be an American or European inference provider. A model has no ability to send data anywhere.
China bad by default, right?
When you have waitlists for many many months for Blackwell GPUs, keeping the old ones around as long as customers are willing to pay for them is great.
If I as a customer have a use case for a machine learning model I developed awhile ago, so an insect identification model, I had an ML researcher/eng develop it back in 2019, and it runs fine on a 2018-era T4 GPU (NVidia 2080 era), why mess with it?
edit: Actually American inference providers are cheaper for Chinese models. There's way more competition here because the Chinese aren't idiots and investing every last dollar they have into data centers for llms that don't make money..
Not sure if that's true or if it might be influencing what you're seeing, but it's a thought.
As far as “boring systems are boring”, I can tell you from experience that I work on a pretty boring system, and AI is not all that meaningful in terms of its impact, and it’s not for a lack of trying.
Can it help me create a migration and add an endpoint and such? Sure. But those aren’t the hard problems. They never were.
It’s funny that you think the idea of slowing down is such a bad one, but it is another well-established truth. Slow is smooth, and smooth is fast. This notion of break/fixing your way to prosperity by way of 10,000 ill-conceived PRs is a fool’s game.
I have no idea how we can get people motivated to learn these through trial-and-error when AI coding exists though. I remember the days of spending hours on stupid bugs that AI can resolve within a minute. But I recall learning heavily from those experiences. Oh well…
Most other workers are served fine by $20-30 worth of tokens on a budget model. You don't need Opus to help support write emails.
For customer facing, production software, its worth paying a cloud tax to get the reliability guarantee. For tools that are used by engineers for code development, there is no need for such bulletproof guarantees.
A lot of average people are producing gigantic messes. At least previous to this they were gated by their mediocrity.
Generally we've modified our timelines heavily, systems are working as intended, company is still making money. There are some AI-authored commits that had mistakes that we didn't catch, but I'm sure this could've been an issue even if all were human-authored. I know first-hand multiple other companies who are doing exactly the same thing.
I agree with "slow is smooth, and smooth is fast" for mission critical systems. But super majority of systems are, indeed, not mission critical.
If 250k was the total comp (taking into account bonus/stocks/what have you) then yeah, you definitely should have negotiated.
No. There is no accurate number.
we've got product folks vibing out prototypes (not shippable but clickable) in our main front end in a few minutes to an hour. This would previously have involved 3 people and several weeks, or a ton of figma and documents to fill in the gaps. This saves weeks to months and lets them really experience the items.
Then they hand it off to someone who knows all that stuff who is also using AI and the impl also gets done faster.
The PMs are either moving infinitely faster, or at least 30x faster and not blocked constantly by others.
basically you're not comparing people who don't know much (tech) with those who do, you're comparing them before and after access to AI.
I setup k3s, and tons of what would be otherwise unnecessarily complicated stuff on my laptop for my side projects with additional home servers, smart house stuff. Otherwise k8s and things like that would have been daunting to learn and in theory and without constant professional exposure, etc...
Microservices in Go, Rust, which I didn't have any previous experience with, games in C and other languages. Didn't know anything about low level memory management before. Was just mainly TypeScript person. Just constantly building random fun stuff.
Most directly, human labour. Labour is always a problem for capital. At a certain level of AI competence, businesses don't need to pay humans to complete the work they need doing in order to operate. I don't think anyone would dispute AI competence isn't growing steadily.
Which category of developer tool has on-premise as the more popular option?
Cloud isn’t about “reliability,” it’s about being able to focus on your core business rather than spending all your time maintaining stuff.
I also think your excuse is bad. "The code is legacy fucked so I'll just legacy fuck it some more because I can't be bothered to make an effort"
Anthropic and OpenAI license to the public clouds. Google reportedly licenses to Apple. licensing to Fortune 100 companies running on their own infra is an obvious next step
it is a race to the bottom and I’m not sure the labs win that race. we’ll see!
For the employer those employees cost between 2945 - 7736 EUR per month based on https://kalkulatori.lv/lv/algas-kalkulators (income and social taxes).
So on the lower end that's (1500 USD ~ 1300 EUR) close to half the total expenses of such a developer, on the high end here around 15-20%. That's quite significant, depends on whether their productivity also improves (if that's what the orgs care about).
And we’re not even the country with the worst pay out there, but pay the same for tokens, cause regional pricing isn’t a thing!
Note that it's not surprising that he finds his own usage (described in the quote) negative, since his real job is as a blogger, not anything else.
e.g. an interesting possible canary in this coal mine is that there’s been a 200% increase in the rate of new apps appearing on Apple’s App Store, but it has not been accompanied by a 200% increase in the rate at which people are buying apps.
This was simply poor design, it took Intel ages to really figure out what went wrong and "resolve" it.
It cost them far more than it made.
Also, there are a lot of competition in China. Like a lot. You might know better than me as well, but although the biggest AI-labs are based in USA, the adoption is weirdly global. Like as a general sense of what's going on - you can see AI-related ads literally everywhere in Tokyo, almost all the time, in every single screen in public.
Why take risk when you can spend money and take no risk
Deepseek shot themselves in the foot because they never intended to serve V4 Pro for .80c mm ouput, that was a promotional price that was meant to expire (and still might). They intended for v4 to cost $4.00 per million but Western inference providers drove down the price because they can operate at negative margins to try and push competition out. I can assure you they are losing a ton of money @ ~80cents.
My point is, its Western inference providers that are establishing the floor price of inference. They are willing to operate at a loss in order to put their competition out of business. Chinese providers are typically at or above the prices set by American/western providers if you go looking on the Chinese internet. You aren't going to get deals from China for inference except through this one instance with Deepseek v4 Pro which wasn't even supposed to be permanent pricing.
However, that's an absurd scenario.
This isn't something that is public knowledge, in the sense that you mean it.
Just earlier today it asked me if I wanted to create a jira ticket for something I asked it about doing. My prompt mentioned nothing about jira.
If you use Claude Code, you might want to take a look at the "auto memories" files that it creates. See "/memory" for some more information.
I just wanted to take their number at face value. It's not like it needs more real information to make AI a bubble.
This means that the average engineer is efficient at (say) identifying the first 10 tasks they should do but there are diminishing returns after that? That seems like a weird pattern. Wouldn't it be more likely that certain tasks have a ROI based on how efficient the task is generated?
Like I'm trying to imagine in my head, if you think an engineer is more efficient with the tool, why deny them more tokens. I guess so they think to use them more efficiently?
So, maybe I conclude that I think your conclusion that there must be $1500 per engineer is flawed. And even if it were true, I don't think the benefit would be evenly distributed. I suspect this is a first pass at figuring how to budget them and there will be a second pass.
While it certainly reeks of motivated reasoning, Jensen Huang assertion that an expensive engineer should be using at least their salary in tokens feels more logically sound to me (assuming the average engineer is efficient at using tokens, I have a feeling it's a normal distribution)
Judging the ROI of an engineer is hard. Adding AI on top of that makes things worse, I think. I've heard AI makes engineers 3X, 5X, 10X and even 100X.
If I told my CEO that I was 4X more effective with AI, I am doubtful he would be willing to spend even 1X my salary on tokens. Even though he would be making out in the end.
At some point the ROI is pretty much vibes, man.
The question is, how quickly does a junior with no experience builds intuition without trial and error.
I'm optimistic that the demand for AI accessibility will drive programmatic interfaces in places where companies were previously reluctant to.
We tend to obsess over software quality when it’s the least important thing for a business. It’s just a means to an end.
The open-weight models will have a steady race to the bottom on inference costs just by dint of competition between providers. They aren’t at the frontier yet, but they are rapidly eating the flash market.
Of course though they are not necessarily a viable solution for companies with security requirements etc. given it is just a single person project, but they still serve as a proof it can be done.
1. Why it's not a bold assumption: it's a bit shocking now. But in two years or so, many/most companies will realize this is the cost of doing business. Just like people are ok with using Outlook, or Office 365, or (in the case of Wall Street) Bloomberg terminals, people will realize that developers will need AI coding assistants.
2. Why the conclusion does not follow from the assumption: if the limit is set at $1500/developer/month, it does not mean all developers will use it. Companies will set incentives for people to not be very wasteful. It is more likely that on average developers will consume $100-200 worth of tokens per month, and there will be some outliers who will consume 10, 100, or 1000 times as much, but they'll be few.
It's hedging a bet at this point, but that's why people say there's no moat. If the tools are properly used + maintained, there should be no reason we can't use a new provider even next week (maybe with a little tweaking).
And with a bit of careful routing - there isn't a lot stopping you sending the hard stuff to a cloud model and the average stuff to an on prem model.
Maybe a $10k raise would be nice?
My experience was not with pure software houses; we had some labs, measurement and RF equipment, but even without the hardware component the offices, insurance, admin expenses, HR, janitors, conference travel and so on would easily bump the total employee cost to double the salary. My 2c.
I've seen those vision researchers want to train on H100s at the time and being told know, wait for the T4s.
I've seen T4s running BERT models for document classification.
When there are enough Blackwells in data centers that H100s are useless for inference by your standards (I don't know if we've arrived there or not yet), there will be people who, say, want to run the Taco Bell ordering chatbot on them. There will be people who have applications that are just fine with Qwen 2.5 who will be happy renting them.
There seems to be this crazy consensus that hyperscalers are going to go into their datacenters and throw away their old GPUs. The reality is they have a ton of paying customers for them.
And there may be insect identification apps from 2019 that say "you know what? H100s have gotten cheap enough I can use a VLLM so the user can describe where they saw the insect too", or the McDonald's website support chatbot developers say "Hey, the bigger cheapers have gotten cheap enough we can upgrade our models to Qwen 2.5".
The frontier level GPUs in e.g. AWS have a huge premium. When the newer generations come out, they will be able to cut prices to a bit of a premium over the operational costs and still make a profit, and there are a ton of down-market customers who will be interested, who aren't willing to try to outbid Anthropic for Blackwells.
Completely agree with that.
But yeah, double is insane. When I saw prices for COBRA from Facebook, it was $3300 a month, and that was god-tier insurance - the insurance benefits were so good they had a custom list of what was covered that was probably way better than anything available on the market (e.g. you want brand name drugs? no problem. You don't want to try both ambien and trazadone before taking a sleep medication doctors actually recommend? No problem - etc.) - but for my needs it was barely better than COBRA costing way less than half. $3300/mo, or even $1200/mo for an entry level ops worker is a lot of their salary, and probably where the double comes from. At SWE compensation most of it ceases to scale.
The fully loaded costs including proportional management costs isn't relevant to the true marginal engineer, but estimates I've gotten from higher-ups definitely factor into engineering decisions about "should we spend engineering time to save money/make more money - how much will doing this thing cost the company" (opportunity costs are also relevant, but usually less grounded, since most projects don't have concrete benefits like "we will save $x/yr in infra costs")
An entreprise license for 0365 is something like $75 per person per month. Totally different order of magnitude.
And regarding Bloomberg terminals, Bloomberg only has 1 million users (semi random guess).
The reality will be that some places just won't pay for any licenses or will try to set up their own, local LLMs.
https://openai.com/index/codex-for-every-role-tool-workflow/
This kind of race-to-the-bottom logic needs to be rejected: by workers, business culture, and the government.
Unfortunately business culture embraces races to the bottom (for everyone but owners and executives), and uses its lobbying might to push the government into tolerating or even supporting it. And there are a lot of deluded workers who (for some reason) seem to be feel smart when they parrot the ideas of people who want to screw them.
this is how it works: https://help.markbase.cloud/humans/collections/overview
https://arxiv.org/abs/2602.11988
For what it's worth, if you were considering building context out.
It's kind of like neuroscientists found the trigger to tell your brain "we're going to do a clean shutdown now, trigger transition to runlevel 0".
Quiviviq, Dayvigo, Belsomra. All still on-patent, so they don't have generics and are pretty expensive (like $1000/mo if your insurance doesn't cover them). A lot of doctors won't recommend them in practice because most of their patients won't yet be able to get them covered.
Ask your doctor about them, look them up in your insurance's formulary to see what's required (e.g. if you have tried both Ambien and Trazadone and can document it), and see what they can do, before writing it off!
The expectation is Belsomra will lose its patent in 2029 and then generic makers can try to get one approved - so it's not that far off!