This means we're going to need $1t+ per year in spending, per year, on tokens. 200m knowledge workers in the world, 30m developers. We're talking about a world where you need 5% of every knowledge workers salary to go into tokens. 20% if you're a developer.
That's a _huge_ shift. Most people I know cite +20%-40% velocity with these tools, against the actual work their company cares about doing. +20% speed for +20% spend isn't going to motivate a trillion dollars a year in spending.
We're not there yet. This is still the upswing of the hype cycle, and unless we figure out how to make developers 2x, 5x, 10x as productive on stuff that matters, this isn't going to play out well.
>"These are tools which burn vastly more tokens, but are also quickly becoming daily drivers for the work carried out by extremely well-compensated professionals."
>"Somehow this fragment turned into headlines like Uber’s COO says it’s getting harder to justify the money spent on AI tokenmaxxing, because the market for stories about AI failures remains enormous."
Yes, it's just the yearning for AI failures. It couldn't possibly be runaway costs, record revenues, and massive layoffs. It couldn't possibly be that these tools are lighting dollars on fire by people already paid significantly well and not producing any increase in "value" for it (I recognize that output is 100x but outcomes are flat by all measures).
[1] https://cmr.berkeley.edu/2025/10/seven-myths-about-ai-and-pr... [2] https://futuretech.mit.edu/publication/crashing-waves-vs-ris...
My take is the product has been very useful for coding (PMF) for months. But it’s certainly not useful at any cost…
I suspect that once the technology has been tamed and the hardware and software has been commoditized, the impact will be much less dramatic than we expect and we will realize the importance of a shared vision, experience, taste, intuition and discernment in building good products.
i think the article is momentaneously correct but there are some things that smells to me about the situation overall (not the article!)
(1) i believe gpt 5.5 and opus 4.7 are not as good improvements as their predecessors and there has been not enough evolution in those models to justify the price increases. Unless something big changes in the next few years they will not be able to track these costs (unless A)
(2) they might not be able to keep up improving the data centers if their tech keeps demanding more and more hardware capability. even nvidia does not show signals that they will be able to beat too much their big GPUs and at some point this increased price will be passed to them anyway which will need to be put in the token price (unless A)
(3) i've been trying deepseek v4 and other models and honestly, they are more useful than gpt 5.5 and opus 4.7... i mean, there is still a difference, but it is so little that it does not make sense the cost of opus and gpt 5.5 (unless they are going to A too)
(A) I have the impression is that the plan was the whole time to sideline common folks like us and focus on gov, big techs and military. they only let us play with the toys to gather data and for people to not get mad because they stole the internet from us.
PMF is one interpretation, but it could also be read as desperation.
In my opinion, we've been at PMF for quite a while now. The November inflection point that's often referenced definitely changed how we interface with models, but as far as coding goes, I feel like Cursor had proven itself useful for at least a year prior to that.
The demand has always been there, the outstanding question is still - how do you build a business on top of these products? None of the frontier models have emerged as uniquely capable, but open weight models are now catching up in capability as well. The explosion in go-to-market roles feels more like an attempt to lock customers into contracts so that they don't consider alternatives.
I assume the hope is that during this 12-month contract they will develop real integrations, something deeper than just a CLI harness. If you've ever worked in procurement or dev tooling at a reasonably sized company, you'll know that this is exactly what teams try to avoid.
It's anyone's guess what will happen this time, but I'm excited to see how the IPOs go.
- dedicated hardware (https://cloud.google.com/tpu)
- optimized models (https://research.google/blog/turboquant-redefining-ai-effici...)
TL;DR Ed argues that the deal between Anthropic and xAI could have been negotiated in such a way as to make Anthropic only appear profitable during its “ramp-up” period in June, which incidentally is also the month that Anthropic is making tons of other pricing changes.
I do agree with the author that these companies seem much stronger financially recently though.
https://youtu.be/0lvMgMrNDlg?si=QkkOnngYTjaSPlIy
He said, so many years ago, that there will become a time where computing power is so prevalent that we will stop using the person to make the computers job easier and start using the computer to make the humans job of interfacing with it easier.
But in this context, it would mean the other side of increase productivity is decreased time to do the same work. These are the same thing.
the economics simply don't work unless you make six figures, at least to just give it a go blindly. the providers are also still figuring out what they can get away by charging, and they are getting a similar treatment from those under the stack.
the caps and limits are not very transparent, and it is quite difficult to know what is "enough". the current rate does not stay the same and the contract is changed way too often to dedicate for the long term. regardless, the subsidized rates should not be sustainable forever. make hay while the sun shines i suppose.
Anthropic and OpenAI have shown people want a tool for task offloading, driving predictable token consumption and justifying the math, so long as users stay in that dynamic.
However, knowledge workers using these tools daily are getting exhausted with them. Outputs come out polished but hollow. Talking to a frictionless, frame-completing model all day drains you.
If user behavior drifts away from assistant usage because of that, per-token math implodes. The valuations we're hearing about all the time rely on usage compounding daily. The fatigue is a timer running against that compound.
Anthropic's Constitution is the closest hedge out there, I think. Installing an identity structure into the model through training. But it's still assistant-first, so the fix there is only partial.
I've spent the last year running a product that flips the architecture so identity is primary and the assistant role is secondary. Same frontier models, completely different conversational quality. The fatigue property doesn't really show up.
Whichever labs figure out how to install real identity natively in the weights are going to be the ones with PMF in the next phase.
It is easy for me to change providers. Right now I use the open source Claud Code harness with two paid API venders for DeepSeek v4 (flash and Pro). I like seeing how much each session costs.
With current limits my 100/mo codex subscription is more than enough for the work I do.
However, I do worry about when does current subsidies are going to end? I can see myself paying up to 300/mo, but more than that will be prohibitive.
What's the long term plan here? Are OpenAI's and Anthropic's costs expected to increase/decrease?
it is only true for USD. for example if you pay in euro, this is actually more expensive. kind of makes no sense, because it translates to $1 = €1
All the slop content, all the bots, all the misinformation and fake AI images and videos.
All of the social and economic disruption from datacenter buildouts.
The massive nosedive in reliability on the world’s software infrastructure.
After all of that and all we get is a code bot so a few incompetent loser devs can bloviate about not writing their own code and brag about never reading it.
Burn it all to the gd ground. Destroy this new Tower of Babel.
Firstly, if the user is asking for things where AI can link to products or services to buy, there's a very good relevancy, much higher than in other types of ads.
Secondly, since the AI often takes time to compute answers to user's questions, they could be shown ads while waiting. People could perhaps be less annoyed by this than some other commercials since they know the break has to be there anyway.
(First idea is something I came up when asking Claude to compare some products, or ask for help in lawn care. Second idea was by a colleague.)
I don't see the business model working. My closest friend actually does automation software for large companies.
He does not use Claude or openai at all. He primarily uses gpt 120b on cerebras and glm-5.1 for heavy thinking work. And some other small models for various tasks. All open source.
And these systems are extremely useful for the businesses and are able to run fully automated pipelines that are very stable and fast.
We discuss this a lot, and we both think any business doing heavy agentic work on Claude and openai just aren't aware of exactly how good and cheap open source has gotten on the last year.
So... once the legacy businesses and developers catch up, won't Claude and openai be unable to recoup their costs?
“Tokens” don’t have an intrisic cost or value. Saying that I used $2,180.16 worth of tokens is like relying on the salesperson to convince me I’m getting a billion dollars worth of pots and pans for $19.99.
I think it’s funny how we are throwing critical thinking out the window when it comes to evaluating biased sources of info.
AI has some use cases, but not at the price it’s currently priced at. I’ve been on AI since GPT-2 with a lot of heavy users. Every user has the same story, curiosity, surprise, hype, hate, realization. Enterprise is usually a bit behind and are right now at hype cycle, that’s where they sold all the deals and do the IPO.
It’s really a VC masterclass.
Don’t get me wrong there is are useful cases of AI, but not the way the want it to be. Quite similar to Blockchain. The idea of decentralized money has right to exist. 99% of other coins not.
AI is a faster, but still less accurate search engine. AI is great in finding bugs, it’s great at ruber duck debugging.
The reason I call it a swindle is, because along with the marketing it gives tons of people in the world the impression, they can now build their own startup, game, infra etc without the need to learn it themselves. This leads to millions of abandoned and low qualiy projects and products, because the vast majority has never built the mental modal necessary to solve the problem thoroughly. In the end they’ve wasted months and money (but burnt tokens). This is what I call a swindle.
All early adaptors I know have not drastically winded down their usage, not because of money, but because there is no new case. If you want to explore a new project you can get onboarded quickly learn a lot and then switch to documentation and live testing. For me usage is the lowest it has been the last 2 years.
I would not let AI touch my code. I have anxiety around it, because it will gripple back up. I will let it read my code and let me know what I did wrong so I’m sharpening myself.
100s of companies including open source solution can offer that for me.
All my non-tech friends are now in hype cycle and share their hype and fore forseeable frustration with me.
I have to say I’m in a way impressed in how AI has been rigorously vc-utilized (conciously or not-conciously) to generate these vast companies with the whole world watching.
There is a lot of AI usage happening not because it shows benefits, but because the business has mandated its ubiquitous use. Companies having dashboards for token usage and rewarding people for using more tokens is a real thing. I just spoke with someone today who works at Microsoft and they are required to use AI for all of their work - they have to make a special request with justification if they decide not to use AI for even a single PR. This kind of demand isn't driven by value from either the company itself or from its workers; it is the kind of artificial demand you get from make-work projects to keep people employed during hard times.
We have to wait for the hype to settle down and people start making business decisions based on results before we can really value these AI products.
In contrast, imagine if we had the same AI 20 years or so ago. Could AI really write Jersey? I guess not as people were still trying to understand JAX-RS. Could AI really answer all the questions about React? I guess not as React was just invented. Would we use 10x fewer people to build out infra on the public cloud or the entire so-called Big Data platforms? I guess not, as they were still rapidly evolving and we'd need so many engineers to explore so many different possibilities? Could we use AI to build our ML ecosystem with 10X fewer people? I highly doubt so. Heck, 20 years ago R was all the rage and Python's ecosystem was not mature at all. Oh, and mobile computing, could AI lead to 10X fewer people to build all the mobile apps and the underlying infra?
I'm skeptical that their current price raise is sufficient, and I'm also skeptical that most users/businesses will accept more significant price raises that will be needed. Especially for individual users, $200 a month is already incredibly expensive, I really don't think most people are going to be willing to pay more like $1000 a month.
A single 3D CAD license pack for the guys in our R&D group costs multiple thousands of dollars per seat, per month.
It's about time software seats get some love too.
So the author claims he's getting $2000 per month worth of frontier AI free of charge. Ok. If he's been doing that for 6 months that's $12k. What has this produced concretely? For $12k you can find a used car in decent condition. Heck for $1200 (his actual out-of-pocket spend) you get a brand new ebike! (on which you could put a pelican and make a photo of both if that's your fancy). But here it's unclear what has come of it.
More specialized products will consume tokens but their builders will be incented to optimize token use and switch models as costs and capabilities change. And if search engines become more AI capable, and Google is clearly striving for this, then they may have pressure from two sides that could squeeze the number of use cases for AI chat. AI coding isn't going anywhere and nor is the need for AI in general but I wonder if the products will have to evolve significantly to maintain the current levels of PMF. And then there's the question of profitability...
Guys, what - in your opinion - does "heavy user" mean? I thought I am heavy user (I am using AI to code every day 8hr a day + side projects) but 20 USD/month Cursor plan is always enough. What should I be doing to extend my license to higher level?
The assumption here is that this is a positive thing.
But this very well could end up being a major negative long term by increasing the cost per user, reducing margins.
More usage = more cost = less profit.
It's not obvious that more usage is good. It's only good if revenue per user increases more than cost does. I'm skeptical about that.
Ahhh the classic startup term that's definition is nebulous. But also, since when does any definition of product/market fit mean a product is profitable? And profitable in what sense? Unit economics? Overall company?
It is quite trivial to switch from using one model or another. Likewise, in a few years we'll have affordable laptops to run today's frontier models.
What's their plan to let us keep subscribing?
Bloggers are having AI psychosis too.
20-40% sounds about right for me, today. Maybe 40-60% on a good day. But a lot of the reason it's not higher comes from harness gaps and org processes that haven't caught up.
All of that will get fixed with time.
OpenAI's spending commitment is in the ~1T range for the next 5 years, and Anthropic is ~300B.
If they continue to show strong growth, they likely need to be at 100-300B in revenue/yr to support their yearly payments + financing, not 1T.
Thats why most here shouldn’t engage in the discussion - they parrot on about benefits without identifying and articulating the costs and moreover how it affects the firms financial position.
> Stories are circulating of companies surprised at how expensive their LLM bills are becoming from usage by their staff
> Enterprise customers are now paying API prices
How long before enterprise customers start to question the bill? Anthropic goes from not making money to doing pricing shakeup, and now they are making money and the biggest spenders are shocked at prices.
Seems like things are still very uncertain.
edit: typo
(It's mostly open source, you're welcome to dig around in https://github.com/simonw and https://github.com/datasette if you like.)
My time as an experienced software engineer is worth a lot of money - a whole lot more than $12,000 for the past six months.
Legalities aside, you need to look not at the model quality but at the infrastructure needed to scale these models from tens (now) to hundreds (soon) of millions of users. Only a handful of companies actually have the resources and funding to do that. That's what these huge valuations are based on. These companies are gearing up to scale to these levels. That's why they are spending on data centers. Whoever has access to those data centers gets to tap into the revenue stream of people using models running on those.
The market for frontier models is roughly split between OpenAI, Anthropic, and Google. And then you have companies like X/SpaceX, Amazon, and Microsoft being more successful with their infrastructure than their AI products and companies like Apple, Meta that have the money and the aspiration but are so far not really managing to be very successful with their AI strategies.
Deepseek is just very poorly positioned to capture a lot of the enterprise revenue in the EU or North America. But they might become very dominant outside the US/EU. And of course China itself is going to be a huge market and equally unlikely to want to be depending on US owner AI companies.
I spend most of my time designing and tweaking tests suites, and improving test performance. These commits are almost entirely Codex: https://github.com/tsoniclang/tsonic/commits/main/ - but it's possible only because there's a very large test suite attached to it.
All of that is very token intensive. If OpenAI gave me 3x my limits, I'd find ways to eat it up in a week.
What do these tokens give me? Well, I think in a week or two, I hope to port the TypeScript-Go compiler back into TypeScript, but compiled to native code. It's probably not particularly useful for most ppl, but it's a hobby project that I've spent the last 7 months on.
8hrs a day doesn’t really mean anything without a lot additional qualifiers.
fwiw lately I’ve been straddling 2 or 3 claude codes and one Claude cowork, primarily on 4.7 with high effort - the company’s paying for it, so I’m doing my best to burn as many tokens as I have the mental capacity to manage. At that rate, the 100 account is completely necessary, I was blowing through my 4-hour limits consistently before requisitioning an upgrade.
The money would be so much better spent that way as well, supporting individual programmers.
That's why it's so important for these labs that they're selling API tokens for more than the compute+energy costs needed to generate them.
Every indicator I've seen is that they do have a positive margin on that. If they don't, they're screwed.
It's a great hook to build an article around. My core point is more that April 2026 was the point when Anthropic and OpenAI finally appeared to have figured out a credible business model.
Other than the hosting providers, I am also yet to see anyone directly making money from their OpenClaw agent.
You may want to get one of them to check the math on that :p
I agree with this person, let's use AI psychosis for when using an LLM gives someone psychosis, not for when we think, what, that a blogger made some poor assumptions?
I've been calling that out for a couple years now. LLMs best and most viable use case is still just as a dev tool. Even for non-programming tasks, I still get better results from the LLM if I instruct it to write code to do the task...look at Claude Cowork for example, it's everything I used to do with python myself. It's not really a novel capability, it's just using python & bash for automations that any sysadmin has been doing for decades. Yeah, that's valuable for a non-techincal audience but is it $1T valuable? I don't think so.
When has an IDE or other dev tool ever commanded a $1T valuation?
These things get lost in discussions because people conflate "overvalued" with "not useful." LLMs are useful, particularly as dev tool, but Anthropic & OpenAI are definitely way overvalued.
I'm building a product right now with some AI coding (despite my negative sentiment about AI in general they are useful). I am both the product person and the engineer, and I'm pretty decent at using it, so according to the hype I should be seeing like a 10x speedup. I am not seeing that. It's definitely faster, but there are also days where I'm stuck cleaning up things after going too fast for too long, or periods where I need to put the software in front of people to get real feedback, or even periods where I just need to use it extensively myself to find the pain points and bugs. I just don't see this "running circles" once you get past an MVP and you actually need to build something secure and not embarassingly broken.
Typical tech worker costs a company around $100/hour minimum. That $200 subscription cost can look mighty attractive if it saves some time or mental load.
I don’t think there is anything about addiction or spooky with that math. I suspect a lot of this is coming from tokenmaxxing firms but on the flip side on our small team, we end up spending about $200 per person per month for tokens using tools like Cursor. We feel the spend is justified with measurable value.
You know, that's fair. I'm much more against super-rich investing hundreds of billions in the things they don't understand, creating massive disruptions in their wake.
AI didn't create stupidity and greed. In the end, it's just another tech. I'm just tired, time and again, of people who I would hope to know better (and repeatedly they prove me wrong).
I mean it is doing both of those, so thats fair to be honest.
- The publicly available information about how inference costs compare to training costs is conflicted. EEs involved in datacenters talk about power usage spikes during training runs as if they were a major factor in the designs, but academic papers discussing cost-optimal scaling confidently treat inference-time compute as a major factor.
- On the side of the balance indicating that training is more compute-intensive after amortization than inference is that Chinese providers, constrained primarily by access to compute, have nearly unlimited token availability at a lower price than US providers (inference), but poorer model capabilities (training). That would make sense only if US providers are inflating inference costs by 20-30x due to amortized training costs that overseas providers were not able to take on (there are other factors too).
- If training >> inference, they're in a prisoner's dilemma that far exceeds the ordinary zero-marginals model of competition between firms (due to its huge discrete stepwise nature). On the other hand, if inference>>training, the high-level analysis popularized by certain thought leaders, that it's like a utility, would be true. You'd tend to count this as a vote for inference>>training, but the CEOs saying it at least have a huge incentive to agree because the alternative, the prisoner's dilemma, would stop investment very fast.
- The only voice in the story that I just told you to have anything to do with fact (as opposed to high-level analysis and ivory tower armchair management of a secretive business) were the rumors from facilities engineers. That shows you the state of our understanding...
- If we don't even know the ratio between amortized capital expenses and operational costs, outside investor analysis is impossible. It doesn't matter how finely they divide the accounting buckets for office ferns and indoor ferns if the single biggest part of their business is obscured for trade secret reasons.
Our estimated spend for AIaaS would exceed that cost in less than a year.
In a few years, there will be hardware capable of running frontier models good enough for most things at accessible prices for even tiny companies.
"As cable TV and Pay Per View came out, there were studies done about how many movies people would watch if given unlimited access to films. The results were bandied about as proof that we should build out all this infrastructure to support this line of business. When the data was further analyzed by statisticians etc, it turned out that people claimed they were going to watch films 10-12 hours a day, every day of the week. Impossible."
I feel like we are in a similar boat here where some people are assuming:
- EVERYONE is going to be using max tokens
- tokens will NEVER get cheaper due to improvements in hardware, software, design, market forces etc etc
That's the game. There's a view you could take of this that this is just a growing of the pie: with those cost dynamics a lot more "small businesses" get a vast amount of leverage, so the overall economy grows without replacing the knowledge workers. I'm not sure I trust the MBA class to have that view.
What are you basing this on? For reference, Anthropic raised ~$70 billion in total and OpenAI ~$190 billion. Why do they need to make 20-40x that?
And that's just one inflection point. We've had several and there are many more on the horizon. So while I could be convinced that ROI is maybe not even positive today despite the ridiculous enterprise spend, it's perfectly rational to pave the way today for what's coming over the next few months let alone years down the line.
I think it was clearly useful for months to people who had tried it and taken the time to understand it, but now that knowledge has spread to the point where wallet holders are convinced it's not just passing fad or hype so now pmf can be "claimed".
I agree it's weird to say "those people have pmf" though, usually it's something you define for yourself
"I’ve called November 2025 the November inflection point because that was when GPT-5.1 and Opus 4.5, combined with their respective coding agent harnesses, got good—good enough that we’ve spent the last six months adapting to agent systems that can reliably get useful work done."
Most of the money right now is in coding. Openai and Anthropic just have to be 6 months ahead of SOTA open source models and they'll capture most of the enterprise and dev market
Same. It's a nightmare from a Porter's Five Forces perspective.
There will be a ton of businesses competing in this space, and there will be something of a moat due to how capital intensive the business can be, but there will still basically be infinite competitors.
Great for consumers.
I agree with the common trope that open models lag behind by about a year, but something magical happened just around a year ago when the state of the art models became extremely useful. By this reasoning we're about to see open models perform well, but I'm afraid there is more to it than just waiting for another revolution around the sun.
Note, my application is coding assistance. Open models can be great for other purposes.
We all have our own observations and mine don’t significantly diverge. But that’s bottom up. At this point shouldn’t we be seeing it top down?
If we are beyond potential and into significant productivity gains, why isn’t that showing up for the customers?
Why didn’t delta airlines get significantly more operationally efficient in the last 3 months due to the introduction of better software?
This is a genuine question, I am seeing a disconnect.
It's not exactly news, is what I'm saying. And even with the PMF they found, the product is still only a commodity i.e `tokens`, which is what every other provider on the planet is also providing.
All their other products boil down to "harnesses", which does not look viable as a product in the sense of PMF - you cannot sell it, you cannot lock it to your own subscription, API, etc. so you can't use it to generate revenue any more than the free harnesses do.
PMF has a specific meaning, and "code harness" or "coding model" does not satisfy the commonly accepted meaning. Maybe Mythos (or similar) will.
How enormous? 1 trillion dollars, 2, 10 trillion enormous?
And yet we surely need this data for the IPO? Or are they relying on rule changes on the indexes to force ETFs to buy shares?
It's a given that the SOTA models need to raise their prices. It's also a given that they can't. The more they raise the more customers will move to their competition.
So what happens next? Well I think it will suck horribly if you can't move off of SOTA sooner or later, because the Big Two are going to lose customers, and therefore have to raise prices on the locked in customers even more than these projections suggest.
Beyond that if you're looking to start a business, figure out how to use cheap models in new scenarios. Build software which does that and license it. This is kind of contrary to the idea that you shouldn't over optimize for deficiencies in the models that will likely go away in the next generation - for instance a lot of problems were solved when context windows got way bigger. So it's a thin line to walk but I think it's there because a lot of orgs are using Claude today for pretty basic tasks.
The dev who's addicted to SOTA models honestly is going to have to settle for less or get totally screwed. Most applications within business from what I see aside from complex research do not require SOTA. They summarize, they classify, they transform, and doing that accurately has been cheap for a while.
I realized it long ago: one needs output to make meaning. Input can only be the cherry on a cake in one's life. That, actually, makes FIRE or Fat FIRE not so sustainable unless one has other hobbies.
I feel like the reverse assumption is being made, that the current model looks like IBM doubling down on Mainframes soon to become cheap enough to deploy everywhere, when the real action is that the costs coming down represents cheaper hardware or more efficient software, and that a big chunk of "cheaper" AI will be eaten by smaller products deployed by individuals. Whatever the Personal Computer of AI looks like is going to be more disruptive than just an API endpoint you can fling tokens at.
We already see this with things like chrome auto installing an LLM.
You cant tell me with complete certainty that theres a moat here for the people spending 1 trillion + on this infra.
>When the data was further analyzed by statisticians etc, it turned out that people claimed they were going to watch films 10-12 hours a day, every day of the week. Impossible.
I also think this applies to people suggesting that companies will sack workers for AI, when the costs of replacing everything someone does in a day is more expensive in terms of tokens (likely even at a reduced price) than just hiring a bloke.
A lot of these LLM demand scaling scenarios make broad "up and to the right" assumptions about things which in practice have finite limits. Only some percentage of knowledge work benefits from acceleration, optimization or other improvements, and even then the amount of economic gain is capped.
anthropic already hunts down OpenClaw users for using too much on their plan.
I'll give different example: When LED lights started to be more popular, the power usage didn't drop by the amount of power saved
>- tokens will NEVER get cheaper due to improvements in hardware, software, design, market forces etc etc
Well, first, improvements in computing stalled or even rolled back just purely because price of everything compute shot up cos of AI and that will NOT be fixed for a while and ESPECIALLY if AI usage will continue to increase
Second, the token per model might go down in time but better models have more expensive tokens, so we quickly get into spot when:
* price increase in token might not be worth marginal improvement next, better model brings
* more and more models are passing "good enough for the task" threshold so for less and less companies there is any economic sense to pay for the "best" instead of paying deepseek or some other company to run "previous gen" models
“There’s more capital than good ideas to fund” has been a complaint from the likes of A16z & other VCs for a long time now. It’s why we ended up with stuff like NFTs getting funded.
i am pretty sure these services know what it truly costs them to serve you tokens, maybe not in realtime but at least periodically.
however, what they charge us is a constant exercise in price discovery. i agree with this sentiment in the sense that we don't have a stable sense of the cost. all of these comparisons are good for the moment, or at most the near future.
i believe that even the "all you can eat" approach with the max plans, regardless of their crazy pricing, is not sustainable only with the power users. if most of us gets this kind of value through our plans, surely it does not incentivise the service providers to continue pushing it. maybe they can regardless just to gain market share, but not forever.
I was recently consulting at org where two separate engineering teams were all in on two different, incompatible deployment platforms and using AI to accelerate adoption of each.
Management was mystified why their engineering leads kept telling them they couldn’t deploy a complete implementation of their solution.
The coding agents got good in November. Most individual engineers didn't fully clock this until January/February. This means that companies didn't really figure it out until March/April.
Assuming companies like Delta have adopted coding agents (which would be pretty fast) it still takes months from adopting a new tool to the code results of that tool rolling out to production.
I expect (and would hope) Delta's software development culture is very conservative. Since nobody can confidently tell Delta "here are proven practices for using this tech to produce high quality, more secure code" yet it would be surprising if they were blasting full-steam ahead.
I expect that even companies that got on board with coding agents in January will only just be starting to ship user-facing features that benefited from those new tools. Shipping software takes a long time, no matter how much faster the "typing the code in" bit gets!
But memory costs are going way up. And both OpenAI and Anthropic bumped up the price of their frontier models in April.
For a pretty funny comment about pricing.
https://www.reddit.com/r/chipdesign/comments/1ajrli2/cadence...
As you might suspect, this is what I have an issue with. Without LLMs, isn't it possible or even likely that that code wouldn't have been written at all, and wouldn't have been missed? If LLMs are mostly used to produce throwaway prototypes then it's a stretch to say that's money well spent.
If indeed it let you advance your main product much faster then sure it's a different story. You're the judge of that. It's hard to see the impact from the consumer side; everything is still broken and no extraordinary app seems to be emerging. Maybe it's just a question of time. We'll see.
From this I assume you think that what the llm has generated is as valuable as your own work generally is. How do you even calculate this?
The customers of these tokens need to see returns on their projects that exceed the cost of financing.
Laying people off only goes so far.
If enough said firms don’t see enough value given the price of frontiers they will cancel and consume open source. This is the risk the frontier labs are exposed to.
How so? What's specifically changed? We still don't know what their unit economics are and everything you've documented is basically speculation at this point.
If not lower priced chinese offerings will be better as its cheaper per token - giving you more attempts to offset the variance.
My feeling on the former is no... I believe they tried really hard but they've settled on pure marketing now to attempt to fight off the chinese with perceived superiority in quality.
It's a natural response for society to despise these people who have such contempt for us. It's almost embarrassing these days being at a social function and telling people I work in software, it's got a negative stigma almost like working in gambling or the military.
No. What? Of course not.
>Many people charge more than that per single billable hour.
hrmmmm not so sure about the work that "many" is doing there
* At some point model capability reaches diminishing returns. Then inference >> training in the future but training >> inference now. It’s not a prisoner’s dilemma but a land grab to solidify market position and be one of the 2-3 firms left standing as dominant in the space. The model companies aren’t super sticky yet but they’re working on it.
* even if training remains >> inference, it’s possible to have multiple price points like they do today. If you need the most capable model you’ll be paying exponentially more per token to supplement the training cost even though the serving cost is marginal because most people will be satisfied with cheaper / less capable models for most tasks.
I buy that inference is a dropping line item while training is a growing one. There’s all sorts of things on the horizon that’ll be order of magnitudes improvements, from startups burning models into ASICs to get order of magnitudes more performance to alternate architectures like diffusion transformers that have orders of magnitude structural optimizations. It’s inevitable that it’ll come down even further from where we are. It’s possible model training also will go down but I’ve not seen any compelling research suggesting major “easy” reductions here.
Yes I know there's no evidence and this is lazy reasoning. But there's probably a bit of truth to this line of thought.
Training involves multiple passes over the entire training dataset, ideally in large batches where you can perform inference on as many samples as possible simultaneously and then perform backpropagation to adjust the model weights (which is about as expensive as inference).
Let's consider the size of the dataset we're dealing with here. The dataset likely consists of practically every piece of digitized text they can get their hands on (including that extracted from audio and video). We know Google has digitized a large portion of the books in existence as part of their "search book contents" feature and we have no reason to believe they're not using it alongside their cache of 90+% of the internet to train their models. We're talking about 100s of millions of books each with an average of 100,000s of tokens. The internet has 10s to 100s of billions of pages on it with who knows how many tokens on average. This is a huge dataset that we've got to go through hundreds of times.
Second, let's consider the effect of batching and how it sets requirements for our hardware. We know that larger batch sizes converge faster, are more stable, and produce better models. So if you want a good model you need large batch sizes. This means that you need machines several orders of magnitude more powerful than you use for inference. From what I heard Google uses clusters of 100s of the their TPUs all located in a single rack for training. These clusters are organized in a customized computing architecture to maximize memory locality between cores (really critical for efficient back-propagation). Further, you can't use reduced precision weights for training like you can for inference, so there are no shortcuts.
Finally, the initial training stage is followed by reinforcement learning stages - this is key development in how AI models have improved in the past year. This may mean going through a curated set of traces (either synthetic or captured from users) and adjusting the weights based on experienced outcome.
Overall there's so many orders of magnitude more work and more hardware requirements for training that I find it improbable that inference dominates. The number of "inference" steps in training is freaking ridiculous and includes such factors as the "number of words ever written".
Maybe investors will realise that "the only winning move is not to play".
And so we are left with (as was) frontier models getting more and more out of date as whoever their post bankruptcy custodians are tries to eek pennies on the dollar for inference on their decaying property. Perhaps along with local and/or highly specialized models still feeding on the after-glow of the huge amount of training that was (and is no longer) done.
The next AI winter is going to be deep, savage, and long.
The Gemini Flash is very good at searches. Just about any low end model can toss out a poem. All the higher end models (open source and otherwise) seem to be able to churn out code that passes tests. The smaller, "less capable" ones are much faster at it, which means in the hands of a skilled practitioner are the best choice for that task. But they rapidly fall apart where there isn't a hard source of truth (like a good test suite) to grind against. Because of that you have to use a bigger model for bug finding. In that task the open source models tend to fail on larger code bases, where something like Opus still shines. I gather Mythos is an absolute monster, and unparalleled, and unavailable. I'm sure one of the reasons for that is it's so expensive to run.
Or to put it another way - you don't use a 100 tonne crane to pick up the shopping. And ... the smaller models will happily run on in-house hardware. You may not do it today because of the current DRAM price and integrated NPUs have just started shipping, but in 5 years time models will be running on your phone.
If open source models are ~3-6 months behind SOTA, and ~opus4.6 capabilities are good-enough for product market fit, do the frontier labs have half a decade to catch up on their prior burn?
AI cost ballooning faster than companies can afford is becoming a very common topic in my circles right now. The era of "I'll pay infinitely more for marginal gains" is over from what I can tell.
Just think how much further that $100K would have gone if the hardware market wasn't so screwed-up.
Anecdote: I priced-out adding 1TB of RAM to a four node cluster a couple months ago. The cluster was purchased in fall of 2024 w/ 4 nodes, each with 256GB RAM. The nodes cost just over $14K apiece back in 2024 (entire box, not just the RAM).
Dell wanted >$90K a couple months ago to add 256GB to each node.
AFAIK you would get about ~5 concurrent users, with a max context window of ~128K tokens on the larger models.
This wouldn't be good enough for coding -- are you guys thinking of using it for something else?
The decadal move to all-cloud-all-the-time killed off in-house hardware teams while the C-suite chased their OpEx dreams.
It would be interesting if we come full circle on this.
And what happened? How many hours per day/week are people spending watching now?
I suspect that AI will fail to pan out to the same extent for the same reason why outsourcing hasn't fully panned out (even though every company tries it after getting big enough).
The problems that will come up will be and always have been ongoing maintenance. AI is great at writing new code without a brain behind it, but once you get to the point where you need to refactor code, you start really needing someone with coding experience to guide the AI or veto it's mistakes.
I don't think that's really fixable even with a lot better AI. It's not something that ultimately comes out of the likes of github data.
I'm not saying that AI isn't going to make things better, btw, I just don't think we'll see a 20x improvement. Probably more like 1.5 or 2x.
I would argue that that's been the case for quite some time before AI. As an example, what innovative amazing world-changing products have Google or Meta launched in the past decade with their very high numbers of very talented and highly-compensated engineers? The issue with most big tech companies are leadership, strategy, and product direction. I'm not saying that they don't make any profits, just that they probably aren't "building [the right thing]".
AI for product development and management would be far more impactful than automating rote coding tasks / building React UIs that mirror API structures IMO.
It sounds like the economy would largely reduce to the small minority class of independently wealthy people.
What makes you think the people who used to build (or would have built) software will switch into the industry of "knowing that the thing was the right thing to build", as opposed to something cooler like surgery, city planning or experimental physics? The roles within a tech company are not the only jobs in the world.
I spent $200. If I had been paying API pricing it would have been $2,180.16. The article is about how enterprise customers get charged API pricing, which means if I had been employed by one of those companies I would have cost them $2,180.16.
What am I missing?
Yes, value is hard to calculate, but luckily market pricing mechanisms exist exactly for this purpose. There isn't a better number to use than what people are willing to pay for them.
So he's saying that on an enterprise plan, he'd be spending $2,180.16. He's not paying that much, but enterprises are.
- it’s a swindle because ROI of tokens for coding models is not positive? As in it doesn’t bring enough value to charge like the $100/mo?
- enterprise customers are too dumb to see this
- IPO to max out the CEO profits for what is ultimately blockchain vaporware
Am I getting that right? Or am I putting words in your mouth?
> it gives tons of people in the world the impression, they can now build their own startup, game, infra etc without the need to learn it themselves.
I can’t speak for peoples beliefs and motivations, but this seems to be strawmanning, no? AI is a powerful tool to force multiply people. You can’t just prompt “build me an enterprise SaaS app worth $1B” or “build me GTA6 and don’t hallucinate” but is that your impression of what’s happening? Dario and Sam are saying “if you buy our coding agent subscription you can build a game with zero skill and one shot and then be rich”?
If you don’t find value in AI agents I can see reasons why that could easily be true today. Also if it just gives you the heebie-jeebies. But to say it’s a swindle on par with the blockchain I think that contradicts an enormous amount of signals and also the actual dialogue (not just headline sound bytes) around what these systems are capable of today and what we expect them to do say at the end of the year.
> Would we use 10x fewer people to build out infra on the public cloud or the entire so-called Big Data platforms?
No, cannot solve core problems, makes a mess at scale
You are right about the incremental work. But most of the work is historically incremental imo, only few positions are R&D.
I currently pay ~$150/m in tokens to cover my 1099 work. If I had to pay $1000 for those same tokens each month, I would still be massively positive on my margins. I bill customers based upon # of completed features & bugs, not total time spent at the computer. Tokens would have to more than 10x in cost before I would start to have a problem on my end.
So one possible future is that frontier-level training becomes so expensive and the use cases so sparse that it simply isn’t viable to keep going bigger.
Google seems to pretty regularly post about how their TPU and algorithm advancements have been decreasing energy costs for both inference and training.
That seems like a large number, until you realize that OpenAI claims to have almost a billion weekly users. And OpenRouter shows many models at over a trillion tokens per week.
So in pure token terms, I'd say it is in fact extremely plausible that inference dominates, at least for the popular models.
Unless to the grandparent commenter’s point they’re using it to obscure their large prisoner’s dilemma (training) cost?
I skimmed the article, but couldn’t spot any details on their estimates. They mention 70b+ params as being large in several places. But we’ve had several 100b+ param models that trail Sonnet.
A given model is trained once but applied N times. A large enough N will dominate training, no matter how complex and costly it was.
But how long is a model useful for? How often will labs need to train new models? Time will tell.
But what if your competitors sell their knowledge to AI companies?
Then you're still screwed.
Roughly equivalent to 4x H200's for less than half the price.
Vaguely around 60k tokens per second...
My friends in day care tell me the kids hate "movie day" because movies are all the get at home and they are sick of them - they want to play all day. (but I'm not sure if this is representative of anything other than the types of people who put their kids in that particular daycare)
Second to this are countless other areas that have a major impact on the companies bottom line that are entirely engineering driven, especially at google given they are a cloud provider and have meaningfully grown the workspace business and launched waymo in this time.
I would agree but it's really minimized the building. More and more time is being spent on pre-coding work.
To follow on from that comment, if the growth in breadth of capacity of AI leads to a decrease in the risk of running a smaller business, which I don't think is an unreasonable prediction, then it's not inevitable people do lose their jobs. Employers get smaller, higher-leverage, and more plentiful.
And part of my reasoning for this is: the only system capable of actually fixing bugs in vibe-created code is an LLM. If we humans couldn't write it without assistance, we certainly won't be able to debug it without assistance. So there's a real stickiness here.
We're signing pacts with demons - we have to, if we want to outcompete the other warlocks - and those pacts are written in the very size of our codebases.
And this is why many companies go out of business. You always want the best bang for your buck, sometimes this is the "best model" and sometimes it is not.
Not sure about other domains though.
We have no market convergence on tokens yet (and it'll differ between LLMs), so it's impossible to say what value you got for your $200.
Does that mean you'll be saving $99k?
It sounds an awful lot like the mark-up to mark-down scheme where the price stays the same.
The point being made above is that API pricing is calculated... somehow... seemingly arbitrarily. Possibly untethered to the infrastructure costs entirely: which would be the basis of any 'value', however that holds the labor theory of value, which isn't accurate either. So how do you accurately price these tokens at all (other than through price-discovery: which is slow, messy and fuzzy)?
Maybe if you spend $2000 on a BigMac. But it’s unlikely you would buy such a burger.
What is a hamburger worth? Don’t look to McDonalds to set the value.
As with pretty much anything priced on volume/usage.
Enterprise deals are negotiated ad-hoc, the listed pricing is simply a jumping off point for the final negotiated discount.
If you’re going to give 20,000 employees Claude code you are not going to be spending $1B per year on Anthropic tokens as if you gave everyone an individual API key. Just as Anthropic isn’t paying AWS SES $10,000,000 to send 1 email update to their massive user base when the next Claude version drops.
edit: I missed the "enterprise" feature matrix with the usual audit/compliance stuff to force the biggest enterprise customers onto enterprise plans. Otherwise the "teams" plan is much better value for any business.
orig-continued:
https://claude.com/pricing/team
Teams premium is "Everything in standard, plus more usage*"
And from my experience, it's a very generous usage, I've only hit the limits once or twice, and both times required multi-boxing agents.
I could single-window agentic development all day on opus-4.7 auto-mode without hitting limits.
If you're a business using claude, then that seems like the right plan, the enteprise/API plan seems more suited to where your product is built on top of the agent themselves, so seats/limits aren't really meaningful?
It’s quite an elaborate swindle obviously. But you generate hype with underselling your core product, you claim way more usability then there is. Users will experience usability initially. Everything multiplies with each other and then you put it on the market. Everybody involved makes money and you’ve succesfully extracted money from everyone who’s invested in NASDAQ index funds at the very least.
> Dario and Sam are saying “if you buy our coding agent subscription you can build a game with zero skill and one shot and then be rich”?
That’s Anthropics marketing, yes.
Also their offering is not uniqe that justifies a 1 trillion valuation. The first companies are already rowing back. It’s a really certain time window that they are about to hit now with their IPOs
The companies that have signed these enterprise deals haven’t done a ROI analysis. They had Fomo.
They are saying profitable companies should replace the engineers that built their systems with a subscription (while they are hiring).
Maybe irrelevant to your point, but I'd argue they were really good already in May if one used the right workflow (planning etc.). They've become better, but they're not saving me significantly more time now than they did 12 months ago.
Eventually either the supply will go up or companies will start buying fewer overpriced GPUs.
Either way, the price per token will come down as hardware improves and supply and demand reach equilibrium.
Supply will eventually catch up with demand. Then the prices will come back down.
I guess we are welcoming the software people to the world of expensive tools. Just sad that the FOSS alternatives of these tools are not as powerful whereas software industry still has FOSS tools to fall back on.
[0] https://winchdesign.com/ [1] https://www.superyachts.com/directory/1516/winch-design/flee... [2] https://www.autodesk.com/design-make/articles/naval-architec...
I might agree "AutoCAD" is the current level LLM's are at, but wait until your design departments discovers "Revit", its another ballpark (in wasted cots, engineers on site still get "clashes").
Revit costs are high, and the end results are marginally better - but local LLM's tokens are cheaper 24/7 at "AutoCAD" level - "Revit" level tokens will make Ubers CTO/COO weep harder than they already do. While producing results no better than "Revit" does (engineers still face "clashes").
Open source software changed the world. AI that will cheaply write whatever you want in a few days will also change the world.
(I have a feeling if I could say "and I closed $2m in sales with the software I wrote!" people would find a way to say that didn't mean anything anyway, because how can I prove I wouldn't have made those sales writing it by hand?)
Personally I see no difference between China and America in terms of risks of them embedding "backdoors" so to speak, but I disagree when people claim that open-weight models are obviously safe just because they can be ran locally.
Sure, you can self host a non-frontier OSS model yourself; including Deepseek. And no doubt some people will pay one of the companies I mentioned to rent the infrastructure to do exactly that. Much of the rest of the world will be paying directly for direct access to the frontier models.
As for the legal/compliance stuff, I recommend you don't take any big decisions on that front without consulting lawyers. My understanding of that is that most serious companies in the EU have to take these topics pretty seriously. I'm sure in the US, hosting all your data and secrets in Chinese data centers isn't a whole lot less controversial.
The Chinese could of course choose try to match the current levels of investment Google, OpenAI, Anthropic, etc. are putting into local infrastructure. But as far as I know they aren't and there are probably a few political blockers for that.
Without infrastructure, their role is being a niche player in these markets. It doesn't really matter how good they are if they can't scale to most of the market.
1. Both Anthropic and OpenAI significantly increased the prices of their latest models. They're clearly not trying to offer the lowest-price-possible to drum up demand any more.
2. Both Anthropic and OpenAI no longer let enterprise companies buy discounted almost-all-you-can-eat subscriptions. Those big enterprises are now paying full API prices.
3. According to reasonably well-sourced leaks, Anthropic may be about to have their first profitable quarter.
And I didn't even say "profitable", I said "credible business model". I think getting companies to spend hundreds of dollars per month per seat, WITHOUT crazy subscription discounts, is a credible business model.
Dario telling Dwarkesh three months ago that they have a margin on inference: https://www.dwarkesh.com/p/dario-amodei-2?timestamp=3528.0
Oh, now that their IPOs are nigh they're changing their tunes (https://archive.md/s9EO3) but to me that looks more like they've decided to let $$$ prospects override what they really think.
What are your directives?
Also, inference costs are bound to go way down with more optimized architectures. GPUs are fundamentally not great at inference. No platform where the weights are streamed from a large pool of memory is. If the models ever quiet down, there will be massive step changes in cost/token, energy/token and tokens/second, as models are etched into silicon ala https://chatjimmy.ai/
We are still chasing the best because the best is moving rapidly, but it’s a simple thought experiment to work out what the cost to serve an 8B model from 2 years ago is in a world of 2T models.
Note: parameter counts are illustrative. Concretely, qwen3.6 27B delivers opus 4.5 capability at 1/27th the cost on openrouter. Single chip llama3 8b performance can exceed 17k tokens/sec.
Speaking to your point, inference being dramatically less costly than training would not be seen as a delta from the norm. The model of providing inference for anything near the operational costs (like a utility would), would the delta from the norm if it were true.
Why are they getting out of date? Is it because we have new content from the internet that the older models did not have? Or are we simply trying to increase the size of the training data? In other words not more up-todate in terms of time the content was created vs. wanting to use bigger training-input-sets?
They know they do not and that’s why they’re all trying to IPO right now, so they can pass the bag to consumer investors
Your argument rests on the "for marginal gains" part but it's really not clear that the gains are marginal in the foreseeable future.
RAM is expensive, but not THAT expensive. I just bought 128Gb for about $5k for our build cluster (it's not even for AI, sigh). Even if you need larger-sized DIMM sticks, it's still going to be in the vicinity of ~15k tops.
Surely we could just put better stuff on the radio, and accomplish most of the same goals for a far lower price?
My mental model for that is that outsourcing fails where the work is being done organisationally far from the knowledge needed to do it. We know that's true of teams inside organisations, there's been a lot of research on how distance in the organisational tree negatively impacts productivity. Outsourcing is a pathological worst-case of that.
The promise (promise! We're not there yet!) of AI is that I can have a cross-functional team on my laptop. Organisational distance is zero. Where previously the outsourced team has to wait for the time zones to roll round so I can answer their blocking question when I get to my email STRICTLY AFTER I have had my coffee, now it's a prompt in a chat window with a button I can click to make a choice in 5 seconds. Delay is gone, cost of delay is gone.
> The problems that will come up will be and always have been ongoing maintenance. AI is great at writing new code without a brain behind it, but once you get to the point where you need to refactor code, you start really needing someone with coding experience to guide the AI or veto it's mistakes.
Oh, absolutely. That's a minefield. Today. It will be, right up until it isn't. There are ways to set up agents and projects right now that make a dramatic difference to how this part of the picture plays out, but those will sink into the harnesses as time goes on.
But also the big problem with maintenance and outsourced teams tends to be the commercial structure around the contract. You get a Build team, who Build the Thing and then: no more features for you, anything you want to add past the original spec costs extra. They hand over to the Run And Maintain team, who get to fix all the bugs that the Build team left but without the knowledge gained from building the thing, but are scaled and located to be absolutely as cheap as the supplier can get away with so probably don't have the skill, inclination, motivation, or permission to take on any restructuring to make the bug fixing easier and they're on the wrong end of the globe so there's a 24-hour latency on any queries. It's a terrible way to set teams up, but it looks good on paper.
Again, that's peculiar to outsourcing and completely goes away if I have the same team that built the thing own the thing long-term. That's true if it's humans or AI!
> I don't think that's really fixable even with a lot better AI. It's not something that ultimately comes out of the likes of github data.
No, it's a harness problem. You need to start from a maintainable point and keep standards in place. It'll take work to get the harnesses there and it's not ubiquitous. You might also need better models, but I've already personally seen big differences in outcomes between projects that took certain steps and others that didn't; it's nothing revolutionary, mostly stuff that works for humans also works for AIs but you need to know to ask for it.
> I'm not saying that AI isn't going to make things better, btw, I just don't think we'll see a 20x improvement. Probably more like 1.5 or 2x.
I think people radically underestimate the cost of delay. I don't know if 20x is realistic for the AI itself, but I think it's not impossible once the inefficiencies of having to go to other humans is factored in.
The determinant of success was only whether the task needed American-tier labor or could make do with sub-American quality labor.
I don't think there is any shortage of great ideas at these companies, they are just extremely bloated. And I don't think its something like indecision or bad PMs, it's "we have a finite amount of time and resources so we need to be conservative but also not too conservative"
If you have AI systems that can simply build out POCs in days, backtest on real data, show reliable results and numbers, you get a suite of product options you were never able to get before. If you have coding agents that can speed up implementation, you can build more stuff and choose the things that stick.
It changes the cost/benefit calculus of the entire business. I think you are exactly right in that: PMs/leadership are by their nature orchestration machines. Other roles are as well, but I think PM's are at a particular advantage here in that it will be quite awhile I would expect before core product decisions and creativity can be delegated to an AI, but not quite awhile until virtually everything that they're blocked on (legal approvals, POCs, wire frames, etc etc etc) will become less and less of a blocker
Yeah, if this stuff actually worked that well already, OpenAI et al. would just run AI CEOs and engineers. Why get some other company to pay you at all when you can automate every other company out of existence and take all the money they make?
The fact of the matter is that while the tech has some uses, it sure as hell isn't a full scale replacement and you almost always actually have to massage the input into LLMs to get anything decent back out in practice. Some CEOs and managers can learn to do this, of course, and some already are... but that quickly turns into a second full time job. A "programmer" is still needed. The job might change from mostly hand-writing C++/JS/Python to prompt engineering + some manual coding to fix all the stupid fuck-ups that the bots can't solve themselves, but you still need someone to actually prompt the bot.
When that changes, it won't just be engineers losing work; there will be no reason to even have a human CEO any more.
Kubernetes is at 11 years ago, and is huge enough to be included there. The Google Pixel was just under 10 years ago. So... not nothing haha
If they can crack that latter review/spec-check/assurance step, checking that what was built was what was demanded of the problem such that we don't have humans in the loop at that step either, then the bottleneck moves again. Then I think it moves to requirements capture and to product development, but that might depend on the industry.
The problem is they get killed by some other executive who is afraid of their department looking bad by comparison.
I think this is fairly illustrative of the challenges in AI becoming as impactful as the Internet. The bottleneck is not making things. There are plenty of people who are really good at making things and can easily be 10x or 100x as productive as the average corporate worker. YCombinator was founded on that premise - small teams of founders and early employees could be orders of magnitudes more productive than the 1000s of corporate employees at their competitors.
The bottleneck is on bringing your product to market. If your innovative new product is built within a corporate environment, it'll get killed unless the executive you work under can get a promotion out of it, and you'll be denied all sorts of help with approvals, launch process, PR, marketing, branding, etc. If it's a startup, they'll try to shut you out with exclusive distribution deals, legal threats, lobbying efforts to change the legal environment, PR campaigns, FUD, etc.
The Internet was revolutionary because it let millions of people bring products to market without asking permission. Instead of having to bid for retail shelf space among dozens of entrenched competitors that all had sweetheart deals with the retailer, you could just put up a website and sell it to anyone across the globe. Instead of following hundreds of regulations that governed existing commerce, you could just launch something and sort it out later. AI doesn't really have that property - if anything, it makes things more centralized, with more gatekeepers, and so seems more likely to destroy economic value than add to it.
You'll find that most internal "innovation" teams are just lip service. In most cases, the "mothership" will be incapable of reproducing true innovation -- from a statistical perspective, culture perspective (mega corps are anti-scrappy; internal politics), and motivation perspective (startups aren't 9-to-5). It's much easier to have big M&A budgets, a VC arm, and some handwavvy internal innovation group.
Every now and again, you'll get real innovations (Waymo, transistors, GUIs), but even those have a spotty track record of commercialization when created internally.
It takes a skilled knowledge worker to use these things.
Somewhat oversimplifying; writing software and building apps was a bottleneck - now it is not. What is the next bottleneck that LLMs can solve? Is there one? And is there enough publicly available data to solve it repeatably at scale? Or did we just automate stack overflow searches and now we’re stuck again?
Or is the endgame of this innovation cycle the complete removal of interaction with machines through code? Will we simply interact with machine coworkers purely through natural language? Can an LLM make PowerPoint slides and run a meeting? So far not seeing much progress on that.
I'm not sure if this runs counter to your point or not, but: I don't see any future where LLMs aren't a core part of Software Engineering. The horse is out of the barn. There is no going back.
people -> programmers, I haven’t met a non-developer who reports getting more time out of current AI platforms than they put in. If anything I’ve anecdotally heard the opposite, introducing AI at work creates so much slop (output) it takes more time to process it all without a tangible bump in overall productivity
52 on AI misuse: https://simonwillison.net/tags/ai-misuse/
149 on the unsolved challenge of prompt injection: https://simonwillison.net/tags/prompt-injection/
40 on slop: https://simonwillison.net/tags/slop/
If you want an "LLM evangelism blog that rarely, if ever, has any critical analysis that isn’t pro-industry" there are plenty out there. I'm not one of them.
I highly doubt I'll ever use Claude again.
I think you are wrong about Claude being any significant level better
Once the model gets good enough, the returns on bigger models diminishes quickly. I don't want to spend 10x the money and wait 5x the time to get answers that are equivalent.
This is transparently false, because the best "model" is still competent human developers. They're just more expensive. If you're willing to use current LLMs at all, it means you're willing to sacrifice quality for a better price, and your disagreement with the comment you were replying to is entirely about what the optimum tradeoff is.
Currently, the difference is substantial, but what happens if capabilities saturate?
At work I mostly use Claude Code and a bit of Codex; personal projects are OpenCode and honestly I prefer it.
For me, it feels like widely available open models have recently crossed that same canyon. Are they as good as e.g. late-model Claude Opus? I don't think so. But they have absolutely gotten past the point where they are beneficial. This means that, for me, they are about six months behind.
Like how snapchat kind of fell off because the feature could just be a subset of instagram
It seems like it would just become a commodity like EC2
In latest experiment I used opus for implementation plan then used cursor composer 2.5 for execution.
I must say that combo is really good. Main drawback of claude code is that is super slow. So when paired with composer that is super fast it flies.
Wouldn't they be bragging about it to investors? It feels like something that would matter a lot to them, and at least OpenAI kinda feels desperate to find them.
There's also the small question about whether a drop in inference cost would actually change anything about profitability, when training seems to get exponentially more expensive.
They just have to be useful enough that companies don't need the best.
They are.
And 5% worse model for 10% of the price of the bleeding edge will be worth it for majority of people
1- SpaceX + Tesla + xAI merger / IPO while Musk was vocal against IPO for about a decade
2- Warren Buffett cash at record highs
Someone got to be exit liquidity
I haven't had problems w/ Dell support and 3rd party memory, personally, but given the machines' application I understood the concern.
If we still were in the ZIRP era, busting the bubble would certainly kill off the world's economy for good simply due to its size.
There's not that much depth in a lot of 'everyday' writing. For many tasks that means that you don't need to be hyperintelligent - reading a recipe or a shopping list, reading a newspaper article, etc.
Then there’s NTS, BBC… Ypu can listen to them from online service, but at least in Europe there’s amazing national FM broadcastimg services.
TV is just bad radio with flickerimg lights.
Obviously that includes whatever needs to be done to hoover in data from their marks and Meta also does the same thing without fail and both are really good at it. But outside their remit not so much.
I don't.
> What makes you think the people who used to build (or would have built) software will switch into the industry of "knowing that the thing was the right thing to build", as opposed to something cooler like surgery, city planning or experimental physics?
Because it's probably already part of the job. It's a change of emphasis, not a change of career. Your boss can already ask you to do it. If you're producing code, you're probably also reviewing code, checking it matches the acceptance criteria, testing it, sanity checking that it was the right code to have been written, today.
Investor confidence. They have a bit of a need for cash (also an interesting part of the profitability discussion of course).
> Also, inference costs are bound to go way down with more optimized architectures
I agree. Jimmy is incredible, I wonder what non-toy use cases they have. Surely they’ll come out with updated chips soon.
That said, I was apparently a bit over-excited for Groq and Cerebras. I thought they’d quickly dethrone Nvidia for inference, but not so far. Even the GPT spark trial isn’t seeming to go far.
I'm most curious about this sentence. What have you noticed about the similarities? I'm getting really good at asking for confidence levels, tests and pushing back, but I'm curious what you found
End result is that many outsourcing firms are borderline fraudulent in the way they treat their customers.
Sure, it might start to slow down, but even then we will likely see a doubling in the next 10-15 years.
https://substackcdn.com/image/fetch/$s_!_ZW2!,f_auto,q_auto:...
In other words, most of the prompting will also go away.
And I don’t even necessarily disagree with OP! It’s more like the competition is shifting so quickly that your competitors could undercut your PMF in a blink of an eye.
Not saying this trend will do the same, just that the industry adopting something doesn't guarantee its success.
If I make an argument and you disagree that's fine with me, provided I didn't use misinformation or sloppy thinking in making that argument.
Many people still think AI coding agents are slop on steroids despite all the current hype around AI actually shipping functional products.
Unless ofc there was an actual speed difference, only reason I'd be willing to go with a worse model couple of percent worse than current best model is if the speed was at least 5x higher. Looking forward to kimi k2.6 offered publicly by Cerebras
None of them are quite opus, but they are damned close and a no brainer if you care at all about cost.
But there have been very good open source office apps for decades and few enterprises use them, so perhaps this is just the nature of B2B purchasing committees and 'nobody getting fired for buying IBM.'
Simon is saying that companies are (today) willing to pay API prices for tokens which is as good as any determination of value.
You seem to be suggesting the price of tokens is entirely disconnected to the cost of providing the service? I don't see much basis for that assumption.
your point is large players won't pay those prices at massive volume. ok
Going to be interesting to determing the metrics we give to engineers for determining whether the spend on this is worth it. Measuring PRs, lines of code committed, commits fully generated by agentic workflows, etc.....
Do you have any numbers or reports to back that up?
How much do you think emails cost? That number is just so far off?
But besides that, running SES is also quite a bit cheaper than SOTA ai models with high demand (and comparatively) no competition. And quite a bit more pressure to make money (soon).
Like anything else in the economy: at the point where enough customers can pay you, and not enough will go to the cheaper competition.
Or are you saying that Anthropic is determining that cost per million arbitrarily?
If so, it'd be like asking to explain why things fall on the ground other than through gravity. Companies pick the price they think the market will bear, and adjust based on new information.
I don't believe you have the option to keep with the $200/month flat rate subscriptions any more. I'd be happy to be convinced otherwise.
(I dug into this a bit more and couldn't find anything in their consumer terms that say "you cannot use this personal account if your company has more than X people", so I imagine the pressure is more that your big company's purchasing department really doesn't like managing hundreds of individual subscriptions as opposed to a single, stable, predictable negotiated contract.)
Isn’t this a contradiction?
> Everything multiplies with each other and then you put it on the market. Everybody involved makes money and you’ve succesfully extracted money from everyone who’s invested in NASDAQ index funds at the very least.
Sorry I may have totally missed what you’re saying here. Anyone in S&P has already made a lot of money thanks to effects from these companies. No one has to invest in an index fund. Markets have risk…
> That’s Anthropics marketing, yes.
Show me.
> Also their offering is not uniqe that justifies a 1 trillion valuation. The first companies are already rowing back.
That there is competition doesn’t imply they aren’t worth 1 trillion.
> The companies that have signed these enterprise deals haven’t done a ROI analysis. They had Fomo.
Also…wrong. I have seen the data at my company, everyone at scale tracks this.
The only times when people talk about actual full replacement of people is always when they are talking about some “future AGI” that is far more capable than the tools we have today.
Those tools are used in ways that they're integral to processes. They have their equivalents of ticket systems that are linked to code repositories with LFSs and bunch of IDE type tools and automated and manual test systems and build systems. Their equivalents of PR discussions and Selenium screenshots needs to check all boxes in the right ways for legal and traceability purposes.
Without all that might be $175/user/month but you're not shipping apps with just vi and bare gcc.
And that is likely a fair assessment, though I understand perfectly the feeling that you have that you are accomplishing great(er) things thanks to AI.
Perhaps, but that's also a good way to lose users+reputation as there's no way to control when said malware is generated. Once the first instance is discovered cybersec researchers will have a field day reproducing it and showing the world.
Given the audience you are reaching, that is actually the expectation. Github stars is not a great metric.
might as well be because of this:
> [...] with capacity ramping in May and June 2026 at a reduced fee.
And then, we have
> The agreements may be terminated by either party upon 90 days’ notice.
The timing of the "leak" of profitability is just as superb for Anthropic as the timing of the $45b deal is for SpaceX.
I wonder how this partnership looks like on August 1st...
Enterprises will take a while to adjust. We're going along with Copilot price changes in the short term "just to see what it'd look like". They will get customers to continue with the all you can eat mindset at API pricing for a while, but the finance departments will start asking questions very soon.
Simple test: can they get their hands on a data center contracts and financials?
Search isn’t anywhere near as high profile as the profitability of AI inference, and yet, even aspects of the Search org were walled off from the rest of the company such that other employees couldn’t see what we see.
There was crazy clip of Eric from Google telling a crowd of university students that in the future AI will do everything, and after the whole audience boos him he keeps pushing the point that they better accept it and get on board. The mentality these guys have is sickening. They have no humility and no humanity.
You have to keep in mind that about 99% of their announcements are targeted towards investors (their most important revenue source..), so they're not going to be afraid to mention metrics that make the business look better.
Is this still happening? Opus 4.5 was six months ago, can you get its capabilities for 1/10 cost now? Are we on track to get the same for 4.6 in a couple months?
Training is also done over batches, which increase memory requirements by several orders of magnitude. This is why training needs costly compute.
One of the ways out of this unfortunate situation is to use something like Stochastic Average Gradient Descent [1]. Examples there are mostly concerned with regularized logistic regression, which makes problem more or less convex. Neural networks are inherently non-convex. Still, maybe some ideas from there can be utilized in the context of neural networks, like use of estimated Lipshitz constant to derive curvature and appropriate learning step.
[1] https://www.cs.ubc.ca/~schmidtm/Courses/540-W19/L12.pdfThat's doing a lot of work here.
The future I see isn't most companies buying hundreds of thousands in hardware to run models, it's them adding a line item to their AWS bill. Inference costs on the larger hosted open source models are dramatically lower than the frontier labs API pricing.
The goalpost we've been bludgeoned with over and over again is that, in particular, Everything Changed in November 2025. That GPT 5.2 and Claude 4.5 were the inflection point. That is actually 6 months ago. And DeepSeek 4 is already there.
> run locally
You can't run DeepSeek locally on consumer hardware[1], but you can on enterprise hardware, and enterprise spend is the subject of this conversation -- and even if you aren't self-hosting, it doesn't matter, because you can just get your inference from one of the the many companies serving DeepSeek, who trivially undercut the pricing of OpenAI/Anthropic because they didn't have to spend hundreds of billions on training frontier from scratch but instead only invest in supporting inference, which is already profitable.
[1] Since this misconception comes up all the time, I'll go ahead and pre-empt it: no, training a 32b parameter model on outputs from DeepSeek and running that locally is not "running DeepSeek", despite the hundreds of stupid articles and Youtube videos making that idiotic claim that they're running it on a 5090.
We're 3.5 years into this current AI wave, and a lot of the valuations have been predicated on what you're arguing here -- that essentially should one of the labs make an order-of-magnitude improvement or hit escape velocity on recursive self-improvement they'd become the most powerful economic chokepoint in history.
The reality has been that given access to compute + capital all of the labs can stay pretty competitive with each other. Someone does a bit better on coding, someone else does a bit better on tool calling, and then they swap after each spending another $100bn.
The market looks like a commodity market where the commodity is intelligence, not a winner-take-all market with massive margins. Plenty of people get rich in oil and airlines, but they notably don't tend to be the innovators long term, they tend to be the operators. Obviously if the machines become sentient tomorrow, turn on their masters, and hit world-dominating intelligence, that assessment changes, but after several years of that narrative while objective reality looks quite different I think the more sober voices are starting to gain a foothold.
My single spark has me running Qwen 3.6 27B and antirez’s specially quantised DeepSeek v4 Flash (which is shockingly impressive)
None of them had the Pirates game.
I was thinking how the transistor radio was a far superior experience for this use case. Just tune to the channel broadcasting the game.
I think the comment put forward that as an incorrect assumption that was made prior to the cable build-out.
That part of dev work, the requirements gathering, attention to details, clarifying requirements, is something AI also struggles with. A lot of companies basically waste time and money on outsourced devs because without a clear path forward they effectively will sit and do nothing, waiting for a prompt.
You still want someone whose ass is on the line if they get it wrong.
I'll also add this: within a large organization, you often need to interact with many different codebases owned by many different teams. Agents have made it much easier to wrangle by having the ability to deploy one to scope out your web of dependencies to learn about what would be needed for feature X, and how that integration can happen.
We've been doing far more away team work simply because it makes things move faster. It's easier to convince a team to sign off/review something than it is to get them to commit to the planning and eventual work.
It genuinely is helping things move faster inside large organizations. Or at least, it is for us, particularly since we're getting organizational prioritization to actually build the scaffolding to make those agents more effective at search.
The human race isn’t ready for that world IMHO. The only reason there is a middle class is because people have leverage in the form of their labor. When that becomes worthless … the people who own stuff and make their living from doing so won’t hesitate to get rid of everyone else - whom are now worthless to them.
> YCombinator was founded on that premise - small teams of founders and early employees could be orders of magnitudes more productive than the 1000s of corporate employees at their competitors.
I think this is still true, but the theory is:
1. You don't need YC-type funding to do YC-type business any more; 2. You don't need to scale the business past those small teams any more, you just buy more tokens.
For clarity YC still obviously has a place as an incubator, mentoring, and networking function. I just think that what was previously the inevitable conclusion that you have to hire all the people the second you hit PMF to keep up with scaling the business as you scale sales is no longer inevitable. If you didn't want to go that way before AI, you were a "lifestyle business" and not worth investing in. As more and more knowledge functions get capably implemented by AI, it's the preferred position: humans are vastly more expensive than tokens, so you want them doing the stuff the AI still can't do.
I don't think this necessarily translates to mass unemployment. I think it translates to masses of smaller businesses that are radically more efficient because the handoffs between business functions are tool calls, not emails to someone who doesn't want to help.
> The Internet was revolutionary because it let millions of people bring products to market without asking permission.
Think about it this way: if I am a small business owner but I think it makes sense to do something that previously only a team in a corporate environment could do but is now within the reach of AI, not only can I do it now, but I also don't have to ask anyone for permission! Who wins between the corporation and the small business in that scenario?
> AI doesn't really have that property - if anything, it makes things more centralized, with more gatekeepers, and so seems more likely to destroy economic value than add to it.
I think this will turn out to be backwards. I can see a version of this where the number of things you can do without needing to turn to a gatekeeper for help increases to the extent that the balance completely inverts.
The vast majority of businesses are small, and AI can give them tools which previously required corporate scale to make sense, without the inefficient hand-offs between busy, political humans. Which is also something that the internet did! Getting an advert in front of a national market pre-internet was Hard but sometimes you had to do it because your target market was "all Canadians who buy toothpaste" or whatever and that meant saturation-bombing the physical environment with physical billboard ads, posters, flyers, and so on. So you only did it if you were P&G-scale. Now you, personally, can do it, trivially, for better or worse.
They do not care unless these companies can get a bailout.
UBI only exists for companies that are too big to fail. Case in point, 2008 and SVB when there was too much money on the line.
One of the AI companies attempted to guarantee themselves a way for the government to bail them out if they were close to defaulting on the debt from the data center build out.
Coding is an interesting case as [1] the pace of progress has been absurd and [2] it's hard to put an upper bound on required capability. However hard to put a bound on and will are different, it's quite possible that the average engineer will cease to see the benefit of rapid progress - or that their employer will be satisfied with lower tier models.
How smart of a model do you need to build a high quality CRUD app for internal users? Or build a scalable web service?
https://openrouter.ai/moonshotai/kimi-k2.6
The march of cost efficiency moves on.
That's the future Amazon sees too. We just had a week long session with the AWS team and they pushed that to us multiple times.
Claude code was a lot of people's introduction to using coding agents that could do a lot more than copy-pasting from a chatbot or autocomplete.
Running software in the cloud gives you certain reliability and scaling advantages that would be very hard to replicate locally. Running some code agents in the cloud vs local hardware, if the local hardware gets "good enough," breaks the other way - offline usage, alone, would be hugely valuable to many people and companies.
It'd be very interesting to see where various players would decide to make a call "local is good enough" though. Buying the hardware isn't a small bet, if it's not something that ends up as part of your standard computer.
I'd qualify that by writing that you can't run it with ordinary, real-time speed and throughput. If all you care about is slow and high-latency inference, there's no reason why that shouldn't be feasible even on the cheapest miniPC around, as long as it can literally store the model weights and keep around the (rather small) context.
With a nice UI on top, for the desktop app too: [2]
[1]: https://developers.openai.com/codex/config-advanced#custom-m...
How I find your argument is that one distinguished engineer from US could do the same with the use of AI.
I worked with both and I know great and bad engineers from both sides. Only thing is that US has a bigger pool of great engineers.
1000x yes: you have touched on what I think is the single biggest factor here, that is the humongous value of POCs. they are gnarly to build without agents, and so we used to have to get everyone on board so we didn't get screwed in performance reviews, which was monumental task because that means convincing very busy PMs who have a lot on their plate and dont want to take risks on things they don't understand, and now it's like "can we scale this out" and you have a very nicely formatted proposal and POC. It de-risks things very quickly
27th May 2026
Anthropic are strongly rumored to be about to have their first profitable quarter. Stories are circulating of companies surprised at how expensive their LLM bills are becoming from usage by their staff. I think this is because OpenAI and Anthropic have both found product-market fit.
I currently subscribe to the $100/month Max plan from Anthropic and the $100/month Pro plan from OpenAI. If you are a heavy user of coding agents these plans are a fantastic deal. I just ran the ccusage tool on my laptop to get an estimate of how much I would have spent if I were to pay for API tokens in the past 30 days and got:
That’s $2,180.16 worth of tokens for $200—not bad at all! I’m a moderately heavy user of these tools, but I’m certainly not running agents every hour of the day and night.
I had assumed that companies making extensive use of agents were getting similar discounts. It turns out I could not have been more wrong about that.
I haven’t been able to track down the exact date, but at some point in the last six months Anthropic switched their Enterprise plan (originally “Claude seats include enough usage for a typical workday” back in August 2025) to $20/seat/month plus API pricing for usage. This story about the change from The Information is dated Apr 14, 2026, but cites an Anthropic spokesperson claiming that the pricing change occurred in November 2025. Existing customers are finding out about the change as they renew their contracts.
OpenAI made a similar pricing change in April. The Codex rate card (Internet Archive copy) currently says:
Note: On April 2, 2026, we updated Codex pricing to align with API token usage, instead of per-message pricing. This change was applicable to new and existing Plus, Pro, ChatGPT Business and new ChatGPT Enterprise plans.
On April 23, 2026, we made this update for all existing ChatGPT Enterprise plans as well, inclusive of Edu, Health, Gov, and ChatGPT for Teachers.
It’s a little harder to decode as they quote prices in “credits”, but as far as I can tell those credit costs are an exact match for the API token costs listed for those models.
All of which is to say that as of April 2026 the “Enterprise” cost for both OpenAI Codex and Anthropic Claude Code/Cowork is the same as the listed API price.
GPT-5.5 (released April 23rd) is 2x the API price of GPT-5.4. Opus 4.7 (April 16th) is around 1.4x the price of Opus 4.6 when you take their new tokenizer into account.
So April saw both leading model companies release new frontier models with a higher API price, and both companies now have measures to lock their enterprise customers (who tend to sign year-long deals) at those API prices, not the previous extreme discounts.
Why these sudden aggressive moves on pricing? Both Anthropic and OpenAI are planning to IPO, but I suspect there’s a more important factor here: I think they’ve finally found product-market fit, with the coding/general-purpose agent products embodied by Claude Code/Cowork and Codex.
Tools like ChatGPT are wildly popular, but that wild popularity has been difficult to turn into revenue. In February OpenAI boasted more than 900 million weekly active users for ChatGPT, but only 50 million—5.6% of that—were paying consumer subscribers.
Charging $10-$20/month per user is an OK business, but you’d need 1-2 billion subscribers sticking around for four years to cover $1 trillion in infrastructure.
Companies spending $200+/month/user will get you there a whole lot faster—and as noted above, as a power-user I’m at ~$1,000/month in API costs per vendor already.
Coding agents really did change everything. These are tools which burn vastly more tokens, but are also quickly becoming daily drivers for the work carried out by extremely well-compensated professionals. Right now that’s still mostly software engineers, but a coding agent is a tool that can automate anything you can do by typing commands into a computer... so they are clearly applicable to a much wider set of skilled knowledge workers.
As I’ve discussed on this site at length, the models released in November 2025 elevated agents to being genuinely useful. We’ve had six months to get used to that idea now—it’s no wonder companies are beginning to spend real money on this technology.
You could argue that ChatGPT achieved product-market fit when it became the fastest-growing consumer app in history back in February 2023... but it certainly wasn’t making any actual money back then. Coding agents plus enterprise pricing marks the point when these companies start making very real revenue. Maybe even enough to start covering their costs!
As further evidence that enterprise agents represent product-market fit for these companies, consider their open job listings.
OpenAI have 703 open jobs right now, of which I’d categorize 229 (32.6%) as relating to enterprise sales and support—account executives, “Go To Market”, “Forward Deployed Engineers” and the like.
Anthropic have 390 open jobs, 105 (26.9%) of which look enterprisey to me.
It’s pleasingly ironic that these AI labs have picked a business model with such a heavy demand on human labor—enterprise sales contracts don’t close themselves without a whole lot of humans in the mix!
(I ran this analysis by scraping their job sites with Claude Code, then having it use Datasette’s JSON API to pipe that data into Datasette Cloud where I used Datasette Agent for the analysis, exported here. Dogfood!)
I started digging into this in response to a growing volume of stories claiming that large companies were sounding the alarm because their AI usage costs had grown so large.
The most widely cited of these stories appear quite overblown to me.
The most discussed has been Uber, based on this report where CTO Praveen Neppalli Naga indicated that Uber had “maxed out its full year AI budget just a few months into 2026”, mostly thanks to Claude Code.
Given that Claude Code only got really good in November it’s entirely unsurprising to me that a budget set in 2025 may have failed to predict demand for that tool in 2026!
That Uber story was further fueled by comments made by Uber’s COO, Andrew Macdonald, on the Rapid Response podcast. I tracked down the segment and there really isn’t much there. Here’s what Andrew said:
But then you sometimes go and talk to your senior engineering leaders and you’re saying, OK, how many projects that were on the cutting room floor got moved above the line because of the productivity gains because 25% of our code commits were via Claude Code last quarter?
That link is not there yet, right? I think maybe implicitly there’s more that is getting shipped. But it’s very hard to draw a line between one of those stats and, OK, now we’re actually producing like 25% more useful consumer features, right? And that line is hard to draw.
Somehow this fragment turned into headlines like Uber’s COO says it’s getting harder to justify the money spent on AI tokenmaxxing, because the market for stories about AI failures remains enormous.
The other popular story around this is Microsoft starts canceling Claude Code licenses, ostensibly to encourage their engineers to dogfood their own Copilot CLI agent instead—but The Verge reporter Tom Warren says “sources tell me the decision is also a financial one”, triggered by the June 30th end of Microsoft’s financial year.
I think both of these stories support my “product-market fit” hypothesis. The best advice I ever heard on pricing a product was that your customer should suck air through their teeth and then say yes. Uber’s budget overrun and Microsoft’s seat cancellations look like that effect playing out in practice.
The big AI labs spend billions of dollars on both training and inference. Credible figures are hard to come by, but we did get one huge hint as to the figures involved from, oddly enough, the recent SpaceX S-1:
[...] in May 2026, we entered into Cloud Services Agreements with Anthropic PBC (“Anthropic”), an AI research and development public benefit corporation, with respect to access to compute capacity across COLOSSUS and COLOSSUS II. Pursuant to these agreements, the customer has agreed to pay us $1.25 billion per month through May 2029 [...]
The Anthropic announcement said that this deal meant they could “increase our usage limits for Claude Code and the Claude API”, heavily implying that Colossus is being used for inference, not model training.
Anthropic already have vast amounts of compute from other providers. The fact that they’re willing to spend $1.25 billion per month for extra capacity from just one of their vendors hints at how big these inference budgets have become.
Over the past two years my impression has been that OpenAI made more of their income from subscription revenue while Anthropic made more from their API.
Anthropic’s API revenue was historically quite dependent on a small number of large API customers—this VentureBeat story from August 2025 quotes “sources familiar with the matter” suggesting that just Cursor and GitHub Copilot were responsible for $1.2 billion of the company’s then-$4 billion revenue.
Today Anthropic are rumored to hit $10.9 billion in the second quarter, potentially even operating at a profit for the first time.
This pivot-to-Enterprise suggests that the labs have realized that the real money lies in cutting out the middlemen. Anthropic’s Claude Code directly competes with Cursor and Copilot. No wonder Cursor are investing in their own models!
I’ve called November 2025 the November inflection point because that was when GPT-5.1 and Opus 4.5, combined with their respective coding agent harnesses, got good—good enough that we’ve spent the last six months adapting to agent systems that can reliably get useful work done.
I think April 2026 is a new inflection point where the revenue implications of this have started to land, to the benefit of the frontier AI labs and with material impacts on the budgets of large companies.
We’ll know for sure how real this moment is when the S-1 documents for the upcoming Anthropic and OpenAI IPOs give us some real, audited numbers to get our teeth into.
Arguably, the main impact of securing SVB depositors above the $250k limit is that it prevented thousands of people from being laid off that week, as their employers wouldn't have had the money to make payroll the following Wednesday.
What's the U stand for in UBI?
So yeah, I wouldn't be shocked that in the 2023 - 2033 timespan total AI investment worldwide will be around $5tn, maybe even going towards $10tn.
All that money will have to be repaid, and it will have to be repaid 10x, otherwise heads will roll.
The enshittification we've seen so far is nothing compared to what's coming.
Nor is it a good reason to think there will be more.
But my guess is that the cost of SWEs themselves mean that the more expensive ones will be worth the delta to most companies.
But time will tell.
By comparison almost all tech companies I know have leaned heavily into AI.
My root comment simply represented my two cents about the current post. I don't think anything about the post is outrageously incorrect or anything, just somewhat confusing. You're a very prolific contributor in this community and I don't think me or anyone else that welcomes your takes expects everything you write to rock our collective socks every single time, anyway.
"Many people still think AI coding agents are slop on steroids despite all the current hype around AI actually shipping functional products."
Oh yes, tons and tons, especially on HN. But the plural of anecdote is not data. Enterprise spend speaks for itself. You are using AI-coded functional products all the time. Do you want like a diff history for the Google codebase or something?
(And that's after taking into account the METR paper that says engineers over-estimate their productivity with these tools.)
I have plenty of doubts about AI delivering on its promises outside of coding. I don't write about AGI because I think it's science-fiction hysteria. I write about slop precisely because it represents a mis-use of AI that demonstrates people completely misunderstanding what it's useful for.
That's fine. Other people may not want to pay 300x more and will rather make do with last year's SOTA.
> For coding you always want to go with the best model
Maybe you meant "For coding I always want to go with the best model"?
Could you explain this?
Good to see the classic "yeah the models weren't good enough six months ago, but this time they actually are, promise! Please forget you were hearing the exact same thing six months ago!" is alive and well though.
Both implicit and explicit..?
But to your point, re-reading the article, this is not what Simon is saying at all; he's just pointing out that he got to use ~$2000 "worth" of tokens on his $200 plan. Which makes total sense! Subscriptions are sticky, that's why the entire software industry moved towards subscription models (as much as we hate it); the person paying $200/month is more likely to stick around than the person who paid $2000 using the API.
Also, to just color in the picture here, as I haven't seen it mentioned elsewhere, there is a very large Saas company at the moment who has given everyone unlimited tokens on Claude. And they have a dashboard showing who spends the most. So the "budget" went from about USD500 per per person (split between Claude and cursor) in Jan to... Well a soft limit of USD100k... Per month... Per person.
People can still see the top line sticker price on their spend, but honestly I can't believe that the Saas is paying that full price when the invoice comes in.
That said, there are some finance reports which are probably dropping soon where we will find out!
Could be fantastic for small shops while it lasts. The big guys have to pay 10x for precious tokens.
> (other than through price-discovery: which is slow, messy and fuzzy)
I notice a distinct lack of reading or comprehension (from everyone around me now, not just this comment) which worries me. I worry if LLM's are to blame. No one reads anymore...
https://aws.amazon.com/ses/pricing/
https://www.statista.com/statistics/456500/daily-number-of-e...
You're right, Linus uses Emacs.
But yes, it's likely that the ease of which code can now be outout lets us produce lots of unnecessary code just because we can, and the author says as much in a below comment
I take some reassurance from knowing that they are indeed used by real people to solve real problems though.
It might be worth keeping it secret from the staff if they were running at a loss.
They had all the incentive in the world to say "I'm not going to talk about that."
See https://taalas.com/products/
Edit: updated link
Training is inference + backwards pass (~2x inference cost) + activations (vram overhead) + optimizer (vram overhead) + gradients (vram overhead).
The days of requiring a data center to run anything resembling opus 4.6 are already counted. (But the industry will fight hard to get people to keep paying the Claude tax.)
Maybe not DeepSeek v4 Pro, but I've run DeepSeek v4 Flash on my 128GB MacBook Pro using antirez's carefully quantized https://github.com/antirez/ds4 and it's impressive.
Opus 4.6 quality for local inference would be revolutionary.
I'd pay a premium for even just a model that's 20% better, no ASI required, and I think a lot of people would. I wouldn't call that marginal, if it means I'm getting frustrated on 20% fewer tasks.
A recurring pattern that I've seen in myself and others is to at first be very impressed by a new model's coding capabilities, and then desensitize quickly and start being frustrated by the shortcomings.
The larger point I'm making is I think models are rapidly becoming commoditized. There is probably a small market long term that's willing to pay 10x for 10% marginal gains, but the majority of the buyers in the market will be economic and we're likely to have a lot of folks willing to spend 1/10 the cost for 90% of the performance, and plenty of companies that haven't raised hundreds of billions-trillions who can provide that.
A lot of the frontier labs valuations has been based on an assumption that 1-2 companies would get break-away intelligence that basically made them economic chokepoints indefinitely into the future. The reality that's becoming increasingly clear is that model quality is a pretty linear function of (cash burned - ability to copy other's homework) and the economics are starting to look a lot more like airlines than online advertising.
And worse, these are the tasks that help the junior people eventually grow into the skilled knowledge workers required to operate models, so there's a pipeline problem too.
Not completely, but compared to the middle ages we 50x'd their output. Which is a great illustration what it means to make a job 50 times more productive. We went from 80-90% of the population being required to barely make enough food for everyone to survive, to 4% of the population producing such an abundance that consuming too much food has become a systemic health issue
A $50k - 100k rig could do it and an entire company would be able to use it a full speed.
Kimi is close for example regarding SWE bench for code. For reasoning there are open models that surpass opus by quite a margin already.
I bet this will ironically be couched in "safety" reasons or regulation to get anti-AI folks on board, even if it favors the large incumbents.
The point I'm making is that I think we're rapidly hitting levels where corporate buyers aren't willing to pay multiple-times-more for marginal gains, and I expect that to become more the case over time, not less. You, and a small % of other power users in the market might tolerate a $400/month pro-supreme-plan for access to Mythos or whatever, but I don't think that's going to scale up in quite the same ways we've seen so far.
Even a year ago paying multiples times more for a 50% gain was very sensible for a lot of workflows. But if we're getting to "good enough" for things like coding, justifying to your CTO/CFO why the org should go from spending $1m/year to $5m/year for a 10% higher hit-rate on one-shot prompts from the engineers is a much tougher sell.
What you call harnesses I call… bullshit?
The economics of airlines are such that they generally earn a return on capital less than cost of capital.
I think this is exactly where we are heading and OAI-Anthropic are the concordes.
(Yes, all the good developers from Oklahoma move out, but the same is true of Ireland)
Basically every big tech has large offices and employ a lot of people there.
The limitation is that Ireland is a relatively small country, and most Irish developers are already employed (which is why Ireland end up being one of the main destinations for tech workers being hired from abroad).
Nobody is doubting that there are some people who watch films 10–12 hours a day, every day of the week.
I'm already seeing tech execs/hiring managers getting very frustrated at the lack of new-senior-engineers to hire. The market will correct for this in time.
I'm pretty skeptical on the outcomes and the costs also (natural and social as well), but possibly we can have 50x or even more software in the end! The phrase will be truer than ever:
> Software is eating the world!
> Valve
Arguably a monopoly. They've got a product that sells itself with very low infra overheads for the income.
> Hedge funds
Very different model. I don't think the same intuitions apply.
https://substackcdn.com/image/fetch/$s_!_ZW2!,f_auto,q_auto:...
That seems... Pretty reasonable? Like Anthropic is at $45B annual revenue, let's say they enter next year with $100B annual revenue? Let's say they have 30% gross margins (no idea), so $70B goes to data center owners/operators. That's one company doing roughly 1/3rd of what's required to pay the investments off. And you have Ant+OAI+GDM+Internal AI at GOOGL/META/etc.+all the servers for open models.
I'm sure there's a world where you can paint a picture that requires $5-10T but that would require capex increasing significantly NEXT year. And the cloud companies won't do that unless revenue keeps growing.
Ultimately, we'll need UBI or large scale cuts in working hours or similar if AI progresses to the point of mass unemployment - the alternative would be massive social unrest. In the meantime I expect to keep doing better than average.
Ultimately software is everything these days and the economics make the demand insatiable. We've gone through many cycles of "X" but on computers/web/mobile. There's going to be a massive amount of "X" but with AI companies that will need engineers.
Or at least this is what I tell myself to sleep at night.
> I’ve called November 2025 the November inflection point because that was when GPT-5.1 and Opus 4.5, combined with their respective coding agent harnesses, got good—good enough that we’ve spent the last six months adapting to agent systems that can reliably get useful work done.
Claiming a grand inflection point based on your own personal usage is very anecdotal.
- performance scales with compute very very reliably. We have “scaling laws” (and have for years) and they are almost miraculously stable and show no sign of being invalidated at all even at the very largest scales. There are some theoretical bases for this though I’m not as familiar with the details
- these scaling laws are on an unintuitive quantity (validation loss on pretraining datasets), so we can look at downstream performance. Benchmarks are a minefield of junk but there are many decent ones and enough variety of techniques and data sources and scoring methods etc that in aggregate they are useful. The single number that I think is the best summary statistic across the crazy (O(100k)) number of benchmarks is the “epoch capability index” (just some branding over a reasonably standard statistical model that was really well thought out and a great idea). The trends in this are extremely stable. Eyeballing the trend over time on their graph we’re getting basically a GPT-4 to GPT-5 level capability improvement every ~18 months
- coding agents are not limited by the quality of the human training data they’re trained on, this is such a massive misconception: human data is only a bootstrap to a reinforcement learning phase. This combined with the fact that we have verifiable rewards means it’s just a matter of when not if for any given level of reliability.
- the massive compute investment implies that the compute that we’re building over the next 2-3 years will 10x the effective compute for training models. That combined with various R&D contributions (historically which have been very significant and there is no shortage of wins here), better data curation and flywheels, richer data (wait until conversation capability gets good) means we have several orders of magnitude of runway that we know of, today.
In short I don’t see any compelling evidence to suggest all of the trends we observe in many different ways will end any time soon.
I shared that assumption until yesterday, when I found out that it wasn't holding for LLM pricing from OpenAI and Anthropic. That's what inspired me to write this piece.
I think those token leaderboards are an obviously terrible idea and will go extinct very quickly now that people are paying attention to costs.
How long would it taken you to do it yourself? How much longer will the next task take you, compared to when you would’ve written the code yourself. How is the mental model compared to when you would’ve written it yourself?.
I’m not saying you’re wrong, again there are use cases. But the calculation is not plain and simple it goes deep into our perception, perceived productivity versus actual productivity.
I’ve 2 months maxxed out all 6k of Claude Code and bought Antigravity on top. My codebase became 140k lines. I introduced tons of bugs and spent another 2 months, deleting 80k of code. I wish I would’ve just chatted with AI and not let agents touch my codebase. I would’ve saved approx 300$ subscription prices a month and 2 months of my life.
People got a lot done before Opus 4.6. In 6 months, would you be dissatisfied by Opus-4.6-level open-weight models, just because Opus 4.8 will be out?
And last, but not least, you need only one hidden layer kept in RAM for inference, but you need all of them (61 for Deepseek models) kept in RAM for computing gradient for one sample.
The people who are claiming Opus level capability does not have sufficiently complex problems to see the difference.
And yeah, that may be the ~decade world, but we're in the mainframe era of the frontier models. It's going to be more economical for basically any consumer, and most businesses, to pay someone else to host models for quite a while.
I remember that even when GPT-4 was king, the Gorilla paper showed that Llama 7B could be fine-tuned to outperform GPT-4 on tool calling.
On domains that don’t involve agentic tool calling*, I haven’t found the frontier to have advanced that much.
Edit: I should broaden this to domains that naturally lend themselves to RLVR training. Models are drastically better at math now.
Batch size is frequently limited by compute bottlenecks well before memory.
Say I want to build a feature in a product.
- DS has to do a deep dive (need buy in) to opportunity size and derisk with data. That DS has to work with other DS (people may have left or moved teams) to figure out how to get the right data and figure out what the difference is between 10 different tables that have overlapping but inconsistent data. - Eng has to build up an actual simple demo (need buy in) - Design has to make it not hideous (need buy in) - Legal has to review what you're doing; POCs should involve real data where possible because otherwise no one will trust it, even if its just for user analysis on existing products
This plus about 6 internal system bugs for custom tools that are flaky and who's team has long been re-orged or laid off, 8 people who won't answer you, 2 PTO's for the stakeholders, 6 weekly meetings
no one did POCs, they just had ideas and tried to get PM's to put it on the roadmap so if it fell through at least it was bought into
https://www.agtechmarket.net/news/laserweeding (random web search, I don't vouch for this site, it just looks legit at a glance)
Next innovation could be to scale succession planting, which keeps the ground from being exposed in between crops and lets you transition from nitrogen fixers to users quicker, getting more food out per acre while reducing fertilizer usage. But you can't do that with current harvesters and human labor is too valuable to spend on this.
Also take broccoli harvesting, typically you get a few big heads, then it keeps producing smaller heads, but it's not economical to harvest the smaller heads with human labor. Robotic harvesting lets the same plant produce more food per acre and uses the energy needed for new plants instead to keep producing food.
Sure, but is that the case now? Is everyone made whole when a bank fails and they have more deposits than the insurance limits? Or only when it's the well-connected / too-big-to-fail?
Looks like the answer is no: https://www.wsj.com/finance/banking/a-small-banks-failure-le...
So I don't think it's unreasonable to describe SVB as a bailout. Not for the investors, but for the depositors. Has anything changed to reduce the moral hazard / make it less likely to recur?
Currently I have no way of telling if big changes in their rankings are caused by a single "whale" switching providers, or if it's a more meaningful trend.
Models have been getting better, but all that follows from that is that newer models tend to be better than older ones. It doesn't follow that they have (or even will in the future) gotten better than anything else, be that human developers, a given definition of good enough, etc.
> It is absolutely data driven to say “an inflection point has happened within the last 6 months”.
With all due respect to OP (who I think is responsible for popularizing that way of phrasing it), I don't think it is when you consider the actual definition of "inflection point". At best I think you can say that models crossed a lot of developers definition of good enough around then, which is a different thing. The problem I have with that is that as a (mostly) outsider looking in, it doesn't seem like they're right.
Volume discounts may be available for high-volume users. These are negotiated on a case-by-case basis.
* Standard tiers use the pricing shown in Model pricing
* Enterprise customers can contact sales for custom pricing
And there are discounts available through "Claude Platform on AWS":
Anthropic rates your token usage in USD at standard per-model, per-feature rates, applies any negotiated discount, converts the result to CCUs at $0.01 per CCU, and reports the CCU quantity to AWS Marketplace hourly. Your AWS bill shows a single CCU line item.
https://platform.claude.com/docs/en/about-claude/pricing
On the other hand, contrary to Anthropic's documentation, another source claims they've killed pre-existing API volume discounts for large enterprise customers as of April: https://itbrief.news/story/anthropic-shifts-enterprise-billi...
> I shared that assumption until yesterday, when I found out that it wasn't holding for LLM pricing from OpenAI and Anthropic.
This reads like GP saying "enterprise never pays sticker price" and you responding "I thought so too until I saw the sticker price".
Is there some info you have that you can't/didn't share? Your article doesn't offer anything beyond the above.
You're correct. When a type of cloud service grows large enough and has a few competitive suppliers, enterprise pricing tends to coalesce with the large buyers paying around the same price for the same thing. While that might be lower than the publicly cited rate card, the private price similarly large customers pay ends up being similar.
One reason is that the very largest, long-term enterprise customers are so valuable, they can command MFN clauses ensuring no one else is paying substantially less for the same thing, then the rest of the rate card for smaller customers flows down from that. There's a strong disincentive for vendors to cheat or allow big disparities between similar classes of customers because the number of people involved on both sides of these deals is large enough that word will get around eventually.
Large scale enterprise sales and purchasing in a given sector tends to be rather circular. Account execs move to other vendors and call on the same customers, while purchasing agents can move to other customer firms. Personal relationships, reputation and credibility matter. Lying to screw a large customer over just to make one commission or quarterly quota can be a very bad long-term career move. Sometimes purchasing agents or executives quietly compare notes off-the-record with their peers from other similarly-sized firms. After dinner drinks at industry association meetings and trade shows can be quite productive in terms of verbally exchanging 'market insight' with peers.
When there are significant pricing differences, its usually due to different volume commitments, SLA/QoS guarantees, payment terms and other material factors which justify the difference. Source: been there, done that inside a top ten valley tech company. Once was in a meeting where a newly minted EVP tried to get a long-time senior account exec to pressure a huge customer by being semi-dishonest. The account exec schooled the EVP on the fact that the EVP could only make him unemployed for about 8 hours but that huge customer not wanting to work with him could make him unemployed forever. :-)
It's taken me the best part of this year to readjust and find a pace and level of ambition that fits.
I hope there's a "good enough" point but I don't think we're there yet. Like for me hardware got good enough several years ago. But while opus 4.7 is really good compared to everything else, it's not so good that I would use it at a discount over whatever is available in a few months. The improvement in quality, speed, and daily frustration is worth it to me... Spoken as someone whose employer is footing the bill, so take that with a grain of salt.
I want to run my own local models, but I don't think that's feasible without lots of frustration until a few generations of frontier models are so good that they're almost indistinguishable for common tasks. Kind of like how MacBook pros have been for a while.
would you be dissatisfied by Opus-4.6-level open-weight
models, just because Opus 4.8 will be out?
Well, I see what you mean, but two big concepts...1A. Models get stale pretty quickly w.r.t. new developments that occur past their cutoff date. "But you can just keep them current by linking them to never documentation, etc!" Well, no, you sorta can't -- at least not in perpetuity. Those search results fill up your context window real quick. So that gets unsustainable real quick.
1B. Even when your context has plenty of free space, the results you get from "here's a link to the documentation for this new framework that released after your cutoff date" absolutely pales to the results you get from knowledge that is fully baked into the trained model as opposed to your context window. For one thing, that documentation link you pasted into your context might link to... a dozen code examples. Whereas if that was baked into the model itself, the model might have been trained on many thousands of examples in Github etc.
2. It's also a reality that most professional engineers have to keep up with their peers and competitors. We can maybe say it shouldn't be that way, but it is. So if $SOME_NEW_MODEL is significantly better than 4.6... and my peers and or competitors are using it, then yeah I might but really feeling the need to match them. And I'm not even necessarily talking about some kind of cutthroat dog-eat-dog stack-ranked workplace.
These limitations aren't relevant for all use cases or careers but they're hiiiiiiiighly relevant for professional software engineering.
But at the moment, I can't imagine why I wouldn't be spending the majority of my time with the best models. I'm spending a lot of time with them! Reducing the number of back-and-forths is extremely valuable to me.
I expect in two months I will still want to spend >80% of my time prompting the best models, and that's true if I were spending my own money on hobby projects, too.
The 1 they were missing is that AI requires both training and inference, and training is by far the expensive part. And that in principle you can stop training at any point and keep using the models as they are. (But that means that if other companies keep improving their models, you'll be left behind...)
In contrast, inference is fairly cheap and all the providers have great margins on it. Eventually either investment in training stops having commensurate impact on model quality, and people stop doing that and instead concentrate on making inference faster and even more efficient. Or if that doesn't happen, things will get very weird very quickly.
Said model will also run as a tool-calling coding model excellently (it's no Opus, but for a thing that once set up is just the cost of energy, it's incredible). It can type faster than you can, probably 10x faster, so with guidance it'll make you faster. And it's free.
It's here. If folks want ChatGPT without a subscription, they can have it today on their computer. The only money to be made is in the high end models doing "serious business" work spanning 1M+ token contexts and massive uncertainty. Everything else is already set to be eaten by today's local models.
Why would it not? The typical new phone today has 16gb of RAM. 20 years ago that was somewhere around 32mb. Factor 512. It's not hard to see that we'll get there rather soon, especially if there is an application that provides demand.
> You people fundamentally don't understand the memory requirements for running inference.
You seem to be overlooking how fast things change in this industry, especially if tons o money can be made as a consequence.
> Your cute local models seem good enough because you have no standards and anything an LLM produces seems like magic to you.
Please don't generalize. I'm an expressed AI skeptic and have to deal with the bad consequences of AI slop every day. But you can't deny that there are enough applicationn areas where people have use cases and those will be much easier if things don't need a few round trips to a data center that sucks all the electricity and water out of neighboring communities.
The short and only kind of wrong version is:
In the US, companies are not allowed to unfairly privilege some investors over others by giving them access to secret information that would let them judge the future prospects of the company. (Except in all the ways they can, but these usually involve some kinds of insider trading rules.) Private companies can handle giving out secrets to investors by literally writing and memo and mailing it to all their investors, if they want to give out some secrets to one of them.
Public companies cannot do that, even if they knew who all their investors were, but must instead consider every member of the public a potential investor, even if they don't already own the stock. Because of this, when public companies want to reveal material information about their future prospects, they must reveal it to everyone.
I remember hearing about a guy trying to squeeze out short sellers of his own company but ended up effectively taking his company private because he bought out like 95% of all the shares.
I wonder how that aligns to these small releases of stock for the public.
Using Cursor to hop between models, I've found Opus to be generally better at really tricky debugging than GPT 5.5 or earlier models, but not reliably better at execution because of these things. I'm not sure Composer 2.5 is quite there yet for the execution side, but it's getting pretty close to those other ones, such that I'm definitely still in a "debug and plan with slow, execute with faster ones" operating model for working on hard shit.
If you want frontier model you will pay more for inference to essentially fund the expensive training.
If you don’t need frontier model you will get dirt cheap inference, which eventually will approach the cost of electricity spent per token.
These are ideas that simplify the design, reduce future work and tie together the entire system. If in two months I can arrive at ideas of that quality with normal brainstorming with llms that will be extremely valuable
I don't think we can discount this, frankly. Newer electronics are energy efficient, but older devices are more energy-intensive, and unless configured well, a gaming PC can easily use a few dollars a month in electricity, so now you're approaching subscription territory. A subscription comes with no upfront cost, higher reliability, no wasted space in your home, mobile apps, etc. (and less privacy).
I wanted the ability to run whatever cameras on a VLAN and own the stack.
The iPhone 17 has like 8 gb, the Pixel 10 12.
The original iPhone was 128mb, and the iPhone 6 from 2016-2018 was around 1gb; that puts the iPhone at around 8x RAM per decade, and puts us at 128gb in our pockets at around 2036 or so.
(Incidentally, the big news in phone RAM is that a lot of new phones are dropping back to 4gb because of RAM shortages.)
The demand for senior+ engineers has remained steadier through this downturn from my anecdotal observations, with new grads being by far the most negatively affected, but even that seems to both be shifting from talking to people a handful of years younger than me + CS enrollment has already precipitously declined [2] as the narrative that programming is dead because of AI has spread rapidly.
All that leads me to think it's going to be a junk-show over the next decade for people trying to hire as the pipeline was destroyed.
1: https://www.citadelsecurities.com/news-and-insights/2026-glo... 2: https://www.washingtonpost.com/technology/2026/04/13/compute...
This is where this is going, the whole industrialism is totally self-serving, and for every problem its answer is digging deeper in the rabbit hole, and creating 2 more problems in addition to solving the initial problem only half-way.
I don't want to say what you are suggesting is not possibly useful, I just want to emphasize how stuff works out in reality, in addition to doing some nice stuff like what you called out (the halfway solution to the problems). All we get is more alienation and humans getting depressed and feeling a lack of purpose... but somehow we cannot afford to pay fair prices for the agricultural work, and pay fair prices for the food, and not overproduce and overpollute... and the same thing is happening in every aspect of the human condition, not only food production, which is the most basic and ancient activity we have been doing.
Pretty much and has been for awhile.
https://nyulawreview.org/wp-content/uploads/2025/05/100-NYU-...
In early 2023, within the span of two months, the United States experienced three out of the four largest commercial bank failures in U.S. history, as Signature Bank, Silicon Valley Bank, and First Republic Bank all toppled.1 Yet, despite these banks having roughly $300 billion in uninsured deposits at the time of their failures2 and despite the failures costing the Deposit Insurance Fund (DIF) of the Federal Deposit Insurance Corporation (FDIC) an estimated $38 billion, uninsured depositors took no losses in any of the failures.3 While these results were striking, they were far from unusual. Since 2008, uninsured depositors have experienced losses in only 6% of total U.S. bank failures.
...
Formally, the United States caps deposit insurance at $250,000 per account,6 but, in reality, the post-2008 financial system comes close to providing de facto total deposit insurance covering all amounts in all accounts.
But if there was a bank failure at a regionally smaller bank with a regular customer or startup depositing the same amount of money over the insurance limit, their money is gone.
Just like Intel got a "bailout" from investment as chosen by the US government, AI will eventually have a very similar story.
But this is not true, you’re saying we only have relative performance numbers and not absolute measures of capabilities and reliability but that’s simply not true. OSS benchmarks as well as the internal flywheels of these companies are good complementary measurements.
> At best I think you can say that models crossed a lot of developers definition of good enough around then, which is a different thing
That’s the inflection point. Implication is a massive jump in adoption. We’re not like pulling this out of a hat, there are a number of compelling datapoints. The onus is on people to bring actual evidence that contradicts all of the data and observations we have.
> With the pricing change, customers of Claude Enterprise, a two-year-old bundle of products meant for large companies that now includes Claude Code and its work assistant, Claude Cowork, will have to pay for the amount of computing capacity they consume while using the software on top of a monthly flat fee of $20 per user, an Anthropic spokesperson confirmed.
There was a Hacker News thread the other day where a bunch of people confirmed that their organizations had seen this too: https://news.ycombinator.com/item?id=48278610#48280906
Show us your work, then. If it's so easy to do, this should be a trivial request to accommodate, no?
They can't stop training as then the AI's knowledge will become out-of-date very quickly. Their knowledge stops the day you stop training.
Here's a prompt I just ran against Claude Opus 4.7:
> Use python3 to experiment with whether the SQLite3 authorizer mechanism can be used to detect an INSERT OR REPLACE based just on running an explain query without examining the SQL string itself
Opus nailed it: https://claude.ai/share/c4212606-3fee-4b7c-bc97-505e0348ccac
I tried the same thing against qwen/qwen3.5-35b-a3b running locally in lmstudio, with the Pi coding agent. At first it looked like it was going to do great! And then it fell apart over the course of several tool calls: https://gisthost.github.io/?8ae2f842df619fb7fd8f1ccd82fe41c7
I'm used to GPT-5.5 and Opus 4.7 handling that kind of prompt without any problems at all.
New language, infrastructure, general level of understanding of something I barely have an idea of.
Rubber duck debugging, if i dont know the correct solution
Checking my code for issues and bugs.
But not for:
- writing my code - agentic coding (help me)
The inference has reduced drastically. It’s basically just chatting. I don’t let it write anything, but sometimes I purposely use the browser window instead of them sitting in my codebase, because I know it gets things subtly wrong and migth focus on the wrong things.
The same way people used to say don’t copypaste code at least write it out I think it’s still true. It helps to buidl the mental model and to find the right abstractions.
Net margin versus gross margin.
Net shows profitability after extracting all expenses while gross only extracts the cost of the goods sold. Putting the model training costs into a one time fixed expense provides a much better gross margin.
This is known as COGS reclassification or classification shifting and is a common tactic to mislead investors.
This is why analysts look at Free Cash Flow Margin.
WorldCom and MicroStrategy did this before the Dotcom Bubble imploded.
I have done succession planting in my home garden, but it's definitely not worth the time investment for the food alone. But it's real neat to see your aphid problem disappear as the nasturtiums pop up without any pesticides needed. You can even feed the world with it, if most everyone wanted to be farmers... (as opposed to some Organic practices which is the same mass farming but the pesticides are "naturally-derived")
Compelling anecdotes are not even the main source of evidence. Look at the enormous body of work on measurement of these systems. I always point people to epoch capability index as a good summary statistic of capabilities or METRs time horizon data which has now been topped out. They had a recent updated to the dataset, after which the corrected plots pointed to an even faster acceleration than before.
If you want indisputable, data-driven information about the state of the LLM world I guess you can wait for a peer-reviewed academic paper?
If AI could outperform humans, Anthropic would NEVER release that model. Instead, they'd use it to create a new google, photoshop, office, windows, etc for cheap then undercut all those companies and taking over the entire software industry.
No, I'm saying that the claim you were making ("current models are better than some non-model based standard X") does not follow from your premise ("current models are better than past models"). It's possible that your claim is still true (although I don't think it is for most of the values of X that matter), but that wouldn't change the fact that the argument made is invalid.
As stated, your argument was basically the classic "my 3-month-old is now twice the size he was when he was born" meme, except if the tweet claimed that the kid currently out weighed an elephant.
> That’s the inflection point.
No, it isn't. An inflection point is when the direction of curvature changes. If we crossed over into the diminishing returns part of the logistic function, that would be an inflection point (as would the case where we had been in the diminishing returns regime, but then progress went back to speeding up).
> Implication is a massive jump in adoption.
The point I made was that "a massive jump in adoption" doesn't actually imply "the models are actually good enough now", only that a lot more people think they are.
To answer the sibling comment, all of these public accounts follow local GAAP or IFRS.
The US still astounds me with its willingness to allow corporations to rip people off!
Kimi 2.6 is a 1 trillion total / 32B active parameter model that's something comparable to Sonnet. Sonnet's API pricing is $5 in, $15 out per million tokens. Deepinfra serves Kimi at $0.75 in, $3.50 out, and about the same at openrouter. So you're looking at a 4-7x multiple that Anthropic is charging compared to market rates that any plebe can get with a credit card.
I'm not an expert in SQLLite so I can't say if this is 100% correct, but it seemed directionally similar to the conclusion from claude.
### TL;DR
- Authorizer + EXPLAIN: No — authorizer only sees SQLITE_INSERT, not VDBE opcodes
- EXPLAIN opcode analysis alone: Yes — Delete opcode at position 10 is the unique signature of INSERT OR REPLACE / REPLACE
I can't help but think the not-so-distant future will see language models expected on commodity personal computing devices.No it's not. On some rigged paper maybe. Some such benchmarks say all models group together, which they clearly do not.
> Sonnet's API pricing is $5 in, $15 out per million tokens. Deepinfra serves Kimi at $0.75 in, $3.50 out, and about the same at openrouter. So you're looking at a 4-7x multiple that Anthropic is charging compared to market rates that any plebe can get with a credit card.
That's not saying much. You can get "cloud" at AWS and you can get a VPS. There is likely a 10x difference. It's not "same". Whilst AWS costs more they also don't have 7x margins similarly.
4-7x isn't a tiny markup, but how does that compare to high-margin internet businesses like AdSense? Meta and Google do hundreds of billions in ad revenue a year, and after taking out the publisher's portion (60-80% per some searching), I wonder what the ratio of the remaining tens-of-billions is against the compute cost and headcount required to run it.
And how much room for maintaining or improving that margin do they have if the cheap competitors also continue getting better? Is there a "good enough" point where the easier inference tasks are all moving to vendors massively undercutting them, and then they don't have the volume necessary to justify spending on further cutting-edge development?
Not saying they're equivalent, local models still decohere much quicker as the context grows in my experience. But... Interesting.
That's exactly what I'd expect people who are driven by hype and FOMO and YOLO and anecdotal evidence to do.
> resulting in systemic collapse.
Many people are noting the system is collapsing. Maybe it's not going as quickly as you expect, but there's definitely evidence of this from increased service outage frequency, billion dollar notes being passed in a circle between companies, open projects refusing AI contributions entirely because they're overwhelmed by crap, Sam Altman begging governments to force citizens to buy their product through "universal basic compute", etc.
> Look at the enormous body of work on measurement of these systems.
It's certainly possible to measure anything. Benchmarks are a form of evidence but they famously a) don't represent reality and b) can be easily gamed.
I tried it a second time, and it spent a lot of time trying to figure out some authorization issue, so definitely not a slam dunk. I might run it a few more times for science. But while this is a new model it's also quite lightweight, and as hardware adapts and improves it seems inevitable that for many use-cases a packaged language model running locally will do the trick.
As a consumer you are often sending deposits or even the full cost of goods to companies some time before you receive those goods (in effect you become a creditor). You are also dependent upon some of those companies for service and repairs. It seems reasonable that you can check the finances of a company you are creating a business relationship with, I know in the past I've checked company statements.
You are unlikely to have significant enough sway to force that kind of disclosure. Small businesses as consumers have less legal protection and are similarly unlikely to be able to make disclosure a precondition of a deal.