Their "API pricing" is exactly the same as that of providers: https://docs.github.com/en/copilot/reference/copilot-billing...
Seems like folks would be better off with OpenRouter instead.
If there's no discount on credits (in terms of tokens per dollar) over other providers, I'm going to switch to a PAYG provider. If there's a month where there's little to no coding I can pocket the 10$. What incentive do they give to stay with this plan?
Just got an email from GitHub saying they'll be raising prices for Co Pilot.
"To keep up with the way you use Copilot, we're transitioning to usage-based billing, and we want to give you enough time to prepare."
Man, it was fun. Having my tokens subsidized by Microsoft. If the prices go up to much I guess I'll try Deepseek again.
Isn't this like saying "The Porsche you rented at $200/mo is now a Honda. But the price hasn't changed!"
Seems a massive loss for Microsoft. Presumably there's a further rugpull to come.
The interesting question is how long it takes enterprises to notice the capability/pricing tradeoff, and whether they respond by limiting access to the strongest models internally.
The part that worries me is that this market is still very early. Most developers and organizations are still learning how to use these tools effectively. Raising the experimentation cost this much may slow down the discovery process that makes the tools valuable in the first place.
On top of that, you’ve got 2000minutes of container runtime, so running cloud agents was included. As was anthropic agent sdk mode via copilot which is very comparable with claude code - not identical, the anthropic “modular prompt” is much leaner in the sdk version.
I cant say im mad, i got above what i paid in value. That said, going forward ill probably go back to openrouter payg rather than a subscription.
I got a free 3months of the gemini £19 plan and ive been playing quite a bit, 3.1 pro is a good model, i just find it slow. Flash i think i under appreciated until now.
> Plan prices aren’t changing
did not continue with an em-dash followed by something profound that is changing.
Plan prices aren't changing -- the value you get out of it is.
I haven't been able to use my subscription much over the busy spring months, but i'm being charged every month.
I'd be tempted to keep the subscription if usage-based billing meant that i'd save money when i had less time.
But today, after hearing this, i cancelled my subscription.
"It" being the end of subsidization of tokens and plans (expected) but while lock-in to foundational models and cloud services is still lacking. Guess investors want their ROI sooner than later, given how big of a wrench the AI boom has thrown into global economics.
The background agents will also depreciate in value because of their harness that's a black box that's not optimized for token usage at all. Rolling one's own will be a better choice here.
How would that be? They are already charging as much as the underlying providers. They can hardly expect to have any customers if they are charging more.
I thought I was pretty familiar with available options, but no one in my circles ever mentions this product. It doesn’t seem to have much mindshare.
Has anyone used it? What’s your experience?
> In March 2026, Windsurf replaced the credit-based system with a quota-based usage system. Instead of buying and spending credits, your plan now includes a daily and weekly usage allowance that refreshes automatically.
With hindsight, per-request pricing makes no sense at all if an agent can burn a widely varying amount of tokens satisfying that request. These pricing plans were designed before coding agents changed the dynamics of token usage.
Usage paying for AI is 1000x crazier because you're not even getting a guarantee in the thing you pay for in the end. You have to keep feeding it prompts and hope it gives you the solution you want. You may end up with no expected result yet you are paying for it. At least with texting, you got what you paid for.
I wonder how long it'll be before all AI costs are flat unlimited monthly fees or even free across the board, without compromise.
With this kind of pricing (sonnet 4.6 has 9x multiplier, previously 1x) it begs the question why use Copilot to begin with.
You could easily just buy the tokens directly and have a lot more choice as well.
With this pricing change, I see no reason at all to stick with Copilot in principle, but I really need to solve this issue of IDE integration to move on.
> What is the benefit of using the Copilot Pro+ at 39$/month instead of using the Copilot Pro at 10$/month and paying for extra usage?
(I'm a copilot subscriber since 2022)
But what really surprised me most about Copilot is that it would bill you per question, nothing about tokens. So if I managed to produce a prompt that gave me back an insane amount of tokens for something, which using any Claude model would easily accomplish, you were giving me my money's worth, at your own expense. The math is not gonna math out forever.
Opus 4.6 3x -> 27x
Opus 4.7 3x -> 27x
GPT 5.4 1x -> 6x
EDIT: only applies to annual plansthis is the project that I am working on https://github.com/mohsen1/tsz
I would say its a x1000 increase in price for agentic workflows.
The old plans were $0.033/request for Pro, $0.026/request for Pro+ and $0.04/request for pay-as-you-go. That discount is now gone. They even still advertise "5x the number of requests" for Pro+ over Pro.
It's a lot of stuff that makes me have to type less into the prompt, since it's already getting so much info from my editor
What we're seeing across the board is every software company tossing AI onto their name or sales pitch and no one understanding what that actually means. But we will spend money on it because of FOMO.
I really question if we're reaching the end of the hype cycle to the point. I wish I were brave enough to put money on it. It feels like there was a command from up top to 'do something with AI' and leadership is scambling for some resume-building projects vs doing the hard work they should've done the past two years at a people and process level.
I once asked it to do a comprehensive security review of our code. It churned for nearly an hour (and then produced 90% false positives). Insane that that usage was charged the same amount as me just saying "Hello".
It also helped build an intuition of what wach model could do and which parts it was weaker at because you could try them almost side by side, especially if one model's output wasn't great.
That said, these were all side projects so nothing truly consequential. Otoh, you might leave some extra perf on the table but I found the models worked quite with the Copilot harness.
Other than that Zed has a similar experience which is pretty decent.
* By which I mean the good one, whatever it's called now - the part of Copilot that used to be a plugin and is now part of VS Code, not the thing that has always been part of VS Code.
On my personal account, Copilot Pro+ still only gave me back Opus 4.7, whereas my work's Pro account still lets me use Opus 4.6.
So, my gut says, it's entirely possible that Pro+ will continue to have more segregation on model availability...
FTA
> Last week, we also rolled out temporary changes to Copilot Individual plans, including Free, Pro, Pro+, and Student, and paused self-serve Copilot Business plan purchases. These were reliability and performance measures as we prepare for the broader transition to usage-based billing. We will loosen usage limits once usage-based billing is in effect.
There's enough weasel wording here that I would expect only certain models get re-enabled on Pro.
e.x. lots of people seem to get good enough results from Opus 4.6, personally I prefer it over 4.7 in GH Copilot... locking that down to Pro+ would be, given this salvo of enshittification, a 'logical' move on their part.
> Users on annual Pro or Pro+ plans will remain on their existing plan with premium request-based pricing until their plan expires, however, model multipliers will increase on June 1 (see table).
Before:
- Opus 4.6 each premium request is 3 premium requests
After:
- Opus 4.6 each dollar spent is 27 dollars in copilot AI Credits.
Given that you'll receive 19 dollars of AI Credits in Business plan, that means you can probably say 1 "hi" to opus per month.
I wouldn't mind a plan between Free and Pro that is just "all I care about is code completion and next edit suggestions".
1. Github could choose to grandfather in those plans and make no changes until those plans expire.
2. Github could offer, or the user could request, a pro-rated refund along with cancellation of the account.
3. Tough luck, those users agreed that Github could unilaterally change the ToS at any time.
But companies do lots of illegal things, and in general nobody takes them to court over it.
Turns out when a request can spawn tens of subagents and use millions of tokens over many turns of toolcalls then suddenly github copilot has a massive financial problem on their hands.
If anything, these new multipliers are more transparent than anything OpenAI or Anthropic have communicated regarding actual costs and give us a more realistic understanding of what it's costing these providers.
The fact that we were able to get such a substantial amount of usage for $20/$100/$200 a month was never meant to last and to think otherwise was perhaps a bit naive.
This feels like a strategy from the ZIRP era of tech growth where companies burned investor capital and gave away their products and services for free (or subsidized them heavily) in order to prioritize user acquisition initially. Then once they'd gained enough traction and stickiness they'd then implement a monetization strategy to capitalize on said user base.
I've been wanting to get off MS more generally and this is good motivation. Will be playing round with OR this week.
I see statements like this as strong indicators that the sales people are wrapping up their work and the accountants are taking over. The land rush is switching to an operational efficiency play.
The only model I even used on Copilot was Sonnet and now its got a ridiculous multiplier.
At this point they might as well just charge per Million tokens like every other provider instead of having a subscription.
Also, the multiplier of 27 for Claude Opus 4.6/4. is way higher than the increase in API price would suggest.
I wonder why that is.
Provide cheap and unlimited access to Grok for programmers (hence the Cursor partnership/purchase for distribution).
-> This would drag massive revenue right before the IPO announcement, like if the company is super growing
-> At a loss, but don't worry, we need these funds to build the biggest datacenter of the universe.
This announcement would create enough momentum to increase valuation, and because of the merge of his companies, would save his X/Twitter investors from a tragedy.
-> Would also be a great service to Cursor investors and so, who are stuck with their VSCode fork
Or if you're a business with multiple seats, these plans may be more inefficient than raw API usage billing. Since if anyone at your organization fails to utilize their full $19/39 allotment each month, that's wasting money, whereas with API credits it is 100% utilized.
I don't think they've thought through the implications of this. Everyone should cancel and go usage-based billing with caps.
Also, Opus 4.7 seems like a model more intended to save Anthropic money than push the bar.
Not really sure why I would stick with Copilot after this, and increasing Sonnet from 1x to 9x for annual subscribers is highway fucking robbery. Very glad I didn't commit myself to an annual plan.
Will always be grateful for the greed of trillion dollar corporations that subsidized me.
I was using 100M+ tokens per day, $250 per day or so and only paying $160 per month to GitHub.
I cancelled my GHCP sub and switched to Codex last week, so far so good but I miss Gemini 3.1 Pro for UI work.
What's actually better in the CLI?
It's not turning consumption based because there are a ton of these licenses just sitting idle.
Due to data governance it will be difficult to move to a different provider.
At the same time, this price hike is so large that the ROI on copilot will be a net negative.
I think what will ultimately happen is that we will not pay Microsoft more than we currently do and we'll simply end up with less AI usage in the company and a reduction in productivity.
But they can't buy curser before their IPO so thats that?
Perhaps they have to much compute because Musk overpromised and Twittergroq doesn't need that much compute after he nerved the porn stuff?
One provider who was undercutting the market with non-standard billing model moving to a more standard billing and prices doesn't seem like that strong of a signal, other than that Copilot was underpriced.
I don't disagree with your other points though.
I don’t understand if this means they’re providing actual refunds or not. For them to straight up go back on their word this had to have been a major cost they didn’t exactly expect.
Save us Deepseek!
I don’t need the world’s greatest programmer for the types of vibe coding projects I actually build.
However, if compute keeps going up in cost, hiring skilled people who know how to utilize it becomes more important. This might save the tech economy.
Tbh I think it still works, but only because the new allowance will likely get used very quickly within a billing cycle - I'm expecting this change to increase our orgs bill significantly based on how many API credits with open router I consume in a weekend using a single agent in a pairing style.
The pooling will only be useful if you have a bunch of infrequent/low usage users that you still want to have licenses.
I’ve “vibe-coded” some projects and when I start to find issues or go to refactor them I don’t have that memory of why decisions were made, because many decisions were never made.
Absolutely the cheapest way to get a lot of tokens through a solid harness for $10/month. Until now
I don't mind a PAYG model for a simple chat interface. But when it comes to actually producing things, you burn through TONS of tokens creating the wrong output.
Additionally, we got copilot for every user, including those that never write code or use AI tools.
Gosh, imagine getting to do that with your TV/Streaming subscription. Getting to pay one fee to access some set number of hours per month from any of the providers.
GitHub has the full power of Azure with their hosted models but it's not being passed to consumers.
Also heard of more and more people moving to Kilo Code or OpenChamber instead.
If I could run a local model comparable to even Sonnet 4.6 without shelling out $50K in hardware, I'd do it in a heartbeat. But all I have is a 32 GB of RAM and an old RTX 4080.
Or am I not up to speed? Are there decent coding models that can run on dev laptops? Not that that's what you were suggesting by recommending a local model, necessarily; just curious.
If you are not on an annual plan, multipliers will be gone completely. You can see the rates that apply instead here: https://docs.github.com/en/copilot/reference/copilot-billing...
They explicitly stated that they won't be doing that: the multipliers go into effect in June for everyone, annual plan or not.
For example, the German Civil Code states:
Section 308 - Prohibited clauses with the possibility of valuation
In standard business terms, the following in particular are ineffective:
[...]
4. (Reservation of the right to modify) the agreement of a right of the user [TL note: this means beneficiary of the terms, eg. party or other subject of the contract] to modify the performance promised or deviate from it, unless the agreement of the modification or deviation reasonably can be expected of the other party to the contract when the interests of the user are taken into account;I think VSCode only supports copilot for "autocomplete" too
on top of that, you need GitHub Copilot for the PR reviewer functionality in GitHub
There's going to be a limit to how much they can raise prices, because someone can always build out a datacenter and fill it up with open source DeepSeek inference and undercut your prices by 10x while still making a very good ROI--and that's a business model right there. Right now I'm sure there's a lot of people who will protest that they couldn't do their jobs with lesser models, but as time goes on that will get less and less. Already right now the consumers who are using AI for writing presentations, cooking recipe generation and ELI5 answers for common things, aren't going to be missing much from a lesser model. That'll actually only start to get cheaper over time.
Also for business needs, as AI inference costs escalate there comes a point where businesses rediscover human intelligence again, and start hiring/training people to do more work to use lesser models--if that is more productive in the end than shelling out large amounts of cash for inference on the latest models. [Although given how much companies waste on AWS, there's a lot of tolerance for overspending in corporations...]
It has been years now, of cash injections, investors can't keep feeding the beast forever.
If/when it gets to the point where it can replace a skilled worker, the service can be sold for close to the same price as that skilled labour. But the AI can run 24/7, reliably, and scale up/down at a moments notice.
There's not going to be much competition to drive prices down, the barriers to entry are already huge. There'll likely to be one clear winner, becoming a near-monopoly, or maybe we'll get a duopoly at best.
Pretty sure that's what they will eventually do
Does it effectively bypass regional restrictions for you, so you can use something like the Claude API from unsupported regions such as Hong Kong, or does it still enforce the official providers' geo-restrictions?
I'm guessing they did that (and the 'temporary bonus credits') to make the pill easier to swallow for that side of customers.
How so? By all accounts I've read so far it uses more tokens overall for roughly the same results.
Inconsistent design patterns from page to page, half baked features, inconsistent documentation (but BOY is there ever a lot of it!), NIH ui component libraries that don't act like you'd expect. All that fun stuff.
It's like they speedran the worst parts of enterprise apps.
I use Claude Code, but I kept my Copilot subscription around mostly for really cheap usage of other models when I need to try a different one (which appears to be ending, in a sense) and also the autocomplete in Visual Studio Code which was really great across a bunch of files, I could make changes in one file and then just tab through some others.
I wonder what other good autocomplete is out there.
That's already the case if you can self-host an LLM; you don't even need a mythical H200: gamer-grade GeForce cards can get you a long way there (if this page is to be believed: https://www.runpod.io/gpu-compare/rtx-5090-vs-h200 )
...after RAM prices return to normalcy, of course - and then wait another 2 or 3 generations of GPU development for a 96GB HBM card to hit the streets - and also assuming SotA or cloud-only LLMs don't experience lifestyle-inflation, but I assume they must, because OpenAI/Anthropic/Etc's business-model depends on people paying them to access them, so it's in their interests to make it as difficult as possible to run them locally.
Give it 5 years from now and reassess.
And at some point even frontier model costs will hopefully come down (if there is still a meaningful difference between closed and open source models at that point) as all of the compute that's being built out right now comes online.
Yes, a lot of people (not me). Why? Well because that was the whole value proposition of these companies, relentlessly pushed by their PR and most of the media- rememmber it was something something Pocket PhDs, massive unemployment etc?
Based on what exactly? So far every time OpenAI, Anthropic or whatever has released a new top performing model, competitors have caught up quickly. Open source models have greatly improved as well.
I expect AI to be just like cloud computing in general - AWS, Azure, GCP being the main providers, with dozens of smaller competitors offering similar services as well.
Sometimes the multiplier increase is significant like for Claude Opus 4.6 from 3x to 27x (https://docs.github.com/en/copilot/reference/copilot-billing...), meaning using that model will use up a lot more „tokens“ (whatever the new word for it is)
TL;DR: Today, we are announcing that all GitHub Copilot plans will transition to usage-based billing on June 1, 2026.
Instead of counting premium requests, every Copilot plan will include a monthly allotment of GitHub AI Credits, with the option for paid plans to purchase additional usage. Usage will be calculated based on token consumption, including input, output, and cached tokens, using the listed API rates for each model.
This change aligns Copilot pricing with actual usage and is an important step toward a sustainable, reliable Copilot business and experience for all users.
To help customers prepare, we are also launching a preview bill experience in early May, giving users and admins visibility into projected costs before the June 1 transition. This will be available to users via their Billing Overview page when they log in to github.com.
Copilot is not the same product it was a year ago.
It has evolved from an in-editor assistant into an agentic platform capable of running long, multi-step coding sessions, using the latest models, and iterating across entire repositories. Agentic usage is becoming the default, and it brings significantly higher compute and inference demands.
Today, a quick chat question and a multi-hour autonomous coding session can cost the user the same amount. GitHub has absorbed much of the escalating inference cost behind that usage, but the current premium request model is no longer sustainable.
Usage-based billing fixes that. It better aligns pricing with actual usage, helps us maintain long-term service reliability, and reduces the need to gate heavy users.
Starting June 1, premium request units (PRUs) will be replaced by GitHub AI Credits.
Credits will be consumed based on token usage, including input, output, and cached tokens, according to the published API rates for each model.
A few important details:
Last week, we also rolled out temporary changes to Copilot Individual plans, including Free, Pro, Pro+, and Student, and paused self-serve Copilot Business plan purchases. These were reliability and performance measures as we prepare for the broader transition to usage-based billing. We will loosen usage limits once usage-based billing is in effect.
Copilot Pro and Pro+ monthly subscriptions will include monthly AI Credits aligned to their current subscription prices:
Users on a monthly Pro or Pro+ plan will automatically migrate to usage-based billing on June 1, 2026.
Users on annual Pro or Pro+ plans will remain on their existing plan with premium request-based pricing until their plan expires. Model multipliers will increase on June 1 (see table) for annual plan subscribers only. At expiration, they will transition to Copilot Free with the option to upgrade to a paid monthly plan. Alternatively, they may convert to a monthly paid plan before their annual plan expires, and we will provide prorated credits for the remaining value of their annual plan.
Copilot Business and Copilot Enterprise monthly seat pricing remains unchanged:
To support the transition, existing Copilot Business and Copilot Enterprise customers will automatically receive promotional included usage for June, July, and August:
We are also introducing pooled included usage across a business, which helps eliminate stranded capacity. Instead of each user’s unused included usage being isolated, credits can be pooled across the organization.
Admins will also have new budget controls. They will be able to set budgets at the enterprise, cost center, and user levels. When the included pool is exhausted, organizations can choose whether to allow additional usage at published rates or cap spend.
Plan prices aren’t changing. You’ll have full control over what you spend, tools to track your usage, and the option to purchase more AI Credits if and when you need them.
If you have questions, visit our documentation for individuals and for businesses and enterprises, and our FAQ and related discussion.
Mario Rodriguez leads the GitHub Product team as Chief Product Officer. His core identity is being a learner and his passion is creating developer tools—so much so that he has spent the last 20 years living that mission in leadership roles across Microsoft and GitHub. Mario most recently oversaw GitHub’s AI strategy and the GitHub Copilot product line, launching and growing Copilot across thousands of organizations and millions of users. Mario spends time outside of GitHub with his wife and two daughters. He also co-chairs and founded a charter school in an effort to progress education in rural regions of the United States.
Everything you need to master GitHub, all in one place.
Build what’s next on GitHub, the place for anyone from anywhere to build anything.
Meet the companies and engineering teams that build with GitHub.
Catch up on the GitHub podcast, a show dedicated to the topics, trends, stories and culture in and around the open source developer community on GitHub.
This is the VSCode autocomplete stuff right? Really enjoy this.
> Copilot code review will also consume GitHub Actions minutes, in addition to GitHub AI Credits. These minutes are billed at the same per-minute rates as other GitHub Actions workflows.
That sucks.
but now, you get literally nothing
In not-too-distant future we're going to be running better models on our phones than we can buy access to today in the cloud. Skate where the puck is going: soak the customers until that day comes.
* with a quota of 138 meters per hour, overage charges may apply
And yes, I need to find a solution for autocomplete. It used to be available in free tier of Copilot. Not sure anymore.
Personally I got CLI fatigue and am happy with Conductor for now, but things are moving fast in this space.
I am in the same boat. I tried looking for tab/auto-complete implementations ~ a year ago and it was pretty disappointing. If that has changed, would love to know!
Can you imagine ten months from now and you're still rolling Sonnet 4.6?
Cancel/refund is looking pretty good. They're doing refunds until May 20.
"To request a refund, go to Settings → Billing and licensing → Licensing, select Manage subscription, then choose Cancel and refund "subscription". (The phrasing varies slightly depending on your subscription ). This option will be available until May 20."
Not sure how it all works out. Currently trillion dollar companies can't make a native app for platforms. Everything is just JS/Electron because economics does not work for them.
And here companies can make GW data center running very expensive GPUs for 1/10th of current prices. Sound little fanciful to me.
Even if SOTA models in the cloud are a few percentage points better, most work can be routed to local models most of the time. That leaves the cloud providers fighting over the most computationally intensive tasks. In the long term, I think models are going to be local-first.
(Unless providers can figure out a network effect that local models can't replicate).
You can pay with crypto though, which seems to be convenient for people under sanctions or with limited access, or if you are in low-tax jurisdiction (e.g. HK)
It still does make one wonder, why have seats at all though? If everyone is just in one big API credit pool - what do the seats/users accomplish?
We're putting other providers through the gauntlet. An M4 Studio or two running the latest Qwen3 or whatever counts for state of the art in open models is also looking a little more viable all the time.
Were you able to see assisted AI coding savings proportional to costs increase now you are going to get?
Companies removed people as AI assisted coding will be cheaper and now coding cost are going up from fixed $X to non-deterministic. The posts by Uber few days back about spending 12 months' worth of money in 4 months tells a lot.
Only path forward seems using Open-source models and many companies don't use Chinese that makes only Mistral one as the option.
Inference economics are going to be brutal in 2026 H2 when DeepSeek's new infra and model improvements come online, and Kimi launches K3. By brutal, I mean for OpenAI and Anthropic.
I do like the integrations with the IDE however, they are convenient for rapidly reviewing changes. I just need their terminals to actually work!
One of the largest employers publicly engaging in a project which has the outcome of depressing wages. It's easier to "get" if you don't take the trillion dollar gorilla at face value.
I had copilot mainly so I could write issues and throw agents at it, while I went off and did other things. Has been great for contained spot work.
At this point, I'll go ahead and leave it expire, and then consolidate between Codex and JetBrains AI. Especially since Xcode supports Codex with a first-party integration.
Which feels a bit like a kick in the pants for me as a developer that was primarily using Copilot for VS Code ghost text and very rarely used the Chat sidebar much less "agentic" tools.
Copilot Pro sort of made sense for my personal account when amortized across a year, but I don't want to "waste" $10/month on credits I won't use most months.
That said I think few people using openrouter are actually being selective about providers.
It took half a day to get my opencode setup, was not friendly. A lot of manually cross referencing model and providers. I was actually mainly optimizing for relatively fast providers. It all is super fragile and I'm sure half out of date; I have no idea if these picks are still fast, no promises they are still the same price (pretty terrifying honestly).
I'm mostly on coding plans so it doesn't super affect me. But man is it a bother to maintain.
It's a convenience cost, for sure, but it's not valueless in a fast-moving world. Certainly if you're comfortable with one provider and it's cheaper, do that.
But its a really good UI for agentic coding. Not sure why more people don't use it. I've tried the others and keep coming back to Copilot chat. It's a really good tool. Which is why the rugpull on pricing is so concerning.
In other words.
The bubble has burst. You're just in denial.
Why? There's an inherent efficiency advantage to scale, while the only real advantage for local models (privacy/secrecy) hasn't proven convincing for broader IT either.
Do inference providers have standardized endpoints, or at least endpoints compatible with claude code? Otherwise to pay 5.5% on all your tokens just so it's slightly easier to swap providers (ie. changing a few urls?)
Apple still charges 30%. 5.5 seems pretty reasonable. /shrug I dunno.
One theory I think Matt Levine posited, is that SpaceX will go public with dual-class stock that gives Elon control even with a minority ownership stake, and will subsequently buy Tesla, which doesn't have dual class stock, making SpaceX the singular "Elon Musk company", with him having operational control despite being public.
So, lets do some honest evaluations:
1. The model itself is a non-deterministic engine of work with an unknown value; it's real value is just magic.
2. The business model itself is non-deterministic engine of profit with a known value; whatever the VCs have put into it, _must_ be piulled out. If Ed Zitron's numbers are correct, circa 2030, it's several trillion dollars.
So do some matrix multiplication of non-determinism vs determinism, and realize that the value proposition for _you_ is only going to decrease because #1 can never outpace #2, ensuring enshittification captures a smaller and smaller whale.
We know this. This has been the last 2 decades of money extraction from software. It was ok when it was some 12 year old's parents CC. But now it's you, or your business, that's going to either ben squeeze for value or squeeze out of the market.
And everyones squabbling about the color of the cost. ok
Yep, you can plug deepseek/kimi/minimax into claude code just fine. Or run everything through another harness like opencode instead.
Still worth it IMO to be able to switch from Provider A to Provider B if Provider A is having a bad day.
Having some open weight deployment or vendor is also a good thing, because you may have domain specific tasks where you can get better results on domain specific problems with a quick finetune.
Unsloth makes it particularly easy. Open weight LLMs are incredibly powerful building blocks.
Considering most of the cost of producing a model is the upfront cost rather than the running one, I kinda still do.
The point was never to produce 4 frontier models per company a year.
That is not my experience. Each model since at least GPT-4 can fill up an entire context window. In fact, more powerful models can solve tasks faster, so their ratio of multiplier to API price should decrease, not increase.
For example, Claude Sonnet 4.6 has a multiplier of 9 and an API price of $15, which is 0.6 multiplier per dollar.
Claude Opus 4.7 has an API price of $25, so it should have a multiplier of 25 * 0.6 = 15 when extrapolating from Sonnet, but the multiplier is 27.
> Also, they tend to use more thinking tokens.
That might be it. Is there any data on this somewhere?
We are paying for tens of thousands of those machines, although everyone knows they are stupidly expensive and incredibly slow.
Why does everyone care about gas prices I only ever pay $20 for gas?
When I see how fast Codex max thinking GPT 5.5 eats our enterprise seat credits almost anything else seems cheap (until we switch our live systems from 5.4 api to 5.5 api I guess)... good thing I'm not the one paying for those credits and tokens (which is probably how most of the money is going to be made on AI going forward, borderline free chatbots for normies are done)
Here's the oh-my-posh GH issue[0] in case your problem is similar but not solvable with a simple package update.
[0]: https://github.com/JanDeDobbeleer/oh-my-posh/issues/7029
That, and they have tool use issues.... https://www.reddit.com/r/LocalLLM/comments/1smzw6s/qwen35_a3...
I would check out the model mentioned in that thread, GGUF unsloth/qwen3.5-35b-a3b on Q4_K_M
Deepseek API pricing is very low compared to Anthropic/OpenAI API pricing.
For many, the 300% difference in pricing may be difficult to justify, if the quality difference is very small. And there will be many tasks where the most expensive/the best model, is not needed. Currently many people end up using Opus 4.7/GPT 5.5 for many tasks without thinking about it.
That would be, even is, the smart thing to do.
Maybe in a world where these AI companies behaved with some semblance of ethics and user-friendliness they would be on even ground, but for anyone paying attention local models are obviously the future.
That said it's worth noting, I don't see how anything they expose will reliably help orgs plan costing from what AFAIK is in fact a big shift for billing/costing planning.
> It forces you to pay at least $20
For better or worse that's public pricing, i.e. if you are coming in also negotiating VS for devs, windows/office licenses for the rest of the business and stuff like Azure Devops... a lot of their stuff gets cheaper if your company's IT procurement group is vaguely competent at negotiating. Not even talking bigcorp here I'm talking 500-1000 employee range.
Of course, very small orgs will suffer, but it does tie in with the theme over the last two weeks; anyone with a personal account is basically subsidizing the credits for the business accounts during the transition period.
Don't you still need to handle tokens with them? Also that's trivial.
> billing
Yes but you'd be paying for billing anyway.
> reliability
They increase reliability?
> middleware
Which you wouldn't need if you paid directly.
I'm not saying they shouldn't get 5.5%, but that list is mostly non-convincing.
> Apple still charges 30%.
3 of the 30 is for billing, with the rest mostly being gatekeeping with a fake justification on top.
Should we be blamed about uber destroying the taxi business, or airbnb the hotel one? Oh sorry, "disrupting".
Uber was dirt cheap, now it is the same price as taxis, and the people working for it (the "partners", not employees) have no social benefits.
Airbnb was cheap and humane, now it is THE cause for housing crises and massive residential property "investment".
The playbook of silicon valley is destructive, not disruptive.
It is by design aimed towards wealth accumulation. The ones with most money can capture the market, and make even more. It really is late stage capitalism.
And the more wealth inequality there is, the more pain, poverty and instability will be as well. AI will only exacerbate this.
Now, I have this high-resolution shiny object that can near instantaneously get any information I want along with _streaming HD video to it_ *anywhere*.
15 years even feels like a stone age. I can't fathom what it has to feel like people in their 60s and 70s.
The only sad thing is trying to use tools in a VS developer prompt (and how could this not have been fixed ages ago, its literall YOUR OWN flagship product). It knows how to launch the .cmd for it, but thats incredibly slow for single commands. Would be nice if I could tell it to just use an open terminal.
That's very close to a normal day in 1996. The biggest difference is I read the news on my phone instead of a physical newspaper. The news was not any more interesting or informative because of that. I guess I can also still do the loop reasonably well, but I'm a lot slower than I was in 1996 when I was a cross-country state champion.
My parents are closing in on 70 and I guess I can't speak for them, but I'm at least aware of the daily routines of their lives, too. Walk the dog, do housework, DIY building projects, visit kids and grankids. Seems much the same, too, with the biggest difference being they're now teaching my sister's sons to play baseball rather than me, but shit, one of her sons even looks like exactly the same way I looked when I was 7! The more things change, the more they stay the same.
5.3-Codex is really good enough, Sonnet 4.6 is good enough.
but surely the issue is on VS Code side, to do things in a way that work with people's shells as they are
other agent harnesses don't have the same problems with my shell
Housework meant no laundry machine, no dishwasher, and possibly no vacuum cleaner. That means hand washing everything, and beating rugs with sticks and brushes to get the dust off of them.
I am just over 50 myself and I agree with your points. Technology has changed but life is largely very similar to wear it was in the 90s. At least day to day. Attitudes are way worse now.
Don't think I'll be renewing though. The usage limits are low enough that I don't think this is worth it. One complex prompt while Americans are awake will wipe out your alloted tokens it seems.
I'm finding Google's Gemma 4 even better though - seems to hold up the agentic loop better than Qwen.
All will load into 20Gb of VRAM. None are amazing, but they do just about work.
But that gives me a good while to determine if it's worth it or not. I've heard good and bad, so here's hoping for good or close to it.
I wasn't going to fork out $1000 on a chance it might be enough with a rough return strategy.
I agree though, it can't get cheaper than the cost of hardware it's just without sufficient documentation of the actual costs to run the cloud models, we can't really know what the "true" cost of each token is. I assume there's an economist out there somewhere that could figure it out though. Certainly, the cost should approach at a minimum a open weights model running on a local machine.
I've succesffully got Qwen3-coder-next to loop and generate sufficiently competent code and from what I can tell, the difference between this and the cloth is how quickly the gen happens and perhas how interactive it has to be.
When using opencode or copilot CLI, the error messages are displayed normally and it's possible to see what's going on. Under Pi, it sometimes just hangs, or Pi crashes with some bun stacktrace and that's it.
Copilot has introduced additional limits for Claude models in past month, and it's rather easy to hit it. Pi often doesn't show anything when this limit hits (although sometimes it shows the error, I guess it depends on Pi version).
Even with overhead and scaling for peak use and a large profit margin, any company with an ounce of competition will be vastly cheaper than self-hosting. And for models you can run yourself, there will be plenty of competition.
If people wouldn't use their services, nothing would happen. They would just go bankrupt.
So yeah, I'd say it's entirely people's fault. Because people just wanted to use their services without thinking what they're causing.
Customers who think only about themselves and noone else.
But - now they are easier - I can read books on an e-ink screen and pretty much instantly find what I want to read next. I get my news on a phone. I used to watch TV/movies broadcast or on tape rentals. Now, I have just about everything I could ever want available - without ADs... those were such a time-waster.
What has changed is that I have access to MORE information than my local (or school) libraries could ever provide - in a variety of more accessible formats. Whatever tools I need to get "work done", I can find a myriad of free and open-source options.
But - the overall days and household family routines are the same - now, instead of reading a paper book while waiting to pickup my kids (or other family members) "back-in-the-day", I can read my device, or connect with my DIY communities online on my phone - or learn something new. I don't have to schedule life around major broadcast events, I can easily do many tasks while I am "out-and-about".
Friction has been reduced.
I always wonder the views of older people. My parents are very technology forward and have been my entire life so it is difficult to gauge how different life is compared to when they were growing up.
It's easy to hear "Oh well I only had 640kb of memory and typed programs out of a magazine I got in the mail!" and see as distinct from having 'unlimited' resources and the internet.
Your insight is good ("The biggest difference is I read the news on my phone instead of a physical newspaper") that life sort of stays the same but the modality changes. People still go to the store like they did in the mid-1800s but now it is by car.
I wonder what our "industrial revolution" will be where the previous generation lived (ie: out in the country on a farm) totally different lives to the current (ie: in the city in a factory). Maybe when space travel and multi-planetary living is normalized?
Since I was there (young, but there), I want to point out that this crosses three eras which all felt quite different:
1978: typed programs in from a magazine or loaded from a cassette (16kB, TRS-80)
1983: loaded programs from a floppy (64kB, Apple ][ and C64 etc)
1988: loaded programs from a hard disk (640kB, IBM PC and Mac).
Exact years vary but these eras were only about 5 years each. Nobody had a floppy in 1978 but almost computer user did by 1983; nobody had a hard drive in 1983 but almost everyone did by 1988.When was this ever different? And do you expect it to ever change?
I am not shilling China, this is just what is happening right now.
This is the role of legislation, educated experts creating policies so that you don't have to do business analysis before making a purchase.
Would I pay 10x the price for tokens and be outcompeted by other companies, hoping that openAI will go out of business ? This is entirely unrealistic.
Even if we argue that we can't require from every human being to understand what they're doing, I'd still argue that there are more people who perfectly understand it and don't care than people who have no idea how such a business operates.
> You cannot expect every consumer to be fully educated and aware of the consequences of their purchasing power.
Huh? I cannot expect that people understand consequences of their actions? What are we, animals? Of course sometimes things aren't simple, and we cannot predict that using some service will create some longterm effects that in the end will be harmful. Some things are hard to predict.
But some things are easy to predict and my point is that this was exactly this case.
I mean, now we all know what Uber and AirBnb did, and we still use them because we don't care (generally speaking, I've used uber maybe 3 times in my life, AirBnb never).
I do NOT want to have to research the business model of companies before I buy their products or services. I would like to outsource that to the government, and spend my time actually enjoying life.
Am I supposed to be invested in every change that happens around me ?
What if I am a baker, using chatGPT to experiment with recipes and develop them. Am I supposed to read about LLMs, tokens, and the silicon valley playbook ?
No. I should not have to do any of those things.
I think the Chinese government works differently than the US government. I think China has been subsidizing their electricity grid for decades and leading the world on sustainable electricity namely solar. While the us has let their infrastructure rot and laughed at government inefficiencies for about half that time. The US has data centers running on gas right now while waging wars blowing up gas infrastructure world wide. It would be comical if it wasn't an environmental disaster. Most of them have no hopes at even getting enough power in well established areas short term.
I realize what I am saying may come off as propaganda because the US holds net negative views on China so here are some links.
https://www.technologyreview.com/2025/07/10/1119941/china-en...
https://www.wired.com/story/data-centers-are-driving-a-us-ga...
I think because openai spent so much money upfront showing how it was possible to do this and laid out a product roadmap China got to get on board much cheaper and easier. I see no reason to not believe any of these companies when they say they didn't squander tons of money to do what they did because I don't know how openai has even spent all the money they have it's actually ridiculous to think about.
https://the-decoder.com/openai-adds-111-billion-to-its-cash-...
Unlike the us chinas focus is on research and sustainable building. China also has really good infrastructure for energy, etc. it is also to their advantage to drop 5 billion instead of 2 trillion and beat the us while turning a profit.
Chinas focus in ai is less flashy and because they are the biggest manufacturing super power in the world right now, it directly feeds their economy. They aren't looking for applications or to replace thought workers with slop bots, they have natural needs for this technology. Us manufacturers can't compete so they have to keep companies from selling their goods there see byd. China sees it as commoditizing their complement, the us is risking its entire economy and it's environment and resources, kind of scary.