We'll begin restoring access tomorrow, and will share an update soon.
We’re grateful to our users for their patience, and to everyone who worked with us on redeploying the models.
>We'll begin restoring access tomorrow, and will share an update soon.
>We’re grateful to our users for their patience, and to everyone who worked with us on redeploying the models.
From Anthropic on Twitter
All aboard the hype train!
Source: https://x.com/AndrewCurran_/status/2072103733715194048?s=20
-------
June 30, 2026
Tom Brown Chief Compute Officer Anthropic 548 Market Street San Francisco, CA 94104
Dear Mr. Brown:
Since the issuance of my previous letters, dated June 12, 2026 and June 26, 2026, Anthropic has taken steps in close coordination with the U.S. government to address the risks associated with Claude Mythos 5 and Claude Fable 5. Among other things, Anthropic has agreed to proactively detect and address security risks associated with the models; to work diligently with the U.S. government on protocols and standards and releases for Mythos, Fable, and future models; and to inform the U.S. government of any malicious activity.
In light of these actions and commitments, as well as the Bureau of Industry and Security's evaluation of the diversion risks now presented by Claude Mythos 5 and Claude Fable 5, the controls in the June 12 letter are withdrawn. A license is no longer required for the export, reexport, or in-country transfer, including deemed export or deemed reexport, of the Mythos or Fable models.
Commerce reserves the right to reevaluate the decisions made in this letter and the necessity of reimposing a license requirement, should circumstances change or should Anthropic fail to adhere to its commitments.
If you have any questions about this letter, please contact me or the Under Secretary of Commerce for Industry and Security, Jeffrey Kessler, at (202) 255-1864.
Sincerely,
Howard W. Lutnick
------
In the end, we need actual laws that tell the market what kinds of models get paused / analyzed, how long that pause can be, etc.
Otherwise there’s no standard and it will be easily abused and prevent investment in US AI companies.
> After a series of productive conversations with the US government, we're redeploying the model with a new set of classifiers to target and block more cybersecurity tasks. In the near term, some routine tasks like coding and debugging will fall back to Opus 4.8.
Edit: the above was from their tweet announcement at https://x.com/AnthropicAI/status/2072163884430229756 ... the associated blog post at https://www.anthropic.com/news/redeploying-fable-5 suggests it was just poorly written and coding can still be done with Fable, just with overeager bouncing of "some routine coding and debugging tasks" to Opus.
"Anthropic has agreed to proactively detect and address security risks associated with the models" LOL, this was already happening.
This clown car administration just keeps making shit up and then backpedalling in a way that just leaves everything worse.
are export controls the right thing ? Probably not.
but the american economy is over-exposed on "A.I" - the capital expenditure, while the Chinese are proving you don't need to spend tons of capital to get close to the frontier.
the Chinese have better building capacity & cheaper energy. that means the market has to correct at some point.
> Fable 5 will be available starting tomorrow, Wednesday, July 1, to users globally... Fable 5 will be included for up to 50% of weekly usage limits through July 7, after which it will be available via usage credits.
https://www.anthropic.com/news/redeploying-fable-5It's nice that the restriction is going to get lifted but I hope this doesn't make anyone complacent that their coding work is going to be scrutinized by the US government, with AI, when using these models.
Opus 4.8, you did a lot of good work for me, but in the name of all things holy... I will not miss your communication style. So long and thanks for the fish.
Now whether AI tech is in the same league as say Nuclear tech and therefore by any reasonable standard should be regulated is a different question.
We hit the slippery slope on a random day in June 2026 and there is no putting the genie back in the bottle. Any exec or manager that puts load bearing weight on top of Anthropic/OpenAI/Google/AmericanCorp frontier model deserves the stress.
So you use the frontier model, then when you can’t you accept things are less efficient. The alternative (right now) is to be less efficient all the time, I don’t see any advantage to that.
Yes 1000%, please, all my European competition please don't use mythos whatever you do it's total USA trash and the Chinese models work better anyway.
Hmm? The linked tweet was posted at 16:52.
If the Trump administration wants him to say something, he says it. Maybe what he is saying is true, maybe it isn’t. There is no way to know.
The story they are telling is exactly the same whether it was true or they were just shaking down Anthropic for no reason.
In past Empires kings bet their entire nations future on the words of soothsayers , people who said they could predict the future. It seems like Machine Learning engineers are the magicians of Empire of the modern age.
On a lark, I asked Claude to compare AI to the wild west a while ago. It raised three points of similarity:
- Land-grab economics
- Lack of regulation
- Changing social and professional attitudes.
Whatever it is, it's a wild ride regardless.
Looks like Anthropic paid the Danegeld. Now they'll never get rid of the Dane.
They almost definitely mean "you will notice even more false positives during seemingly routine coding/debugging tasks than you did at the initial launch". Which is not surprising, given the ordeal they've been put through. Hopefully it won't be too bad.
The main depressing thing for me is it's now only 7 days on the subscription, and then full API pricing, with no mention of even a plan to bring it back to the subscription in the future. (The initial launch mentioned two weeks of subscription, then API pricing, then a hope to return it back to the subscription not long after.)
"In the near term" is doing some heavy lifting.
> The new classifier also comes at the cost of flagging benign requests more often during routine coding and debugging tasks.> The new classifier also comes at the cost of flagging benign requests more often during routine coding and debugging tasks. As with all our safeguards, we’ll continue to refine this to better distinguish genuine misuse from legitimate requests and reduce false positives.
There are many different factions within the administration. Sacks was part of the "deregulate the tech sector" faction, which on this issue is aligned with the "beating China overrides anything" faction.
That's distinct from the Pete Hegseth faction (I don't really know how to characterize his faction other than anti-woke maybe?).
Sometimes these factions agree, sometimes they don't.
In general your approach is right - you can't trust most things coming out of this administration. But you can try to unpick was actually happened by who is saying what, when. That is useful even without liking the people.
Depends on how economically useful AI turns out to be. It will be useful, but it needs to be VERY useful for the current valuations.
>In past Empires kings bet their entire nations future on the words of soothsayers
I think AI's rise is much closer to the story of factory machines and computers than to soothsayers and emperors.
> The new classifier also comes at the cost of flagging benign requests more often during routine coding and debugging tasks.
Here's Fable 5, the strongest model. Actually try to use it to harden your code and it turns into Opus 4.8. You have seven days to use it, and only half of that time's worth in actual usage. Enjoy.
Looks like it's going to be a thoroughly frustrating experience, even worse than initial rollout. For subscription users, the situation is almost indistinguishable from the export ban.
> Fable 5 will be available starting tomorrow, Wednesday, July 1, to users globally on the Claude Platform, Claude.ai, Claude Code, and Claude Cowork. For Pro, Max, Team, and select Enterprise plans,1 Fable 5 will be included for up to 50% of weekly usage limits through July 7, after which it will be available via usage credits. We will re-enable access on AWS, Google Cloud, and Microsoft Foundry as quickly as possible.
EU is looking and charting its course already. Yeah, we can joke about it, we can mock it but it is in momentum already, one step at a time.
Of course, it's possible that Fable remains drastically better than 5.6, but to whatever extent Fable is the true frontier (if temporarily)... it makes me wonder if external commitments on compute put a hard deadline on how long they could run Fable on the subscriptions.
Honestly, why bother with it? They are effectively just releasing the model in-name, but we just get Opus 4.8.
Reading the full blog post, I think the summary was just poorly written (because it's hard not to read that sentence like all coding is redirected to Opus).
The west just don't know how to compete in the long run. The greed is eating itself up.
Fable 5 might not be accessible for sub in the future despite their "best effort".
And 5.6-sol is as expensive as 5.5, so highly probable to be kept in sub.
So what's the plan? Hoping people stay on ClaudeCode because Sonnet 5 while Codex offers 5.6-sol to subsription peasants?
Seems risky
I only realised late that I had an algorithm problem that existing models were struggling with, and Fable had made progress with. It created a 14 phase plan, which I was able to execute with Opus after the restriction.
The SOTA frontier models have value elsewhere, not monetarily perhaps, but certainly per user. Quite a few cool things have come out of that brief Fable window. There should be more.
Looks like it's gonna be even harder to use than before, if not impossible. Subscription users only get it for a week, and only for 50% of that week's usage.
https://news.ycombinator.com/item?id=48466313
Just a code review of my own project. Downgraded to Opus 50% of the time while evaluating the critical I/O and memory safety parts, the exact thing I wanted it to do.
And now it's gonna be even worse.
I don’t agree with this at all. IMO Anthropic has shown that that are willing to take even significant financial hits in order to stand up to their values and mitigate what they consider to be dangers and risks. Some people don’t like that or think it’s just marketing. But that’s exactly what Incorruptible is about: companies that are willing to take a stand, even in the face of overwhelming pressure from competitors, shareholders and naysayers.
The switching costs of changing LLM providers is as low as it gets. All the individuals and startups I know try different models all of the time, even down to the level of choosing which provider to use based on the task. Bigger companies move slower but only because they have lawyers and teams negotiating contracts, not because there is a technical reason that it's hard to switch.
Companies have dealt with supply chain unpredictability by having multiple providers and switching between them since forever. It's infinitely easier to switch LLM providers than it is to deal with physical supply chain uncertainty.
But, as a frustrated EU resident lamenting a lack of European option(Mistral is just not competitive enough), I will spread my money towards the Chinese models as well. Thank you Murica! You achieved your soft power by pushing us towards the Chinese :-)
This protectionism and hypocrisy (free markets and freedom!! Until it is us who needs to practice what we preach) is so tiring. I wish European nations would come together closer and put their differences aside and realise larger things together. Become the new power that the US is clearly stumbling away from being.
Nobody should be putting loadbearing weight on Amazon or Microsoft with their ruthless monopoly ambitions, yet here we are
A week or so pause from seemingly legitimate cyber security concerns isn’t cause for panic. But it should be backed by laws that describe what that process should be. That would put the market at ease
The gap between Chinese models and American frontier models is estimated at 10 months by Anthropic themselves, and it's growing.
China has no flywheel for long-form agentic traces like Claude Code and its telemetry over its userbase (no one uses the Chinese harnesses yet). Most Chinese models are forced to price themselves significantly below cost to compete with the huge demand for bootleg claude tokens, because they're that much worse.
> routine tasks like coding and debugging will fall back to Opus 4.8.
It's pretty clear that they didn't want this anyway, despite what the conspiracy theorists want to believe.
https://www.mixvale.com.br/2026/06/26/fbi-warns-brazilian-po...
But, it is a big own goal, because once you invest in building evals for your internal use-case, 1) it’s easier to switch your model to whatever is cheapest, and 2) it’s way easier to fine-tune an oss model.
Evals are annoying to build and most companies were fine to rest on vibes. Now many companies have to do the work for insurance.
If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.
If you make money from doing anything like "produce software with as little human involvement as possible", then sure, you need SOTA models. In that case, though, the value you add is very little and you probably don't have a sustainable business.
OTOH, if you make money by getting clients to pay for features, there is very little difference in time-savings from using Anthropic/OpenAI SOTA over GLM-latest.
IOW, if you business can only make money by one-shotting software, you probably don't have a business in the first place.
Regards, another small business owner.
Example. Yesterday I listened the technical lead of a customer of mine digging himself into a hole by not understanding what it would mean exposing AWS EFS to their on premise server over NFS. It was just too many unknown unknowns for him and he had no time to ask the AI (and even if he did I'm not sure that he could understand.) His boss, which actually used NFS, had to stop him. I didn't speak a word.
So, he could have coded the migration of a server from AWS to on premise, asked Claude to write also all the configuration scripts and policies but then what?
It's almost identical to the possibility of one model getting shut down for a business that doesn't care about SOTA.
At $JOB I have warned higher ups we should try to keep our expenditure under control, educate people that document slinging doesn't require Fable every time and demo the capabilities of the cheaper models, and been snubbed for it. When Fable is available once again our bill is going to be eye watering, relative to what it should be.
https://archive.is/9k7qt#selection-2001.41-2001.49 https://archive.is/dybOE
https://openai.com/index/previewing-gpt-5-6-sol/
I assume they did something to the model itself.
Either way, I do hope they lift those draconian bans. Using the model was a terrible experience because of the constant downgrades. I didn't manage to harden my own projects before Fable got banned.
When there isn't a zero-risk option, the question becomes which risk is smaller.
You could legitimately argue this is a unique situation, a brief window where cybersecurity is being disrupted by new harnesses + a strong model. But that will be fleeting as other models and products adapt very quickly, and the long term benefits of keeping it from the market are questionable at best.
It's not a coincidence the export control was dropped after Dario (who is a hardcore AI safety activist much like Ilya Sutskever) was replaced by Tom Brown in the government negotiations.
I agree that the US is falling behind in the areas you mention but that analysis fails to recognize the value of markets the US is dominant in.
Sonnet 5 today was incredibly slow for example
All the while you fight with its broken new classifier that triggers if the model is even thinking about writing secure code.
Apparently Anthropic cares nothing for their private users. This is insulting, and I hope they bankrupt after losing enterprise share to OpenAI's more efficient models.
https://en.wikipedia.org/wiki/Anthropic%E2%80%93United_State...
Anthropic was correct in their assessment and early warning of Mythos's capabilities, and they did this rollout pretty well. They were not hype marketing. They were being genuinely cautious and honest.
The Trump admin was largely unreasonable with the sudden export control. (Though not entirely unreasonable.) The export control also had not much to do with Anthropic's pre-release warnings. See: GPT-5.6 currently being held up by the federal government.
I expect the strong cybersecurity model to help me strengthen the cybersecurity of my project.
> not allowed to cover security topics
They said it wouldn't be usable for offensive purposes. This is the opposite of that.
I think the Fable ban happened because Anthropic was first to release a capable enough model.
if fugu/fugu ultra was good, why aren’t we hearing about how good it is? seems super slow and expensive, and everyone i’ve talked to who tried it gave up
For cyber and bio related requests it just refuses.
Sure… but which ones? How can you know ahead of time?
I just did a “simple” upgrade project where both me and the AI kept tripping over dead code, subtle typos, and difficult-to-trace live versus dead code.
Many times I used “Medium” thinking I got bitten, but not every time, and I couldn’t predict when.
So “Extra high” it was, for the entire project.
Far fewer nasty surprises!
But for what I work on I mostly need high or xhigh SOTA model quality output. I don't have the time to deal with anything less.
There's no congress. There's no policy (they've been making noises about not allowing AI regulation and now they're not-regulating it like a child paying with an on/off switch). The law is whatever Dear Leader's mood is today. It overrides any contract you sign with private companies, and they roll over and take it, because that's how oligarchies work.
Fable will literally sabotage you if it thinks you're trying to compete with Anthropic.
I'm hoping that some relatively cost-effective self-hosting solutions come about as a result of Hopper hardware being sold off as they're retired from DC use.
A Chinese cybersecurity company "360" has announced "Chinas version of Mythos".
I'm extremely left leaning myself but I'd rather not be able to tell who won the last election cycle by looking at HN and seeing whether comments containing phrases like this for the president are upvoted or [dead]. The only thing it aids is convincing people the guidelines are for selective application. Everyone who doesn't like ${currentPresident} will be unchanged and those who do aren't going to be convinced by constant casual name calling across the site - probably the opposite.
I usually also expect to get called out as only saying this when ${currentParty} is in power or when it only benefits ${awfulThingsAboutCurrentParty}, regardless which that is and what those are at the moment. I've started including this note and the searchable token "reallynotpartyrelated" when commenting such things for later reference - this paragraph can otherwise be ignored :).
I don't know what I was thinking.
There's a lot of subjectivity in determining this, but I'm 100% sure that 10 months is wrong.
I don't know whether the gap is currently growing, but I'm not sure it matters. There are thresholds where models reach certain levels of usefulness. Opus 4.8, for example, is at a level where I can give it relatively vague input, and it can go for half an hour on its own and produce a high-quality PR.
If GLM reaches that level of capability and can do that task more cheaply than Anthropic's model, I will use GLM for that task, because that's a specific type of task I use models for. It doesn't really matter whether Anthropic also has a better model, because what does "better" mean in this context? It's a clearly defined task, and Opus 4.8 already does it at a very high level of quality.
#1 I've had use cases where it was clearly obvious the Chinese models were behind.
#2 I've also had use cases where I couldn't tell a difference at 1/20th of the price.
The problem is - the #1 is the use case where American frontier is gated behind saboteur classifiers and is tiny minority anyway. Vast majority of work is #2.
The gap doesn't matter anymore.
Yes.
If.
Man I hope this tech FOMO eventually stops.
Companies generally fail because either their product doesn't meet a market need, or the market doesn't exist in the first place (possible because of bad timing), and not because they simply outran their competitors.
These aren't things fixed by using a frontier model to vibe code faster in lieu of one 5 months behind.
I think it’s excessively charitable to assume businesses are uber-competent ROI-chasers. The expense people are eventually going to win on AI too, this blip of unrestricted AI budgets will be gone soon.
> say GPT-4o to GPT-5.2, a transition I just finished on a not too complicated application
Neither of which is close to SOTA, because tasks like these are typically built on a cost conscious manner which tries to keep token costs in check.
I’m primarily responding to all of the commenters who are acting like nobody is going to use American SOTA models for anything because the government interfered with them for a couple weeks. It’s obviously not true, and I expect these models to be oversubscribed instead of avoided like some are claiming.
- https://vgtimes.com/tech-and-hardware/159377-glm-5.2-open-ch...
- https://www.business-standard.com/technology/artificial-inte...
- https://www.timesnownews.com/technology-science/china-brings...
The cybersecurity model is Mythos, which was never made publicly available. It is only available to a list of US government approved companies.
> They said it wouldn't be usable for offensive purposes
No, they said Fable would refuse for cybersecurity and offensive purposes. You are conflating Fable with Mythos.
Fable 5, harden my openssl project. Then you use the diffs/summary to find out what the bug is for your exploit.
> Speed.
Speed of what?
Speed of understanding what needs to be done? I highly doubt it.
Speed of LoC checked into git? Sure, I'll give you that.
But one can use any number of tools to generate hundreds of thousands of lines of code. See any build tools which support specifications such as RAML, OpenAPI, CORBA, etc.
So I ask again; speed of what?
For actually building software, I'm starting to suspect a human with a dumber (but faster) model is going to get the job done quicker than Fable (and possibly even cheaper). Bug-finding and vulnerability detection is a different story.
Overview: https://www.bbc.co.uk/history/ancient/vikings/overview_vikin...
https://en.wikipedia.org/wiki/Northumbrian_Revolt_of_1065
He likely does not have the domain knowledge nor is authorized to be the recipient of such a letter.
And that's ok. His role is to hire others competent in export matters. It's a learning experience for them.
If you're the one-shotting type, obviously then Fable might be useful, but I think only marginally. You don't need to bring a MANPADS to a duel at high noon.
Even if you won't be able to use some model tomorrow, you can still make money by using it today!
And in the age of limited compute, spiky workloads and constant outages, building a mechanism to fallback to a weaker model when your primary choice isn't available is smart anyway.
Not trivial, you would need to do lots of evals and prompt tuning when you switch models.
imagine what happens when you optimize your agent skills to the current model, and new model starts breaking. you would need to have versioning for your skills, serving different skills based on the model while you do A/B testing
Does anyone know why? I was really excited when they emerged, but their models and targets don't seem to be quite in the same market.
On top of that, the intelligence is being dialed down. Sonet 5 is a living proof of this. Fable has strong guardrails, but new Sonet is a dumbed down expensive model, which already falls behind GLM 5.2 and Kimi 2.7. I might go back to Claude since I know Fable is just a limited offer, and I am not going to pay for API usage. But what they are signaling with Sonet will also come to Opus. A lobotomized more expensive model.
I am honestly baffled how the current administration is giving the whole world, on a golden plate, to China. And they don't seem too bothered about it. They are living in their own bubble and reality distortion field I guess.
I could go on endless rant about Dario, but I feel I am so strongly biased now that my judgement might be clouded.
Time to move on
It is interesting to hear a European exclaim they would rather depend on a selection of models from companies in China with concomitant strings attached, rather than be dependent on a selection of models from companies in America.
Isn't it better to simply stick to whatever is best and then, should it be pulled from under you, simply switch out to the new best model that IS available? I don't know that models have a moat and you can easily swap out should you need to.
Pre-emptively betting on which is going to be least susceptible to government intervention seems like premature optimization to me.
I feel like EU could start a company, start from available open weight models, feed 2bln a year into it (1% of the EU budget) and make a compelling almost SOTA model for the EU market. This company could partner with datacenter providers and sell it hosted in the EU or somewhere else with EU protection terms. The budget for this company would easily double with the added revenues and you are creating an ecosystem of providers that can compete with US big-techs and have a 500 million people market that can't wait to ditch US companies for them, given the current mood.
The model can be open weight and it's an easy way to compound the efforts we are seeing in China without even having to talk to each other. Maybe there is a way to make it work not open weights but I am not sure how would that work.
These are those kind of decisions that seem such no brainers to me, which probably means I am completely out of touch with reality.
Until it goes down, or Anthropic raises prices again.
Fable is already expensive to use compared to GLM and they want you to use the API as much as possible so you get a worse deal.
The reality is this is world-ending technology and absolutely nobody knows what to do or can even agree that the problem exists.
The H200 was released Nov 2024.
Even allowing for Jensen exaggerating the risk there is no way China is 7-10 years behind.
Looking at manufacturing process nodes, SMIC N+3 is a a 5nm process. 5nm was introduced by Samsung and TSMC in 2020 so at most that is 6 years.
But the chips they can produce on it are roughly comparable to "roughly level with Android flagships from three years ago"[2]
TL;DR: China is more like 2-4 years behind than 7-10 years. If China developed EUV lithography then all bets are off.
[1] https://www.reddit.com/r/LocalLLaMA/comments/1kxw6b9/nvidia_... - see video.
[2] https://www.tomshardware.com/tech-industry/semiconductors/se...
We got the first news about Mythos in March, so it is likely that it was already close to ready by the time Opus 4.6 was released.
So the actual gap is the time elapsed between March (or April for the official announcement) and whenever Chinese models can match Mythos.
How is this different than any business with something to lose saying a competitor isn't as good? Not saying it's false, but it would seem to me that it's more important how customers feel about the issue.
I've heard half a dozen people talk about how a less advanced model coupled with a better harness outperforms a smarter model in the last few weeks.
If the USA wanted to shoot its AI industry in the foot it achieved its goal.
And you seem to think "no one uses" DeepSeek's v4, z.AI's GLM 5.2 or Xiaomi's MiMo 2.5 from their official APIs when they probably dwarf Anthropic's usage and are widening the gap due to conquering a chunk of Western market too.
I know it's hard for some to comprehend there's an entire Eastern hemisphere in the globe with billions of people, so it's worth reminding. And some seem to think the world is basically silicon valley even.
Assessing quality of output is often not trivial, either. Typically, problems that are solved by offloading something to an LLM are super subjective, and customers “feel” something is different is vulnerable.
We try to quantify output differences by many different similarity metrics. But a lot of energy goes into subjectively evaluating if something still works.
Why would you expect a typical policy decision to be reversed within 3 weeks? If policies are going to be reversed within 3 weeks just don't do them in the first place.
I wonder where the market sizes will shake out for these different types of use cases? I am guessing right now 1 is bigger than 2 but not for long (by token volume)?
For example, I have software that summarizes articles and classifies links on webpages to build a synthetic RSS feed, both of which use LLMs, neither of which need a SOTA model.
I'll probably use LLMs to bootstrap a dataset of native ads in articles, and there again, I don't really need a SOTA model.
If it's for more open ended tasks like writing code though, I agree that at this point SOTA models make more sense to use.
fixing more serious regression also easier. connect honeycomb mcp, ask agent to debug while i walk to coffee and get some pistachio rose dates. by time im back with my oat latte ive got a full report on what happened and can send the next slack message to fix.
life is good
In some cases they do. I work in a B2B vertical SaaS company and there’s both features that competitors build or rough edges around our features that make clients go „either we get X or we sign with someone else”. I agree though with the general sentiment that you don’t need SOTA models to build those - humans or humans + mid pack strong model will do.
Couldn’t we just train smaller models to “translate” what the harness user wants to what the worker model expects? I mean, if models understand caveman, it seems like just a small stretch
The AI model people choose today has no bearing on the ultimate trajectory of the competition. Both the US and China understand this. EU simply can't move quick enough to be competitive in this type of game, which I think they also recognize.
Everyone is betting that the model you use will be a Hobson's Choice[1] over a long enough time horizon. They are likely correct.
What strings? The Chinese models are open weight, you don't have to spend your money directly with those labs. They can be hosted within the EU, by EU companies without sending a dime to China.
The bigger question is does the EU have the appetite to invest in building out data centers/hosting infrastructure for this, and that's where I have my doubts.
Huh? Sonnet 5 is a strict improvement over Sonnet 4.6 at the same price.
In terms of cost-benefit, they are already the best models I could find.
This "only super special corporations get the model" nonsense is dividing society into haves and have-nots.
I guess the underlying issue is that there is this model that is very capable, but it's being hobbled because of a fear of abuse. It may well be justified, but for a legitimate user any restriction just makes it a worse product and after all the puffery around how good it is (and some practical experience of how good it is) it's a pretty shit experience. "Here's our best model, no you can't really use it".
I can only imagine what people are doing at their jobs with unlimited token budgets.
The opportunities available for these people are rapidly, rapidly shrinking. I believe it's possible to be a developer today who's EXCEPTIONAL and never uses AI. Most opponents are not exceptional, though, and even these opportunities are shrinking.
Most exceptional developers in my org adopted AI in their workflows and went from 10x developers to 20x developers.
If you refuse to adapt, you're going to be out of a job complaining about the kids and their newfangled technology REAL quick. You have a few years remaining, maybe less.
I am appalled none of this is clicking with you anti-AI folks. This is all so exciting -- alarming even! --, and software careers are never going to be the same.
I don't know how you just metaphorically stand there and act like nothing at all is happening. We've never seen anything like this in our entire lives.
Some of you are standing right in front of the steam roller, yelling to all of us that steam rollers aren't real.
It sounds that your business is selling completely agent-coded products. I don't know how long that will be viable, or even if it is right now.
In my part of the world, I am completely unable to sell completely agent-coded products, so even a SOTA model is useless. The majority of my time is spent on analysis outside of coding anyway, so when I bill it's not based on how many lines of code I've added, it's based on whether the goal of the customer is satisfied.
One of the contributing factors that led to this control in the first place was that the commerce department couldn't get Dario on a call immediately:
"Then White House started reaching out to Anthropic to speak with Dario Amodei, who was at a wellness retreat.... When Amodei was finally available past 1pm, he had three tense phone calls with a combo of ppl including Cairncross, Bessent, Lutnick, Kessler, Will Scharf, Richard Walters, and Walker Barrett."
https://x.com/SophiaCai99/status/2065942612293365948
Anthropic has disputed that Dario was at a wellness retreat but both sides seem agree that it seemed to be a problem (and it is very apparent that Dario's response made things worse).
> Europe becoming more reliant on the Chinese is not the answer, and will, if continues, isolate the EU from the US
There are sound reasons to avoid reliance on China, but the risk of isolation from a fading superpower - who befriends the EU's enemies, agitates in EU politics, inflict needless damage on the EU's economy, and insults EU leaders - isn't one of them.
China should be dealt with as a normal country. There's no need for undue anxiety there.
EU as a trade block should exercise reciprocity and protect its own interests accordingly though.
As for LLMs, I see no issue in using Chinese models. With the talk of digital sovereignty, you can run open source models on EU datacenters without necessarily having to spend the money to train them.
> isolate the EU from the US.
That is not a bad thing. In fact, I hope this separation grows stronger.
It was about time European countries lifted themselves from the US shadow.
Mistral focuses on long term b2b contracts and their proposition is that they fine tune their model to your needs with an added bonus of 'not dependent on America' in a politically tumultuous time.
The USA is far more dangerous a "friend" than China is an acquaintance. China has not been threatening military annexation, China does not randomly start trade (or real) wars. China doesn't just turn away from international commitments.
Bottom line: China is a far better international partner than the USA.
If that's a comfortable position for you, all good.
We held the US in higher regards, that's all.
That's not how market-based economies work...
> feed 2bln a year into it and make a compelling almost SOTA model
...and the reason is, if you give a bunch of people €2b a year and tell them "go try and make something", they'll make a ton of paperwork covering their asses and very little actual output.
This is irrespective if those people are European ("european google killer"), American ("cost plus" old US aerospace companies) or Chinese (which is why they do it a little different).
If there are no incentives to really try really hard, they won't do it.
In many high-tech cases in Europe, the formula for "let's subsidise the hell out of research and hope a commercially-viable business comes out" has a really poor track record.
Your second option - and possibly the best bet - is to find an existing company that already showed they're capable, and shower them with money, which is what French are doing with Mistral.
The reality is that the "people in power" believe it is "world-ending technology" and will therefore use it in world-ending ways. People are absolutely 100% the danger here, not the technology.
Let's face it - all bans were dumb. They just gave China the legal (per WTO rules) justification to start producing everything domestically. The bans work as a reverse tariff, as a protectionist measure that actually protects your competitor. If China did those, others could bring China to court at the WTO. But the US did that, so nobody can sue China.
Why would Anthropic get the benefit of pre-release models counting toward their lead, if nobody else gets to count their pre-release models?
If you’ve got a product where the budget allows for Fable level token costs, I doubt you wouldn’t have the budget to run your evals again on a cheaper model if Fable was unavailable. I mean it wouldn’t even take that much token volume to turn it into a money saving proposition to do the engineering work to switch to a cheaper model.
Fable is primarily used for human in the loop tasks like coding or office work, not in some backend app unless the company has money to burn and doesn’t care about anything other than using the best model available at the time.
Even the premier EU companies such as ASML are heavily reliant on US supply chain.
But why can't we be bitter?
This may change in the future as AI gets more commoditized and the current US admin keeps shooting itself in the foot but they are still very far ahead right now
There's enough money and scale on the line that software affinity like CUDA is no longer the deciding factor and there's margin for custom stacks.
Even more so after the USA GPU exports ban which is proving to have backfired by speeding up China's tech growth.
First the US blocked China from buying NVIDIA's H100, but allowed NVIDIA to sell them a China-special nerfed H100, the H800
Then the US blocked the H800
Then the US realized that China was indeed accelerating their US independence, so does a U-turn and has now approved the H200 (more powerful than both the H100 and H800) for sale to China, on a case-by-case basis
However - and here is the real kicker - China themselves are now blocking H200 purchases since they want the acceleration towards Chinese homegrown solutions to continue, and now we have Chinese models being served on Huawei Ascend chips, with next generation Ascend 750 chips (using CXMT made memory) targetting training currently in testing.
Now we have Apple asking the US government for permission to buy memory from CXMT given the global shortage!
Deepseek and Kimi are writing paper after paper with substantial architecture improvements for efficiency, because they can't just throw more hardware at the problem.
And China is now doing something on the hardware axis; which it may have never explored were it not for the sanctions.
But exactly which point in time is z.ai compared to claude.ai? Consistently bring "6 months behind" in an exponentially acellerating evolution means the gap is growing exponentially wider, not constant.
A couple: usually 2, though not always
A few: 3, 4, 5
Several: 4, 5, 6, or 7.
Can you comprehend than Anthropic is winning because is both cheap(subscriptions) and better SOTA. People are cheering China providers when I reality they would rugpull open weights the moment they are competive.
China models are trash that why they are giving them away for free.
For individuals and small companies subscriptions is the best deal, for big companies china models are big no unless they can host them.
TBF I do burn 200k tokens just preloading the context with onboarding, not including any code, just document trees of development policy documents, style and architectural standards, code and documentation review processes, company ethos and culture, etc. it’s a token fire, but it really works for us.
Also, documentation driven development all the way down.
(not necessarily implying it's conscious strategy, authoritarians tend to be actual incoherent dumbasses)
They are overused in sitcoms because it’s easy for actors to mimic on demand unlike several other reactions.
If you want to climb into Xi Jinpeng's garden where he has absolute uncontested unilateral control for life, well, be warned.
But I think what you’ll see is people making sure the model they use can just be plugged into their workflow.
I used to use Gemini-cli until they did a Google and cancelled it in favour of anti-gravity.
That was my fault. Fool me the 10th time shame on me.
So I picked more open source offerings that I can use with any model. Once the other models are good enough, I just need to jump ship.
I've stared at ugly LLM code, that I had just had generated, and worked well enough for my purposes. (generally, some quick recursion into a nested python dictionary in order to dig out some property -- especially for linting or quick data analysis).
And I wanted something better, sure, something a bit more readable ...but I just needed it to work well enough to recurse through a yaml file for config file linting, not be battle-hardened against every test case.
So to deal with the mess, I shoved it in a pure function, threw a few basic sanity unit tests around it, put a comment with a disclaimer of "#this is LLM generated code, it is lightly tested, do not use it for anything truly load-bearing without a lot more tests" and I moved on to something else.
Not everything has to be bulletproof.
That's irrelevant. What's the increase in revenue?
I can’t turn 10x work into 20x work because my Product Manager thinks changing fundamental premises of tasks I already spent two weeks on (mostly removing human blockers) is very simple. After all, when he asked Claude to update his prototype, it only took it 10 minutes.
I can’t turn 10x work into 20x work because the company dedicated entire teams to write company-wide skills for everything. They suck, but if I don’t use them, I’m not following the new “golden path for engineering”, and I lose points in my performance review.
I can, however, turn 10x work into 20x work, or even much more than that, if AI actually did what it’s promising and eliminated most of my team, the product manager, and the middle managers. Or me. I could use a break.
Speed of what?
With ad hominems and a non sequitur. How about I narrow the question with the hope it engenders a relevant response: How do LLMs increase the speed of a person understanding
what needs to be done?
0 - https://en.wikipedia.org/wiki/Straw_manFor all of you people who think these LLM models are “earth shattering” how the hell do you reconcile that it’s a net positive for anyone but those who want to consolidate knowledge and power.
We are really looking at idiocracy in the making.
It's shocking to me that Anthropic seems to be run with the same managerial chaos as depicted in early seasons of Entourage.
Dario may be a genius, but when it comes to running a big business — which involves dealing with governments and regulators — it's like he just fell off a turnip truck.
Maybe not in Europe, but ask their Asian neighbors.
> The USA is far more dangerous a "friend" than China is an acquaintance.
That's true and will continue to be true for 2.5 more years. European countries too have had bad leaders (like Germany), but have recovered. So too will the US.
> China is a far better international partner than the USA.
China is not a democracy and does not share western values.
there's no shortage of talent in Europe or France, it's just an issue of available capital
They've been doing military annexation right now in the South China Sea.
> China does not randomly start trade (or real) wars.
The invasion of Vietnam? The subsidization of industry and pegging their FX?
> China doesn't just turn away from international commitments.
Abandoning Ukraine despite being a signatory to an agreement that assures their defense?
This is not an anti-China post. I don't like anti-XYZ country posts that create tension and make people defensive. I am not particularly against China more than other major powers. They have their interests and they pursue them selfishly, like other countries do. This is just a basic lesson about the world you live in.
The only difference between using a slower chip such as H100 (or Huawei's Ascend 750) vs NVIDIA's newer Blackwell chips (B200 etc) is that you need more of the slower chips to achieve the same total FLOPs in your cluster. It has zero effect on what models you can run on it.
Oh? Exponentially accelerating, huh? That's quite a surprise, to me.
The tradeoff is worth it. They’re even publishing papers which blows me away — their efficiency gains quickly become incorporated into frontier models because they are open sourcing them. They would be aggressively pursuing the same chip pipeline strategy as they are today.
US is lagging in efficiency work because the ROI is better elsewhere for us. We have the same tier of talent, once the script flips so can the research.
HN is full of contrarians and folks who don't know what they're talking about in regards to AI.
Gaining parity on the semiconductor fab front has been official government policy as part of their Five Year Plan for at least the last decade, straight from the Politburo. They were always going to go down this path, and with AI playing front and center on their upcoming plan, there’s even more pressure.
There was never a possibility of them not exploring it.
Of course they have ways around this -- you can get black market GPUs and also API costs are SUPER cheap there -- they hack the subscription model, bundle a bunch of user accounts, and route API requests through them.
And yes they are getting to parity with US technology and will get there in a few years, they have decent chips but still not the quality of NVIDIA.
It's really a very complex situation
I had to explain this to my German friend. In my understanding this isn't about the actual number, it's about the certainty. If it's absolutely and definitely two, then I say two. If I'm uncertain but it's probably two, or if a non-integer, somewhere around two, then I say couple.
And few is more likely to be 3 than 5, because 5 is getting close to a "half-dozen or so", or (as you say) several.
Many is very context-sensitive, as the meme has it.
So I would agree that the open models are a few months behind, definitely more than a couple of months behind, possibly several months behind, maybe a half-dozen months or so behind, but not many months behind.
It's not just statistics either. I know for a fact that I made major progress by using LLMs. Here's a summary from around a month ago:
https://news.ycombinator.com/item?id=48407642
AI is world changing technology as far as I'm concerned.
its a lot of features that feel half complete, with the llm pretending that the job is done rather than actually being done
True, but neither is the US.
What exactly is there in the USA's destruction of the economic norms that have always served it, or in the pointless dumping of its hard-won soft power, alienation of its allies, deliberate weakening of its intelligence gatherers, rampant open corruption from its leadership, or in any other of the innumerable harms it's inflicted on itself the last 18 months, that you think is conducive to the US maintaining its superpower status?
Even if the US isn't fading, the message is still clear: the country is adopting a more isolationist stance and has no problems alienating its allies. Why would you want to continue to tie yourself to a nation like that?
You can try, but where I am there's literally no point - anything I offer that I bill based on how long my agent will take will be counter-offered by an even cheaper person using the same agent.
I've been through this cycle a few times already. It's pointless.
I sell outcomes, not lines of code. When I can get paid for unlocking revenue or reducing costs, SOTA makes not one bit of difference.
In practice, this means that I now don't even engage with clients who lead with "we want this program written" or "we want this feature added to this code we own". Those types of clients, their expectation is that you'll never need to bill more than the time you used to meet with them and maybe an hour of "labour".
You can, of course, continue as normal, but the expectation from clients now is that code is, for practical purposes, free. I've had one client last year vibe-code a ping program using Claude Code just to "prove" to me that my custom board+design+code for their industrial flow controller could have been done by their AI subscription.
If your business is "selling code", you aren't gonna win. If your business is "selling solutions" then you don't need SOTA anyway.
This is more complicated than you paint it. Countries like UAE have enough capital to throw at things and little-to-no taxation, yet they don't attract as much talent as they would like to.
Preexisting centers of excellence like Silicon Valley are attractive for young talented people precisely because a lot of older talented people are already there. The same reason why a young talented painter in 15th century would prefer Florence to some rich, but boring place elsewhere.
You can only really do a meaningful work in a "heavy" field by tightly cooperating with others, and physical proximity still matters.
What's your competitive edge here? Shaving off an hour of a feature delivery? Not having to see the code that is produced?
A: The sky is blue! B: No it's not. A: Yes, it is, please look up. B: No, you must prove it to me through reason. A: But, if you would just pretty please look up. B: No.
I run a company, I've been running it for 10 years, we do alright. I'm a shitty manager. Every time I've hired developers, the business freezes. The business isn't anything super important, the main consequence of bugs is that my family loses money. Everything has always rested on my shoulders. In theory there is some path for me to become a good manager, but I never landed on it. But now, with Claude, it's great. So far Claude has paid itself off in real profits at least 20x over, and that's with significant API usage on top of the monthly sub. I can prototype new features in an afternoon that before were on my giant list of "maybe somedays if I ever get to breathe" list. Our user experience has improved in so many ways that I knew were probably worth it, if I could just find the time. Now I can.
There are situations where yeah, it probably isn't ready yet. But, there are so many where it's amazing. Seriously, it's worth looking up.
Without access to ASML EUV machines, the Chinese will be stuck on older less-dense chip manufacturing nodes, but in terms of building a cluster this is just a cost/efficiency issue - it means you need more chips, more electricity!
3 or 4 would likely be a few, or some. 1 is, well, one.
They have always been able to do this, but this time they did have the option to pass.
If west ai is too advanced can take over the world. So better go to war now on a same level playing field than later when you need to fight against a SGI
Instead, the US banned China from chips and lithography machines, giving China the legal excuse to start producing them domestically without violating WTO rules. Now China produces cheap chips and uses them with cheap electricity.
This was a dumb move by the US. Brought upon it by dumbf*ck aristocratic elites who grew up in isolated mansions and then received law degrees, with absolutely no understanding of technology and technology ecosystems. They thought they'd just make the rules and everybody would have to obey. It turns out in technology, they don't have to...
That's rare, though. If they could not untangle their own code after 4 months, it's because they were not making enough money to pay a team to untangle it - that's not a code problem, it's a revenue problem.
IOW, the startup failed because their revenue was too low.
For a change, I let DeepSeek V4 Pro implement it on Max thinking level. Nothing too out there - some DB migrations, some Django back end changes and Vue SPA front end changes.
Implementation time in total including tests was a few hours, so nothing too egregious. However, one of the migrations would break with pre-existing data, one of the column references in the entity was wrong, the API endpoint wasn't made consistently with the others in adjacent code (e.g. permission checks) and the front end had a Pinia state related issue and submitting one of the forms didn't work.
Tooling was run: ruff, ty, Oxfmt, Oxlint, also Docker build was green across the board, but the overall feature just didn't work. In both cases, sub-agents with clear context would review the code for serious/critical issues, at least three in parallel and do review loops until they spot nothing. The harnesses both has LSP integration.
Opus spent another hour fixing it, needed a few iterations, because I couldn't be bothered there.
> What's your competitive edge here? Shaving off an hour of a feature delivery? Not having to see the code that is produced?
The difference largely was not needing to waste time in fixing all sorts of subtle bugs that sub-optimal models will produce, worse yet if it was some sort of a serious project and those wouldn't have been spotted but instead that slop would have gotten shipped.
That said, Opus isn't ideal either and messed up a whole bunch when I was training some neural nets and try to process a bunch of satellite data and configure Garage to store them so that tiles can be served from a slow HDD and stuff like that. Obviously, it also needs a lot of babysitting in regards to UI looks, but it's better at the rest of development.
I think that DeepSeek V4 Pro and GLM 5.2 are cool though, it's just that you want as many checks and tests as you can throw at any given problem, or use languages that make shipping completely broken code increasingly likely.
People need to get to grips with that fast.
Distribution, relationships, processes, mindshare, marketing, and politics matter. Code is just ephemeral glue and implementation detail.
Nowhere near the value of having access to chips, at any cost. They have extremely deep pockets. They already pay 6x the cost per FLOP.
> Instead, the US banned China from chips and lithography machines, giving China the legal excuse to start producing them domestically without violating WTO rules. Now China produces cheap chips and uses them with cheap electricity.
You think without export restrictions China wouldn't be doing the exact same thing? China needs absolutely zero legal excuse. I mean sure they have compute available on grey market / domestically but at 6x the cost per FLOP. Access to NVIDIA chips would make it dramatically cheaper for them. Yes you get chip income but that is not even close to what you lose. The strategy is doing what it was always supposed to do: slow them down, bleed their resources to force them to spin their wheels catching up. China is doing a great job with this but they are fundamentally constrained by these export controls.
You are right that this greases the wheels, they are further along than they would have been without export restrictions, but they are still delayed even with the reduced friction. The alternative is that they move slightly slower _while having the same compute infrastructure available_ and at dramatically lower energy costs. That is a far worse position for the US to be in.
> This was a dumb move by the US. Brought upon it by dumbfck aristocratic elites who grew up in isolated mansions and then received law degrees, with absolutely no understanding of technology and technology ecosystems. They thought they'd just make the rules and everybody would have to obey. It turns out in technology, they don't have to...
I think this is too cynical. Neither one of us is in the room to actually observe the real decision making, but export restrictions as a strategy are not some "dumbfck aristocratic elite" thing. They are perfectly rational from a strategic standpoint and arguably doing what they're supposed to do.
My point is and remains:
A) GenAI did not give you this understanding.
B) GenAI can only assist in your expressing this
preexisting understanding.
C) GenAI is a statistical token (text) generator and
cannot, by definition, "make" a person understand
what they want/need to do.They’ve also invested in AI separately (before LLMs) in that time period but I’m less familiar with that sector.
To quote Pat Toulme:
There’s a big misconception about how GLM 5.2 was trained. Yes, they distilled Claude and GPT 5.5 — but distillation is not how they matched Opus quality. Distillation only fixed the cold start problem in RL.
RLing an agentic coding model isn’t rocket science. In simplified terms:
1. RL needs trajectories — rollouts where the model actually completed a task in some env
2. No successful trajectory on a task = zero gradient = you can’t RL it. This is the cold start problem
3. Distillation solves it. You seed your model with knowledge from a smarter one (Claude, GPT) on tasks it can’t do yet
4. Now it produces positive trajectories on those tasks
5. RL on those trajectories and hill climb agentic coding
6. At that point you no longer need to distill and can solely hill climb RL to better models
This is an interesting curve. I’d argue it’s harder to get to Opus 4.8 from scratch than to go from Opus 4.8 → Fable/Mythos tier.
GLM 5.2 is already producing positive trajectories, so they have plenty to RL on — they’ll keep climbing to Mythos quality without distilling any further. They no longer need American models.
https://x.com/PatrickToulme/status/2069211575437627743
Not exactly sure what the finish line in "the race to superintelligence" looks like and even moreso it's unclear why you think being there first is a critical benefit.
Of all the "concise" and "beautiful" code I worked hard to produce, I was the only one to ever lay eyes on it. It didn't actually matter, and nobody cared but me. The people in charge of my raises could never perceive quality of code, because it wasn't their area of expertise. They only cared (rightly so) that it did what it was supposed to, and all the elegant abstractions didn't practically help that purpose. It was, literally, wasted life that I should have spent just getting off work early, like most of my colleagues.
Just 99.999%.
If you are, in fact, "a technical product manager", I would hope you understand that "bad code" is identified as such specifically because it "impacts the business."
Get over yourself. We're all ephemeral, dead and recycled in the blink of an eye. Our species doesn't even clock on the geologic timespan.
If you think your code (or any of your artifacts or possessions) matter beyond their immediate utility, you're mistaken. Work will either fall into disuse or be replaced. It's scaffolding for what comes next along a well-traversed path.
The engineers I have worked with most definitely define "bad code" as having intrinsic limitations and/or latent defects which impact successful system functionality/operation. Indicators provided to stakeholders such as yourself which support this assessment are, but not limited to:
- the system doesn't work that way
- the system lacks test coverage, so changes take longer
- adding feature "X" is not feasible
- there is no repeatable way to onboard team members
- the backlog grows exponentially
- that "one point task" is going to take a couple weeks
All of the above impacts a business.It is up to you, the "technical product manager", to understand what your team is trying to tell you.
Everything you're saying is true, sometimes. Assume I'm still right, and that you might be able to learn something from someone else.
I do not see how I was being rude, unless it was my use of quotations around the title you claim.
> I'm a human being ...
I did not doubt this.
> ... I'm a very experienced product manager and engineer ...
Again, if it was my use of quotations which you found to be rude, then I do not know what to say about that.
> ... and the way you are behaving sucks.
I respect your perspective and support your right to express yourself. And no, I do not think you are being rude by doing so.
> Assume I'm still right ...
Why would I? You responded to:
>> This is a site full of developers who are convinced that "proper software engineering" is 100% of what makes a business successful, and everything and everyone else is useless.
With:
> As a technical product manager, this 1000%.
Finally, you write:
> ... you might be able to learn something from someone else.
Maybe you can learn something from someone else as well.