This field is going in a incredible pace, the providers release a new model every quarter or so. The amount of criticism is a bit overblown in my opinion. The benchmarks still look very good to me. I’ve used GLM-5 (latest is GLM-5.1) and Kimi K2.5, they are decent and gets the job done, so seeing how this model of Qwen performs compared to it is kinda impressive.
Also, why are so many pointing out the fact that this model is not open-weight as if this is their first time doing so. Qwen-3.5-plus, Qwen-3-Max is also closed source. This is not something new.
I think Qwen trying to catch up to the SOTA models is still healthy for us, the consumers. Sure, its sad news that this version is closed-weight, but I won’t downplay their progress.
Comparing to Opus 4.5 instead of the current 4.6 and other last-gen models is clearly an attempt to deceive, which isn’t winning them any points either.
I think there is a moderately large market for models like this that aren’t quite SOTA level but can be served up much cheaper. I don’t know how successful they’ll be in the race to the bottom in this market niche, though. Most users of cheap API tokens are not loyal to any brand and will change providers overnight each time someone releases a slightly better model.
I can remember how good Opus 4.5 was. If I'm considering using this, it's most informative to me to compare to the model it's closest to that I have familiarity with.
I'm obviously not switching to this if I want the best model. I'm switching if I'm hopeful that the smaller versions are close to it, or if I want to have more options for providers, or for any other reasons unrelated to getting the highest quality responses possible.
I used the https://modelstudio.alibabacloud.com/ API to generate that one, which required signing up for an account and attaching PayPal billing - but it looks like OpenRouter are offering it for free right now so I could have used that: https://openrouter.ai/qwen/qwen3.6-plus:free
As always, we'll have to try and see how it performs in the real world but the open weight models of Qwen were pretty decent for some tasks so still excited to see what this brings.
Opus 4.5 is $25/m output tokens.
This is at most $6/m output tokens.
That's ~1/4 the price.
Like Qwen local for it’s privacy, but I trust the privacy of Google/OpenAI/Anthropic more than alibaba.
Right, they state that they'll release "smaller" variants openly at some point, with few details as to what that means. Will there be a ~300B variant as with Qwen 3.5? The blog post doesn't say.
The naivety around this has been staggering quite frankly. All of a sudden, people thinking that meta etc are releasing free models because they believe in open access and distribution of knowledge. No, they just suck comparatively. There is nothing to sell. Using it to recruit and generate attention is the best play for them.
Sure they are not cheap to train. But if open weight models continue to be trained and continue to become available on cheaper hardware, how do dedicated AI companies protect their margins?
There isn't, pretty much everyone wants the best of the best.
There's nothing really strange about not competing directly with the best, but rather showing whom you are as good as.
"[...] In the coming days, we will also open-source smaller-scale variants, reaffirming our commitment to accessibility and community-driven innovation. [...]"
This means a 100k token request counts the same as a 100-token one. I’ve made about 8000 requests in the last two weeks, averaging around 80k tokens per request. It feels like they’re subsidizing this just to gather data on agentic workflows.
On the downside, the speed is mediocre (15–30 tg/s for GLM-5), and I’ve seen the model glitch or produce broken output about 10 times out of those 8k requests.
None should be trusted, unless you are running them locally.
This is how I view that the public can fund and eventually get free stuff, just like properly organized private highways end up with the state/society owning a new highway after the private entity that built it got the profits they required to make the project possible.
There are a lot of data science problems that benefit from running the dataset through an LLM, which becomes bottlenecked on per-token costs. For these you take a sample subset and run it against multiple providers and then do a cost versus accuracy tradeoff.
The market for API tokens is not just people using OpenCode and similar tools.
Coding is a rung on the ladder of model capability. Frontier models will grow to take on more capabilities, while smaller more focused models start becoming the economical choice for coding
They posted charts with logos for Claude and others. You had to read the fine details to realize they weren’t comparing to the latest offerings from those companies. They were counting on you not noticing.
There’s zero reason to compare to old models unless you’re trying to mislead.
Now they show their true colors. They want to train models on our engineering to replace us, while simultaneously giving nothing back? No thanks. I'd rather fund the shitty US hyperscalers. At least that leads to jobs here.
If there's a company willing develop and foster large scale weights in the open, I'll adopt their tooling 100%. It doesn't matter if they're a year behind. Just do it open and build an entire ecosystem on top of it.
The re-AOLization of the internet into thin clients is bullshit, and all it takes is one player to buck the rules to topple the whole house of cards.
Now, is it mildly deceptive because all of the companies using incredibly confusing naming conventions for their models? Maybe!
Apparently that wasn't actually the play here.
For direct user interaction or coding problems, perhaps. But as API calls get cheaper, it becomes more realistic to use them for completely automated workflows against data-sets, or as sub-agents called from expensive SOTA models.
For example, in Claude, using Opus as an orchestrator to call Sonnet sub-agents, is a popular usage "hack." That only gets more powerful, as the Sonnet equivalent model gets cheaper. Now you can spawn entire teams of small specialized sub-agents with small context windows but limited scope.
In other words, like GP said, this Qwen3.6-Plus model is not open-weight unlike the other Qwen models.
- Qwen3.5-Plus
- Qwen3-Max
- Qwen2.5-Max
etc. Nothing really changed so far.
In any case, aside Claude fanboyism, having other plays inch closer to similar performance is always useful. Even if they are "6 months behind" as the pace slows down, this guarantees that there's no huge moat and they'll eventually either get to where the SOTA is, or the difference wont be that big.
I'd rather put fewer eggs in 2-3 big player baskets.
That's a very reasonable stance. It doesn't change the fact that we do have plenty of local models (up to and including Qwen 3.5) that are still quite useful.
I don't think any org doing this is necessarily being deceptive, so long as there's some reasonable basis for the chosen comparable(s).
For example, comparing a new iPhone to a prior Android phone might make sense if the install base is considerably large and Apple is targeting the cohort for user acquisition. (~"These benchmarks are not for you.")
The community will always run the numbers and get the clicks for the benchmarks not filled in by the 1st party. I noticed what appeared to be some movement from Apple in content they've produced to get ahead of this with recent product content.
Qwen is not the only Chinese lab, and the others have shown no change in their commitment to open source. Allegedly Qwen hasn't either if their recent statements are to be believed. They're just hoping to capture market share with *-claw customers before releasing an open weights version. We'll have to wait and see how before they decide to release that.
us actually has laws around this and they arent sharing very much with thr us gov today. china shares 100% as required by law. and neither care much about "how long do i cook eggs for", but they do care about code generation a lot.
I'd prefer them to be open weight, but I'd love to sub a decent competitive coding plan from a European or Chinese provider. Right now they're not quite there. If closing it and charging for it brings them closer to competitive, that's ok.
If the US tech and AI industry long term wants customers and a broad market outside of their own domestic base, they need to reconsider who they are bending the knee to, and how they are defining their policies in relation to the Trump administration.
Bring on the Chinese competition.
Laziness? Lack of time? It's not like the latest generation of the SOTA models were released yesterday.
I did create my own MCP with custom agents that combine several tools into a single one. For example, all WebSearch, WebFetch, Context7 exposed as a single "web research" tool, backed by the cheapest model that passes evaluation. The same for a codebase research
Use it with both Claude and Opencode saves a lot of time and tokens.
There are many simpler tasks that would work fine with a simpler, local model.
However, my hope is that there will be at least somewhat competitive big and open models as well, from an ethical/ideological perspective. These things were trained on data that was provided by people without their consent, so they should at least be be publicly accessible or even public domain.
Almost all means there have been ones before that were not open. So, no contradiction there.
Please send the download link for qwen 3.5-plus.
Also, who cares? If you have the hardware to run a ~400b model i don’t think you count as a home user anymore.
At least from my experience and friends of mine, we use OpenRouter for cases where we want to use smaller LLMs like Qwen, but when I've used ChatGPT and Claude, I use those APIs directly.
I have no affiliation with DeepInfra. I use them, because they host open-source models that are good.
products entirely disappearing or significantly changing will be more and more common in the llm arena as things move forward towards companies shutting down, bubbles deflating, brand priorities drastically reshifting, etc...
i think, we're at or at least close to a time to really put some thought into which pieces of your flow could be done entirely with an open/local model and be honest with ourselves on which pieces of our flow truly needs sota or closed models that may entirely disappear or change. in the long run, putting a little bit of thought into this now will save a lot of headache later.
Seems like a huge waste of money and electricity for processes that can be implemented as a traditional deterministic program. One would hope that tools would identify recurrent jobs that can be turned into simple scripts.
I wouldn't call this totally accurate, especially as of late. What's closer to the truth however is that there's lots of second-rate players in China doing open models, that will be getting a lot more attention from local AI proponents if the big names seriously slow down their AI releases. The local AI scene as a whole is quite healthy.
It's not that, it's about relative risk to your own life. Asking questions about "DEI" for example is much more likely to have adverse effects on your life if you ask Grok or an OpenAI chatbot, though still not that likely.
And the US government has repeatedly shown that it is very interested in collecting all the data available, just like China. In China this is simply done in the open while the US has a veneer of protection for citizens. But where the data collection is forbidden by law they either ignore the law or ask another five eyes member to do the spying and share the results. Both are well documented
But with LLMs, how do you know switching from one to another won’t change some behavior your system was implicitly relying on?
For example: "Here our dataset that contains customer feedback comment fields; look through them, draw out themes, associations, and look for trends." Solving that with a deterministic program isn't a trivial problem, and it is likely cheaper solved via LLM.
I don't know whether you really believe this or it was an off the cuff remark. China is not going to tell you why they plan to arrest you. China is not a benevolent dictatorship.