Something is afoot in the land of Qwen

I really hope this doesn't hinder development too much. As Simon says, Qwen3.5 is very impressive.

I've been testing Qwen3.5-35B-A3B over the past couple of days and it's a very impressive model. It's the most capable agentic coding model I've tested at that size by far. I've had it writing Rust and Elixir via the Pi harness and found that it's very capable of handling well defined tasks with minimal steering from me. I tell it to write tests and it writes sane ones ensuring they pass without cheating. It handles the loop of responding to test and compiler errors while pushing towards its goal very well.

I wonder if an american company poached one/all of them. They've been pretty much bleeding edge of open models and would not surprise me if Amazon or Google snatched them up

There has been tension between Qwen's research team and Alibaba's product team, say the Qwen App. And recently, Alibaba tried to impose DAU as a KPI. It's understandable that a company like Alibaba would force a change of product strategy for any number of reasons. What puzzled me is why they would push out the key members of their research team. Didn't the industry have a shortage of model researchers and builders?

Getting a bit of whiplash goin from AI is replacing people, to AI is dead without (these specific) people. Surely we're far enough ahead that AI can take it from here?

Wild times!

I would second that Qwen3.5 is exceptionally good. In a calibration, it (35b variant) was running locally with Ada NextGen 24GB to do the same things with easy-llm-cli in comparison with gemini-cli + Gemini 3 Pro, they were at par … really impressive it ran pretty fast …

I'm hopeful they will pick up their work elsewhere and continue on this great fight for competitive open weight models.

To be honest, it's sort of what I expected governments to be funding right now, but I suppose Chinese companies are a close second.

I tried the new qwen model in Codex CLI and in Roo Code and I found it to be pretty bad. For instance I told it I wanted a new vite app and it just started writing all the files from scratch (which didn’t work) rather than using the vite CLI tool.

Is there a better agentic coding harness people are using for these models? Based on my experience I can definitely believe the claims that these models are overfit to Evals and not broadly capable.

I am singularly impressed by 35B/A3, hope that is not the reason he had to leave.

Does anyone know when the small Qwen 3.5 models are going to be on OpenRouter?

My conspiracy theory hat is that somehow investors with a stake in openai as well is sabotaging, like they did when kicking emad out of stabilityai

> me stepping down. bye my beloved qwen.

the qwen is dead, long live the qwen.

inb4 qwen is less of a supply chain risk than anthropic

what the hell, their models were promising tho

Were they kneecapped by Anthropic blocking their distillation attempts?

I'm hopeful they will pick up their work elsewhere and continue on this great fight for competitive open weight models.

To be honest, it's sort of what I expected governments to be funding right now, but I suppose Chinese companies are a close second.

inb4 qwen is less of a supply chain risk than anthropic

I am singularly impressed by 35B/A3, hope that is not the reason he had to leave.

> me stepping down. bye my beloved qwen.

the qwen is dead, long live the qwen.

what the hell, their models were promising tho

I wonder if an american company poached one/all of them. They've been pretty much bleeding edge of open models and would not surprise me if Amazon or Google snatched them up

It would surprise me if they're willing to come to the US in the setting of the current DHS and ICE situation.

Perhaps they wanted future Qwen models to be closed and proprietary, and the authors couldn't abide by that.

q4 quant gives you 175 tg and 7K pp, beats most cloud providers

I really hope this doesn't hinder development too much. As Simon says, Qwen3.5 is very impressive.

I've been playing with 3.5:122b on a GH200 the past few days for rust/react/ts, and while it's clearly sub-Sonnet, with tight descriptions it can get small-medium tasks done OK - as well as Sonnet if the scope is small.

The main quirk I've found is that it has a tendency to decide halfway through following my detailed instructions that it would be "simpler" to just... not do what I asked, and I find it has stripped all the preliminary support infrastructure for the new feature out of the code.

I've been testing the same with some rust, and it's has spent a fair bit of time going through an infinite seeming loop before finally unjamming itself. It seems a little more likely to jam up than some other models I've experimented with.

It's also driving itself crazy with deadpool & deadpool-r2d2 that it chose during planning phase.

That said, it does seem to be doing a very good job in general, the code it has created is mostly sane other than this fuss over the database layer, which I suspect I'll have to intervene on. It's certainly doing a better job than other models I'm able to self-host so far.

I've had even better results using the dense 27B model -- less looping and churning on problems

Are you running it locally with llama.cpp? If so, is it working without any tweaking of the chat template? The tool calls fail for me when using the default chat template, however it seems to work a whole lot better with this: https://huggingface.co/Qwen/Qwen3.5-35B-A3B/discussions/9#69...

What hardware do you have it running on? Do you feel you could replace the frontier models with it for everyday coding? Would/will you?

What hardware are you running this on?

what's your take between Qwen3.5-35B-A3B and Qwen3-Coder-Next?

What is the meaning of 'A3B'?

Is there a better agentic coding harness people are using for these models? Based on my experience I can definitely believe the claims that these models are overfit to Evals and not broadly capable.

I've noticed that open weight models tend to hesitate to use tools or commands unless they appeared often in the training or you tell them very explicitly to do so in your AGENTS.md or prompt.

They also struggle at translating very broad requirements to a set of steps that I find acceptable. Planning helps a lot.

Regarding the harness, I have no idea how much they differ but I seem to have more luck with https://pi.dev than OpenCode. I think the minimalism of Pi meshes better with the limited capabilities of open models.

Have frontier lab do the plan which is the most time consuming part anyways and then local llm do the implementation. Frontier model can orchestrate your tickets, write a plan for them and dispatch local llm agents to implement at about 180 tokens/s, vllm can probably ,manage something like 25 concurrent sessions on RTX 6000 Do it all in a worktrees and then have frontier model do the review and merge. I am just a retired hobbyist but that's my approach, I run everything through gitea issues, each issue gets launched by orchestrator in a new tmux window and two main agents (implementer and reviewer get their own panes so I can see what's going on). I think claude code now has this aspect also somewhat streamlined but I have seen no need to change up my approach yet since I am just a retired hobbyist tinkering on my personal projects. Also right now I just use claude code subagents but have been thinking of trying to replace them with some of these Qwen 3.5 models because they do seem cpable and I have the hardware to run them.

What is "the new qwen model"? There are a dozen and you can get them in a dozen different quantizations (or more) which are of different quality each.

In my experience Qwen3.5/Qwen3-Coder-Next perform best in their own harness, Qwen-Code. You can also crib the system prompt and tool definitions from there though. Though caveat, despite the Qwen models being the state of the art for local models they are like a year behind anything you can pay for commercially so asking for it to build a new app from scratch might be a bit much.

Getting a bit of whiplash goin from AI is replacing people, to AI is dead without (these specific) people. Surely we're far enough ahead that AI can take it from here?

Wild times!

Anthropic has one nine of uptime right now. One.

https://status.claude.com/

If AI could effectively replace people, you wouldn’t need CEOs to keep trying to convince people.

Who is suggesting "AI is dead without (these specific) people"? People are wondering what it means specifically for the Qwen model family.

We've gone from AGI goals to short-term thinking via Ads. That puts things better in perspective, I think.

Claude is incapable of producing a native application for itself, and is bad enough with web ones to justify Anthropic acquiring Bun.

Were they kneecapped by Anthropic blocking their distillation attempts?

What Anthropic was complaining about is training on mass-elicited chat logs. It is very much a ToS violation (you aren't allowed to exploit the service for the purpose of building a competitor) so the complaint is well-founded but (1) it's not "distillation" properly understood; it can only feasibly extract the same kind of narrow knowledge you'd read out from chat logs, perhaps including primitive "let's think step by step" output (which are not true fine-tuned reasoning tokens); because you have no access to the actual weights; and (2) it's something Western AI firms are very much believed to do to one another and to Chinese models all the time anyway. Hence the brouhaha about Western models claiming to be DeepSeek when they answer in Chinese.

My conspiracy theory hat is that somehow investors with a stake in openai as well is sabotaging, like they did when kicking emad out of stabilityai

More likely some high ranking party member's nepobaby from Gemini sniffed success with Qwen and the original folks just walked away as their reward disappeared.

apples v.s. oranges. The later is true, Emad did get sabotaged (for not being able to raise money in time, about 8-month before he's leaving). Junyang didn't have that long arc of incidents.

Does anyone know when the small Qwen 3.5 models are going to be on OpenRouter?

Perhaps they wanted future Qwen models to be closed and proprietary, and the authors couldn't abide by that.

It would surprise me if they're willing to come to the US in the setting of the current DHS and ICE situation.

q4 quant gives you 175 tg and 7K pp, beats most cloud providers

I've had even better results using the dense 27B model -- less looping and churning on problems

What hardware are you running this on?

What is "the new qwen model"? There are a dozen and you can get them in a dozen different quantizations (or more) which are of different quality each.

4th March 2026

I’m behind on writing about Qwen 3.5, a truly remarkable family of open weight models released by Alibaba’s Qwen team over the past few weeks. I’m hoping that the 3.5 family doesn’t turn out to be Qwen’s swan song, seeing as that team has had some very high profile departures in the past 24 hours.

It all started with this tweet from Junyang Lin (@JustinLin610):

me stepping down. bye my beloved qwen.

Junyang Lin was the lead researcher building Qwen, and was key to releasing their open weight models from 2024 onwards.

As far as I can tell a trigger for this resignation was a re-org within Alibaba where a new researcher hired from Google’s Gemini team was put in charge of Qwen, but I’ve not confirmed that detail.

More information is available in this article from 36kr.com. Here’s Wikipedia on 36Kr confirming that it’s a credible media source established in 2010 with a good track record reporting on the Chinese technology industry.

The article is in Chinese—here are some quotes translated via Google Translate:

At approximately 1:00 PM Beijing time on March 4th, Tongyi Lab held an emergency All Hands meeting, where Alibaba Group CEO Wu Yongming frankly told Qianwen employees.

Twelve hours ago (at 0:11 AM Beijing time on March 4th), Lin Junyang, the technical lead for Alibaba’s Qwen Big Data Model, suddenly announced his resignation on X. Lin Junyang was a key figure in promoting Alibaba’s open-source AI models and one of Alibaba’s youngest P10 employees. Amidst the industry uproar, many members of Qwen were also unable to accept the sudden departure of their team’s key figure.

“Given far fewer resources than competitors, Junyang’s leadership is one of the core factors in achieving today’s results,” multiple Qianwen members told 36Kr. [...]

Regarding Lin Junyang’s whereabouts, no new conclusions were reached at the meeting. However, around 2 PM, Lin Junyang posted again on his WeChat Moments, stating, “Brothers of Qwen, continue as originally planned, no problem,” without explicitly confirming whether he would return. [...]

That piece also lists several other key members who have apparently resigned:

With Lin Junyang’s departure, several other Qwen members also announced their departure, including core leaders responsible for various sub-areas of Qwen models, such as:

Binyuan Hui: Lead Qwen code development, principal of the Qwen-Coder series models, responsible for the entire agent training process from pre-training to post-training, and recently involved in robotics research.

Bowen Yu: Lead Qwen post-training research, graduated from the University of Chinese Academy of Sciences, leading the development of the Qwen-Instruct series models.

Kaixin Li: Core contributor to Qwen 3.5/VL/Coder, PhD from the National University of Singapore.

Besides the aforementioned individuals, many young researchers also resigned on the same day.

Based on the above it looks to me like everything is still very much up in the air. The presence of Alibaba’s CEO at the “emergency All Hands meeting” suggests that the company understands the significance of these resignations and may yet retain some of the departing talent.

Qwen 3.5 is exceptional

This story hits particularly hard right now because the Qwen 3.5 models appear to be exceptionally good.

I’ve not spent enough time with them yet but the scale of the new model family is impressive. They started with Qwen3.5-397B-A17B on February 17th—an 807GB model—and then followed with a flurry of smaller siblings in 122B, 35B, 27B, 9B, 4B, 2B, 0.8B sizes.

I’m hearing positive noises about the 27B and 35B models for coding tasks that still fit on a 32GB/64GB Mac, and I’ve tried the 9B, 4B and 2B models and found them to be notably effective considering their tiny sizes. That 2B model is just 4.57GB—or as small as 1.27GB quantized—and is a full reasoning and multi-modal (vision) model.

It would be a real tragedy if the Qwen team were to disband now, given their proven track record in continuing to find new ways to get high quality results out of smaller and smaller models.

If those core Qwen team members either start something new or join another research lab I’m excited to see what they do next.

It's also driving itself crazy with deadpool & deadpool-r2d2 that it chose during planning phase.

> it's has spent a fair bit of time going through an infinite seeming loop before finally unjamming itself.

I think this is part of the model’s success. It’s cheap enough that we’re all willing to let it run for extremely long times. It takes advantage of that by being tenacious. In my experience it will just keep trying things relentlessly until eventually something works.

The downside is that it’s more likely to arrive at a solution that solves the problem I asked but does it in a terribly hacky way. It reminds me of some of the junior devs I’ve worked with who trial and error their way into tests passing.

I frequently have to reset it and start it over with extra guidance. It’s not going to be touching any of my serious projects for these reasons but it’s fun to play with on the side.

Some of the early quants had issues with tool calling and looping. So you might want to check that you're running the latest version / recommended settings.

> and it's has spent a fair bit of time going through an infinite seeming loop before finally unjamming itself

I can live with this on my own hardware. Where Opus4.6 has developed this tendency to where it will happily chew through the entire 5-hour allowance on the first instruction going in endless circles. I’ve stopped using it for anything except the extreme planning now.

I don't know much about how these models are trained, but is this behavior intentional (ie, the people pulling the levers knew that this is how it would end up), or is it emergent (ie, pulling the levers to see what happens)?

> to decide halfway through following my detailed instructions that it would be "simpler" to just... not do what I asked

That's likely coming from the 3:1 ratio of linear to quadratic attention usage. The latest DeepSeek also suffers from it which the original R1 never exhibited.

That sounds awfully similar to what Opus 4.6 does on my tasks sometimes.

> Blah blah blah (second guesses its own reasoning half a dozen times then goes). Actually, it would be a simpler to just ...

Specifically on Antigravity, I've noticed it doing that trying to "save time" to stay within some artificial deadline.

It might have something to do with the system messages and the reinforcement/realignment messages that are interwoven into the context (but never displayed to end-users) to keep the agents on task.

I've seen behavior like that when the model wasn't being served with sufficiently sized context window

> that it would be "simpler" to just... not do what I asked

That sounds too close to what I feel on some days xD

Turn down the temperature and you’ll see less “simpler” short cuts.

> The main quirk I've found is that it has a tendency to decide halfway through following my detailed instructions that it would be "simpler" to just... not do what I asked,

This is my experience with the Qwen3-Next and Qwen3.5 models, too.

I can prompt with strict instructions saying "** DO NOT..." and it follows them for a few iterations. Then it has a realization that it would be simpler to just do the thing I told it not to do, which leads it to the dead end I was trying to avoid.

Have you tried the '--jinja' flag in llama-server?

What hardware do you have it running on? Do you feel you could replace the frontier models with it for everyday coding? Would/will you?

Around 20ish tokens a second with 6-bit quant at very long context lengths on my AMD AI Max 395+

I’m trying to use local models whenever possible. Still need to lean on the frontier models sometimes.

60 to 70 on a 5080, but only tinkering for now. The smaller models seem exceptionally good for what they are, and some can even do OCR reliably.

I'm getting ~30 tok/s on the A3B model with my 3070 Ti and 32k context.

> Do you feel you could replace the frontier models with it for everyday coding? Would/will you?

Probably not yet, but it's really good at composing shell commands. For scripting or one-liner generation, the A3B is really good. The web development skills are markedly better than Qwen's prior models in this parameter range, too.

what's your take between Qwen3.5-35B-A3B and Qwen3-Coder-Next?

In my experience Qwen3.5 is better even at smaller distillations. From what I understand the Qwen3-next series of models was just a test/preview of the architectural changes underpinning Qwen3.5. So Qwen3.5 is a more complete and well trained version of those models.

In my experience qwen 3 coder next is better. I ran quite a few tests yesterday and it was much better at utilizing tool calls properly and understanding complex code. For its size though 3.5 35B was very impressive. coder next is an 80b model so i think its just a size thing - also for whatever reason coder next is faster on my machine. Only model that is competitive in speed is GLM 4.7 flash

We don't have a Qwen3.5-Coder to compare with, but there is a chart comparing Qwen3.5 to Qwen3 including Qwen3-Next[0].

[0] https://www.reddit.com/r/LocalLLaMA/comments/1rivckt/visuali...

What is the meaning of 'A3B'?

It's the number of active parameters for a Mixture of Experts (misleading name IMO) model.

Qwen3.5-35B-A3B means that the model itself consists of 35 billion floating point numbers - very roughly 35GB of data - which are all loaded into memory at once.

But... on any given pass through the model weights only 3 billion of those parameters are "active" aka have matrix arithmetic applied against them.

This speeds up inference considerably because the computer has to do less operations for each token that is processed. It still needs the full amount of memory though as the 3B active it uses are likely different on every iteration.

I've noticed that open weight models tend to hesitate to use tools or commands unless they appeared often in the training or you tell them very explicitly to do so in your AGENTS.md or prompt.

They also struggle at translating very broad requirements to a set of steps that I find acceptable. Planning helps a lot.

+1 to this, anecdotally I’ve found in my own evaluations that if your system prompt doesn’t explicitly declare how to invoke a tool and e.g. describe what each tool does, most models I’ve tried fail to call tools or will try to call them but not necessarily use the right format. With the right prompt meanwhile, even weak models shoot up in eval accuracy.