GLM-5.2 is a step change for open agents

Open weight models from Chinese labs tend to be significantly cheaper.

I think theyre absolutely needed. I can't afford 200 USD a month for personal use of coding AI, and I don't think such prices are reasonable for most of the world economy anyway. Not to mention US firms might be giving their employees a lot more than that.

It's increasingly feeling, to me, that theres a gap building up between haves and have nots. But then, we get news of these open weight models that are reasonably priced in inference with reasonable capabilities. Yes, they take maybe 6-9 months to get there, tbh, that's not a bad trade off at all.

GLM-5.2 has been a step change in how fast i can burn through tokens.

I subscribed to their max plan to try it out. It counted me 700M tokens and drained my weekly quota in under 2 days.

Quota just reset less than 24h ago and i'm already >60% weekly quota usage.

For reference the kind of work i did would have used somewhere between 3% and 5% of Codex max or Claude max.

The model is good, the plan is a scam

I've been working with Deepseek V4 Flash (with opencode as the harness). It's been almost indistinguishable from Codex / Claude Code for me. I'm sure I'll run into problems when I get to a stickier ticket to tackle. But so far, it's been quite good, and I find it writes straightforward code.

I do think the Chinese models are good enough for an 80/20 rule use case.

I signed up to a z.ai max account, $144. Hardly been able to use it as it 429s on most requests. They’re also refusing to refund me.

Can people share their GLM and open model setups in general please? What provider do you use. Why do you trust it with serving full quality? What harness do you use? Why do you trust it not to have malware (most harnessed are TS apps). I am just trying GLM 5.1 from Nvidia build in open code would love to hear how you all do it, thanks.

I know very little about the current state of replacability of Opus but I do sometimes imagine a reality where Opus has been rebuilt as an open model. What plan does Anthropic have when it does happen?

Will they still rent out their own model, will they support the open model and become a resource provider? Will they be able to repay the billions of dollars ?

This is probably the first question I would ask someone from Anthropic, if I ever meet one.

The idea of an open-weight Mythos model is not scary at all. This space is moving so quickly that it'll looked at in 1-2 years as childs play.

It feels like the gap is closing from an intelligence perspective. Or at least doing some kind of log flattening.

Been playing with GLM 5.2 in different contexts. It's less good if you don't max out thinking, but as xhigh it's been able to solve most problems I was throwing at Opus in the about the same amount of time (via OpenRouter).

Wild time to be alive.

Here are the numbers from their bar chart:

    1. SWE-bench Pro
    Model Score (%)
    GLM-5.2 62.1
    GLM-5.1 58.4
    Claude Opus 4.8 69.2
    GPT-5.5 58.6
    Gemini 3.1 Pro 54.2

    2. Terminal-Bench 2.1
    Model Score (%)
    GLM-5.2 81.0
    GLM-5.1 63.5
    Claude Opus 4.8 85.0
    GPT-5.5 84.0
    Gemini 3.1 Pro 74.0
    
    3. NL2Repo
    Model Score (%)
    GLM-5.2 48.9
    GLM-5.1 42.7
    Claude Opus 4.8 69.7
    GPT-5.5 50.7
    Gemini 3.1 Pro 33.4
    
    4. DeepSWE
    Model Score (%)
    GLM-5.2 46.2
    GLM-5.1 18.0
    Claude Opus 4.8 58.0
    GPT-5.5 70.0
    Gemini 3.1 Pro 10.0
    
    5. ProgramBench
    Model Score (%)
    GLM-5.2 63.7
    GLM-5.1 50.9
    Claude Opus 4.8 71.9
    GPT-5.5 70.8
    Gemini 3.1 Pro 39.5
    
    6. MCP-Atlas
    Model Score (%)
    GLM-5.2 77.0
    GLM-5.1 71.8
    Claude Opus 4.8 77.8
    GPT-5.5 75.3
    Gemini 3.1 Pro 69.2
    
    7. Tool-Decathlon
    Model Score (%)
    GLM-5.2 48.2
    GLM-5.1 40.7
    Claude Opus 4.8 59.9
    GPT-5.5 55.6
    Gemini 3.1 Pro 48.8
    
    8. Humanity's Last Exam
    Model Base Score (%) Score w/ Tools (%)
    GLM-5.2 40.5 54.7
    GLM-5.1 31.0 52.3
    Claude Opus 4.8 49.8 57.9
    GPT-5.5 41.4 52.2
    Gemini 3.1 Pro 45.0 51.4

Seems to be handily beating Gemini 3.1 Pro. What _is_ Google DeepMind doing (other than bleeding talent to A\ ) ?

While I agree with the post in its entirety, I think it would have been worth mentioning DeepSeek V4 Flash as well, which, in my view, had already reached a sufficient, if not high-level of agentic coding before GLM 5.2 (see DwarfStar).

if someone has any tutorial on how to run GLM-5.2 from a Rasberry Pi 5 (AI hat), I want it !

I've been using GLM 5.2 recently (company hosted, for non-coding tasks) and it's been strong and reliable. There are areas where GPT 5.5 and Opus 4.x still feel marginally better but only marginally. For most tasks if GLM 5.2 is the only model I have to use I'm productive and happy. This was not true before GLM 5.2. No doubt in my mind that the gap is closing quickly and for most tasks that are not very specialized open models will be usably on par on flagship closed models and have an edge factoring in cost.

For coding I still use 5.5 w/ Codex and prefer that to other models + harness combinations.

I just tested GLM 5.2 out via Z.ai in pi for a little one-off project that was already scoped. It actually did a relatively decent job starting out, and figured important things out from context.

But the reasoning traces became increasingly hilarious, with it getting confused and going in loops, doubting itself. I began to feel almost sad, it was like listening to the internal monologue of someone with anxiety disorder.

It made pretty good progress but wound up going in a lot of goofy loops and doing things a bit "off" from standards I'd hoped it would infer, and finally started going a bit nuts, "This is very confusing.", "OH WAIT", seemingly hallucinating a whole side-quest that didn't make sense and looking at making internal system changes to try to achieve its (now very confused) goal when I pulled the plug.

Without seeing the reasoning traces from Claude/GPT it's hard to really know, but it definitely didn't feel like the same quality of reasoning, even if dogged persistence does wind up actually working eventually.

American AI labs really need to start releasing good open-weight models.

What's the current best for ablation? Specifically chemistry and red-team/netsec?

Honestly, glm is staying quiet close to claude but it can save tons of tokens either than anthropic model

It's by far the most competent open model I've tried yet. It's a bit slower than Claude, but in terms of coding capability it seems to get comparable results at least for the work I'm doing.

5.1 and Qwen 3.6 are great too IMO

A question I always have is, how to the AI labs safeguard the leak of their model? Training a cutting edge model basically cost a minimum of hundreds of millions of dollars. And its all contained within a file. Okay, that file might be 500GB large, but its still just one blob that is worth almost a billion dollars. And they need to train new models every few weeks, have lots of people with access to it to debug it, run inference etc. I wonder when we will see the first leaks? Imagine if e.g. Opus 4.8 got leaked. Wouldnt that bankrupt Anthropic?

Is z.ai

Is 2 better than x.ai

Ive been using glm5 since its release and still prefer it to glm5.1 and so far to glm5.2

Perhaps it is just my harness and workflow, but the older model still seems to work better. Also the token cost is significantly lower. I rarely spend more than $20 a week with $50 cap. Not even half claudes ambiguous minimum $200 a month plan.

Once open Chinese models look like they’re about to overtake closed US models, watch the US government push imperialism hidden behind increasingly hyperbolic national security concerns.

At the end of the day, open weights should be seen as nothing more than information (just more just numbers afterall), and so organisations like the EFF should sue for any restricting of the 1st Amendment

I can't help wondering what kind of models we'll see coming out of China once it gets its own chip fabs up and running. Right now it sounds like the US's export ban is not slowing them down a whole lot.

Open weight models from Chinese labs tend to be significantly cheaper.

I signed up to a z.ai max account, $144. Hardly been able to use it as it 429s on most requests. They’re also refusing to refund me.

Even as a GLM z.ai fan, I wouldn't pay for their plans. They are just way worse values than gpt or anthropic plans, in terms of both usage and capabilities.

Opencode Go subscription has served me well.

Self-promo but you should try our service synthetic.new. We generally have up-to-date open-source LLMs on the sub, and we have GLM-5.2 :) Perf+stability should be wayyy better than zai.

same here. Barely usable due to API connections issues.

And when i can use it, it just drains the quota 5 times faster than codex or claude.

Their plan is a scam

My experience as well unfortunately :(

Pi is great, set it up with a system prompt to give the model more direction and think less, and it crushes anything I give it

Next to my Claude Pro plan, I have subbed to OpenCode Go. I find the OpenCode UX much better than in Claude Code CLI. As for models, I started a few months ago with GLM 5.1 and it was solid and could archive near sonnet-level tasks. It weirdly sputtered out Chinese characters sometimes. Then I switched to Kimi K2.6, which is the Chinese model I used the most until now. It used way too many reasoning tokens (improved in k2.7). But executed Claude created plans reliably. Now I’m back with GLM 5.2 and it’s really solid (among other things it’s good at design) and I get good usage with the $10 plan. Still the Claude models have less hiccups but the Chinese models are getting really close.

> What provider do you use?

1. My own harness + Local (which usually means Qwen3.6-35B-A3B), I use this fairly often for research gathering on topics, info gathering on code bases, etc.

2. My own harness + DeepSeek v4 Flash served by DeepSeek, I added $20 quite some time ago and somehow still have $18.77 in there after I don't know how many prompts. I use this pretty often, slightly less than my local setup, it's great and what I'm planning on running locally (eventually).

3. My own harness + OpenRouter with whichever model I want to try out. I use this very rarely.

4. Pi + OpenAI Codex $20 subscription. I don't use this almost at all anymore, but I keep the Codex subscription for testing things out to see how GPT-5.5 will handle a problem the other setups have issues with.

> Why do you trust it with serving full quality?

The only thing I've noticed seems unbearably useless sometimes versus what I noticed before was GPT-5.5 which has had some of the weirdest degradations I've seen. It's not to Anthropic levels but it definitely had some service issues a few times where I was wondering if they had accidentally (or purposefully) lobotomized it.

Everything else has mostly just been the same, except DeepSeek I noticed had some speed issues a few days ago.

> What harness do you use? Why do you trust it not to have malware (most harnessed are TS apps)?

I pretty much only use my own, agents are trivial to make and it's definitely not hard to make one that's better than Claude Code or Codex for whatever you're doing.

    > What provider do you use.

OpenRouter with pinned DeepSeek provider or OpenCode Go

    > Why do you trust it with serving full quality?

Quality seems good so far.

    > What harness do you use? Why do you trust it not to have malware (most harnessed are TS apps).

I wrote my own. A minimal harness without dependencies is only 65 lines of Python.

I use both the openai subscription and the opencode go subscription. I use the go subscription for my personal work and the openai subscription for my consulting work.

The differences between the models are minimal, but I usually stick with gpt-5.4-mini, gpt-5.4, mimo-pro-2.5, deepseek-v4-pro. These latter ones have way more usage than even using 5.4-mini so I tend to use them in personal projects for that reason.

My harness is https://github.com/can1357/oh-my-pi. I trust it...enough. It updates very frequently so as a safe guard I run it sandboxed with https://github.com/containers/bubblewrap so it can only access the project folder and some whitelisted config files

For work, I mostly use Codex and some Claude. For personal use, I’ve started using Chinese models directly through their respective providers, mostly for automation tasks and experiments so far, either via the API directly or through the Pi harness.

I do not trust any of them. Everything runs inside virtual machines, not just the sandboxes provided by the harnesses. I also do not run Claude or Codex directly on the host machine. Not just because of supply chain fears, but also because of how incredibly user hostile the VC funded companies are when it comes to installing random stuff on your machine.

Synthetic.new and Claude Code using GLM-5.2. Great model, but the harness will error out if using subagents. The base plan only allows one concurrent request at a time. Also, GLM will burn through your weekly quota in a day if you're not precise with your scope.

Local using Qwen3.6-27B; 2xRTX 5070Ti graphics cards; VS Code with Cline at the moment and Ollama back-end (will get to trying the others soon).

GLM 5.2 coding plan- I'll post the agent as soon as I can! But opencode works and their own zcode is really good as well.

I just tested GLM 5.2 out via Z.ai in pi for a little one-off project that was already scoped. It actually did a relatively decent job starting out, and figured important things out from context.

I think the self-doubt might actually be a very crucial part of it's capability. I often feel compelled to interrupt when I'm watching it think (which thank the stars it let's us do, unlike the big American models!!), but usually it makes the right pick!

Being willing and able to reconsider seems very good. Going around and around, pulling in more thinking, integrating it: maybe that's why it is as good as it's good.

I want to emphasize again how excellent it is that we can see the thinking. I think this makes GLM so much better an experience for me. It gives me such insight into what is being considered, helps me see where things go wrong. It grounds me, gives me the notion of where the results come from. It was so jarring to switch to GPT and Opus and find that they won't discuss with me, won't reveal their thinking: that feels fundamentally unsafe, for me, for society, to have such a severe black box. I don't think it should be allowed, honestly.

Many thanks to this recent submission, which is the first time I've seen anyone blog about this core difference: The text in Claude Code’s “Extended Thinking” output is not authentic. https://patrickmccanna.net/the-text-in-claude-codes-extended... https://news.ycombinator.com/item?id=48630535

Ive been using glm5 since its release and still prefer it to glm5.1 and so far to glm5.2

You made me realize something. I routinely spend upwards of 500$ per month on LLMs for coding (expensed towards clients). However I live in a place where 500$ is around the avg. salary. I’m lucky that I know my way around western clients. Clients who pay these expenses and are happy to work with me because I am still about 50% cheaper than local talent in EU/US, while my salary at home converts to an upper class income at the highest tax bracket.

Which of course causes some unfairness on both ends. Nobody here can compete with me. I often use left over tokens on local client projects; which despite lower pay, still pays off because they now take hours not days or weeks to complete. And nobody in the local clients talent pool can compete with me; unless they charge about half the market rate.

Take away my 500$ monthly grant; and I’d be more or less screwed. Better open models will more or less start to reduce this advantage. It’s not like I positioned myself here on purpose. But it’s definitely a „right place, right time“ situation.

As much as I don't like Mark Zuckerberg, part of me wishes he would get his head in the game and compete with these models, he's literally got all the capability to do so, and he could easily sell the model through deals with GCP, AWS, and Azure. Hell, Amazon needs a hot model they can host that's exclusive to them I feel like, maybe he can work something out with them, whatever the case, it seems so glaringly obvious to me, I'm not sure why he hasn't taken a stab at competing with Claude Code or at least frontier open models and then cutting a deal with cloud providers to recoup the costs of maintaining said models.

He's sitting on a frontier model letting it burn a hole in his wallet that could actually pay for itself.

If we can agree that the AI model is at least as capable as a junior engineer or new contractor, how’s that different to saying “software engineering isn’t worth $200 a month”?

Has a very race-to-the-bottom feel to it.

Though in the grand scheme of it, $200/mo probably isn’t the real price either. Also looking at it not just in a vacuum - paying for a product that can change what you get from under you doesn’t seem great anyway.

At least with a locally-hosted model you know what you’re getting.

DeepSeek through their own API has saved me tons of tokens honestly. Even though it is not as smart as Kimi or Claude, their level of entry is very low with a top up of 2$ and Pay as you go compared to the subscription of Claude or 20$ top up of Kimi

Someone else on this forum put it well, U.S. is trying to achieve AGI at all costs, while Chinese models are seeking widespread adoption.

Significantly cheaper than comparable models if you are using openrouter [0]. Just yesterday I spent roughly 13 cents centering some divs using Deepseek in a personal project. It would have been north of $1 to do that with a US frontier model.

0. https://openrouter.ai/compare/z-ai/glm-5.2/anthropic/claude-...

The tokens cost the same everywhere on earth. This does hurt some cost advantages of outsourcing when tokens start to become a bigger part of development costs.

I read these stories and I can never figure out how people are managing to use these $200 plans. If I really go full bore, I can sometimes max out the $20 plan. Even then, it already produces more code than I can reasonably review and merge.

> It's increasingly feeling, to me, that theres a gap building up between haves and have nots.

People speak of a permanent underclass.

https://www.nytimes.com/2026/04/30/opinion/ai-labor-work-for...

With open weight models there is true inference competition. Whoever can serve the model at the lowest price. And the consumer wins. Capitalism, served by China.

Even as a GLM z.ai fan, I wouldn't pay for their plans. They are just way worse values than gpt or anthropic plans, in terms of both usage and capabilities.

The reasoning traces always look terrible and they’re frustrating to watch. It’s the same with Kimi. What’s interesting is that the end result is then good. I think it’s just some sort of devils advocate trick to get better output.

I have a hilarious theory why GLM (and Kimi) have this thinkslop,

apparently Chinese language as token is more information dense than English, so having these wasteful thinkslop in Mandarin isnt that damaging. So the developer focus mostly in Mandarin and didnt think of handling these thinkslop while American AI labs do.

Being willing and able to reconsider seems very good. Going around and around, pulling in more thinking, integrating it: maybe that's why it is as good as it's good.

Now that's a tremendous pointer, I'm going to have to try that.

Do you full on let GLM5 get stuff done on its own or is it more like a guided workflow? The former's what the point releases doubled down on and is also something that uses a lot of juice.

>Right now it sounds like the US's export ban is not slowing them down a whole lot.

Just costing them a lot more money as they pay multiples more buying on the underground grey market.

There does not seem to be a big penalty for going slow anyways. People seem to just switch on cost as soon as a model can do a task well enough. There do not seem to be strong network effects or vendor lock in.

Seems to me that going slow is the better long term tactic. China can just let the USA pay the high R&D costs to figure out what works, then just copy what works.

> Right now it sounds like the US's export ban is not slowing them down a whole lot.

It may wind up being a massive boost to them in the long run, even.

Necessity is the mother of invention.

With subsidization from the Chinese government they will probably be equal to or better than the models here. I mean, have you looked at the author list of any given AI paper published within, say, the past 5 years? I wouldn't be surprised if half or more AI researches are from China.

If we can agree that the AI model is at least as capable as a junior engineer or new contractor, how’s that different to saying “software engineering isn’t worth $200 a month”?

Has a very race-to-the-bottom feel to it.

At least with a locally-hosted model you know what you’re getting.

The appropriate price is what the output is worth to you. Some people could pay $10,000/month, some $5 and feel like they were breaking even. There is a big jump between convenience and curiosity uses versus business critical.

OpenAI already charges enterprise users a premium purely for that title over on-demand, no-contract usage. Retail users get a good deal. People make a lot of hay about subsidies but this is a very sane approach if you want exposure to these three different types of customers.

Yeah. There's no way to verify what these providers are doing. The real future is running these models at home. Opus level inference on our own hardware would be a dream come true.

Someone else on this forum put it well, U.S. is trying to achieve AGI at all costs, while Chinese models are seeking widespread adoption.

> U.S. is trying to achieve AGI at all costs

If that was true, they would be collaborating with each other and opening up all the results from their work.

I don't think anthropic/openai/google aren't also seeing widespread adoption. In fact they already have they already have the marketshare.

None of the AI companies in the US are on the path to AGI. They are, however, on the path to claiming they have AGI, then subsequently not releasing it and only giving it to the US government to make drones that can bomb the homes of political dissidents.

Everyone wants widespread adoption, of course. I'm sure that China is also working on more expensive frontier intelligence models behind doors, but they're lagging behind America on that front. Going for cost-optimized open weight models is their bet to stay relevant in a market where they can't compete for the "luxury" segment. It is important for them to get a foot in the door and maintain a presence in the press to attract future customers, given the general animosity towards China in the west that they need to overcome. Similarly, European providers like Mistral are hopelessly outclassed in every respect and thus try to carve out a niche in the market with regulation and anti-American fearmongering. They position themselves as "privacy-conscious" not out of goodwill but because it is their only chance to survive as a company with an utterly inferior product.

For personal use I’m considering using the frontier models from openai or anthropic to create a plan with research and brainstorming etc with enough details for cheap models to be able to follow (glm, deepseek etc) - with openrouter - will monitor how cheap and effective that turns out to be.

0. https://openrouter.ai/compare/z-ai/glm-5.2/anthropic/claude-...

For centering divs the free models opencode offers can easily handle that work. DeepSeek V4 Flash is pretty decent.

Now that's a tremendous pointer, I'm going to have to try that.

Do you full on let GLM5 get stuff done on its own or is it more like a guided workflow? The former's what the point releases doubled down on and is also something that uses a lot of juice.

My experience as well unfortunately :(

GLM 5.2 coding plan- I'll post the agent as soon as I can! But opencode works and their own zcode is really good as well.

Your post made me laugh because I experienced the same as you but the other way around. I switched from Claude to a multi model harness a couple of days ago and the first model I tried was GLM5.2.

I gave it some simple code porting exercises and watched dumbfounded at the reasoning, which was more like the ravings of a lunatic - but lo and behold, after much confusion and a dizzying number of eureka moments the task was completed very successfully.

I tried Kimi on a similar task, much faster, a little more reassuring somehow in its ramblings, also surprisingly good results.

To be clear, I’m not surprised the results were good because they’re not GPT or Claude, but because the line of reasoning was so bonkers. Coming from Claude, I was just not used to seeing this, but I’ll bet it’s just as nuts with the frontier models and we’re just not allowed to see it (I’m about to read the links you shared).

Agree wholeheartedly that transparency is of grave importance.

> Right now it sounds like the US's export ban is not slowing them down a whole lot.

It may wind up being a massive boost to them in the long run, even.

Necessity is the mother of invention.

If this pans out, you're not at all kidding: https://www.youtube.com/watch?v=8ekndZwyOzo

Trump allowed more advanced chips (H200s) to be sold after his visit, because some people in the admin still believe the US can "addict" China to the hardware. It seems China is only letting a token few in, the ban is more on their side now, as Xi really wants indiginous capability.

I use both the openai subscription and the opencode go subscription. I use the go subscription for my personal work and the openai subscription for my consulting work.

Thanks. I was looking at open code go yesterday and I couldn't figure out if the base pricing is including usage or if that's just base pricing and then you have to pay for usage too. How does it work? It is very cheap.

You should try out the cheaper models first. I find Deepseek v4 models pretty comparable to sonnet 4.6 but at a fraction of the cost. You might find you just don't need to use the American models at all.

If this pans out, you're not at all kidding: https://www.youtube.com/watch?v=8ekndZwyOzo

For coding I still use 5.5 w/ Codex and prefer that to other models + harness combinations.

Is z.ai

Is 2 better than x.ai

OpenCode Go looked intriguing and I spent time reading their docs and pricing but didn’t purchase services. Do you think they are running it at a loss to get market share? (Probably not.) I have been happy buying tokens directly from DeepSeek (I am retired and everything I do is open source code and writing open content books (the manuscript files are available along with the source code) so I have no privacy issues). I also use FireWorks.ai to try different models. Both API services are excellent, but I may try OpenCode Go for a month or two to support the devs of OpenCode.

Seconding the recommendation to use Deepseek directly via the API. I've burnt 287 million tokens in the last couple of days, costing me a whopping $5.77 USD.

For my case Openrouter breaks Deepseek caching and charges me multiple times over what I pay for Deepseek's API, with 2$ I was able to get around 120M tokens from deepseek easily when Openrouter could only barely do 250k

I call this the reviewer/implementer pattern.. Opus for planning then ds4/qwen/kimi for.implementation then opus for PR review

Your post made me laugh because I experienced the same as you but the other way around. I switched from Claude to a multi model harness a couple of days ago and the first model I tried was GLM5.2.

I tried Kimi on a similar task, much faster, a little more reassuring somehow in its ramblings, also surprisingly good results.

Agree wholeheartedly that transparency is of grave importance.

If you look at the "thinking" traces as ways of expressions of uncertainty rather than literal thinking they make more sense.

Consider debugging - you start off in one place, think you have worked out what is happening, and then there is a "oh but what about xxx" thing that happens and you explore another branch. Then you "have it for sure" until you find another edge case.

The LLM is doing something analogous. It's writing circuits to try to emulate your program. Each time it gets one that seems right it is very sure that circuit is correct, but then it finds another thing.

At any point you can stop and go "write code now" and it will, and the code will seems fine provided it hasn't hit one of these edge cases.

Turning up thinking time is literally forcing more exploration.

The words that come out are amusingly dramatic, but... TBH when I debug I often are like "WTF" and throwing my hands up in the air at some gotcha I didn't expect.

Yeah isn't that thinking weird?

Now I see the issue clearly! But wait... now I have the full picture! But wait... Found it!

I gave up a few times because of it at first until I realized I just had to let GLM get on with it and what came out was great!

But once it was outright endearing- challenging bug, it said: I have been very thorough. Then it escalated where to look and aced it. Built in confucian values