LE: Someone said this is how the tiers are now counted:
"Essentially if old plus is 1x then new limits are: Plus - 0.3x Pro $100 - 1.5x Pro $200 - 6x (unchanged)"
5x=$100 20x=$200
>Our existing $200 Pro tier still remains our highest usage option.
And that includes usage of the API with any agent without risking being banned. OpenAI is also very supportive of open source software.
I'm using GPT-5.4 with Swival (https://swival.dev) for a while, alongside local models, and it's absolutely fantastic.
For my money, on the code side at least, GitHub Copilot on VSCode is still the most cost effective option, 10 bucks for 300 requests gets me all I need, especially when I use OpenAI models which are counted as 1x vs Opus which is 3x. I've stopped using all other tools like Claude Code etc.
Xhigh will gather all the necessary context. low gathers the minimum necessary context.
That doesn’t work as well with me for Opus. Even at max effort it’ll overlook files necessary to understanding implementations. It’s really annoying when you point that out and you get hit with an”you’re absolutely right”.
Codex isn’t the greatest one shot horse in the race but, once you figure out how to harness it, it’s hard to go back to other models.
Same for the $200 plan, it's still 2x its normal usage until that date.
For me, they are just a means to an end and disposable.
The population on Hacker News heavily skewed towards tech workers so I wouldn't draw a conclusion from that.
I wouldn't mistake this for any kind of capability plateau. There is a massive push towards making transformers the engine of humanoid (and other kinds of) robotics, we just haven't reached the hype moment for those yet.
And everyone serious uses the API rate billing anyway.
Opus 4.6 is the L5 new hire SWE keen to prove their chops and quickly turn out totally reasonable code with putatively defensible reasons for doing it that way (that are sometimes tragically wrong) and then catch an after-work yoga class with you.
Regarding speed, I don't use xhigh that often, and surprisingly for me GPT 5.4 high is faster than Claude 4.6 Opus high (unless you enable fast mode for Opus).
Of course I still use Opus for frontend, for some small scripts, and for criticizing GPT's code style, especially in Python (getattr).
I use both OpenAI and Anthropic models, though for different purposes, what surprises me is how underrated GPT still feels (or, alternatively, how overhyped Anthropic models can be) given how capable it is in these scenarios. There also seems to be relatively little recognition of this in the broader community (like your recent YouTube video). My guess is that demand skews toward general codegen rather than the kind of deep debugging and systems work where these differences really show.
Claude on the other hand can be creative. It understands that examples are for reference purposes only. But there are times it decides to off on a tangent on its own and decide not to follow instructions closely. I find it useful for bouncing off ideas or test something new,
The other thing I notice is Claude has slightly better UI design sensibilities even if you don’t give instructions. GPT on the other hand needs instructions otherwise every UI element will be so huge you need to double scroll to find buttons.
Not to mention a billion times more usage than you get with claude, dollar for dollar.
GPT doesn't know how to get creative, you need to tell it exactly what to do and what code you want it to write.
For Claude you can be more general and it will look up solutions for you outside of the scope you gave it.
I presonaly prefer Claude.
Personally, it seems like I have to redirect Opus/Sonnet much less often. GPT felt pretty "dense", it was more likely to ignore earlier instructions in the session, I had to remind it more often, and when I reviewed the code it produced I had to make more corrections that seemed obvious.
Entirely subjective, but I also find I prefer Claude's "personality" to ChatGPT, but I couldn't point to any specific differences.
My guess is that 5.5 will come out soon and be significantly better so you'd want to be using Codex then, but then when Opus 5 comes out probably back to claude code
Also 5.4 has fast mode, and higher usage limits since it's cheaper
In general I view VS Code and VS.NET Community + SQL Server free universe as the most effective option :) I think these products are great actually.
On the other hand, the benchmark of Plus usage seems to be to be all over the place, so it’s difficult to say now how does the usage compare to the old Pro.
Both Pro plans include the same core capabilities. The main difference is usage allowance: Pro $100 unlocks 5x higher usage than Plus (and 10x Codex usage vs. Plus for a limited time), while Pro $200 unlocks 20x usage than Plus.
From their faq
20x more usage than in Plus is 200$
I see this when I try to upgrade my Plus subscription.
With this new VIP membership that comes with 5x or 20x usage, if you spend $100 you get 5x. $200 you get 20x and you get to spin the wheel and use the slot machines unlimited times even at peak hours more than most without any restrictions, 24/7, no waiting for hours with priority.
So spend more to get more abundance and more simultaneous spins at the wheel.
Except if you're trying to abuse the slot machines themselves or sharing or reselling your membership to other customers who want a spin at the roulette wheel; but were previously banned. [0]
[0] https://help.openai.com/en/articles/9793128-about-chatgpt-pr...
Problem is that the fuel to get this train going relies on investors money. Investors aren't going to be happy with the quote I took from your message.
And that's the real bet really, can the industry turn the spark into fire before the investor money runs out?
This myth about the inferiority of ChatGPT and Codex is becoming a meme.
I have active subscriptions to both. I am throwing at Codex all kinds of data engineering, web development and machine learning problems, have been working on non-tech tasks in the "Karpathy Obsidian Wiki" [1] style before he posted about it.
Not only does Codex crush Claude on cost, it's also significantly better at adherence and overall quality. Claude is there on my Mac, gathering dust, to the point I am thinking of not renewing the sub.
There are plenty of fellow HNers here who feel the same from what I read in the flamewars. I suspect none of us really has a horse in this race and many are half-competent (in other threads, they mention they do things like embedded programming, distributed DL systems, etc.)
I'm starting to suspect a vast majority of people pushing the narrative that Claude is vastly better haven't even tried the 5.3 / 5.4 models and are doing it out of sheer tribalism.
[1] https://gist.github.com/karpathy/442a6bf555914893e9891c11519...
That's cute, but do you mean something concrete with this, aka are there some non-coding prompting you use it for that you're referring to with that or is it simply a throwaway line about L5 SWEs (at a FAANG).
(FWIW, I find myself using ChatGPT for non-coding prompting for some reason, like random questions like if oil is fungible and not Claude, for some reason.)
I do turn to Anthropic for ideation and non-tech things. But I find little reason to use it over codex for engineering tasks. Sometimes for planning, but even there, 5.4 is more critical of my questionable ideas, and will often come up with simpler ways to do things (especially when prompted), which I appreciate.
And I don't do hard-tech things! I've chosen a b2b field where I can provide competent products for a niche that is underserved and where long term relationships matter, simply because I'm not some brilliant engineer who can completely reinvent how something is done. I'm not writing kernels or complex ML stacks. So I don't really understand what everyone is building where they don't see the limits of Opus. Maybe small greenfield projects with few users.
Claude is noticeably poor for my use case on this particular issue. That said, I imagine I’m not alone in refusing to continue paying OpenAI. We’re in for a wild ride.
It's probably never been the case that plurality of views meant anything since online is a bubble to begin with, filtered by endless biases wherever we happen to be reading, making it an even more fringe bubble, but the advent of AI has pushed it all over the edge to the point that perceived pluralities are just completely and utterly meaningless. Somewhat depressing for a one who enjoys online chat as a pasttime, but it's the reality of the world now.
Just curious as I've often heard that Claude was superior for planning/architecture work while ChatGPT was superior for actual implementation and finding bugs.
FWIW it feels like GH Copilot is a cheaper version of OpenRouter but with trade-offs like being locked into VSCode and the Microsoft ecosystem overall. I already use VSCode though and otherwise I don't see much downside to using GH Copilot outside of that.
Cancelled the plan I had with them and happily went back to just coding like normal in VSCode with occasional dips into Copilot when a need arose or for rubber ducking and planning. Feels much better as I'm in full control and not trusting the magic black box to get it right or getting fatigue from reading thousands of lines of generated code.
Anyone who says they're able to review thousands of lines effectively that Claude might slop out in a day are lying to themselves.
Of course it is. Returns are diminishing, AGI isn't happening with current techniques but it is good enough to sell, so it's time to monetize. I just got an email from OpenAI as well about ads in their free tier (I signed up once out of curiosity).
Codex is closer to my taste, as it is at least a native app and not typescript slop. But the model is just not up to snuff.
Grok makes sense if you want s.th. less censored that is not biased towards woke ideology.
I don't see how this matters for coding though. I only use it to give me a summary of recent news (so I don't have to actually read the bs newspapers and X posts myself).
With an honest evaluation of your own capabilities you are already far above average. Also its hard to see the insane amount of work that often was necessary to invent the brilliant stuff and most people can not shit that out consistently.
The user's conversation happens at level 0. Any actual tool use is only permitted at stack depths > 0. When the model calls the Return tool at stack depth 0 we end that logical turn of conversation and the argument to the tool is presented to the user. The user can then continue the conversation if desired with all prior top level conversation available in-scope.
It's effectively the exact same experience as ChatGPT, but each time the user types a message an entire depth-first search process kicks off that can take several minutes to complete each time.
Basically the classical prisoner dilemma. The other devs with less moral can then outperform you.
It could be a valid strategy if you can increase your crediblity with this relinquishment.
[1] https://en.wikipedia.org/wiki/Project_Maven
[2] https://www.reuters.com/business/autos-transportation/us-dep...
Do you think people who mention grok creating CSAM is a holier-than-thou attitude? Do you not think the people who ignore that are worse than other people?
It compensates for most during implementation if you make it use TDD by using superpower et al, or just telling it to do so.
GPT 5.4 makes more simple plans (compared to superpowers - a plugin from the official claude plugin marketplace - not the plan mode), but can better fill the details while implementing.
Plan mode in Claude Code got much better in the last months, but the lacking details cannot be compensated by the model during the implementation.
So my workflow has been:
Make claude plan with superpowers:brainstorm, review the spec, make updates, give the spec to gpt, usually to witness grave errors found by gpt, spec gets updates, another manual review, (many iterations later), final spec is written, write the plan, gpt finds mind boggling errors, (many iterations later), claude agent swarm implements, gpt finds even more errors, I find errors, fix fix fix, manual code review and red tests from me, tests get fixed (many iterations later) finally something usable with stylistic issues at most (human opinion)!
This happens with the most complex features that'd be a nightmare to implement even for the most experienced programmers of course. For basic things, most SOTA modals can one-shot anyway.
I also wouldn’t say you’re locked into Microsoft’s ecosystem. At work we just have skills that allow for interaction with Bitbucket and other internal tooling. You’re not forced to use GitHub at all.
https://github.blog/changelog/2026-01-16-github-copilot-now-...
You can use GH Copilot with most of Jetbrains IDEs.
It's likely you didn't learn how to use the tool properly, and I'd suggest 'trying again' because not using AI soon will be tantamount to digging holes with shovels instead of using construction equipment. Yes, we still need our 'core skill's but, we're not going to be able to live without the leverage of AI.
Yes - AI can generate slop, and probably too many Engineers do that.
Yes - you can 'feel a loss of control' but that's where you have to find your comfort zone.
It's generally a bad idea to produce 'huge amounts of code' - unless it's perfectly consistent with a design, and he architecture is derived from well-known conventions.
Start by using it as an 'assistant' aka research, fill in all the extra bits, and get your testing going.
You'll probably want to guide the architecture, and at least keep an eye on the test code.
Then it's a matter of how much further 'up' you can go,
There are few situations in which we should be 'accepting' large amounts of code, but some of it can be reviewed quickly.
The AI, already now in 2026 can write better code than you at the algorithmic level - it will be tight, clean, 'by the book' and far lesss likley to have erros.
It fails at the architectural and modular level still, that will probably change.
The AI 'makes a clean cut' in the wood, tighter to the line than any carpenter could - like a power tool.
A carpenter that does not use power tools is an 'artisnal craft person' , not really building functional things.
This is the era of motor cars, there is really no option - I don't say that because I'm pro or anti anything, AI is often way over-hyped - that's something else entirely.
It's like the web / cloud etc. it's just 'imminent'.
So try again, experiment, stay open minded.
The amount you can review before burning out is now the reasonable limit, for the same reason that a car is supposed to stay at the speed you can handle and not the max speed of the engine.
Of course, many people are secretly skipping reviews and some dare to publicly advocate for getting rid of them entirely.
The codebase disconnect is real.
We are like blue collar workers that need to hit the gym to maintain the body that our cavemen ancestors could maintain by doing their daily duties.
Codebase gym sessions might become a thing.
It's true AGI is 'not happening' but it doesn't matter.
Demand for AI is explosive, sales are skyrocketing.
We have another 5-8 years of this crazy investment stuff.
Altman will step aside before they turn into a 'normal company'.
Like they did at Uber.
Or perhaps it was a scam in the first place for an IPO.
It’s only use case now is when you can walk away for an hour.
And plenty of very wealthy folks see the writing on the wall wrt robotics.
Aren't you saying here that the LLM personality matters to you, too? Being critical of you is a personality attribute, not a capabilities one.
(Of course, strictly speaking, LLMs have neither temperament, "personality", nor intellect, but we understand these terms are used in an analogical or figurative fashion.)
Nerd-sniping as a weapon of oppression
What's a good way to think about this? Because it does cross my mind about the billions of dollars at play - at the same time, I'm not a pessimist. I think my middle ground is kind of just the usual, taking things with a grain of salt. I mean, I chose to reply to this comment in good faith it's human to human, commenter to unpaid/unaffiliated commenter.
I hope I keep that faith. I hope our billions of neighbors on the web enable me to keep that faith over the coming years. Definitely uncertain about the future of the web but want to love it like I've loved it 1990s-today. (Guess I should volunteer w/the EFF while job hunting, try for for-purpose jobs...)
I believe the rule around here is to not assume everyone who disagrees with you or has opinions you don’t understand is a shill. Perhaps there’s a bit of that in the post you replied to, but to me seems mostly about mourning the loss of quality conversations online.
Gotta say, I agree. Not that things were ever great, but it’s really in the crapper now.
Just curious as I'm trying to branch out from using Claude for everything, and I've been following a somewhat similar workflow to yours, except just having Claude review and re-review its plan (sometimes using different roles, e.g. system architect vs SWE vs QA eng) and it will similarly identify issues that it missed originally.
But now I'm curious to try this while weaving in more GPT.
I realized this is the crux of our moment, because a variant of Amdahl's law applies to AI code gen.
{time gained} = {time saved via gen AI} - {time spent in human review}
There's no way that results in a positive number with 100% human review coverage, which means that human review coverage is headed to < 100% (ideally as low as possible).
As we know with driving, sensible drivers stick to the speed limit most of the time, but there's a good percentage of knuckle draggers who just love speeding, some people get drunk, some they just drive the wrong way down the highway entirely. Either way it's usually the sensible people who end up suffering.
But I write this on mostly US forum full of faangs and similar so i dont expect strong agreement.
Not sure why you felt the need to switch the topic to Grok. About its nudification incident, it seems a bit far stretched to say that malicious actors bypassing its safety controls was not an accident.
Initially, the image features were restricted to paying subscribers to prevent abuse by anonymous actors; this obviously happened while they were tightening safety controls to stop abuse.
If you're going to bring up that old topic, at least try to get the facts straight.
To use your own analogy, there's plenty of carpenters still around for when someone needs something doing properly and bespoke, even though we can all go to Ikea, or any other flat pack furniture company, to get wobbly furniture cheaply at any time.
I'd rather be the last carpenter charging a liveable wage, working on interesting problems for clients who appreciate a human touch than just pumping out mountains of slop to keep up with the broligarchy. If that makes me ignorant that's fine, but I'll be happily enjoying the craft while you're worrying about your metrics.
- Plus is still the same $20
- 20x Pro is still the same $200
- This is a new 5x tier is $100
https://help.openai.com/en/articles/9793128-what-is-chatgpt-... is probably a better direct comparison of the 3
The $200 Pro plan still exists, and does give access to the pro model.
What is new is a $100 Pro plan that does give access to the pro model, with lower usage limits than the $200 Pro plan.
Plan details
5x more usage than Plus 20x more usage than Plus
$120/month $200/monthTo me it seems a LOT of a stretch to think that the people behind grok belived their safty controls worked, but you can belive that if you wish. Deepfakes of non-consenting adults were trending on X all the time, elon even appears to have shared them himself, which is pretty bad even if they're all just adults, and I'm sure you belive that they belived the AI could tell the difference between an underage person and an adult perfectly, although it seems clear they didn't test it very much.
Or in other words - 'non existent'.
It is arrogant and luddite to suggest that 'using AI is not doing it properly' or that anyone will care.
They care that it's done well - that's it.
FYI, the code that AI produces is probably better than what you produce - at least a functional level.
'Artisanility' is worthless in 'code' - there are no 'winding staircases' for us to custom build, as a master carpenter would.
Where you can continue to 'write code by hand' is for very arcane, things, but even then you're still going to have to use AI for a lot of things in support of that.
So if you want to get into compiler design - sure.
But still - without mastery of AI, you'll be left behind.
At least with horses, there's a naturalist component, with 'code' - nobody cares at all. There's zero interest in it, there's not 'organic' angle to sell.
In 2005, Tim Bryce wrote that programmers were by and large a lazy, discipline-averse lot who are of average intelligence at best but get very precious about their "craft", not realizing that it's only a small part of a greater whole and it's the business people who drive actual value in a company. AI is proving him 100% correct.
Edit: I wonder if this is actually compute-bound as the impetus
This just adds a $100 plan that's 1/4 the usage of the $200 plan..
Pricing strategy is always a bit of an art, without a perfect optimum for everyone:
- pay-per-token makes every query feel stressful
- a single plan overcharges light users and annoyingly blocks heavy users
- a zillion plans are confusing / annoying to navigate and change
This change mostly just adds a medium-sized plan for people doing medium-sized amounts of work. People were asking for this, and we're happy to deliver.
(I work at OpenAI.)
GPT 5.4 Pro is extremely slow but thorough, so it's not meant for the usual agentic work, rather for research or solving hard bugs/math problems when you provide it all the context.
And do you mean to say that you don't really use GPT 5.4 Pro unless it's for a hard bug? Curious which models you use for system design/architecture/planning vs execution of a plan/design.
TIA! I'm still trying to figure out an optimal system for leveraging all of the LLMs available to us as I've just been throwing 100% of my work at Claude Code in recent months but would like to branch out.
- internally same architecture of best of N
- not available in the code harness like Codex, only in the UI (gpt has API)
- GPT-5.4 pro is extremely expensive: $30.00 input vs $180.00 output
- both DT and Pro are really good at solving math problems
You can't just say because they've added more things the old things are over - the old things actually have to go away first. Eventually they may get there (or not). It may be another few years (or not). Nothing is actually now over though any more than it was now over in 2024.
ChatGPT
See pricing for our individual, business, and enterprise plans.
*Usage must be reasonable and comply with our _policies_
**Enterprise and Business can purchase credits for more access
***ChatGPT manages a shared context window to understand your request, track the conversation, retrieve relevant information, and generate responses. The portion available for user input is smaller than the total window, as space is also used for system instructions (including tools and personality), memories (if enabled), and internal processing (reviewing information, reasoning, and response generation). The reported space for user input is an approximation and may change dynamically based on features in use and any memory content.
The free version of ChatGPT is available to everyone. Paid plans (Go, Plus, Business, and Enterprise) are priced per user per month. We offer monthly plans for Go, Plus and Business and annual plans for Business and Enterprise.