The Codex App

It is baffling how these AI companies, with billions of dollars, cannot build native applications, even with the help of AI. From a UI perspective, these are mostly just chat apps, which are not particularly difficult to code from scratch. Before the usual excuses come about how it is impossible to build a custom UI, consider software that is orders of magnitude more complex, such as raddbg, 10x, Superluminal, Blender, Godot, Unity, and UE5, or any video game with a UI. On top of that, programs like Claude Cowork or Codex should, by design, integrate as deeply with the OS as possible. This requires calling native APIs (e.g., Win32), which is not feasible from Electron.

People's mileage may vary, but in my instance, this was so bad that I actually got angry while trying to use it.

It's slow and stupid. It does not do proper research. It does not follow instructions. It randomly decides to stop being agentic, and instead just dumps the code for me to paste. It has the extremely annoying habit of just doing stuff without understanding what I meant, making a mess, then claiming everything is fine. The outdated training data is extremely annoying when working with Nuxt 4+. It is not creative at solving problems. It dosent show the thinking. The Undo code does not give proper feedback on the diff and if it actually did "undo." And I hate the personality. It HAS to be better than it comes off for me because I am actually in a bad mood after having worked with it. I would rather YOLO code with Gemini 3 flash, since it's actually smarter in my assessment, and at least I can iterate faster, and it feels like it has better common sense.

Just as an example, I found an old, terrible app I made years ago for our firm that handles room reservations. I told it to update from Bootstrap to Flowbite UI. Codex just took forever to make a mess, installed version 2.7 when 4.0.1 is the latest, even when I explicitly stated that it should use the absolute latest version. Then it tried to install it and failed, so it reverted to the outdated CDN.

I gave the same task to Claude Code. Same prompt. It one-shotted it quickly. Then I asked it to swap out ALL the fetch logic to have SPA-like functionality with the new beta 4 version of HTMX, and it one-shot that too in the time Codex spent just trying to read a few files in the project.

This reminds me of the feeling I had when I got the Nokia N800. It was so promising on paper, but the product was so bad and terrible to use that I knew Nokia was done for. If this was their take on what an acceptable smartphone could be, it proves that the whole foundation is doomed. If this is OpenAI's take on what an agentic coding assistant should be—something that can run by itself and iterate until it completes its task in an intelligent and creative way.... OpenAI is doomed.

Genuinely excited to try this out. I've started using Codex much more heavily in the past two months and honestly, it's been shockingly good. Not perfect mind you, but it keeps impressing me with what it's able to "get". It often gets stuff wrong, and at times runs with faulty assumptions, but overall it's no worse than having average L3-L4 engs at your disposal.

That being said, the app is stuck at the launch screen, with "Loading projects..." taking forever...

Edit: A lot of links to documentation aren't working yet. E.g.: https://developers.openai.com/codex/guides/environments. My current setup involves having a bunch of different environments in their own VMs using Tart and using VS Code Remote for each of them. I'm not married to that setup, but I'm curious how it handles multiple environments.

Edit 2: Link is working now. Looks like I might have to tweak my setup to have port offsets instead of running VMs.

Looks like another Claude App/Cowork-type competitor with slightly different tradeoffs (Cowork just calls Claude Code in a VM, this just calls Codex CLI with OS sandboxing).

Here's the Codex tech stack in case anyone was interested like me.

Framework: Electron 40.0.0

Frontend:

- React 19.2.0

- Jotai (state management)

- TanStack React Form

- Vite (bundler)

- TypeScript

Backend/Main Process:

- Node.js

- better-sqlite3 (local database)

- node-pty (terminal emulation)

- Zod (validation)

- Immer (immutable state)

Build & Dev:

- pnpm (package manager)

- Electron Forge

- Vitest (testing)

- ESLint + Prettier

Native/macOS:

- Sparkle (auto-updates)

- Squirrel (installer)

- electron-liquid-glass (macOS vibrancy effects)

- Sentry (error tracking)

It's basically what Emdash (https://www.emdash.sh/), Conductor (https://www.conductor.build/) & CO have been building but as first class product from OpenAI.

Begs the question if Anthropic will follow up with a first-class Claude Code "multi agent" (git worktree) app themselves.

The landing page for the demo game "Voxel Velocity" mentions "<Enter> start" at the bottom, but <Enter> actually changes selection. One would think that after 7mm tokens and use of a QA agent, they would catch something like this.

I'm a Claude Code user primarily. The best UI based orchestrator I've used is Zenflow by Zencoder.ai -- I am in no way affiliated with them, but their UI / tool can connect to any model or service you have. They offer their own model but I've not used it.

What I like is that the sessions are highly configurable from their plan.md which translates a md document into a process. So you can tweak and add steps. This is similar to some of the other workflow tools I've seen around hooks and such -- but presented in a way that is easy for me to use. I also like that it can update the plan.md as it goes to dynamically add steps and even add "hooks" as needed based on the problem.

How about us, Linux users? This is Mac only. Do they plan to support CLI version with all the features they are adding to desktop app?

It seems the big feature is working agents in parallel? I've been working agents in parallel in Claude Code for almost 9 months now. Just create a command in .claude/commands that references an agent in .claude/agents. You can also just call parallel default Task agents to work concurrently.

Using slash commands and agents has been a game changer for me for anything from creating and executing on plans to following proper CI/CD policies when I commit changes.

To Codex more generally, I love it for surgical changes or whenever Claude chases its tail. It's also very, very good at finding Claude's blindspots on plans. Using AI tools adversarially is another big win in terms of getting things 90% right the first time. Once you get the right execution plan with the right code snippets, Claude is essentially a very fast typer. That's how I prefer to do AI-assisted development personally.

That said, I agree with the comments on tokens. I can use Codex until the sun goes down on $20/month. I use the $200/month pro plan with Claude and have only maxxed out a couple times, but I do find the volume to quality to be better with Claude. So far it's worth the money.

- looks like OpenAIs answer to Claude Code Desktop / Cowork

- workspace agent runner apps (like Conductor) get more and more obsolete

- "vibe working" is becoming a thing - people use folder based agents to do their work (not just coding)

- new workflows seem to be evolving into folder based workspaces, where agents can self-configure MCP servers and skills + memory files and instructions

kinda interested to see if openai has the ideas & shipping power to compete with anthropic going forward; anthropic does not only have an edge over openai because of how op their models are at coding, but also because they innovate on workflows and ai tooling standards; openai so far has only followed in adoption (mcp, skills, now codex desktop) but rarely pushed the SOTA themselves.

To me, the obvious next step for these companies is to integrate their products with web hosting. At this point, the remaining hurdle for non-developers is deploying their creations to the cloud with built-in monetization.

Mac only. Again.

Apple is great but this is OpenAI devs showing their disconnect from the mainstream. Its complacent at best, contemptuous at worst.

SamA or somebody really needs to give the product managers here a kick up the arse.

OpenAI, ChatGPT, Codex

So many of the things that pioneered the way for the truly good (Claude, Gemini) to evolve. I am thankful for what they have done.

But the quality is gone, and they are now in catch-up mode. This is clear, not just from the quality of GPT-5.x outputs, but from this article.

They launch something new, flashy, should get the attention of all of us. And yet, they only launch to Apple devices?

Then, there are typos in the article. Again. I can't believe they would be sloppy about this with so much on the line. EDIT: since I know someone will ask, couple of examples - "7MM Tokens", "...this prompt initial prompt..."

And why are they not giving the full prompt used for these examples? "...that we've summarized for clarity" but we want to see the actual prompt. How unclear do we need to make our prompts to get to the level that you're showing us? Slight red flag there.

Anyway, good luck to them, and I hope it improves! Happy to try it out when it does, or at the very least, when it exists for a platform I own.

Looks like they forgot the part of the code editor where you can… edit code. Claude Code in Zed is about the most optimal experience I can imagine. I want the agent on the side and a code editor in the middle.

I had been procrastinating putting in the effort to find a decent web designer to redesign our company’s website because I couldn’t stomach the hours I would need to put in to educate them about our messaging and to slowly go around and around iteratively to get the design nailed.

Last week, I decided to try building the site myself using Codex (the terminal one). I chose Astro as the framework because I wanted to learn about it. I fed it some marketing framework materials (positioning statements and whatnot) and showed it some website designs that we like. I then asked it to produce a first cut and it one-shotted a pretty decent bit of output.

AGI is definitely a few more years away, because I’ve since invested probably 30 hours of iteration to make the site into something that is closer to what I eventually want to launch. But here’s the thing: I never intended for Codex to produce THE final website. But now I’m thinking, “maybe we can?” On my team, we have just enough expertise and design know-how to at least know what looks good and we are developers so we definitely know what good code looks like. And Codex is nailing it on both those fronts.

As I said, we’re far from AGI. There’s no way I can one-shot something like this. It requires iteration with humans who have years of “context” built up. But maybe the days of hiring a designer and just praying that they somehow get it right are behind us.

Bit of a buried lede:

> For a limited time we're including Codex with ChatGPT Free

Is this the first free frontier coding agent? (I know there have been OSS coding agents for years, but not Codex/Claude Code.)

This will actually work well with my current workflow: dictation for prompts, parallel execution, and working on multiple bigger and smaller projects so waiting times while Codex is coding are fully utilized, plus easy commits with auto commit messages. Wow, thank you for this. Since skills are now first class tools, I will give it a try and see what I can accomplish with them.

I know/hope some OpenAI people are lurking in the comments and perhaps they will implement this, or at least consider it, but I would love to be able to use @ to add files via voice input as if I had typed it. So when I say "change the thingy at route slash to slash somewhere slash page dot tsx", I will get the same prompt as if I had typed it on my keyboard, including the file pill UI element shown in the input box. Same for slash commands. Voice is a great input modality, please make it a first class input. You are 90% there, this way I don't need my dictation app (Handy, highly recommended) anymore.

Also, I see myself using the built in console often to ls, cat, and rg to still follow old patterns, and I would love to pin the console to a specific side of the screen instead of having it at the bottom and pls support terminal tabs or I need to learn tmux.

This looks interesting and I use Codex a fair bit already in vscode etc, but I'm having trouble leaving a 'code editor with AI' to an environment that sort of looks like it puts the code as a hidden secondary artefact. I guess the key thing is the multi agent spinning plates part.

> "Localize my app and add the option to change units"

To me this still feels like the wrong way to interact with a coding agent. Does this lead people to success? I've never seen it not go off the rails in some way unless you provide clear boundaries as to what the scope of the expected change is. It's gonna write code if you don't even want it to yet, it's gonna write the test first or the logic first, whichever you don't want it to do. It'll be much too verbose or much too hacky, etc.

Somewhat underwhelmed. I consider agents to be a sidetrack. The key insight from the Recursive Language Models paper is that requirements, implementation plans, and other types of core information should not be part of context but exist as immutable objects that can be referenced as a source of truth. In practice this just means creating an .md file per stage (spec, analysis, implementation plan, implementation summary, verification and test plan, manual qa plan, global state reference doc).

I created this using PLANS.md and it basically replicates a kanban/scrum process with gated approvals per stage, locked artifacts when it moves to next stage, etc. It works very well and it doesnt need a UI. Sure, I could have several agents running at the same time, but I believe manual QA is key to keeping the codebase clean, so time spent on this today means that future requirements can be implemented 10x faster than with a messy codebase.

Genuinely curious if people would just let this rip with no obvious isolation?

I’m aware Mac OS has some isolation/sandboxes but without running codex via docker I wouldn’t be running codex.

(Appreciate there are still risks)

I really look forward to using this. I tried Codex first time yesterday and it was able to complete a task (i.e. drawing Penrose tilings) that Claude Code previously failed at. Also a little overwhelmed by all the new features that this app brings. I feel that I'm behind all the fancy new tools.

I'm still waiting for the big pivotal moment in this space, I think there is a lot of potential with rethinking an IDE to be Agent first, and lots of what is out there is still lacking. (It's like we all don't know what we don't know, so we are just recycling UX around trying to solve it)

I keep coming back to my basic terminal with tmux running multiple sessions. I recently though forked this https://github.com/tiann/hapi and been loving using tailscale to expose my setup on my mobile device for convenience (plus the voice input there)

> For a limited time, Codex will also be available to ChatGPT Free and Go users to help build more with agents. We’re also doubling rate limits for existing Codex users across all paid plans during this period.

Is there more information about it? For how long and what are the limits?

OT: I never liked about codex how it didn't ask for confirmations before editing. While Claude has auto accept off by default I never understood why codex didn't have it. I want to iterate on LLMs edit suggestions.

Did they fix it?

Otherwise I'm not interested.

Why do I have to manually switch between medium, high, extra high?? This is comedy at this point, Claude Code "just works" without having to think about obscure stuff like this and it still produces a better output. OpenAI is just embarassing at this point, ChatGPT also got ruined with gpt-5.2 which talks and has the brain of a 2 year old, even gpt-3.5-turbo was better. Did they still have no successful training run?

Crazy free tier. I reckon I used ~2 weeks of the claude $20 subscription within an hour. Spawned like 12 semi big tasks and I still didn't see no warnings.

Why make a desktop windowing system app, for a user group who runs a bunch of simultaneous terminal sessions with tear-off tabs or tmux panels, and then force everything into one window that can only display a single session at a time?

The Open button and then codex resume --last is good, but it's a waste and The Wrong Abstraction not to make instantiable conversation windows from the get-go.

Double the codex limits for 2 months is very compelling. The limits are already generous.

Bugs me they treat MacOS as first class. Do people actually develop on a Mac in 2026? Why not just start with Linux?

How does Codex mac app compare with Cursor? If anyone who tried both can explain here?

My experience with Cursor is generally good and I like that it gives me UX of using VS Code and also allows selection of multiple models to choose if one model is stuck on the prompt and does not work.

Best part of the Codex app launch is that, OpenAI has opened the whole Codex ecosystem (CLI, Web, IDE Extensions) for free ChatGPT users. And x2 usage for Plus, Pro. This is I think to gain developers' attraction from Claude Code.

Is it open source? Do they disclose which framework they use for the GUI? Is it Electron or Tauri?

I guess the next it was meant to happen...I tried Google's Antigravity and found it quite buggy.

May give a go at this and Claude Code desktop as well, but Cursor guys are still working the hardest to keep themselves alive.

This is an ode to opencode and how openai, very strangely, is just porting layout and feature of real open-source.

So much valuation, so much intern competetion and shenanigans than the creatives left.

Interesting timing for me personally as I just switched from running Codex in multiple tabs in Cursor to Ghostty. It had nicer fonts by default, better tab switching that was consistent with the keyboard shortcut to switch to any tab on Mac, and it had native notifications that would ping when Codex had finished. Worktrees requiring manual configuration was probably the one sticking point, so definitely looking forward to this.

No.

I am glad to not depend on AI. It would annoy me to no ends how it tries to assimilate everything. It's like systemd on roids in this aspect. It will swallow up more and more tasks. Granted, in a way this is saying "then it was not necessary to have this things anymore now that AI solves it all", but I am skeptical of "the praised land" here. Skynet was not trusted back in 1982 or so. I don't trust AI either.

It would be nice if it didn't have to be all local. I'd love a managed cluster feature where you could just blast some workloads off to some designated server or cluster and manage them remotely, share progress with teammates, etc. (Not "cloud" though; I'd still want them on the internal network). I imagine something like that is in the works.

Would like to see non-dev workflow that benefit from this app.

Is there any marked difference or benefit over Claude Code?

How is this better than vscode with the codex extension?

No access via paid API? C’mon guys. Big enterprise consume codex and Claude via vertex and azure foundry, because it’s already available on their contracts.

And they don’t mind the usage cost. Please let me spend 5k monthly on this.

I’ve been using codex regularly and it’s pretty good at model extra high with pretty generous context.

From the video, I can see how this app would be useful in:

- Creating branches without having to open another terminal, or creating a new branch before the session.

- Seeing diff in the same app.

- working on multiple sessions at once without switching CLI

- I quite like the “address the comments”, I can see how this would be valuable

I will give it a try for sure

Maybe it's because I'm not used to the flow, but I prefer to work directly on the machine where I'm logged in via ssh, instead of working "somewhere in a git tree", and then have to deploy/test/etc.

Once this app (or a similar app by Anthropic) will allow me to have the same level of "orchestration" but on a remote machine, I'll test it.

I typically bounce between Claude Code and Codex for the same project, and generally enjoy using both to check each other.

One cool thing about this: upon installing it immediately found all previous projects I've used with Codex and has those projects in the sidebar with all of the "threads" (sessions) I've had with Codex on these projects!

I don't know you, but apart from ai tools race fatigue(feel pretty much like frameworks fatigue), all I see is mouse traveling a lot between far distant small elements, buttons and textareas. AI should have brought innovation even in UIs we basically stopped innovating there

Does anybody know when Codex is going to roll out subagent support? That has been an absolute game changer in Claude Code. It lets me run with a single session for so much longer and chip away at much more complex tasks. This was my biggest pain point when I used Codex last week.