Codex for almost everything

Just reading the comments here it's amazing how many people seemingly don't know that Claude Desktop and Cowork basically already does all of this. Codex isn't pioneering these features, it's mostly just catching up.

There seems a fair enthusiasm in the UI of these to hide code from coders. Like the prompt interaction is the true source and the actual code is some sort of annoying intermediate runtime inconvenience to cover up. I get that productivity can be improved with a lot of this for non developers, just not sure using 'code' as the term is the right one or not.

Lots of scepticism here, but I think this may really take off. After 25 years of heavy CLI use, lately I've found myself using codex (in terminal) for terminal tasks I've previously done using CLI commands.

If someone manages to make a robust GUI version of this for normies, people will lap it up. People don't want to juggle applications, we want computers to do what we want/need them to do.

Do people really want codex to have control over their computer and apps?

I'm still paranoid about keeping things securely sandboxed.

I swear OpenAI has 2-3 unannounced releases ready to go at any time just so they can steal some thunder from their competitors when they announce something

</tin foil hat>

Has anyone figured out how to stop the Codex app from draining my M5 Pro's battery in like 2 hours? I can literally just have it open and my lap turns into a heater. I've tried adjusting all sorts of settings and haven't been able to make a dent. I'm assuming its the garbage renderer.

My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

i.e. agents for knowledge workers who are not software engineers

A few thoughts and questions:

1. I expect that this set of products will be extremely disruptive to many software businesses. It's like when a new VP joins a company, they often rip and replace some of the software vendors with their personal favorites. Well, most software was designed for human users. Now, peoples' agents will use software for them. Agents have different needs for software than humans do. Some they'll need more of, much they'll no longer need at all. What will this result in? It feels like a much swifter and more significant version of Google taking excerpts/summaries from webpages and putting it at the top of search results and taking away visits and ad revenue from sites.

2. I've tried dozens of products in this space. For most, onboarding is confusing, then the user gets dropped into a blank space, usage limits are uncompetitive compared to the subsidized tokens offered by OpenAI/Anthropic, etc. It's a tough space to compete in, but also clearly going to be a massive market. I'm expecting big investment from Microsoft, Google etc in this segment.

3. How will startups in this space compete against labs who can train models to fit their products?

4. Eventually will the UI/interface be generated/personalized for the user, by the model? Presumably. Harnesses get eaten by model-generated harnesses?

A few more thoughts collected here: https://chrisbarber.co/professional-agents/

Products I've tried: ai browsers like dia, comet, claude for chrome, atlas, and dex; claw products like openclaw, kimi claw, klaus, viktor, duet, atris; automation things like tasklet and lindy; code agents like devin, claude code, cursor, codex; desktop automation tools like vercept, nox, liminary, logical, and raycast; and email products like shortwave, cora and jace. And of course, Claude Cowork, Codex cli and app, and Claude Code cli and app.

Edit: Notes on trying the new Codex update

1. The permissions workflow is very slick

2. Background browser testing is nice and the shadow cursor is an interesting UI element. It did do some things in the foreground for me / take control of focus, a few times, though.

3. It would be nice if the apps had quick ways to demo their new features. My workflow was to ask an LLM to read the update page and ask it what new things I could test, and then to take those things and ask Codex to demo them to me, but it doesn't quite understand it's own new features well enough to invoke them (without quite a bit of steering)

4. I cannot get it to show me the in app browser

5. Generating image mockups of websites and then building them is nice

Confusingly, Codex their agentic programming thing and codex their GUI which only works on Mac and Windows have the same name.

I think the latter is technically "Codex For Desktop", which is what this article is referring to.

Does that version of Codex still read sensitive data on your file system without even asking? Just curious.

https://github.com/openai/codex/issues/2847

They felt the pressure of posting something after Claude 4.7

Just got Computer Use working and honestly it feels really, really good. This is going to enable so many high-quality cross-application workflows in non-browser applications.

Maybe they could use Codex to build a Linux app...

Codex is my favorite UX for anything as it edits the files and I can use the proper tooling to adjust and test stuff, so in my experience it was already able to do everything. However lately the limits seem to have got extremely tight, I keep spending out the daily limits way too quickly. The weekly limits are also often spent out early so I switch to Claude or Gemini or something.

Couple of people in my company have vibe coded some chat interface and they’re passing skills and MCPs that give the model access to all our internal data (multiple databases) and tools (Jira, Confluence etc).

I wonder if there’s something off the shelf that does this?

Well I sure hope there's a toggle to turn those features off, because I don't want to open my entire UI surface to the potential of sandbox escape...

Is there anyone that feels that LLMs are wrong for computer use? It's like robotic, if find LLMs alone are really slow for this task

SSH to devboxes is the exact usecase for services like https://shellbox.dev: create a box using ssh... and ssh into it. Now web, no subs. Codex can create it's own boxes via ssh

Just commenting here to impact the controversy score.

Sherlocking ramps up into IPO

Bunch of startups need to pivot today after this announcement including mine

it it doesn't complain about everything being malware maybe i will come back to openai from my adventures with anthropic

The first example is tic tac toe. Why would anyone bother? None of those eash things are relevant for people who use AI. They don't care about learning, improving, exploring how things work, creating, being creative to that degree. They want to hit buttons and see the computer do things and get a dopamine rush.

> ... work with more of the tools and apps you use everyday, generate images, remember your preferences ...

Why is OpenAI obsessed with generating imgaes? Do they think "generate image" is a thing that a software engineer do on a daily basis?

Even when I was doing heavy web development, I can count the number of times I needed to generate images, and usually for prototyping only.

> Computer use is initially available on macOS,

Does anyone know of a good option that works on Wayland Linux?

OpenClaw acquisition at work.

First use case I'm putting to work is testing web apps as a user. Although it seems like this could be a token burner. Saving and mostly replaying might be nice to have.

cursor has been doing this for months, welcome to 3 months ago

"Our mission is to ensure that AGI benefits all of humanity. "

They have AGI now?

I don't think this one did it. time to for the real release

>> for the more than 3 million developers who use it every week

It is instructive that they decided to go with weekly active users as a metric, rather than daily active users.

I'm sorry to be slightly off topic but since it's ChatGPT, anyone else find it annoying to read what the bot is thinking while it thinks? For some reason I don't want to see how the sausage is being made.

Using Claude and Codex side by side now . Would love to just use one eventually

"Codex can now operate your computer alongside you" - I really don't want AI to "operate" my computer.

Tool for everything does nothing really good.

My monthly subscription for Claude is up in a week, is there any compelling reason to switch to Codex (for coding/bug fixing of low/medium difficulty apps)? Or is it pretty much a wash at this point?

"We’re also releasing more than 90 additional plugins"

but there is no link, why would you not make this a link.

boggles my mind that companies make such little use of hypertext

Side note: I really wish there was an expectation that TUI apps implemented accessibility APIs.

Sure we can read the characters in the screen. But accessibility information is structured usually. TUI apps are going to be far less interesting & capable without accessibility built-in.

I can't help but see some things as a solution in search of a problem every time I see these examples illustrating toy projects. Cloud Tic Tac Toe? Seriously?

Is it OpenAI Cowork?

I'm sure it's been said before, but more and more our development work is encroaching on personal compute space. Even for personal projects. A reminder to me to air gap those to spaces with separate hardware [:cringe:]

Can't help but think the surface area for security issues is becoming massive with these tools

Am I the only one who sees screen recordings of AI agents as archaic as filming airplane instruments to take measurements?

Only on macOS though? This doesn't seem to work on Linux. Neither does Claude Cowork, not officially.

What does "major update to codex" mean? New model? Or just new desktop app? The announcement is vague.

I wish Codex App was open source. I like it, but there are always a bunch of little paper cuts that, if you were using codex cli, you could have easily diagnosed and filed an issue. Now, the issues in the codex repo is slowly becoming claude codish – ie a drawer for people's feelings with nothing concrete to point to.

Man this progress is fast.

Its clear that it will go in this type of direction but Anthropic announced managed agents just a week ago and this again with all the biuld in connections and tools will help so many non computer people to do a lot more faster and better.

I'm waiting for the open source ai ecosystem to catch up :/

Codex is HN's darling now because Anthropic lowered rate limits for individuals due to compute constraints. OAI has so few enterprise users they can afford to subsidize compute for this group a lot more than Anthropic.

Eventually once they have more users they'll do the same thing as Anthropic, of course.

It's all a transparent PR play and it's kind of absurd to see the X/HN crowd fall for it hook, line, and sinker.

I don't think Claude has this part yet:

> With background computer use, Codex can now use all of the apps on your computer by seeing, clicking, and typing with its own cursor. Multiple agents can work on your Mac in parallel, without interfering with your own work in other apps.

Claude Cowork is unusably slow on my M1 MacBook Pro. I wonder if Codex is any better; a quick search indicates that it is also an electron app

IMHO no one is really pioneering. A lot more is possible than what is being done. I wrote a blog post about useful agents in a business setting (https://www.generativestorytelling.ai/blog/posts/useful-corp...) that highlights AI being proactive.

I mean table stakes stuff, why isn't an agent going through all my slack channels and giving me a morning summary of what I should be paying attention to? Why aren't all those meeting transcriptions being joined together into something actually useful? I should be given pre-meeting prep notes about what was discussed last time and who had what to do items assigned. Basic stuff that is already possible but that no one is doing.

I swear none of the AI companies have any sense of human centric design.

> pull relevant context from Slack, Notion, and your codebase, then provide you with a prioritized list of actions.

This is an improvement, but it isn't the central focus. It should be more than just on a single work item basis, more than on just code.

If we are going to be managing swarms of AI agents going forward, attention becomes our most valuable resource. AI should be laser focused on helping us decide where to be focused.

Yeah, it’s probably very similar to my experience where I just tried Codex because I had a ChatGPT subscription found it to be quite powerful and then because I was used to it just ended up getting the pro subscription so I am guessing folks like me have never really used Claude.

Eventually once they have more users they'll do the same thing as Anthropic, of course.

It's all a transparent PR play and it's kind of absurd to see the X/HN crowd fall for it hook, line, and sinker.

Competition is bad? Who cares - let the big players subsidize and compete between each other. That's what we want. We want strong models at a low price, and we'll hype up whoever is doing it.

Simultaneously, we also hype up the open models that are catching up. That are significantly more discounted, that also put pressure on the big players and keep them in check.

People aren't falling for PR; people are encouraging the PR to put pressure on the competition. It's not that hard.

There's a systematic marketing campaign from oai on reddit and HN - there's a huge uptick of "codex is better than claude code" comments and posts this last week which is perfectly timed with the claude code increased limits

This is true. But Anthropic did us dirty most recently and so it’s their turn on the pitch fork. Sam will do us too. Just not yet.

So Anthropic degraded their product. OAI updated their product to meet for exceeded Anthropic old product.

This is normal behavior and not a cause for such a hyperbolic response.

everyone seems to unconditionally love anthropic, but openai has always had the best models… it just requires a bit more effort on behalf of the user to actually leverage it.

There was brief consternation when OpenAI swooped in to snatch up those DoD contracts but then the next model released and all is forgiven.

Not only that, but anthropic is now forcing users to give their biometric information to palantir

They're doing a slow rollout

If someone manages to make a robust GUI version of this for normies, people will lap it up. People don't want to juggle applications, we want computers to do what we want/need them to do.

Confusingly, Codex their agentic programming thing and codex their GUI which only works on Mac and Windows have the same name.

I think the latter is technically "Codex For Desktop", which is what this article is referring to.

Do people really want codex to have control over their computer and apps?

I'm still paranoid about keeping things securely sandboxed.

My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

i.e. agents for knowledge workers who are not software engineers

A few thoughts and questions:

3. How will startups in this space compete against labs who can train models to fit their products?

4. Eventually will the UI/interface be generated/personalized for the user, by the model? Presumably. Harnesses get eaten by model-generated harnesses?

A few more thoughts collected here: https://chrisbarber.co/professional-agents/

Edit: Notes on trying the new Codex update

1. The permissions workflow is very slick

2. Background browser testing is nice and the shadow cursor is an interesting UI element. It did do some things in the foreground for me / take control of focus, a few times, though.

4. I cannot get it to show me the in app browser

5. Generating image mockups of websites and then building them is nice

I swear OpenAI has 2-3 unannounced releases ready to go at any time just so they can steal some thunder from their competitors when they announce something

</tin foil hat>

it it doesn't complain about everything being malware maybe i will come back to openai from my adventures with anthropic

Is there anyone that feels that LLMs are wrong for computer use? It's like robotic, if find LLMs alone are really slow for this task

SSH to devboxes is the exact usecase for services like https://shellbox.dev: create a box using ssh... and ssh into it. Now web, no subs. Codex can create it's own boxes via ssh

Just commenting here to impact the controversy score.

Well I sure hope there's a toggle to turn those features off, because I don't want to open my entire UI surface to the potential of sandbox escape...

I don't think this one did it. time to for the real release

First use case I'm putting to work is testing web apps as a user. Although it seems like this could be a token burner. Saving and mostly replaying might be nice to have.

"Our mission is to ensure that AGI benefits all of humanity. "

They have AGI now?

cursor has been doing this for months, welcome to 3 months ago

What do you expect from an app that’s built by not looking at the code?

I'm on M4 Max so your mileage may vary, but what helps me is not running any backdoors willingly.

I agree. As a long time linux user, coding assistants as interface to the OS has been a delight to discover. The cryptic totality of commands, parameters, config files, logs has been simplified into natural language: "Claude, I want to test monokai color scheme on my sway environment" and possibly hours of tweaking done in seconds. My setup has never been so customized, because there is no friction now. I love it and I predict this will increase, even if slightly, the real user base of linux desktops.

After 25 years of writing code in vim, I've found myself managing a bunch of terminal sessions and trying to spot issues in pull requests.

I wouldn't have thought this could be the case and it took me actually embracing it before I was fully sold.

Maybe not a popular opinion but I really do believe...

- code quality as we previously understood will not be a thing in 3-5 years

- IDEs will face a very sharp decline in use

> lately I've found myself using codex (in terminal) for terminal tasks I've previously done by CLI commands.

This is the real "computer use". We will always need GUI-level interaction for proprietary apps and websites that aren't made available in machine-readable form, but everything else you do with a computer should just be mapped to simple CLI commands that are comparatively trivial for a text-based AI.

It’s marginally better than Microsoft naming things.

Programmers mostly don't. Ordinary people see figuring out how to use the computer as a hindrance rather than empowering, they want Star Trek. They want "computer, plan my next vacation to XYZ for me" to lay out a full itinerary and offer to buy the tickets and make the reservations.

Knowledge work is work most people don't really want to deal with. Ordinary people don't put much value into ideas regardless of their level of refinement

I want it yes. I already feel like Im the one doing the dumb work for the AI of manually clicking windows and typing in a command here or there it cant do.

Ive also been getting increasingly annoyed with how tedious it is to do the same repetitive actions for simple tasks.

giving these things control over your actual computer is a nightmare waiting to happen – i think its irresponsible to encourage it. there ought to be a good real sandbox sitting between this thing and your data.

There are people running OpenClaw, so yeah, crazy as it sounds, some do that.

I'm reluctant to run any model without at least a docker.

It repaired an astonishing messed up permission issue on my mac

I don’t think people want that, but they are willing to accept that in order to get stuff done.

can't test pygame otherwise :D

> There seems a fair enthusiasm in the UI of these to hide code from coders. Like the prompt interaction is the true source and the actual code is some sort of annoying intermediate runtime inconvenience to cover up.

I've finally started getting into AI with a coding harness but I've take the opposite approach. usually I have the structure of my code in my mind already and talk to the prompt like I'm pairing with it. while its generating the code, I'm telling it the structure of the code and individual functions. its sped me up quite a lot while I still operate at the level of the code itself. the final output ends up looking like code I'd write minus syntax errors.

The fact that the Codex app is still unavailable on Linux makes me think the target audience isn't people who understand code.

The power to the people is not us the developers and coders.

We know how to do a lot of things, how to automate etc.

A billion people do not know this and probably benefit initially a lot more.

When i did some powerpoint presentation, i browsed around and draged images from the browser to the desktop, than i draged them into powerpoint. My collegue looked at me and was bewildered how fast I did all of that.

Check it out: you can open the repo in vim and compare changes with git, for the coderiest coding experience

It's reminds me what happened with Frontpage, ultimately people are going to learn the same lesson, there's no replacement for the source code.

Yes, the code is still important. For example, I had tasked Codex to implement function calling in a programming language, and it decided the way to do this was to spin up a brand new sub interpreter on each function call, load a standard library into it, execute the code, destroy the interpreter, and then continue -- despite an already partial and much more efficient solution was already there but in comments. The AI solution "worked", passed all the tests the AI wrote for it, but it was still very very wrong. I had to look at the code to understand it did this. To get it right, you have to either I guess indicate how to implement it, which requires a degree of expertise beyond prompting.

Hot take: we (not I, but I reluctantly) will keep calling it code long after there's no code to be seen.

Like we did with phones that nobody phones with.

(I work at OpenAI) Heya, in reality it's more much organic than that. We build stuff, ship it internally, then work crazy hard to quickly ship it externally. When we put something out on a given day, it's usually been in the works and scheduled for a while.

One concrete example: to set up a launch like today, where press, influencers, etc, all came out at 10a PT. That's all coordinated well in advance!

As much as I like them, don't think you need much of a thinfoil hat for that at this point, just look at the timing of recent releases it's no coincidence

Their company literally runs on hype. This is all part of the strat.

Raced to the comments to say this. Must absolutely be correct - who can dominate the media cycle.

Perhaps, but that strategy can backfire if you're planting a subpar comparison in the minds of customers.

They did acquire TBPN, this barely needs tin foil.

Credit to them for being media savvy.

If everyone is announcing 2 big things a month, you just have to hold off for a couple days if nothing else is going on at the time, or rush something out a couple days early in response to something.

I think it's a given. OpenAI's product is their hype.

Does that even matter nowadays?

These announcements happen so often

Its not magic. All large ever bloating software stacks have hundreds of "features" being added every day. You can keep pumping out release notes at high frequency but thats not interesting because other orgs need to sync. And sync takes its own sweet time.

Does that version of Codex still read sensitive data on your file system without even asking? Just curious.

https://github.com/openai/codex/issues/2847

This is a pretty important issue given that the new update adds "computer use" capabilities. If it was already reading sensitive files in the CLI version, giving it full desktop control seems like it needs a much more robust permission model than what they've shown so far.

the awkward part isn't just about reading sensitive files.

search, listings, direct reads, browser and computer use all sit behind different boundaries.

hard to tell what any given approval actually buys or exposes.

https://www.reddit.com/r/ClaudeAI/comments/1r186gl/my_agent_...

tldr Claude pwned user then berated users poor security. (Bonus: the automod, who is also Claude, rubbed salt on the wound!)

I think the only sensible way to run this stuff is on a separate machine which does not have sensitive things on it.

ran into this literally yesterday. so im gonna assume yes.

They felt the pressure of posting something after Claude 4.7

It was already leaked several days ago and they've been teasing it for weeks. They had already said that it was coming this week specifically.

Fuck, i've been using it wrong.

Just got Computer Use working and honestly it feels really, really good. This is going to enable so many high-quality cross-application workflows in non-browser applications.

Maybe they could use Codex to build a Linux app...

Linux users are probably too smart to actually use these kinds of tools right now.

OpenClaw acquisition at work.

Any particular evidence for this other than the conjecture that it might be related?

To me it seems like just a natural evolution of Codex and a direct response to Claude Cowork, rather than something fully claw-like.

> Computer use is initially available on macOS,

Does anyone know of a good option that works on Wayland Linux?

Goose is an option, but it is just OK. https://github.com/aaif-goose/goose

Codex-cli / OpenClaw. If you need a browser use Playwright-mcp.

I can't see why I'd want an agent to click around Gnome or Ubuntu desktop but maybe that's just me?

Sherlocking ramps up into IPO

Bunch of startups need to pivot today after this announcement including mine

how? was this not a thing with claude cowork?

I wonder if there’s something off the shelf that does this?

Claude Desktop / CoWork already does this.

North Korean employees should do the trick. For an even cheaper solution, you could try pirating some programs on KaZaA.

> ... work with more of the tools and apps you use everyday, generate images, remember your preferences ...

Why is OpenAI obsessed with generating imgaes? Do they think "generate image" is a thing that a software engineer do on a daily basis?

Even when I was doing heavy web development, I can count the number of times I needed to generate images, and usually for prototyping only.

Slides, publications and tech reports, very handy for figures !

I don't think Claude has this part yet:

I swear none of the AI companies have any sense of human centric design.

> pull relevant context from Slack, Notion, and your codebase, then provide you with a prioritized list of actions.

This is an improvement, but it isn't the central focus. It should be more than just on a single work item basis, more than on just code.

If we are going to be managing swarms of AI agents going forward, attention becomes our most valuable resource. AI should be laser focused on helping us decide where to be focused.

I agree with the sentiment but I think for normie agents to take off in the way that you expect, you're going to have to grant them with full access. But, by granting agents full access, you immediately turn the computer into an extremely adversarial device insofar as txt files become credible threat vectors.

For all the benefits that agents offer, they can be asymmetrically harmful. This is not a solved issue. That hurts growth. I don't disagree with your general points, though.

This is me!

I’m semi-normie (MechEng with a bit of Matlab now working as a ceo).

I spend most of my day in Claude code but outputs are word docs, presentations, excel sheets, research etc.

I recently got it to plan a social media campaign and produce a ppt with key messaging and content calendar for the next year, then draft posts in Figma for the first 5 weeks of the campaign and then used a social media aggregator api to download images and schedule in posts.

In two hours I had a decent social media campaign planned and scheduled, something that would have taken 3-4 weeks if I had done it myself by hand.

I’ve vibe coded an interface to run multiple agents at once that have full access via apis and MCPs.

With a daily cron job it goes through my emails and meeting notes, finds tasks, plans execution, executes and then send me a message with a summary of what it has done.

Most knowledge work output is delivered as code (e.g. xml in word docs) so it shouldn’t be that that surprising that it can do all this!

I am starting to use Codex heavily on non-coding tasks. But I am realizing it works because I work and think like a programmer - everything is a file, every file and directory should have very precise responsibilities, versioning is controlled, etc. I don't know how quick all of this will take to spread to the general population.

Most knowledge workers aren't willing to put in the effort so they're getting their work done efficiently.

> My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

I agree this is going to be big. I threw a prototype of a domain-specific agent into the proverbial hornets' nest recently and it has altered the narrative about what might be possible.

The part that makes this powerful is that the LLM is the ultimate UI/UX. You don't need to spend much time developing user interfaces and testing them against customers. Everyone understands the affordances around something that looks like iMessage or WhatsApp. UI/UX development is often the most expensive part of software engineering. Figuring out how to intercept, normalize and expose the domain data is where all of the magic happens. This part is usually trivial by comparison. If most of the business lives in SQL databases, your job is basically done for you. A tool to list the databases and another tool to execute queries against them. That's basically it.

I think there is an emerging B2B/SaaS market here. There are businesses that want bespoke AI tools and don't have the discipline to deploy them in-house. I don't know if it is ever possible for OAI & friends to develop a "hyper" agent that can produce good outcomes here automatically. There are often people problems that make connecting the data sources tricky. Having a human consultant come in and make a case for why they need access to everything is probably more persuasive and likely to succeed.

Maybe but the product category is not necessarily a monolith in the same way that Claude Code is. These general purpose tools will have to action across a heterogeneous set of enterprise systems/tools. A runtime environment must be developed to do that but where that of the agent ends and that of the enterprise systems begins is a totally open question.

I think the coding market will be much larger. Knowledge work is kind of like the leaf nodes of the economy where software is the branches. That's to say, making software easier and cheaper to write will cause more and more complexity and work to move into the Software domain from the "real world" which is much messier and complicated.

Totally agree, AI interfaces will become the norm.

Even all the websites, desktop/mobile apps will become obsolete.

> My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

I disagree. There is a major gap between awesome tech and market uptake.

At this point, the question is whether LLMs are going to be more useful than excel. AI enthusiasts are 100% sure that it’s already more useful than excel, but on the ground, non-technical views do not reflect that view.

All the interviews and real life interactions I have seen, indicate that a narrow band of non-technical experts gain durable benefits from AI.

GenAI is incredible for project starts. A 0 coding experience relative went from mockup to MVP webapp in 3 days, for something he just had an idea about.

GenAI is NOT great for what comes after a non-technical MVP. That webapp had enough issues that, if used at scale, would guarantee litigation.

Mileage varies entirely on whether the person building the tool has sufficient domain expertise to navigate the forest they find themselves in.

Experts constantly decide trade offs which novices don’t even realize matter. Something as innocuous as the placement of switches when you enter the room, can be made inconvenient.

You know what happens to a predator who makes its prey go extinct?

AI is doing the same

>> for the more than 3 million developers who use it every week

It is instructive that they decided to go with weekly active users as a metric, rather than daily active users.

"We’re also releasing more than 90 additional plugins"

but there is no link, why would you not make this a link.

boggles my mind that companies make such little use of hypertext

Tool for everything does nothing really good.

Side note: I really wish there was an expectation that TUI apps implemented accessibility APIs.

Sure we can read the characters in the screen. But accessibility information is structured usually. TUI apps are going to be far less interesting & capable without accessibility built-in.

Can't help but think the surface area for security issues is becoming massive with these tools

"Codex can now operate your computer alongside you" - I really don't want AI to "operate" my computer.

Is it OpenAI Cowork?

Am I the only one who sees screen recordings of AI agents as archaic as filming airplane instruments to take measurements?

What does "major update to codex" mean? New model? Or just new desktop app? The announcement is vague.

I can't help but see some things as a solution in search of a problem every time I see these examples illustrating toy projects. Cloud Tic Tac Toe? Seriously?

This is true. But Anthropic did us dirty most recently and so it’s their turn on the pitch fork. Sam will do us too. Just not yet.

There was brief consternation when OpenAI swooped in to snatch up those DoD contracts but then the next model released and all is forgiven.

everyone seems to unconditionally love anthropic, but openai has always had the best models… it just requires a bit more effort on behalf of the user to actually leverage it.

Not only that, but anthropic is now forcing users to give their biometric information to palantir

They're doing a slow rollout

What do you expect from an app that’s built by not looking at the code?

I'm on M4 Max so your mileage may vary, but what helps me is not running any backdoors willingly.

I want it yes. I already feel like Im the one doing the dumb work for the AI of manually clicking windows and typing in a command here or there it cant do.

Ive also been getting increasingly annoyed with how tedious it is to do the same repetitive actions for simple tasks.

There are people running OpenClaw, so yeah, crazy as it sounds, some do that.

I'm reluctant to run any model without at least a docker.

I don’t think people want that, but they are willing to accept that in order to get stuff done.

can't test pygame otherwise :D

Check it out: you can open the repo in vim and compare changes with git, for the coderiest coding experience

One concrete example: to set up a launch like today, where press, influencers, etc, all came out at 10a PT. That's all coordinated well in advance!

Man this progress is fast.

I'm waiting for the open source ai ecosystem to catch up :/

>background computer use

How does that even work technically? macOS doesn't support multiple cursors. On native Cocoa apps you can pass input to a window without raising via command+click so possibly they synthesized those events, but fewer and fewer apps support that these days. And AppleScript is basically dead, so they can't be using that either.

I also read they acquired the Sky team (who I think were former Apple employees). No wonder they were able to pull of something so slick.

THANK YOU. I keep thinking this as well. I'm rolling my own skills to actually make my job easier, which is all about gathering, surfacing, and synthesizing information so I can make quick informed decisions. I feel like nobody is thinking this way and it's bizarre.

Disclaimer I work at Zapier, but we're doing a ton of this. I have an agent that runs every morning and creates prep documents for my calls. Then a separate one that runs at the end of every week to give me feedback

The macOS app version of Codex I have doesn't show reasoning summaries, just simply 'Thinking'.

Reasoning deltas add additional traffic, especially if running many subagents etc. So on large scale, those deltas maybe are just dropped somewhere.

Saying that, sometimes the GPT reasoning summary is funny to read, in particular when it's working through a large task.

Also, the summaries can reveal real issues with logic in prompts and tool descriptions+configuration, so it allowing debugging.

i.e. "User asked me to do X, system instructions say do Y, tool says Z which is different to what everyone else wants. I am rather confused here! Lets just assume..."

It has previously allowed me to adjust prompts, etc.

It's useful when using prism, and for exploratory research & code.

I do want to see as it allows me to course correct.

Using Claude and Codex side by side now . Would love to just use one eventually

Competition forever, ideally

What's the benefit of using both?

My monthly subscription for Claude is up in a week, is there any compelling reason to switch to Codex (for coding/bug fixing of low/medium difficulty apps)? Or is it pretty much a wash at this point?

I'm switching because of the higher usage limits, 2x speed mode that isn't billed as extra usage, and much more stable and polished Mac app.

FWIW, I've found Codex with GPT-5.4 to be better than Opus-4.6; I would say it's at least worth checking out for your use case.

at least for our scope of work (data, interfacing with data, building things to extract data quickly and dump to warehouse, resuming) claude is performing night and day better than codex. we're still continuing tinkering with codex here to see if we're happy with it but it's taking a lot more human-in-the-loop to keep it from going down the wrong path and we're finding that we're constantly prompt-nudging it to the end result. for the most part after ~3 days we're not super happy with it. kinda feels like claude did last year idk. it's worth checking out and seeing if it's succeeding at the stuff you want it to do.

Wait for new GPT release this/next week and then decide based on benchmarks. That is what I will do.

One main thing is to de-couple the repos from specific agents e.g. use .mcp.json instead of "claude plugins", use AGENTS.md (and symlink to CLAUDE.md) and so on.

I love this because I have absolutely 0 loyalty to any of these companies and once Anthropic nerfs I just switch to OpenAI, then I can switch to Google and so on. Whichever works best.

Honestly, just try it. I used both and there's no reason to not try depending on which model is superior at a given point. I've found 5.4 to be better atm (subject to change any time) even though Claude Code had a slicker UI for awhile.

Only on macOS though? This doesn't seem to work on Linux. Neither does Claude Cowork, not officially.

I don't see how it's possible to support Linux with Wayland, unless you limit the automation only to the browsers.

This is why both companies are in an SF bubble.

Claude Cowork is unusably slow on my M1 MacBook Pro. I wonder if Codex is any better; a quick search indicates that it is also an electron app

Codex is a rust TUI app, and it's available as open source. It has nothing to do with Electron.

Competition is bad? Who cares - let the big players subsidize and compete between each other. That's what we want. We want strong models at a low price, and we'll hype up whoever is doing it.

Simultaneously, we also hype up the open models that are catching up. That are significantly more discounted, that also put pressure on the big players and keep them in check.

People aren't falling for PR; people are encouraging the PR to put pressure on the competition. It's not that hard.

Interesting to see your observation where I have observed the opposite: posts that share big news about open-weight local models have many upvoted comments arguing local models shouldn’t be taken seriously and promoting the SOTA commercial models as the only viable options for serious developers.

Here and on AI tech subreddits (ones that aren’t specifically about local or FOSS) seem to have this dynamic, to the degree I’ve suspected astroturfing.

So it’s refreshing to see maybe that’s just a coincidence or confirmation bias on my end.

> Competition is bad? Who cares - let the big players subsidize and compete between each other.

Subsidizing is the opposite of competing. It's literally the practice of underpricing your product to box out competition. If everyone was competing on a level playing field they would all price their products above cost.

All these tech oligarch asshat companies need to be regulated to hell and back.

I agree but I’d like to add that people are definitely falling for PR, people are always falling for PR or no one would bother with PR

Big players subsidizing is what kills medium and small players which then kills competition. What follows is monopoly.

Big players operating at loss to distort the market is not a good thing overall.

Go to /r/codex and see how pissed off people are by the new Codex Plus plan 5-hour limits (they're a sliver of what they were a week ago). Whatever OpenAI is doing to market on Reddit isn't working.

Thing is, Codex 5.3 is a better and more consistent model than anything Anthropic have come out with. It can deal with larger codebases, has compaction that works, and has much less of a tendency to resort to sycophantic hallucination as it runs out of ideas. I also appreciate their approach to third party harnesses like opencode, which is obviously the complete opposite to Anthropic and their scramble to keep their crumbling garden walls upright.

Which makes it even more of a shame that Sam Altman is such a psychopathic jackass.

So Anthropic degraded their product. OAI updated their product to meet for exceeded Anthropic old product.

This is normal behavior and not a cause for such a hyperbolic response.

This is the benefits of competition in action

Hacker Times

Hacker Times

Codex for almost everything

Discussion

Discussion