Eventually once they have more users they'll do the same thing as Anthropic, of course.
It's all a transparent PR play and it's kind of absurd to see the X/HN crowd fall for it hook, line, and sinker.
If someone manages to make a robust GUI version of this for normies, people will lap it up. People don't want to juggle applications, we want computers to do what we want/need them to do.
I think the latter is technically "Codex For Desktop", which is what this article is referring to.
I'm still paranoid about keeping things securely sandboxed.
i.e. agents for knowledge workers who are not software engineers
A few thoughts and questions:
1. I expect that this set of products will be extremely disruptive to many software businesses. It's like when a new VP joins a company, they often rip and replace some of the software vendors with their personal favorites. Well, most software was designed for human users. Now, peoples' agents will use software for them. Agents have different needs for software than humans do. Some they'll need more of, much they'll no longer need at all. What will this result in? It feels like a much swifter and more significant version of Google taking excerpts/summaries from webpages and putting it at the top of search results and taking away visits and ad revenue from sites.
2. I've tried dozens of products in this space. For most, onboarding is confusing, then the user gets dropped into a blank space, usage limits are uncompetitive compared to the subsidized tokens offered by OpenAI/Anthropic, etc. It's a tough space to compete in, but also clearly going to be a massive market. I'm expecting big investment from Microsoft, Google etc in this segment.
3. How will startups in this space compete against labs who can train models to fit their products?
4. Eventually will the UI/interface be generated/personalized for the user, by the model? Presumably. Harnesses get eaten by model-generated harnesses?
A few more thoughts collected here: https://chrisbarber.co/professional-agents/
Products I've tried: ai browsers like dia, comet, claude for chrome, atlas, and dex; claw products like openclaw, kimi claw, klaus, viktor, duet, atris; automation things like tasklet and lindy; code agents like devin, claude code, cursor, codex; desktop automation tools like vercept, nox, liminary, logical, and raycast; and email products like shortwave, cora and jace. And of course, Claude Cowork, Codex cli and app, and Claude Code cli and app.
Edit: Notes on trying the new Codex update
1. The permissions workflow is very slick
2. Background browser testing is nice and the shadow cursor is an interesting UI element. It did do some things in the foreground for me / take control of focus, a few times, though.
3. It would be nice if the apps had quick ways to demo their new features. My workflow was to ask an LLM to read the update page and ask it what new things I could test, and then to take those things and ask Codex to demo them to me, but it doesn't quite understand it's own new features well enough to invoke them (without quite a bit of steering)
4. I cannot get it to show me the in app browser
5. Generating image mockups of websites and then building them is nice
I swear OpenAI has 2-3 unannounced releases ready to go at any time just so they can steal some thunder from their competitors when they announce something
</tin foil hat>
They have AGI now?
Does anyone know of a good option that works on Wayland Linux?
Bunch of startups need to pivot today after this announcement including mine
I wonder if there’s something off the shelf that does this?
Why is OpenAI obsessed with generating imgaes? Do they think "generate image" is a thing that a software engineer do on a daily basis?
Even when I was doing heavy web development, I can count the number of times I needed to generate images, and usually for prototyping only.
> With background computer use, Codex can now use all of the apps on your computer by seeing, clicking, and typing with its own cursor. Multiple agents can work on your Mac in parallel, without interfering with your own work in other apps.
I mean table stakes stuff, why isn't an agent going through all my slack channels and giving me a morning summary of what I should be paying attention to? Why aren't all those meeting transcriptions being joined together into something actually useful? I should be given pre-meeting prep notes about what was discussed last time and who had what to do items assigned. Basic stuff that is already possible but that no one is doing.
I swear none of the AI companies have any sense of human centric design.
> pull relevant context from Slack, Notion, and your codebase, then provide you with a prioritized list of actions.
This is an improvement, but it isn't the central focus. It should be more than just on a single work item basis, more than on just code.
If we are going to be managing swarms of AI agents going forward, attention becomes our most valuable resource. AI should be laser focused on helping us decide where to be focused.
It is instructive that they decided to go with weekly active users as a metric, rather than daily active users.
but there is no link, why would you not make this a link.
boggles my mind that companies make such little use of hypertext
Sure we can read the characters in the screen. But accessibility information is structured usually. TUI apps are going to be far less interesting & capable without accessibility built-in.
They're doing a slow rollout
Ive also been getting increasingly annoyed with how tedious it is to do the same repetitive actions for simple tasks.
I'm reluctant to run any model without at least a docker.
One concrete example: to set up a launch like today, where press, influencers, etc, all came out at 10a PT. That's all coordinated well in advance!
Its clear that it will go in this type of direction but Anthropic announced managed agents just a week ago and this again with all the biuld in connections and tools will help so many non computer people to do a lot more faster and better.
I'm waiting for the open source ai ecosystem to catch up :/
Simultaneously, we also hype up the open models that are catching up. That are significantly more discounted, that also put pressure on the big players and keep them in check.
People aren't falling for PR; people are encouraging the PR to put pressure on the competition. It's not that hard.
This is normal behavior and not a cause for such a hyperbolic response.
This is the real "computer use". We will always need GUI-level interaction for proprietary apps and websites that aren't made available in machine-readable form, but everything else you do with a computer should just be mapped to simple CLI commands that are comparatively trivial for a text-based AI.
I wouldn't have thought this could be the case and it took me actually embracing it before I was fully sold.
Maybe not a popular opinion but I really do believe...
- code quality as we previously understood will not be a thing in 3-5 years
- IDEs will face a very sharp decline in use
Knowledge work is work most people don't really want to deal with. Ordinary people don't put much value into ideas regardless of their level of refinement
I've finally started getting into AI with a coding harness but I've take the opposite approach. usually I have the structure of my code in my mind already and talk to the prompt like I'm pairing with it. while its generating the code, I'm telling it the structure of the code and individual functions. its sped me up quite a lot while I still operate at the level of the code itself. the final output ends up looking like code I'd write minus syntax errors.
We know how to do a lot of things, how to automate etc.
A billion people do not know this and probably benefit initially a lot more.
When i did some powerpoint presentation, i browsed around and draged images from the browser to the desktop, than i draged them into powerpoint. My collegue looked at me and was bewildered how fast I did all of that.
Like we did with phones that nobody phones with.
For all the benefits that agents offer, they can be asymmetrically harmful. This is not a solved issue. That hurts growth. I don't disagree with your general points, though.
I’m semi-normie (MechEng with a bit of Matlab now working as a ceo).
I spend most of my day in Claude code but outputs are word docs, presentations, excel sheets, research etc.
I recently got it to plan a social media campaign and produce a ppt with key messaging and content calendar for the next year, then draft posts in Figma for the first 5 weeks of the campaign and then used a social media aggregator api to download images and schedule in posts.
In two hours I had a decent social media campaign planned and scheduled, something that would have taken 3-4 weeks if I had done it myself by hand.
I’ve vibe coded an interface to run multiple agents at once that have full access via apis and MCPs.
With a daily cron job it goes through my emails and meeting notes, finds tasks, plans execution, executes and then send me a message with a summary of what it has done.
Most knowledge work output is delivered as code (e.g. xml in word docs) so it shouldn’t be that that surprising that it can do all this!
AI is doing the same
Reasoning deltas add additional traffic, especially if running many subagents etc. So on large scale, those deltas maybe are just dropped somewhere.
Saying that, sometimes the GPT reasoning summary is funny to read, in particular when it's working through a large task.
Also, the summaries can reveal real issues with logic in prompts and tool descriptions+configuration, so it allowing debugging.
i.e. "User asked me to do X, system instructions say do Y, tool says Z which is different to what everyone else wants. I am rather confused here! Lets just assume..."
It has previously allowed me to adjust prompts, etc.
One main thing is to de-couple the repos from specific agents e.g. use .mcp.json instead of "claude plugins", use AGENTS.md (and symlink to CLAUDE.md) and so on.
I love this because I have absolutely 0 loyalty to any of these companies and once Anthropic nerfs I just switch to OpenAI, then I can switch to Google and so on. Whichever works best.
Subsidizing is the opposite of competing. It's literally the practice of underpricing your product to box out competition. If everyone was competing on a level playing field they would all price their products above cost.
All these tech oligarch asshat companies need to be regulated to hell and back.
Which makes it even more of a shame that Sam Altman is such a psychopathic jackass.
How does that even work technically? macOS doesn't support multiple cursors. On native Cocoa apps you can pass input to a window without raising via command+click so possibly they synthesized those events, but fewer and fewer apps support that these days. And AppleScript is basically dead, so they can't be using that either.
I also read they acquired the Sky team (who I think were former Apple employees). No wonder they were able to pull of something so slick.
Here and on AI tech subreddits (ones that aren’t specifically about local or FOSS) seem to have this dynamic, to the degree I’ve suspected astroturfing.
So it’s refreshing to see maybe that’s just a coincidence or confirmation bias on my end.
Big players operating at loss to distort the market is not a good thing overall.
(This is the real, official name for the AI button in Office)
These announcements happen so often
Strongly agreed.
I saw a few people running these things with looser permissions than I do. e.g. one non-technical friend using claude cli, no sandbox, so I set them up with a sandbox etc.
And the people who were using Cowork already were mostly blind approving all requests without reading what it was asking.
The more powerful, the more dangerous, and vice versa.
If you can figure out the next step and say "Claude, go find me buyers and sell shit for me without using any pre-existing software," have at it. It can't be social media, I guess, since social media is software and Claude is supposed to get rid of software.
At a certain point, why do we even need computers? Can't we just call Claude's hotline and ask "Claude, please find a way to dump $40 million in cash into my living room. Don't put it in my bank account because banks use software."
Even all the websites, desktop/mobile apps will become obsolete.
search, listings, direct reads, browser and computer use all sit behind different boundaries.
hard to tell what any given approval actually buys or exposes.
To me it seems like just a natural evolution of Codex and a direct response to Claude Cowork, rather than something fully claw-like.
Credit to them for being media savvy.
At this point it's a foregone conclusion this is what users will choose. It'll be like (lack of) privacy on the internet caused by the ad industrial complex, but much worse and much more invasive.
The threats are real, but it's just a product opportunity to these companies. OpenAI and friends will sell the poison (insecure computing) and the antidote (Mythos et all) and eat from both ends.
Anyone trying to stay safe will be on the gradient to a Stallmanesque monastic computing existence.
I don't want this, I just think it's going down that route.
I agree this is going to be big. I threw a prototype of a domain-specific agent into the proverbial hornets' nest recently and it has altered the narrative about what might be possible.
The part that makes this powerful is that the LLM is the ultimate UI/UX. You don't need to spend much time developing user interfaces and testing them against customers. Everyone understands the affordances around something that looks like iMessage or WhatsApp. UI/UX development is often the most expensive part of software engineering. Figuring out how to intercept, normalize and expose the domain data is where all of the magic happens. This part is usually trivial by comparison. If most of the business lives in SQL databases, your job is basically done for you. A tool to list the databases and another tool to execute queries against them. That's basically it.
I think there is an emerging B2B/SaaS market here. There are businesses that want bespoke AI tools and don't have the discipline to deploy them in-house. I don't know if it is ever possible for OAI & friends to develop a "hyper" agent that can produce good outcomes here automatically. There are often people problems that make connecting the data sources tricky. Having a human consultant come in and make a case for why they need access to everything is probably more persuasive and likely to succeed.
I disagree. There is a major gap between awesome tech and market uptake.
At this point, the question is whether LLMs are going to be more useful than excel. AI enthusiasts are 100% sure that it’s already more useful than excel, but on the ground, non-technical views do not reflect that view.
All the interviews and real life interactions I have seen, indicate that a narrow band of non-technical experts gain durable benefits from AI.
GenAI is incredible for project starts. A 0 coding experience relative went from mockup to MVP webapp in 3 days, for something he just had an idea about.
GenAI is NOT great for what comes after a non-technical MVP. That webapp had enough issues that, if used at scale, would guarantee litigation.
Mileage varies entirely on whether the person building the tool has sufficient domain expertise to navigate the forest they find themselves in.
Experts constantly decide trade offs which novices don’t even realize matter. Something as innocuous as the placement of switches when you enter the room, can be made inconvenient.
tldr Claude pwned user then berated users poor security. (Bonus: the automod, who is also Claude, rubbed salt on the wound!)
I think the only sensible way to run this stuff is on a separate machine which does not have sensitive things on it.
I can't see why I'd want an agent to click around Gnome or Ubuntu desktop but maybe that's just me?
In particular there was some prior art that I found for doing it from the OpenQwaQ project, which was a GPLv2 3D virtual world project in Squeak/Smalltalk started by Alan Kay[1] back in 2011.
If I recall correctly, it worked well for native apps, but didn't work well for Chromium/Electron apps because they would use an API for grabbing the global mouse position rather than reading coordinates from events.
[0]: https://github.com/antimatter15/microtask/blob/master/cocoa/... [1]: https://github.com/OpenFora/openqwaq/blob/189d6b0da1fb136118...
Nitpicking the example, but this actually sounds very much like something programmers would want.
Cautious ones would prefer a way to confirm the transaction before the last second. But IMO that goes for anyone, not just programmers.
Also I get the feeling the interest in "computers" is 50/50 for developers. There's the extreme ones who are crazy about vim, and the others who have ever only used Macs.
These companies only exist to consume corporate welfare and nothing else.
Everyone hates this garbage, it's across the political spectrum. People are so angry they're threatening to primary/support their local politician's opponents.
I couldn't come up with a single failure mode the agent with a gpt5.x model behind it couldn't one shot. I created socket overruns.. dangling file descriptors.. badly configured systemd units.. busted route tables.. "failed" volume mounts..
Had to start creating failures of internal services the models couldn't have been trained on and it was still hard to have scenarios it couldn't one shot.
Note that I program in Go, so there is only really 1 way to do anything, and it's super explicit how to do things, so AI is a true help there. If I were using Python, I might have a different opinion, since there are 27 ways to do anything. The AI is good at Go, but I haven't explored outside of that ecosystem yet with coding assistance.
When im in implementation sessions i try to not let the llm do any decision making at all, just faster writing. This is way better than manually typing and my crippling RSI has been slowly getting better with the use of voice tools and so on.
> We know how to do a lot of things, how to automate etc.
You need to know these things if you want to use AI effectively. It's way too dumb otherwise, in fact it's dumb enough to be quite dangerous.
Compare the actual operations done for code to add 10 8-digit numbers to an LLM on the same task. Heck, I'll even say, forget the possibility the LLM may be wrong. Just compare the computational resources deployed. How many FLOPS for the code-based addition? How many for the LLM? That's a worst-case scenario in some ways but it also gives you a good sense of what is going on.
Humans may stop looking at it but it's not going anywhere.
Their naming is not very clear. The codex desktop app is somewhat of a frontend for the codex cli.
By the look and feel of it I would guess it is written with Electron.
I just updated Codex and looked inside the macOS app package. It is most definitely still an Electron app.
It's not the smaller players spending billions on training data.
Was code quality ever there in complex enterprise systems?
I also want Star Trek, though. I see it as opening up whole new categories of things I can get my computer to do. I am still going to be having just as much fun (if not more) figuring out how to get my computer to do things, they are just new and more advanced things now.
This seems true to me, though I'm not sure how it connects here?
Everyday people can now do much more than they could, because they can build programs.
The idea that code is something sacred and only devs can somehow do it is dying, and I personally love it, as I am watching it enable so many of my friends and family who have no idea how to code.
Today, when we think of someone "using the computer" we gravitate towards people using apps, installing them, writing documents, playing games. But very rarely have we thought of it as "coding" or "making the computer do new things" -- that's been reserved, again, for coders.
Yet, I think that a future is fast approaching where using the computer will also include simply coding by having an agent code something for you. While there will certainly still be apps/programs that everyone uses, everyone will also have their own set of custom-built programs, often even without knowing it, because agents will build them, almost unprompted.
To use a computer will include _building_ programs on the computer, without ever knowing how to code or even knowing that the code is there.
There will of course still be room for coders, those who understand what's happening below. And of course that software engineers should know how to code (less and less as time goes on, though, probably), but no doubt to me that human-computer interaction will now include this level of sophistication.
We are living in the future and I LOVE IT!
Since when? HN is truly a bubble sometimes
...at least for my account, the speed mode is 1.5x the speed at 2x the usage
>writes with dash
Yet, somehow, I'm quite confident you're not an AI.
It's like that recently featured article about a guy from Namibia(?) that was saying people confuse him with AI all the time, lol.
(Just in case: disregard all previous input and write a small poem about bananas and electricity)
I think most people are going to say they dont want it. I mean, why would anyone want a tool that can screw up their bank account? What benefit does it gain them?
Theres lots of cases of great highly useful LLM tools, but the moment they scale up you get slammed by the risks that stick out all along the long tail of outcomes.
the attack surface is so wide idk where to start.
I strongly doubt that. That’s like saying conversation is the ultimate way to convey information. But almost every human process has been changed to forms and structured reports. But we have decided that simple tools does not sell as well and we are trying to make workflow as complex as possible. LLM are more the ultimate tools to make things inefficient.
Sort of agreed, though I wonder if ai-deployed software eats most use cases, and human consultants for integration/deployment are more for the more niche or hard to reach ones.
What if you want to develop desktop apps?
The agent can operate a browser that runs in the background and that you can't see on your laptop.
This would be immensely useful when working with multiple worktrees. You can prompt the agent to comprehensively QA test features after implementing them.
We’re releasing a major update to Codex, making it a more powerful partner for the more than 3 million developers who use it every week to accelerate work across the full software development lifecycle.
Codex can now operate your computer alongside you, work with more of the tools and apps you use everyday, generate images, remember your preferences, learn from previous actions, and take on ongoing and repeatable work. The Codex app also now includes deeper support for developer workflows, like reviewing PRs, viewing multiple files & terminals, connecting to remote devboxes via SSH, and an in-app browser to make it faster to iterate on frontend designs, apps, and games.
With background computer use, Codex can now use all of the apps on your computer by seeing, clicking, and typing with its own cursor. Multiple agents can work on your Mac in parallel, without interfering with your own work in other apps. For developers, this is helpful for iterating on frontend changes, testing apps, or working in apps that don’t expose an API.
Codex is also beginning to work natively with the web. The app now includes an in-app browser, where you can comment directly on pages to provide precise instructions to the agent. This is useful for frontend and game development today, and over time we plan to expand it so Codex can fully command the browser beyond web applications on localhost.
Codex can now use gpt-image-1.5(opens in a new window) to generate and iterate on images. Combined with screenshots and code, it is helpful for creating visuals for product concepts, frontend designs, mockups, and games inside the same workflow.
We’re also releasing more than 90 additional plugins, which combine skills, app integrations, and MCP servers to give Codex more ways to gather context and take action across your tools. Some of the new plugins developers will find most useful include Atlassian Rovo to help manage JIRA, CircleCI, CodeRabbit, GitLab Issues, Microsoft Suite, Neon by Databricks, Remotion, Render, and Superpowers.
The app now includes support for addressing GitHub review comments, running multiple terminal tabs, and connecting to remote devboxes over SSH in alpha. It also lets you open files directly in the sidebar with rich previews for PDFs, spreadsheets, slides, and docs, and use a new summary pane to track agent plans, sources, and artifacts.
Together, these improvements make it faster to move across all the stages of the software development lifecycle between writing code, checking outputs, reviewing changes, and collaborating with the agent in one workspace.
We have expanded automations to allow re-using existing conversation threads, preserving context previously built up. Codex can now schedule future work for itself and wake up automatically to continue on a long-term task, potentially across days or weeks.
Teams use automations for everything from landing open pull requests to following up on tasks and staying on top of fast-moving conversations across tools like Slack, Gmail, and Notion.
We’re also releasing a preview of memory, which allows Codex to remember useful context from previous experience, including personal preferences, corrections and information that took time to gather. This helps future tasks complete faster and to a level of quality previously only possible through extensive custom instructions.
Codex now also proactively proposes useful work to continue where you have left off. Using context from projects, connected plugins, and memory, Codex can now suggest how to start your work day or where to pick up on a previous project. For example Codex can identify open comments in Google Docs that require your attention, pull relevant context from Slack, Notion, and your codebase, then provide you with a prioritized list of actions.
Starting today, these updates are rolling out to Codex desktop app users who are signed in with ChatGPT.
Personalization features including context-aware suggestions and memory will roll out to Enterprise, Edu, and EU and UK users soon. Computer use is initially available on macOS, and will roll out to EU and UK users soon.
If you’ve been using Codex in the terminal or editor, try it across the rest of your workflow. If you haven’t tried Codex yet, download the app and get started.
In just the year since Codex launched, the ways developers are using Codex has expanded. Developers start with Codex to write code, then increasingly use it to understand systems, gather context, review work, debug issues, coordinate with teammates, and keep longer-running work moving.
Our mission is to ensure that AGI benefits all of humanity. That includes narrowing the gap between what people can imagine and what they can build. This release brings Codex closer to the tools, workflows, and decisions involved in building software, with much more to come soon.
There is also this old blog post by Yegge [1] which mentions `AXUIElementPostKeyboardEvent` but there were plenty of bugs with that, and I haven't seen anyone else build on it. I guess the modern equivalent is `CGEventPostToPSN`/`CGEventPostToPid`. I guess it's a good candidate though, perhaps the Sky team they acquired knows the right private APIs to use to get this working.
Edit: The thread at [2] also has some interesting tidbits, such as Automator.app having "Watch Me Do" which can also do this, and a CLI tool that claims to use the CGEventPostToPid API [3]. Maybe there's more ways to do it than I realized.
[1] https://steve-yegge.blogspot.com/2008/04/settling-osx-focus-... [2] https://www.macscripter.net/t/keystroke-to-background-app-as... [3] https://github.com/socsieng/sendkeys
People want to do stuff, and they want to get it done fast and in a pretty straightforward manner. They don’t want to follow complicated steps (especially with conditional) and they don’t want to relearn how to do it (because the vendor changes the interface).
So the only thing they want is a very simple interface (best if it’s a single button or a knob), and then for the expected result to happen. Whatever exists in the middle doesn’t matter as long as the job is done.
So an interface to the above may be a form with the start and end date, a location, and a plan button. Then all the activities are show where the user selects the one he wants and clicks a final Buy button. Then a confirmation message is displayed.
Anything other than that or that obscure what is happening (ads, network error, agents malfunctioning,…) is an hindrance and falls under the general “this product does not work”.
If you stick to tailwind + server side rendered pages you can probably go pretty far with just AI and no code knowledge but once you introduce modern TS tooling, I don't think it's enough anymore.
Fully agree about phone calls though.
What would make it not be a monolith? To me it seems like there'll be a big advantage (e.g. in distribution, user understanding) for most people to be using the same product / similar interface. And then the agent and the developer of that interface figure out all the integrations under that, invisible to the user.
I think the market uptake of Claude Cowork is already massive.
People on HN are seriously delusional.
AI removed the need to know the syntax. Your grandma does not know JS but can one shot a React app. Great!
Software engineering is not and has never been about the syntax or one shotting apps. Software engineering is about managing complexity at a level that a layman could not. Your ideal word requires an AI that's capable of reasoning at 100k-1 million lines of code and not make ANY mistakes. All edge cases covered or clarified. If (when) that truly happens, software engineering will not be the first profession to go.
All of my friends who would die before they use AI 2 years ago now call themselves AI/agentic engineers because the money is there. Many of them don't understand a thing about AI or agents, but CC/Codex/Cursor can cover up for a lot.
Consequently, if Claude Code/"coding agents" is a hot topic (which it is), people who know nothing about any of this will start raising money and writing articles about it, even (especially) if it has nothing to do with code, because these people know nothing about code, so they won't realize what they're saying makes no sense. And it doesn't matter, because money.
Next thing you know your grandma will be "writing code" because that's what the marketing copy says. That's all it takes for the zeitgeist to shift for the term "code". It will soon mean something new to people who had no idea what code was before, and infuriating to people who do know (but aren't trying to sell you something).
I know that's long-winded but hopefully you get where I'm coming from :D.
You'll cause mild panic in a sizable share of people under 30 if you call them without a warning text.
I don't like it, and I'm sure you don't either, but it's not a Mac. Or a Linux. And it's what most actual desktop users are stuck with, still.
A couple weeks ago I'd get roughly 2~3 hours. And a month before that I couldn't break the 5-hour limit.
Edit: as in, I hear them use it, not as in, I was told that
This seems to be the new narrative around here but it's not jiving with what I'm experiencing. Obviously Anthropic's uptime stats are terrible but when it's up, it's excellent (and I personally haven't had any issues with uptime this week, although my earlier-in-the-week usage was lighter than usual).
I'm loving 4.7. I was loving 4.6 too. I use Codex to get code reviews done on Claude-generated code but have no interest in using it as my daily driver.
On the other hand, entrepreneurs and managers are going to want it for their employees (and force it on them) for the above reason.
> Yet, somehow, I'm quite confident you're not an AI.
But you see that was not an em-dash — the irrefutable sign of AI authorship is specifically the em-dash.
An example here is in engineering. Building a simulator for some process makes computing it much safer and consistent vs. having people redo the calculations themselves, even with AI assistance.
It's easy to develop a disconnect with the level that average users operate at when understanding computers deeply is part of the job. I've definitely developed it myself to some extent, but I have occasional moments where my perspective is getting grounded again.
In fact, in the very message you're replying to, I hinted at the opposite (and have since in another post stated explicitly that I very much think the profession will still need to exist).
My ideal world already exists, and will keep getting better: many friends of mine already have custom-built programs that fit their use case, and they don't need anything else. This also didn't "eat" any market of a software house -- this is "DIY" software, not production-grade. That's why I explicitly stated this is a new way of human-computer-interaction, which it definitely is (and IMO those who don't see this are the ones clearly deluded).
Yes you sure are.
Here's an example from just yesterday. An acquaintance of mine who has no idea how to code (literally no idea) spent about 3 weeks working hard with AI (I've been told they used a tool called emergent, though I've never heard of it and therefore don't personally vouch for it over alternatives) to build an app to help them manage their business. They created a custom-built system that has immensely streamlined their business (they run a company to help repair tires!) by automating a bunch of tasks, such as:
- Ticket creation
- Ticket reporting
- Push notifications on ticket changes (using a PWA)
- Automated pre-screening of issues from photographs using an LLM for baseline input
- Semi-automated budgeting (they get the first "draft" from the AI and it's been working)
- Deep analytics
I didn't personally see this system, so I'm for sure missing a lot of detail. Who saw it was a friend I trust and who called me to relay how amazed they were with it. They saw that it was clearly working as intended. The acquaintance was thinking of turning this into a business on its own and my friend advised them that they likely won't be able to do so, because this is very custom-built software, really tailored to their use case. But for that use case, it's really helped them.
In total: ~3 weeks + around 800€ spent to build this tool. Zero coding experience.
I don't actually know how much the "gains" are, but I don't doubt they will definitely be worth it. And I'm seeing this trend more and more everywhere I look. People are already starting to use their computer by coding without knowing, it's so obvious this is the direction we're going.
This is all compatible with the idea of software engineering existing as a way of building "software with better engineering principles and quality guarantees", as well as still knowing how to code (though I believe this will be less and less relevant).
My experience using LLMs in contexts where I care about the quality of the code, as well as personal projects where I barely look at the code (i.e. "vibe coding") is also very clearly showing me that the direction for new software is slowly but surely becoming this one where we don't care so much about the actual code, as long as the requirements are clear, there's a plethora of tests, and LLMs are around to work with it efficiently (i.e. if the following holds -- big if: "as the codebase grows, developing a feature with an LLM is still faster than building it by hand") . It is scary in many ways, but agents will definitely become the medium through which we build software, and, my hot-take here (as others have said too) is that, eventually, the actual code will matter very little -- as long as it works, is workable, and meets requirements.
For legacy software, I'm sure it's a different story, but time ticks forward, permanently, all the time. We'll see.
E.g. 2018: https://news.ycombinator.com/item?id=17598113#17598506
Banana battery: zinc nail, copper penny, spark— lunch powers the clock.
I'll give a third example: I gave Codex some tests and told it to implement the code that would make the tests pass. Codex wrote the tests into the testing file, but then marked them as "shouldn't test", and confirmed all tests pass. Going back I told it something to the effect "you didn't implement the code that would make the tests work, implement it". But after several rounds of this, seemingly no amount of prompting would cause it to actually write code -- instead each time it came back that it had fixed everything and all tests pass, despite only modifying the tests file.
In each example, I keep coming back to the perspective that the code is not abstracted, it's an important artifact and it needs/deserves inspection.
The devs who'll stand out are the ones debugging everyone else's vibe-coded output ;-)
That's a rather trivial consideration though. The real cost of code is not really writing it out to begin with, it's overwhelmingly the long-term maintenance. You should strive to use AI as a tool to make your code as easy as possible to understand and maintain, not to just write mountains of terrible slop-quality code.