Where Agents Converge

The models labs are converging on managing the whole agent experience, blurring the line between model and agent.

They’re slowly baking the harness in to their models to the point where you can’t abstract them — you become too dependent on exactly how they handle their subagent spawning, their context compaction, their ability to spawn thousands of instances on-demand.

And it’s probably better this way too.

The compute, the models, and the harness. Everything else is just an application layer on top.

The model labs own pretty much all of the compute at this point, causing global shortages of fab allocation and memory chips.

SOTA models are basically owned by Anthropic and OpenAI, with some more niche use cases for Gemini’s capabilities.

Compute became commoditized back during the cloud revolution of the 2010’s

The models became commoditized in ~2024-2025

But you can’t so easily commoditize the harness if it depends on the model.

The labs are now creating the best harnesses for their models.

Codex significantly outperforms Cursor when using gpt-5.3-codex or gpt-5.4. It explores the codebase better, builds more cohesive plans, doesn’t go off on on infurating tangents, and is more accessible than a full-fledged IDE. It’s also significantly less bloated.

Startups are spending most of their engineering effort on the harness, so much time that they’re starting to resort to running Codex or Claude Code in a Sandbox instead of building their own.

That makes sense: wouldn’t the most qualified people to build the harness be the ones who made the model? The ones who can RL the model along side the harness?

Of course Codex and Claude Code are going to be better than anything that you can make with the Vercel AI SDK poorly glued to a DB constantly desyncing with your Sandbox and instrumented with Datadog.

This is what happened with Elevenlabs, but at 5x speed because the market for voice AI is substantially smaller.

Elevenlabs pretty quickly hit the walls of where they could get with just the models, and had to start walking downstream to the application layer to keep growing at the rate expected of them.

And who better to build voice AI agents than the people who made the voice AI?

They can provide unbeatable latencies, cheaper character (token) prices because they don’t have application-layer margins, and subsidize early adoption to crush startups from competing with them.

Effectively, they can become their best API customer.

Being upstream at the model layer, the had incredible early-market indicators that the application layer was printing large margins, and that they API layer was a commoditized race to the bottom.

The only way to compete was to make an unbeatable, vertically integrated experience that only a model lab could produce. Just like how Apple can demand such loyalty and high prices.

And we’re already starting to see it in the big labs, just look at Claude Cowork.

Just like the data lock-in due to egress fees with cloud providers. The model labs have their own dimension: your context.

The APIs for accessing the models are already becoming more managed. The Claude API introduces managed compaction so you don’t have to worry about it (and now you depend on it)

The OpenAI Responses API will persist your context for you, so you only need to send the new messages (now you can’t easily move that context to another inference API).

They’re now building the features that lock your agent into their API specifically, and as a result, into their models.

They can provide lower latencies to the models, and subsidize the tokens if they know as a result you can’t leave.

And if you do find something novel, they’ll probably just clone or acquire it within 8 months, because why would they let you benefit more than them from their models?

This is how they prevent you from using abtraction layers like the Vercel AI SDK to be able to switch models: your agents become dependent on their infrastructure, their features, their harness.

The worst part: They can give you early access to models if you use their harness, but it won’t be available on the API “for a few more weeks”.

We know that works, because that’s exactly what OpenAI did with gpt-5.3-codex: they forced you to use their harness, so you could see “Oh… this is better, I think I’ll just stick with this instead of going back to Cursor.”

Instead of having to glue together an agent loop, context compaction, memory, a sandbox, and a persistence layer: what if you just store a session ID and make HTTP requests to send messages to an instance of your agent?

No worrying about provisioning and hiberating sandboxes: you just get magical machines with magically bottomless disks.

What if every agent got its own email inbox, and emails were automatically routed as follow-up or steer messages to the right place?

What if subagents automatically shared the same sandbox as the parent agent, but you could also launchSubagentInNewSandbox: true?

Every application layer company would prefer that. It allows us to spend our focus and energy on business logic, and not re-building the same infrastructure pieces that every AI company is gluing together right now.

Your agent doesn’t need to run super fast? Just flip batch: true and now they give you discounted tokens on your agents.

But once they’ve grabbed all the land, what’s to stop them from slowly creeping prices up?

If you look at OpenAI and Anthropic with all that money and all these data center build outs: the economies of scale simply do not allow anyone to compete with them.

The models will be more expensive if you run them over the API directly vs just using their agent layer.

The compute time will be more expensive than anyone running on a cloud provider, and anyone running servers racked in colos that can’t match the scale of Stargate.

The market for Sandboxes will simply collapse.

Where does this go from here? I have no idea. We’re probably all just doing market research for them at this point.

This is probably 1-2 years out, depending on how much the model labs already believe this, and how quickly they can converge on it.

Or maybe I’m totally wrong, but I guess we’ll see.

Hacker Times

Hacker Times

Discussion

Discussion