MicroVMs: Run isolated sandboxes with full lifecycle control

There are sooooo many sandbox providers out there.

They do spike on different features like:

    - snapshotting and forking
    - good SSH and VPN access for end-users
    - agent-friendly features, like obscuring secrets at network layer

Then there's also the option to use libkrun to run local sandboxes on your own computer. That doesn't scratch the itch for hosted services, but works if your goal is to run agents inside isolated environments for your own work.

I've been working on some open-core stuff[1] to coordinate sandboxes, and we're making changes to have a library that lets people coordinate any number of remote or local sandboxes using any provider, kinda like how the Docker CLI works for managing containers, git repos, and coding agents. Flue[2] is another player in this space, and is more of a pure framework, while we're building it as an interactive product for using sandboxed agents and workflows.

[1] https://github.com/gofixpoint/amika/blob/main/ROADMAP.md

[2]: https://flueframework.com/

It's about time AWS got into the agent sandbox game.

The startups in this space right now don't provide much value on top of the cloud providers they're wrapping. They don't tend to be run by experienced infra people either so they seem very vibecoded, insecure, janky, etc. They're also significantly overpriced because they're marking up already expensive providers.

Something surprising from my own experience is that while there's certainly a huge role for async agents in cloud sandboxes, async agents running locally seem more useful in many cases.

For those looking to run agents: the short lifecycle of the typical “sandbox” seems surprisingly limiting to me. I have no actual workflow where I want one of these products. Sometimes a VM can live for 30 minutes, but it also might need to live for a month, and I don’t know beforehand.

This is why I have been avoiding the word sandbox for exe.dev. I don’t think developers agents need something “sandbox” shaped.

What's the best provider to self-host Firecracker? I feel that AWS is not a safe or cost-effective option for a self-funded startup or small business. Although is anything cost effective anymore? Hetzner just had a massive price hike.

Part of it might just be that I am old and inflation is catching up with my understanding of prices.

But as far as AWS I still have to say no thanks. Imagine some group actually started using my hosted AI agent service for something compute and network intensive. It could turn into $2000 overnight and if I didn't account for one of the numerous types of AWS charges, I might have only collected $500 for credits purchases.

Or it could easily be ten times that. But who am I kidding. No one is going to use my agents. So it doesn't matter if it's gvisor or Firecracker or whatever.

  > Containers launch in seconds, yet their shared-kernel architecture requires significant custom hardening to safely contain untrusted code

That's literally why they made Fargate. It's managed firecracker VMs with containers. They invented firecracker for this purpose. This new product is competing with Fargate, but they don't mention Fargate at all in the announcement.

  > you create a MicroVM Image by supplying a Dockerfile and code packaged as a zip artifact in Amazon S3
  > 
  > MicroVMs support up to 8 hours of total runtime

So you're already using containers with this new thing, same as Fargate! And not only that, it's more limited in runtime than Fargate! The only thing different with this service is stateful file storage, which is actually a problem you later have to engineer around, which is why containers are stateless.

This smells like a competing team building something to capitalize on AI hype, but the product isn't differentiated enough for this to make sense long term. If this was a service called managed AI agents, and you added features specific to AI agents, that has value. But "here's Fargate with a different name" isn't gonna last.

Shouldn’t the title be “AWS Lambda MicroVMs”? MicroVMs are an existing concept.

We have this page which compares a whole bunch of sandbox providers in different categories

https://engine.build/lab/agent-sandboxes

Will add MicroVMs there today (and any others that are missing if you let me know!)

I don’t get it we are paying at least hundreds or maybe thousands per month on ai costs. Just get a regular vm ?

> MicroVMs support up to 8 hours of total runtime

Does this mean you effectively can't use them as long-lived developer environments? It sounds like even if you suspend them, this is the hard limit on the total time it can run.

Interesting, I have recently started working on a project which is similar and fully open source, maybe interesting to some here:

https://github.com/mitos-run/mitos

This seems roughly similar to Google's Cloud Run gen2 instance types. My understanding is with the second generation, they are running microvms which are bootstrapped from a container image.

What's the point of microVMs for running agents?

Are you guys literally spinning up agents where a 100 ms boot time vs a 3 seconds boot time makes a difference?

I'm asking because I understand the appeal of micro VMs but every time the subject comes up people talk about "isolating agents": what's wrong about isolating agents in a regular VM (or in a container which, itself, is in a VM)?

FWIW I've got my stuff nicely isolated in regular VMs that are regularly up for hours and hours.

It's like the microVMs boots in 100 ms, then the agent does... What? And exits after another 100ms and now you need to launch another one?

What's the use case of "microVMs to isolate agents"?

Not so subtle plug from yet another sandbox provider, https://instavm.io :

Apart from the above features.

  1. We support more than 32GB disk (as a detachable device, ideal for agentic memory)

  2. We provide egress control

  3. We provide vault for secret injection (to counter prompt injection)

  4. Snapshot / forking.

  5. long lived sandboxes.

Everything supported in APIs and CLI for agents.

Can be used via - npx skills add instavm/skills

What does the actual startup latency look like? Does it depend on the size of the resulting image?

How does this compare to Fly.io

Which is more cheaper for me?

Ideally maybe self hosting would be better?

How does this compare to E2B?

does it have gpu support?

How's this different from Firecracker?

> MicroVMs support up to 8 hours of total runtime

Does this mean you effectively can't use them as long-lived developer environments? It sounds like even if you suspend them, this is the hard limit on the total time it can run.

It just a time limit of the life of a single MicroVM.

Using this for a long lived "developer environment" would be extraordinarily expensive anyhow. Scaling the vCPU + RAM cost of these to the same shape compute optimized Graviton On-Demand EC2 instance (16 vCPU x 32 GB RAM) shows about 4x the cost.

So don't do that. Just use an EC2 instance.

They are long-lived if you're a mayfly.

But I think the point is that they should be cheap to set up, and because of the short life, never really contain anything except the potential to compute when needed, not important data.

You can use them for dev environments.

You just have to finish development in 8 hours.

lambdas are ephemeral on compute, but couldn't you connect up EFS for your long lived data?

then when you launch the next one, its like you are still there?

I'm assuming you can launch them again after 8 hours.

How does this compare to E2B?

e2b supports UDP and the pricing structure is different.

i’d say what AWS released looks closer to a bare compute primitive. E2B is up the stack and ships everything around VM like snapshots, networking, integrations.

also, there’s no lock-in, E2B is open-source and can be hosted on any cloud (AWS included).

plus supports bigger boxes, higher concurrency, longer timeouts (24hr).

disclaimer: i work at E2B

e2b supports UDP and the pricing structure is different.

This is why I have been avoiding the word sandbox for exe.dev. I don’t think developers agents need something “sandbox” shaped.

Not so subtle plug from yet another sandbox provider, https://instavm.io :

Apart from the above features.

  1. We support more than 32GB disk (as a detachable device, ideal for agentic memory)

  2. We provide egress control

  3. We provide vault for secret injection (to counter prompt injection)

  4. Snapshot / forking.

  5. long lived sandboxes.

Everything supported in APIs and CLI for agents.

Can be used via - npx skills add instavm/skills

does it have gpu support?

check this out https://github.com/smol-machines/smolvm

will have a hosted platform soon with GPU support (vulkan)

It is supposed to be a sandbox that you can invoke from agent, langchains of the world, coding agents etc.

No, it doesn’t seem like it.

Not that I can find in the docs anywhere. Compute only.

  > Containers launch in seconds, yet their shared-kernel architecture requires significant custom hardening to safely contain untrusted code

  > you create a MicroVM Image by supplying a Dockerfile and code packaged as a zip artifact in Amazon S3
  > 
  > MicroVMs support up to 8 hours of total runtime

It's about time AWS got into the agent sandbox game.

Something surprising from my own experience is that while there's certainly a huge role for async agents in cloud sandboxes, async agents running locally seem more useful in many cases.

Shouldn’t the title be “AWS Lambda MicroVMs”? MicroVMs are an existing concept.

There are sooooo many sandbox providers out there.

They do spike on different features like:

    - snapshotting and forking
    - good SSH and VPN access for end-users
    - agent-friendly features, like obscuring secrets at network layer

[1] https://github.com/gofixpoint/amika/blob/main/ROADMAP.md

[2]: https://flueframework.com/

I don’t get it we are paying at least hundreds or maybe thousands per month on ai costs. Just get a regular vm ?

How does this compare to Fly.io

Which is more cheaper for me?

Ideally maybe self hosting would be better?

This seems roughly similar to Google's Cloud Run gen2 instance types. My understanding is with the second generation, they are running microvms which are bootstrapped from a container image.

We have this page which compares a whole bunch of sandbox providers in different categories

https://engine.build/lab/agent-sandboxes

Will add MicroVMs there today (and any others that are missing if you let me know!)

They are long-lived if you're a mayfly.

But I think the point is that they should be cheap to set up, and because of the short life, never really contain anything except the potential to compute when needed, not important data.

You can use them for dev environments.

You just have to finish development in 8 hours.

I'm assuming you can launch them again after 8 hours.

Pretty sure they invented Firecracker for Lambda. Iirc they were previously using a hot pool of EC2 instances behind the scenes with each customer getting their own instances and lambdas sharing capacity on an instance. Firecracker made it possible to spin up VMs in realtime instead of having spare capacity laying around.

That said, Fargate does kind of seem like a superior option

Edit: I guess this supports suspend and fast resume so invocation time should be somewhat better than Fargate.

Fargate does not use Firecracker, it is simply ec2 instances.

AWS AgentCore runtime has been around for about a year: https://docs.aws.amazon.com/bedrock-agentcore/latest/devguid... (spoiler, it's the same underlying technology as the Lambda MicroVMs).

Major Sandbox providers (e.g. Modal) run on non-hyperscaler bare metal not AWS and so don't need to markup on AWS's markup. Thus, prices are comparable or better than AWS.

Agreed.

Most of the startups are just wrappers around AWS and significantly more expensive.

Agents need sandboxes that are cheaper so that they can run thousands

I feel that AWS, GCP and all the other cloud providers can provide this natively.

But still it would be nice to self host.

The best part of self hosting is that you own it as well, no rug pulls from the laundry list of reselling providers that could go away at anytime.

It would be nice to have a one click sandbox agent on a self hosted instance that is, free, fast (can pay a bit more for more intensive operations) and that is open source.

Yeah, I'm surprised Justin posted this like it was new(s). Wasn't it doing the rounds on the 22nd when it launched?

Setting up your own is not that hard and if you bought some compute before the Altman squeeze, very cheap.

Why isn't libkrun good enough for hosted stuff? I use it as a podman backend in a microservice architecture.

What people aren't getting with `firecracker` is utilization. Don't get me wrong, `firecracker` is great software and it's what I'm using for lightweight virtualization, but workloads are really bursty over really short periods of time now, even with the snapshot and restore that you can get if you're willing to hack on `firecracker` substantially, you hit walls where it's like, this is too much against the grain, this thing wasn't designed to bounce from 1 core to 32 to 8 to 16 to 4 to 32 to 1 seamlessly, and that's what it takes to get extreme utilization even with extremely good ML on the prediction.

I am quite sure I'm not the only person working on post-firecracker KVM.

Thanks for sharing these!

You absolutely can run agents on a regular VM. But if you want to build multi-tenant and multi-agent systems with strong security boundaries, then having a VM or MicroVM per agent session (or session with a group of agents) really simplifies things.

When we did AWS AgentCore Runtime last year we introduced session isolation, with MicroVMs per session. You can think of Lambda MicroVMs as the same stack, but generalized to fit a larger number of application patterns.

Isn’t the point that you wanna be able to spin up and down thousands of VM:s on demand (literally a VM just to run a tool and then shut it down until the next tool call)

Fly.io doesn't set a maximum of 8 hours of alive time on your instance.

Also, MicroVMs can't be exposed directly to the web. Your code running in them can only be executed via API calls with attached auth tokens - so if you wanted to host a public facing API or website with them you'd need to implement your own additional layer in front.

Something I appreciate about Fly (disclaimer: they support my work) is that the pricing is fixed - you pay $1.94/month (less if you suspend your machine) for the smallest instance, up to $976.25/month for the largest (16 CPUs, 128GB) plus predictable costs for volume storage.

The only variable outside your control is bandwidth, and that's unlikely to cause a nasty shock.

Contrast with any of the more "elastic" hosting providers - Vercel, Cloud Run - and you're much less likely to get a horrifying bill if something gets overly-crawled or goes viral.

What's the point of microVMs for running agents?

Are you guys literally spinning up agents where a 100 ms boot time vs a 3 seconds boot time makes a difference?

FWIW I've got my stuff nicely isolated in regular VMs that are regularly up for hours and hours.

It's like the microVMs boots in 100 ms, then the agent does... What? And exits after another 100ms and now you need to launch another one?

What's the use case of "microVMs to isolate agents"?

Microvms are better for the provider VM provider. They use less memory and have a smaller attack surface. Also starting in 100ms means you don't need to add a bunch of async machinery when launching the vms.

I imagine you can have a situation where you let an agent run in a shared env but to access certain tools you spin up a VM just for the tool call duration and then shut it down again. Let’s say you wanna allow the agent to write and run code then you need it to run it somewhere safe

This is for people who want both faster execution, and better security isolation for agents/subagents. It is a different use case than yours

Part of it might just be that I am old and inflation is catching up with my understanding of prices.

Or it could easily be ten times that. But who am I kidding. No one is going to use my agents. So it doesn't matter if it's gvisor or Firecracker or whatever.

The simplest worthwhile DIY sandbox you can have is to layer two tools: bwrap and gvisor.

    bwrap args -- gvisor args do -- sandboxee

bwrap will set up the environment and then gvisor elevates it into a true sandbox.

Standalone gvisor (not the 'do' subcommand) used to be a mess with the OCI json requirement, but recently they began work on presenting their own bwrap interface (likely to pursue AI agent uses) though I wouldn't use it myself yet.

People often look down on gvisor because they think it's some kind of syscall filter, it is not. It can use one of ptrace, seccomp or even KVM to intercept ALL syscalls and service them with it's own logic. Basically it's a VMM and kernel in one.

Are you looking for highly ephemeral nodes, where you are writing automation that will use the API to orchestrate it? Or do you just want small microVMs that you launch and kill?

Firecracker just has a ReSTful unix socket with a defined API and launches KVM vms with limited options.

For custom SMB I still think libvirt is a lower entry cost and may have transferable use cases to longer lived VMs, so you can just launch a qemu microvm[0] and use virsh and/or libvirt xml to set up the networking.

The ~400ms boot time of a qemu microvm vs ~120ms for firecracker may not be an issue for some loads, but qemu will also allow you a bit more density of placement than firecracker. qemu microvms will use a bit more memory individually, but they will also tend to use less real system memory with a larger number of microVMs.

It is all tradeoffs, and kata containers are yet another option that may apply depending on your use case.

You can run your own firecracker or qemu/kvm microvms on most instances that allow nested hypervisors, or on a local host. If cost containment is critical to you this is one possible way forward.

Really it just depends on if you want/need ReSTful control, or need to support short lived serverless functions, or if CLIs fit better and you many want to support full VMs.

They both are just Virtual Machine Monitors that targeted different use cases and decided on different tradeoffs.

Just be careful about hosting traditional containers and microVMs on the same system, that config is going to be problematic do to fundamental reasons that are too complex to properly address here.

[0] https://www.qemu.org/docs/master/system/i386/microvm.html

Why do you want to self-host vs. using one of the many providers out there?

Daytona, E2B, OpenComputer, Freestyle, Blaxel, Vercel, Modal, Cloudflare, Tensorlake, Superserve, etc. etc.

Some of them work by pre-purchasing credits, so you can control the blast radius of spend.

Also, if you want a more embedded sandbox runtime as a library instead of a daemon + REST API, you can check out libkrun (and friendly layers on top of it like https://microsandbox.dev/ and https://smolmachines.com/)

Hetzner is still cheap compared to AWS.

For self-hosting, have a look at what we're building with SlicerVM.com (disclosure: I'm the founder). Also runs just as well on Apple Silicon.

We run quite a few Slicer instances on mini PCs and Ryzen builds - also on Hetzner (and yes ouch 120 EUR / mo up to ~ 550 EUR / mo for 16core / 128GB RAM feels almost unfair)

This reminds me of Fly.io's model off the top of my head, though its not a self-hosted firecracker as such.

Cloudflare is cost effective for certain types of workloads, I've heard of businesses getting surprisingly far on the $5/mo worker plan.

Interesting, I have recently started working on a project which is similar and fully open source, maybe interesting to some here:

https://github.com/mitos-run/mitos

> Didn't mean to highjack for self advertisement. > > As the topic matches, .... my project might be appealing to some here

That's exactly what you intended to do. That is the definition of advertising. It is true, many people might like it, so own it. Don't lie about it, even to yourself.

What does the actual startup latency look like? Does it depend on the size of the resulting image?

I tried this a few days ago. Once you have an image built and ready startup time is fast, but building that original image took 5-10 minutes.

I think it's designed for building an image once and then reusing it many, many times.

How's this different from Firecracker?

It just a time limit of the life of a single MicroVM.

So don't do that. Just use an EC2 instance.

lambdas are ephemeral on compute, but couldn't you connect up EFS for your long lived data?

then when you launch the next one, its like you are still there?

i’d say what AWS released looks closer to a bare compute primitive. E2B is up the stack and ships everything around VM like snapshots, networking, integrations.

also, there’s no lock-in, E2B is open-source and can be hosted on any cloud (AWS included).

plus supports bigger boxes, higher concurrency, longer timeouts (24hr).

disclaimer: i work at E2B

It is supposed to be a sandbox that you can invoke from agent, langchains of the world, coding agents etc.

check this out https://github.com/smol-machines/smolvm

will have a hosted platform soon with GPU support (vulkan)

Not that I can find in the docs anywhere. Compute only.

No, it doesn’t seem like it.

Presumably it is Firecracker. It's just a different shape of offering, along with Lambda and Fargate, which are also Firecracker.

The literal first paragraph has a highlighted link that says this runs on Firecracker

It's a product that runs on top of Firecracker.

But these have near instant suspended/resume, and they even have vertical scaling of the ram, which is a great feature that’s not very common.

EFS is extremely slow for many workloads. We tried it for builds and various other common use cases for coding agents and the performance just isn't there. I'm guessing lots of small random reads/writes just isn't going to ever work well.

The simplest worthwhile DIY sandbox you can have is to layer two tools: bwrap and gvisor.

    bwrap args -- gvisor args do -- sandboxee

bwrap will set up the environment and then gvisor elevates it into a true sandbox.

For self-hosting, have a look at what we're building with SlicerVM.com (disclosure: I'm the founder). Also runs just as well on Apple Silicon.

We run quite a few Slicer instances on mini PCs and Ryzen builds - also on Hetzner (and yes ouch 120 EUR / mo up to ~ 550 EUR / mo for 16core / 128GB RAM feels almost unfair)

That said, Fargate does kind of seem like a superior option

Edit: I guess this supports suspend and fast resume so invocation time should be somewhat better than Fargate.

Agreed.

Most of the startups are just wrappers around AWS and significantly more expensive.

Agents need sandboxes that are cheaper so that they can run thousands

I feel that AWS, GCP and all the other cloud providers can provide this natively.

But still it would be nice to self host.

The best part of self hosting is that you own it as well, no rug pulls from the laundry list of reselling providers that could go away at anytime.

It would be nice to have a one click sandbox agent on a self hosted instance that is, free, fast (can pay a bit more for more intensive operations) and that is open source.

Thanks for sharing these!

I am quite sure I'm not the only person working on post-firecracker KVM.

Isn’t the point that you wanna be able to spin up and down thousands of VM:s on demand (literally a VM just to run a tool and then shut it down until the next tool call)

This is for people who want both faster execution, and better security isolation for agents/subagents. It is a different use case than yours

I understand that but micro VMs don't provide better security isolation than regular VMs.

So that leaves faster boot times.

Faster boot times and then the agent does what? And at how many token/s? And what's the "time to first token" anyway?

How do the time to first token and then the token/s inherent limitations of LLMs not totally dominate the running time?

I just don't get the use case.

Why do you want to self-host vs. using one of the many providers out there?

Daytona, E2B, OpenComputer, Freestyle, Blaxel, Vercel, Modal, Cloudflare, Tensorlake, Superserve, etc. etc.

Some of them work by pre-purchasing credits, so you can control the blast radius of spend.

self host = better spec machine for same price.

Even with the Hetzner price increase, it is still far cheaper than all of them with self-hosting.

Are you looking for highly ephemeral nodes, where you are writing automation that will use the API to orchestrate it? Or do you just want small microVMs that you launch and kill?

Firecracker just has a ReSTful unix socket with a defined API and launches KVM vms with limited options.

It is all tradeoffs, and kata containers are yet another option that may apply depending on your use case.

You can run your own firecracker or qemu/kvm microvms on most instances that allow nested hypervisors, or on a local host. If cost containment is critical to you this is one possible way forward.

Really it just depends on if you want/need ReSTful control, or need to support short lived serverless functions, or if CLIs fit better and you many want to support full VMs.

They both are just Virtual Machine Monitors that targeted different use cases and decided on different tradeoffs.

Just be careful about hosting traditional containers and microVMs on the same system, that config is going to be problematic do to fundamental reasons that are too complex to properly address here.

[0] https://www.qemu.org/docs/master/system/i386/microvm.html

Thanks. I just looked into qemu microvms. Might be an option but I already have gvisor set up.

Hetzner is still cheap compared to AWS.

You can't run firecracker on AWS.

Yeah, the big 3 cloud markup is so high that most VPS providers can hike price 10x and they are still cheaper.