I had originally thought this would ok as we could review everything in the git diff. But, it later occurred to me that there are all kinds of files that the agent could write to that I'd end up executing, as the developer, outside the sandbox. Every .pyc file for instance, files in .venv , .git hook files.
ChatGPT[1] confirms the underlying exploit vectors and also that there isn't much discussion of them in the context of agent sandboxing tools.
My conclusion from that is the only truly safe sandboxing technique would be one that transfers files from the sandbox to the dev's machine through some kind of git patch or similar. I.e. the file can only transfer if it's in version control and, therefore presumably, has been reviewed by the dev before transfer outside the sandbox.
I'd really like to see people talking more about this. The solution isn't that hard, keep CWD as an overlay and transfer in-container modified files through a proxy of some kind that filters out any file not in git and maybe some that are but are known to be potentially dangerous (bin files). Obviously, there would need to be some kind of configuration option here.
1: https://chatgpt.com/share/69c3ec10-0e40-832a-b905-31736d8a34...
I've been using claude code daily for months and the worst thing that happened wasnt a wipe(yet). It needed to save an svg file so it created a /public/blog/ folder. Which meant Apache started serving that real directory instead of routing /blog. My blog just 404'd and I spent like an hour debugging before I figured it out. Nothing got deleted and it's not a permission problem, the agent just put a file in a place that made sense to it.
jai would help with the rm -rf cases for sure but this kind of thing is harder to catch because its not a permissions problem, the agent just doesn't know what a web server is.
E.g. if I have a VM to which I grant only access to a folder with some code (let's say open-source, and I don't care if it leaks) and to the Internet, if I do my agent-assistant coding within it, it will only have my agent credentials it can leak. Then I can do git operations with my credentials outside of the VM.
Is there a more convenient setup than this, which gives me similar security guarantees? Does it come with the paid offerings of the top providers? Or is this still something I'd have to set up separately?
I wonder if and how jai managed to address these limitations of overlayfs. Basically, the same dir should not be mounted as an overlayfs upper layer by different overlayfs mounts. If you run 'jai bash' twice in different terminals, do the two instances get two different writable home dir overlays, or the same one? In the second case, is the second 'jai bash' command joining the mount namespace of the first one, or create a new one with the same shared upper dir?
This limitation of overlays is described here: https://docs.kernel.org/filesystems/overlayfs.html :
'Using an upper layer path and/or a workdir path that are already used by another overlay mount is not allowed and may fail with EBUSY. Using partially overlapping paths is not allowed and may fail with EBUSY. If files are accessed from two overlayfs mounts which share or overlap the upper layer and/or workdir path, the behavior of the overlay is undefined, though it will not result in a crash or deadlock.'
The bot should also be instructed that it gets 3 strikes before being removed meaning it should generate a report of what it believes it wants to access to and gets verbal approval or denial. That should not be so difficult with today's bots. If it wants to act like a human then it gets simple rules like a human. Ask the human operator for permission. If the bot starts "doing it's own thing, aka going rogue" then it gets punished. Perhaps another bot needs to act as a dominatrix to be a watcher over the assistant bot.
It works pretty well, agent which I choose to run can only write and see the current working directory (and subdirectories) as well as those pnpm/npm etc software development files. It cannot access other than the mounted directories in my home directory.
Now some evil command could in theory write to those shared ~/.npm-global directories some commands, that I then inadvertently run without the container but that is pretty unlikely.
File system isolation is easy now, it’s not worth HN front page space for the n’th version. It’s a solved problem (and now included in Claude clCode).
I’ve found it to be a good balance for letting Claude loose in a VM running the commands it wants while having all my local MCPs and tools still available.
I've been struggling to find what Ai has intrinsically solved new that gives us the chance to completely change workflows, other these weird things occuring.
*I played with codex a few months ago, but I don't even work in IT.
It has left my project in a complete mess, but never my entire computer.
git reset --hard && git clean -fd
That's all it takes.I think this is turning into a good example of security theatrics. If the agent was actually as nefarious as the marketing here suggests, the solution proposed is not adequate. No solution is. Not even a separate physical computer. We need to be honest about the size of this problem.
Alternatively, maybe Claude is unusually violent to the local file system? I've not used it at all, so perhaps I am missing something here.
Ignoring the confidentiality arguments posed here, I can’t help to think about snapshotting filesystems in this context. Wouldn’t something like ZFS be an obvious solution to an agent deleting or wildly changing files? That wouldn’t protect against all issue the authors are trying to address, but it seems like an easy safeguard against some of the problems people face with agents.
I created https://github.com/jrz/container-shell which basically launches a persistent interactive shell using docker, chrooted to the CWD
CWD is bind mounted so the rest is simply not visible and you can still install anything you want.
{
"sandbox": {
"enabled": true,
"filesystem": {
"allowRead": ["."],
"denyRead": ["~/"],
"allowWrite": ["."],
"denyWrite": ["/"]
}
}
}
You can change the read part if you're ok with it reading outside. This feature was only added 10 days ago fwiw but it's great and pretty much this.We've been securing our systems in all ways possible for decades and then one day just said: oh hello unpredictable, unreliable, Turing-complete software that can exfiltrate and corrupt data in infinite unknown ways -- here's the keys, go wild.
It looks both more convenient and slightly more secure than my solution, which is that I just give them a separate user.
Agents can nuke the "agent" homedir but cannot read or write mine.
I did put my own user in the agent group, so that I can read and write the agent homedir.
It's a little fiddly though (sometimes the wrong permissions get set, so I have a script that fixes it), and keeping track of which user a terminal is running as is a bit annoying and error prone.
---
But the best solution I found is "just give it a laptop." Completely forget OS and software solutions, and just get a separate machine!
That's more convenient than switching users, and also "physically on another machine" is hard to beat in terms of security :)
It's analogous to the mac mini thing, except that old ThinkPads are pretty cheap. (I got this one for $50!)
> jai itself was hand implemented by a Stanford computer science professor with decades of C++ and Unix/linux experience. (https://jai.scs.stanford.edu/faq.html#was-jai-written-by-an-...)
Run <ai tool of your choice> under its own user account via ssh. Bind mount project directories into its home directory when you want it to be able to read them. Mount command looks like
sudo mkdir /home/<ai-user>/<dir-name>
sudo mount --bind <dir to mount> --map-groups $(id -g <user>):$(id -g <ai-user>):1 --map-users $(id -u <user>):$(id -u <ai-user>):1 /home/<ai-user>/<dir-name>
I particularly use this with vscode's ssh remotes.I like the tradeoff offered: full access to the current directory, read-only access to the rest, copy-on-write for the home directory. With stricter modes to (presumably) protect against data exfiltration too. It really feels like it should be the default for agent systems.
Good DX, straightforward permissions system, starts up instantly. Just remember to disable CC’s auto-updater if that’s what you’re using. My sandbox ranking: nono > lima > containers.
So couldn't this be done with an appropriate shell alias - at least under linux.
I want AI to have full and unrestricted access to the OS. I don't want to babysit it and approve every command. Everything that is on that VM is a fair game and the VM image is backed up regularly from outside.
This is the only way.
Please release binaries if you're making a utility :(
More seriously, I'm not a heavy agent user, but I just create a user account for the agent with none of my own files or ssh keys or anything like that. Hopefully that's safe enough? I guess the risk is that it figures out a local privilege escalation exploit...
I wonder if shitty looking websites and unambitious grammar will become how we prove we are human soon.
People might genuinely want some other software to do the sandboxing. Something other than the fox.
Its awe-inspiring the levels of complexity people will re-invent/bolt-on to achieve comparable (if not worse) results.
> Just remember to disable CC’s auto-updater if that’s what you’re using.
Why?
If it wants to do system-level tests, then I make sure my project has Qemu-based tests.
It does something very simple, and it’s a POSIX shell script. Works on Linux and macOS. Uses docker to sandbox using bind mount
P.S. Everything old is new again <3
I'm not saying it is broken for everyone, but please do verify it does work before trusting it, by instructing Claude to attempt to read from somewhere it shouldn't be allowed to.
From my side, I confirmed both bubblewrap and seatbelt to work independently, but through claude-code they don't even though claude-code reports them to be active when debugging.
You can already make CWD an overlay with "jai -D". The tricky part is how to merge the changes back into your main working directory.
"env": { "CLAUDE_BASH_MAINTAIN_PROJECT_WORKING_DIR": "1" },
> Working directory persists across commands. Set CLAUDE_BASH_MAINTAIN_PROJECT_WORKING_DIR=1 to reset to the project directory after each command.
It reduces one problem - getting lost - but it trades it off for more complex commands on average since it has to specify the full path and/or `cd &&` most of the time.
[0] https://code.claude.com/docs/en/tools-reference#bash-tool-be...
I think the actual data flow here is really hard to grasp for many users: Sandboxing helps with limiting the blast radius of the agent itself, but the agent itself is, from a data privacy perspective, best visualized as living inside the cloud and remote-operating your computer/sandbox, not as an entity that can be "jailed" and as such "prevented from running off with your data".
The inference provider gets the data the instant the agent looks at it to consider its next steps, even if the next step is to do nothing with it because it contains highly sensitive information.
/rant
David has done some great work and some funny work. Sometimes both.
I guess the "Future of Digital Currency Initiative" had to pivot to a more useful purpose than studying how Bitcoin is going to change the world.
The way I'd do it right now:
* git worktree to have a specific folder with a specific branch to which the agent has access (with the .git in another folder)
* have some proper review before moving the commits there into another branch, committing from outside the sandbox
* run code from this review-protected branch if needed
Ideally, within the sandbox, the agent can go nuts to run tests, do visual inspections e.g. with web dev, maybe run a demo for me to see.
jai's -D flag captures the right data; the missing piece is surfacing it ergonomically. yoloAI uses git for the diff/apply so it already feels natural to a dev.
One thing that's not fully solved yet: your point about .git/hooks and .venv being write vectors even within the project dir. They're filtered from the diff surface but the agent can still write them during the session. A read-only flag for those paths (what you're considering adding to jai) would be a cleaner fix.
I don't think the file sync is actually that hard. Famous last words though. :)
I feel like an integration with bubblewrap, the sandboxing tech behind Flatpak, could be useful here. Have all executed commands wrapped with a BW context to prevent and constrain access.
It works well. Git rm is still allowed.
For example:
Bash(swift build 2>&1 | tail -20)
⎿ warning:
/Users/enduser/Library/org.swift.swiftpm/configuration is not accessible or not writable, disabling user-level cache
features. warning: /Users/enduser/Library/org.swift.swiftpm/security is not accessible or not writable, disabling user-level cache feat
… +26 lines (ctrl+o to expand)
Build hit sandbox restriction. Retrying outside sandbox.Bash(swift build 2>&1 | tail -20)
⎿ [35/52] Compiling MCP Resources.swift
[36/52] Emitting module MCP
[37/52] Compiling MCP Client.swift
… +17 lines (ctrl+o to expand)
⎿ (timeout 3m)And nothing big has happened despite all the risks and problems that came up with it. People keep chasing speed and convenience, because most things don’t even last long enough to ever see a problem.
Industry caught on quick though.
Also. Agents are very good at hacking “security penetration testing”, so “separate user” would not give me enough confidence against malicious context.
The web site is... let's say not in a million years what I would have imagined for a little CLI sandboxing tool. I literally laughed out loud when claude pooped it out, but decided to keep, in part ironically but also since I don't know how to design a landing page myself. I should say that I edited content on the docs part of the web site to remove any inaccuracies, so the content should be valid.
> Stop trusting blindly
> One-line installer scripts,
Here are the manual install instructions from the "Install / Build page:
> curl -L https://aur.archlinux.org/cgit/aur.git/snapshot/jai.tar.gz | tar xzf -
> cd jai
> makepkg -i
So, trust their jai tool, but not _other_ installer scripts?
My biggest question skimming over the docs is what a workflow for reviewing and applying overlay changes to the out-of-cwd dirs would be.
Also, bit tangential but if anyone has slightly more in-depth resources for grasping the security trade-offs between these kind of Linux-leveraging sandboxes, containers, and remote VMs I'd appreciate it. The author here implies containers are still more secure in principle, and my intuition is that there's simply less unknowns from my perspective, but I don't have a firm understanding.
Anyhow, kudos to the author again, looks useful.
https://github.com/pkulak/nix/tree/main/common/jai
Arg, annoying that it puts its config right in my home folder...
EDIT: Actually, I'm having a heck of a time packaging this properly. Disregard for now!
EDIT2: It was a bit more complicated than a single derivation. Had to wrap it in a security wrapper, and patch out some stuff that doesn't work on the 25.11 kernel.
I've been building an independent benchmarking platform for AI agents. The two approaches are complementary. Sandbox the environment, verify the agent.
Especially because everybody can ask chatgpt/claude how to run some agents without any further knowledge I feel we should handle it more like we are handling encryption where the advice is to use established libraries and don't implement those algorithms by yourself.
These are generally (but not always) 2 different sets of people.
Wrong layer. You want the deletion to actually be impossible from a privilege perspective, not be made practically harder to the entity that shouldn't delete something.
Claude definitely knows how to reimplement `rm`.
And when that fails for some reason it will happily write and execute a Python script bypassing all those custom tools
Nowadays I only run Claude in Plan mode, so it doesn’t ask me for permissions any more.
People are already reporting lost files, emptied working trees, and wiped home directories after giving AI tools ordinary machine access.
There's a gap between giving an agent your real account and stopping everything to build a container or VM. jai fills that gap. One command, no images, no Dockerfiles — just a light-weight boundary for the workflows you're already running: quick coding help, one-off local tasks, running installer scripts you didn't write.
Use AI agents without handing over your whole account. jai gives your working directory full access and keeps the rest of your home behind a copy-on-write overlay — or hidden entirely.
One-line installer scripts, AI-generated shell commands, unfamiliar CLIs — stop running them against your real home directory. Drop jai in front and the worst case gets a lot smaller.
No images to build, no Dockerfiles to maintain, no 40-flag bwrap invocations. Just jai your-agent. If containment isn't easier than YOLO mode, nobody will bother.
One command. No setup required.
1
Prefix your commandjai codex, jai claude, or just jai for a shell.
2
CWD stays writable
Your working directory keeps full read/write access inside the jail.
3
Home is an overlay
Changes to your home directory are captured copy-on-write. Originals are untouched.
4
Rest is locked down/tmp and /var/tmp are private. All other files are read-only.
Pick the level of isolation that fits your workflow.
| Casual | Strict | Bare | |
|---|---|---|---|
| Home directory | Copy-on-write overlay | Empty private home | Empty private home |
| Process runs as | Your user | Unprivileged jai user |
Your user |
| Confidentiality | Weak — most files readable | Strong — separate UID | Medium — your UID, but home hidden |
| Integrity | Overlay protects originals | Full isolation | Full isolation |
| NFS home support | Yes | No | Yes |
jai is free software, brought to you by the Stanford Secure Computer Systems research group and the Future of Digital Currency Initiative. The goal is to get people using AI more safely.
jai is not trying to replace containers. It fills a different niche.
Great for reproducible, image-based environments. Heavier to set up for ad-hoc sandboxing of host tools. No overlay-on-home workflow.
Powerful namespace sandbox. Requires explicitly assembling the filesystem view — often turns into a long wrapper script, which is the friction jai removes.
Not a security mechanism. No mount isolation, no PID namespace, no credential separation. Linux documents it as not intended for sandboxing.
jai is a casual sandbox — it reduces the blast radius, but does not eliminate all the ways AI agents can harm you or your system. Casual mode does not protect confidentiality. Even strict mode is not equivalent to a hardened container runtime or VM. When you need strong multi-tenant isolation or defense against a determined adversary, use a proper container or virtual machine. Read the full security model →
Many of the projects I work on follow this pattern (and I’m not able to make bigger changes in them) and sanboxing breaks immediately when I need to docker compose run sometask.sh
I have seen the AI break out of (my admittedly flimsy) guards, like doing simply
safepath/../../stuff or something even more convoluted like symlinks.
That would make it far less useful in general.
> These restrictions are enforced at the OS level (Seatbelt on macOS, bubblewrap on Linux), so they apply to all subprocess commands, including tools like kubectl, terraform, and npm, not just Claude’s file tools.
They look a lot like daemons to me, they're a program that you want hanging around ready to respond, and maybe act autonomously through cron jobs are similar. You want to assign any number of permissions to them, you don't want them to have access to root or necessarily any of your personal files.
It seems like the permissions model broadly aligns with how we already handle a lot of server software (and potentially malicious people) on unix-based OSes. It is a battle-tested approach that the agent is unlikely to be able to "hack" its way out of. I mean we're not really seeing them go out onto the Internet and research new Linux CVEs.
Have them clone their own repos in their own home directory too, and let them party.
Openclaw almost gets there! It exposes a "gateway" which sure looks like a daemon to me. But then for some reason they want it to live under your user account with all your privileges and in a subfolder of your $HOME.
Are you confident it would still work against sophisticated prompt injection attacks that override your "strongly worded message"?
Strongly worded signs can be great for safety (actual mechanisms preventing undesirable actions from being taken are still much better), but are essentially meaningless for security.
Escaping it is something that does not take too much effort. If you have ptrace, you can escape without privileges.
A number of these supply chain compromises had incredibly high stakes and were seemingly only noticed before paying off by lucky coincidence.
Only a matter of time before this type of access becomes productized.
It's pretty neat, screen sharing app is extremely high quality these days, I can barely notice a diff unless watching video. Almost feels like Firefox containers at OS level.
Have thought that could be a pretty efficient way to have restricted unrestricted convenient AI access. Maybe I'll get around to that one day.
The entire idea of Openclaw (i.e., the core point of what distinguishes it from agents like Claude Code) is to give it access to your personal data, so it can act as your assistant.
If you only need a coding agent, Openclaw is the completely wrong tool. (As a side note, after using it for a few weeks, I'm not convinced it's the right tool for anything, but that's a different story.)
1) more guardrails in place
2) maybe more useful error messages that would help LLMs
3) no friction with needing to get any patches upstreamed
External tool calling should still be an option ofc, but having utilities that are usable just like what's in the training data, but with more security guarantees and more useful output that makes what's going on immediately obvious would be great.
Kinda reminds me of this: https://m.xkcd.com/932/
I'm not a web UI guy either, and I am so, so happy to let an AI create a nice looking one for me. I did so just today, and man it was fast and good. I'll check it for accuracy someday...
I've already shipped this and use it myself every day. I'm the author of yoloAI (https://github.com/kstenerud/yoloai), which is built around exactly this model.
The agent runs inside a Docker container or containerd vm (or seatbelt container or Tart vm on mac), against a full copy of your project directory. When it's done, `yoloai diff` gives you a unified diff of everything it changed. `yoloai apply` lands it. `yoloai reset` throws it away so you can make the agent try again. The copy lives in the sandbox, so your working tree is untouched until you explicitly say so.
The merge step turned out to be straightforward: just use git under the hood. The harder parts were: (a) making it fast enough that the copy doesn't add annoying startup overhead, (b) handling the .pyc/.venv/.git/hooks concern you raised (they're excluded from the diff surface by default), and (c) credential injection so the agent can actually reach its API without you mounting your whole home dir.
Leveraging existing tech is where it's at. Each does one thing and does it well. Network isolation is done via iptables in Docker, for example.
Still early/beta but it's working. Happy to compare notes if you're building something similar.
I fiddled with transferring the saved token from my keychain to the agent user keychain but it was not straightforward.
If someone knows how to get a subscription to Claude to work on another user via command line I’d love to know about it.
Configure Overrides:
1. Allow unsandboxed fallback
2. Strict sandbox mode (current)
Allow unsandboxed fallback: When a command fails due to sandbox restrictions, Claude can retry with dangerouslyDisableSandbox to run outside the sandbox (falling back to
default permissions).
Strict sandbox mode: All bash commands invoked by the model must run in the sandbox unless they are explicitly listed in excludedCommands.Docker sandboxes use microvms (i.e. hardware level isolation)
Bubblewrap uses the same technology as containers
I am unsure about seatbelt.
The fun part is, there have been a lot of non-misses! Like a lot! A ton of data have been exfiltrated, a lot of attacks, and etc. In the end... it just didn't matter.
Your analogy isn't really apt either. My argument is closer to "given in the past decade+, nothing of worth has been harmed, should we require airbags and seatbelts for everything?". Obviously in some extreme mission critical systems you should be much smarter. But in 99% cases it doesn't matter.
A good example of why is project-local .venv/ directories, which are the default with uv. With Lima, what happens is that macOS package builds get mounted into a Linux system, with potential incompatibility issues. Run uv sync inside the VM and now things are invalid on the macOS side. I wasn't able to find a way to mount the CWD except for certain subdirectories.
Another example is network filtering. Lima (understandably) doesn't offer anything here. You can set up a firewall inside the VM, but there's no guarantee your agent won't find a way to touch those rules. You can set it up outside the VM, but then you're also proxying through a MITM.
So, for the use case of running Claude Code in --dangerously-skip-permissions mode, Lima is more hassle than Nono
Unreliable, unpredictable AI agents (and their parent companies) with system-wide permissions are a new kind of threat IMO.
You need to rewrite all the text and Telde it with text YOU would actually write, since I doubt you would write in that style.
But that's also the most damaging actions it could take. Everything on my computer is backed up, but if Claude insults my boss, that would be worse.
curl -L https://aur.archlinux.org/cgit/aur.git/snapshot/jai.tar.gz | tar xzf - && cd jai && makepkg -i"Not a security mechanism. No mount isolation, no PID namespace, no credential separation. Linux documents it as not intended for sandboxing."
To your actual point, the people that would take the landing page being written by an LLM negatively tend to be able to evaluate the project on its true merits, while another substantial portion of the demographic for this tool would actually take that (unfortunately, imo) as a positive signal.
Lastly, given the care taken for the docs, it’s pretty likely that any real issues with the language have been caught and changed.
No they don't. The text is very clearly conveying what this project is about. Not everyone needs to cater to weirdos who are obsessed with policing how other people use LLM.
I do my best to keep off site back ups and don't worry about what I can't control.
By now, getting a car without airbags would probably be more costly if possible, and the seatbelt takes 2s every time you're in a car, which is not nothing but is still very little. In comparison, analyzing all the dependencies of a software project, vetting them individually or having less of them can require days of efforts with a huge cost.
We all want as much security as possible until there's an actual cost to be paid, it's a tradeoff like everything else.
You can always narrow down a capability (get a new capability pointing to a subdirectory or file, or remove the writing capability so it is read only) but never make it more broad.
In a system designed for this it will be used for everything, not just file system. You might have capabilities related to network connections, or IPC to other processes, etc. The latter is especially attractive in microkernel based OSes. (Speaking of which, Redox OS seems to be experimenting with this, just saw an article today about that.)
See also https://en.wikipedia.org/wiki/Capability-based_security
Oh, I'm totally not arguing for cutting off other capabilities, I like tool use and find it to be as useful as the next person!
Just that the shell tools that will see A LOT of usage have additional guardrails added on top of them, because it's inevitable that sooner or later any given LLM will screw up and pipe the wrong thing in the wrong command - since you already hear horror stories about devs whose entire machines get wiped. Not everyone has proper backups (even though they totally should)!
Author would, indeed, be wise to rewrite all the text appearing on the front page with text that he wrote himself.
Admittedly, there’s a little more friction and agent confusion sometimes with this setup, but it’s worth the benefit of having zero worries about permissions and security.
Yes, I'm saying it's pretty much as bad as antivirus software.
> Are you sure that you haven't made some mistake in your dev box setup that would allow a hacker to compromise it?
Different category of error: Heuristically derived deterministic protection vs. protection based on a stochastic process.
> much more annoying to recover from that an accidental rm rf.
My point is that it's a different category, not that one is on average worse than the other. You don't want your security to just stand against the median attacker.
the scs.stanford.edu domain and stanford-scs github should help with that.