Node.js needs a virtual file system

Taking the question of whether this would be a useful addition to Node.js core or aside, it must be noted that this 19k LoC PR was mostly generated by Claude Code and manually reviewed by the submitter which in my opinion is against the spirit of the project and directly violates the terms of Developer's Certificate of Origin set in the project's CONTRIBUTING.md

I'm not convinced that allowing Node to import "code generated at runtime" is actually a good thing. I think it should have to go through the hoops to get loaded, for security reasons.

I like the idea of it mocking the file system for tests, but I feel like that should probably be part of the test suite, not Node.

The example towards the end that stores data in a sqlite provider and then saves it as a JSON file is mind-boggling to me. Especially for a system that's supposed to be about not saving to the disk. Perhaps it's just a bad example, but I'm really trying to figure out how this isn't just adding complexity.

Most of the 4 justifications mentioned sound like mitigations of otherwise bad design decisions. JavaScript in the browser went down this path for the longest time where new standards were introduced only to solve for stupid people instead of actually introducing new capabilities that were otherwise unachievable.

I do see some original benefits to a VFS though, bad application decisions aside, but they are exceedingly minor.

As an aside I think JavaScript would benefit from an in-memory database. This would be more of language enhancement than a Node.js enhancement. Imagine the extended application capabilities of an object/array store native to the language that takes queries using JS logic to return one or more objects/records. No SQL language and no third party databases for stuff that you don't want to keep in offline storage on a disk.

Would be nice if node packages could be packed up in ZIP files so to avoid the security/metadata tax for small file access on Windows.

How does electron do this with its packaged files? I suppose it does not work with module resolution?

    You can’t import or require() a module
    that only exists in memory.

You can convert it into a data url and import that, can't you?

Yarn, pnpm, webpack all have solutions for this. Great to see this becoming a standard. I have a project that is severely handicapped due to FS. Running 13k tests takes 40 minutes where a virtual file system that Node would just work with it would cut the run time to 3 minutes. I experimented with some hacks and decided to stay with slow but native FS solution.

What I really want is a way of swapping FS with VFS in a Node.js program harness. Something like

     node --use-vfs --vfs-cache=BIG_JSON_FILE

So basically Node never touches the disk and load everything from the memory

yarn pnp is currently broken on Node v25.7+;

- https://github.com/yarnpkg/berry/issues/7065

- https://github.com/nodejs/node/issues/62012

This is because yarn patches fs in order to introduce virtual file path resolution of modules in the yarn cache (which are zips), which is quite brittle and was broken by a seemingly unrelated change in 25.7.

The discussion in issue 62012 is notable - it was suggested yarn just wait for vfs to land. This is interesting to me in two ways: firstly, the node team seems quite happy for non-trivial amounts of the ecosystem to just be broken, and suggests relying on what I'm assuming will be an experimental API when it does land; secondly, it implies a lot of confidence that this feature will land before LTS.

I could see something like this being useful if it could be passed to workers to replace any fs access inside the worker.

I'm not convinced this needs to be in core Node, but being able to have serverless functions access a file system without providing storage would definitely have some use cases. Had some fun with video processing recently that this would be perfect for.

the multi-tenant sandboxing use case is interesting but mount() being process-global feels like a footgun. has anyone explored per-isolate or per-worker VFS scoping? seems like the kind of thing that would matter a lot for platform companies running untrusted code.

>Let me be honest: a PR that size would normally take months of full-time work. This one happened because I built it with Claude Code.

The node.js codebase and standard library has a very high standard of quality, hope that doesn't get washed out by sloppy AI-generated code.

OTOH, Matteo is an excellent engineer and the community owes a lot to him. So I guess the code is solid :).

Is node::vfs the new solution for JupyterLite filesystems?

From https://github.com/jupyterlite/jupyterlite/issues/949#issuec... :

> Ideally, the virtual filesystem of JupyterLite would be shared with the one from the virtual terminal.

emscripten-core/emscripten > "New File System Implementation": https://github.com/emscripten-core/emscripten/issues/15041#i... :

> [ BrowserFS, isomorphic-git/lightningfs, ]

pyodide/pyodide: "Native file system API" #738: https://github.com/pyodide/pyodide/issues/738 re: [Chrome,] Filesystem API :

> jupyterlab-git [should work with the same VFS as Jupyter kernels and Terminals]

pyodide/pyodide: "ENH Add API for mounting native file system" #2987: https://github.com/pyodide/pyodide/pull/2987

Are people still building new projects on Node.js? I would have thought the ecosystem was moving to deno or bun now

The Node team has lost the plot IMO.

By far the most critical issue is the over reliance on third party NPM packages for even fundamental needs like connecting to a database.

How does electron do this with its packaged files? I suppose it does not work with module resolution?

I could see something like this being useful if it could be passed to workers to replace any fs access inside the worker.

>Let me be honest: a PR that size would normally take months of full-time work. This one happened because I built it with Claude Code.

The node.js codebase and standard library has a very high standard of quality, hope that doesn't get washed out by sloppy AI-generated code.

OTOH, Matteo is an excellent engineer and the community owes a lot to him. So I guess the code is solid :).

Is node::vfs the new solution for JupyterLite filesystems?

From https://github.com/jupyterlite/jupyterlite/issues/949#issuec... :

> Ideally, the virtual filesystem of JupyterLite would be shared with the one from the virtual terminal.

emscripten-core/emscripten > "New File System Implementation": https://github.com/emscripten-core/emscripten/issues/15041#i... :

> [ BrowserFS, isomorphic-git/lightningfs, ]

pyodide/pyodide: "Native file system API" #738: https://github.com/pyodide/pyodide/issues/738 re: [Chrome,] Filesystem API :

> jupyterlab-git [should work with the same VFS as Jupyter kernels and Terminals]

pyodide/pyodide: "ENH Add API for mounting native file system" #2987: https://github.com/pyodide/pyodide/pull/2987

Worth noting that mcollina is a member of the Node.js Technical Steering Committee

Large PRs could follow the practices that the Linux kernel dev lists follow. Sometimes large subsystem changes could be carried separately for a while by the submitter for testing and maintenance before being accepted in theory, reviewed, and if ready, then merged.

While the large code changes were maintained, they were often split up into a set of semantically meaningful commits for purposes of review and maintenance.

With AI blowing up the line counts on PRs, it's a skill set that more developers need to mature. It's good for their own review to take the mass changes, ask themselves how would they want to systematically review it in parts, then split the PR up into meaningful commits: e.g. interfaces, docs, subsets of changed implementations, etc.

Do as I say, not as I do.

On a more serious note, I think that this will be thoroughly reviewed before it gets merged and Node has an entire security team that overviews these.

How exactly does it violate the Developer's Certificate of Origin clause?

I'm not convinced that allowing Node to import "code generated at runtime" is actually a good thing. I think it should have to go through the hoops to get loaded, for security reasons.

I like the idea of it mocking the file system for tests, but I feel like that should probably be part of the test suite, not Node.

    node -e "new Function('console.log(\"hi\")')()"

or more to the point

    node -e "fetch('https://unpkg.com/cowsay/build/cowsay.umd.js').then((r) => r.text()).then(c => new Function(c + 'console.log(exports.say({ text: \"like this\"}))')())"

that one is particularly bad, because umd messes with the global object - so this works

    node -e "fetch('https://unpkg.com/cowsay/build/cowsay.umd.js').then((r) => r.text()).then(c => new Function(c)()).then(() => console.log(exports.say({ text: 'oh no'})))"

But then you go "hang on, doesn't ESM exist?" and you realize that argument 4 isn't even true. You can literally do what this argument says you can't, by creating a blob instead of "writing a temp file" and then importing that using the same dynamic import we've had available since <checks his watch> 2020.

I do see some original benefits to a VFS though, bad application decisions aside, but they are exceedingly minor.

Why would you want a language enhancement for that, rather than just writing it in JS code? (or perhaps WASM)

Would be nice if node packages could be packed up in ZIP files so to avoid the security/metadata tax for small file access on Windows.

The number of files in the node modules folder is crazy, any amount of organization that can tame that chaos is welcomed.

There are alternative package managers like Yarn that use zip files as a way to store each Node package.[0]

[0] https://yarnpkg.com/advanced/pnp-spec#zip-access

Would accessing deps directly from a zip really be faster? I'd be a little surprised but not terribly, given that it's readonly on an fs designed for RW. If not, maybe just tar?

I remember when Firefox started putting everything into jars for similar reasons.

https://web.archive.org/web/20161003115800/https://blog.mozi...

It’s insane to me that node works how it does. Zip files make so much more sense, I really liked that about Yarn.

Would it work to run a bundler over your code, so all (static) imports are inlined and tree shaken?

    You can’t import or require() a module
    that only exists in memory.

You can convert it into a data url and import that, can't you?

What happens to relative imports?

Yeah but Claude didn't suggest that when it wrote this blog post and did all the work so...

What I really want is a way of swapping FS with VFS in a Node.js program harness. Something like

     node --use-vfs --vfs-cache=BIG_JSON_FILE

So basically Node never touches the disk and load everything from the memory

The way to do this today is to do it outside of node. Using an overlay fs with the overlay being a ramfs. You can even chroot into it if you can't scope the paths you need to be just downstream from some directory. Or, just use docker.

yarn pnp is currently broken on Node v25.7+;

- https://github.com/yarnpkg/berry/issues/7065

- https://github.com/nodejs/node/issues/62012

Strong rec to choose PNPM over yarn. I just posted this in a peer comment: https://news.ycombinator.com/item?id=47415173

Not spamming, not affiliated, just trying to help others avoid so much needless suffering.

Worth noting that mcollina is a member of the Node.js Technical Steering Committee

    node -e "new Function('console.log(\"hi\")')()"

or more to the point

    node -e "fetch('https://unpkg.com/cowsay/build/cowsay.umd.js').then((r) => r.text()).then(c => new Function(c + 'console.log(exports.say({ text: \"like this\"}))')())"

that one is particularly bad, because umd messes with the global object - so this works

    node -e "fetch('https://unpkg.com/cowsay/build/cowsay.umd.js').then((r) => r.text()).then(c => new Function(c)()).then(() => console.log(exports.say({ text: 'oh no'})))"

Why would you want a language enhancement for that, rather than just writing it in JS code? (or perhaps WASM)

Would accessing deps directly from a zip really be faster? I'd be a little surprised but not terribly, given that it's readonly on an fs designed for RW. If not, maybe just tar?

The Node team has lost the plot IMO.

By far the most critical issue is the over reliance on third party NPM packages for even fundamental needs like connecting to a database.

What would a Node-native database connection layer look like? What other platforms have that?

Databases are third party tech, I don’t think it’s unreasonable to use a third party NPM module to connect to them.

Outside of sqlite, what runtimes natively include database drivers?

Are people still building new projects on Node.js? I would have thought the ecosystem was moving to deno or bun now

I don't really understand what the value proposition of Bun and Deno is. And I see huge problems with their governance and long-term sustainability.

Node.js on the other hand is not owned or controlled by one entity. It is not beholden to the whims of investors or a large corporation. I have contributed to Node.js in the past and I was really impressed by its rock-solid governance model and processes. I think this an under-appreciated feature when evaluating tech options.

loud people on twitter are always switching to the new hotness. i personally can't see myself using bun until its reputation for segfaults goes away after a few more years of stabilizing. deno seems neat and has been around for longer, but its node compatibility story is still evolving; i'm also giving it another year before i try it.

Yes people are using Node.js, most likely the majority.

The delusion in this comment is insane.

Do as I say, not as I do.

On a more serious note, I think that this will be thoroughly reviewed before it gets merged and Node has an entire security team that overviews these.

As someone who was a part of the aforementioned security team I'm not sure I'd be interested in reviewing such volume of machine generated code, expecting trap at every corner. The implicit assumption that I observed at many OSS projects I've been involved with is that first time contributions are rarely accepted if they are too large in volume, and "core contributor" designation exists to signal "I put effort into this code, stand by it, and respect everyone's time in reviewing it". The PR in the post violates this social contract.

While the large code changes were maintained, they were often split up into a set of semantically meaningful commits for purposes of review and maintenance.

> With AI blowing up the line counts on PRs,

Well, the process you’re describing is mature and intentionally slows things down. The LLM push has almost the opposite philosophy. Everyone talks about going faster and no one believes it is about higher quality.

How exactly does it violate the Developer's Certificate of Origin clause?

The submitted code must adhere to either of (a), (b), (c), and separately a (d) clause of: https://github.com/nodejs/node/blob/main/CONTRIBUTING.md#dev...

If submitter picks (a) they assert that they wrote the code themselves and have right to submit it under project's license. If (b) the code was taken from another place with clear license terms compatible with the project's license. If (c) contribution was written by someone else who asserted (a) or (b) and is submitted without changes.

Since LLM generated output is based on public code, but lacks attribution and the license of the original it is not possible to pick (b). (a) and (c) cannot be picked based on the submitter disclaimer in the PR body.

There are alternative package managers like Yarn that use zip files as a way to store each Node package.[0]

[0] https://yarnpkg.com/advanced/pnp-spec#zip-access

Strong recommendation to use PNPM instead of yarn or npm. IME (webdev since 1998) it's the only sane tool for stewardship of an npm dependency graph.

See https://pnpm.io/motivation

Also, while popularity isn't necessarily a great indicator of quality, a quick comparison shows that the community has decided on pnpm:

https://www.npmcharts.com/compare/pnpm,yarn,npm

... and of course JAR files in Java are just ZIP files with a little extra metadata and the JVM can unpack them in realtime just fine.

A virtual filesystem makes it possible for the ESM you import to statically import other files in the virtual filesystem, which isn't possible by just dynamically importing a blob. Anything your blob module imports has to be updated to dynamically import its dependencies via blobs.

There's also a module expression proposal, that would remove the need to use blob imports.

https://github.com/tc39/proposal-module-expressions

The number of files in the node modules folder is crazy, any amount of organization that can tame that chaos is welcomed.

And if you thought malware hiding in a mess of files was bad, just wait till you see it in two layers of container files.

Yes people are using Node.js, most likely the majority.

The delusion in this comment is insane.

... and of course JAR files in Java are just ZIP files with a little extra metadata and the JVM can unpack them in realtime just fine.

Strong recommendation to use PNPM instead of yarn or npm. IME (webdev since 1998) it's the only sane tool for stewardship of an npm dependency graph.

See https://pnpm.io/motivation

Also, while popularity isn't necessarily a great indicator of quality, a quick comparison shows that the community has decided on pnpm:

https://www.npmcharts.com/compare/pnpm,yarn,npm

There's also a module expression proposal, that would remove the need to use blob imports.

https://github.com/tc39/proposal-module-expressions

It’s insane to me that node works how it does. Zip files make so much more sense, I really liked that about Yarn.

I remember when Firefox started putting everything into jars for similar reasons.

https://web.archive.org/web/20161003115800/https://blog.mozi...

Would it work to run a bundler over your code, so all (static) imports are inlined and tree shaken?

What happens to relative imports?

Yeah but Claude didn't suggest that when it wrote this blog post and did all the work so...

Outside of sqlite, what runtimes natively include database drivers?

Bun, .NET, PHP, Java

What would a Node-native database connection layer look like? What other platforms have that?

Databases are third party tech, I don’t think it’s unreasonable to use a third party NPM module to connect to them.

Most obviously, Java has JDBC. I think .NET has an equivalent. Drivers are needed but they're often first party, coming directly from the DB vendor itself.

Java also has a JIT compiling JS engine that can be sandboxed and given a VFS:

https://www.graalvm.org/latest/security-guide/sandboxing/

N.B. there's a NodeJS compatible mode, but you can't use VFS+sandboxing and NodeJS compatibility together because the NodeJS mode actually uses the real NodeJS codebase, just swapping out V8. For combining it all together you'd want something like https://elide.dev which reimplemented some of the Node APIs on top of the JVM, so it's sandboxable and virtualizable.

Bun provides native MySQL, SQlite, and Postgres drivers.

I'm not saying Node should support every db in existence but the ones I listed are critical infrastructure at this point.

When using Postgres in Node you either rely on the old pg which pulls 13 dependencies[1] or postgres[2] which is much better and has zero deps but mostly depends on a single guy.

[1] https://npmgraph.js.org/?q=pg

[2] https://github.com/porsager/postgres

Perl has DBI. PHP has PDO.

Wow, I thought you were exaggerating, but no: https://github.com/oven-sh/bun/issues?q=is%3Aissue%20state%3...

Open 80, closed 492.

I don't really understand what the value proposition of Bun and Deno is. And I see huge problems with their governance and long-term sustainability.

Note that Bun was recently acquired by Anthropic.

Faster, no transpilation, dev-ex sugar.

Deno has some pretty nice unique features like sandboxing that, afaik, don't exist in other runtimes (yet). It's enough of a draw that it's the recommended runtime for projects like yt-dlp: https://github.com/yt-dlp/yt-dlp/issues/14404

If one gets nothing from them directly, they've at least been a good kick to get several features into Node. It's almost like neovim was to vim, perhaps to a lesser extent.

I agree about the governance and long-term sustainability points but if you don't see any value in Bun or Deno is probably because (no offense) you are not paying attention.

The submitted code must adhere to either of (a), (b), (c), and separately a (d) clause of: https://github.com/nodejs/node/blob/main/CONTRIBUTING.md#dev...

It would be considered (a) since the author would own the copyright on the code.

> With AI blowing up the line counts on PRs,

Go slow to go fast. Breaking up the PR this way also allows later humans and AI alike to understand the codebase. Slowing down the PR process with standards lets the project move faster overall.

If there is some bug that slips by review, having the PR broken down semantically allows quicker analysis and recovery later for one case. Even if you have AI reviewing new Node.js releases for if you want to take in the new version - the commit log will be more analyzable by the AI with semantic commits.

Treating the code as throwaway is valid in a few small contexts, but that is not the case for PRs going into maintained projects like Node.js.

And if you thought malware hiding in a mess of files was bad, just wait till you see it in two layers of container files.

Or worse yet, the performance load of anti-malware software that has to look inside ZIP files.

Look, most of us realized around 2004 or so that if you had a choice between Norton and the virus you would pick the virus. In the Windows world we standardized around Defender because there is some bound on how much Defender degrades the performance of your machine which was not the case with competitive antivirus software.

I've done a few projects which involved getting container file formats like ZIP and PDF (e.g. you know it's a graph of resources in which some of those resources are containers that contain more resources, right?) and now that I think of it you ought to be able to virus scan ZIP files quickly and intelligently but the whole problem with the antivirus industry is that nobody ever considers the cost.

making that work cross platform is pure pain

Strong rec to choose PNPM over yarn. I just posted this in a peer comment: https://news.ycombinator.com/item?id=47415173

Not spamming, not affiliated, just trying to help others avoid so much needless suffering.

Wow, I thought you were exaggerating, but no: https://github.com/oven-sh/bun/issues?q=is%3Aissue%20state%3...

Open 80, closed 492.

Faster, no transpilation, dev-ex sugar.

Note that Bun was recently acquired by Anthropic.

If one gets nothing from them directly, they've at least been a good kick to get several features into Node. It's almost like neovim was to vim, perhaps to a lesser extent.

I agree about the governance and long-term sustainability points but if you don't see any value in Bun or Deno is probably because (no offense) you are not paying attention.

Go slow to go fast. Breaking up the PR this way also allows later humans and AI alike to understand the codebase. Slowing down the PR process with standards lets the project move faster overall.

Treating the code as throwaway is valid in a few small contexts, but that is not the case for PRs going into maintained projects like Node.js.

Or worse yet, the performance load of anti-malware software that has to look inside ZIP files.

making that work cross platform is pure pain

I just use npm because I like to stay as vanilla as possible.

Bun provides native MySQL, SQlite, and Postgres drivers.

I'm not saying Node should support every db in existence but the ones I listed are critical infrastructure at this point.

When using Postgres in Node you either rely on the old pg which pulls 13 dependencies[1] or postgres[2] which is much better and has zero deps but mostly depends on a single guy.

[1] https://npmgraph.js.org/?q=pg

[2] https://github.com/porsager/postgres

Node has sqlite, though I have not had any issues using better-sqlite3 and worker processes for long running ops

Bun, .NET, PHP, Java

For Bun you're thinking of simple key / values, hardly a database. They also have a SQLite driver which is still just a package.

Most obviously, Java has JDBC. I think .NET has an equivalent. Drivers are needed but they're often first party, coming directly from the DB vendor itself.

Java also has a JIT compiling JS engine that can be sandboxed and given a VFS:

https://www.graalvm.org/latest/security-guide/sandboxing/

> Most obviously, Java has JDBC. I think .NET has an equivalent. Drivers are needed but they're often first party, coming directly from the DB vendor itself.

So it's an external dependency that is not part of Java. It doesn't really matter if the code comes from the vendor or not. Especially for OpenSource databases.

Perl has DBI. PHP has PDO.

Node has sandboxing these days: https://nodejs.org/api/permissions.html

It would be considered (a) since the author would own the copyright on the code.

Citation needed.

Whether AI output can fall under copyright at all is still up for debate - with some early rulings indicating that the fact that you prompted the AI does not automatically grant you authorship.

Even if it does, it hasn't been settled yet what the impact of your AI having been trained on copyrighted material is on its output. You can make a not-completely-unreasonable argument that AI inference output is a derivative work of AI training input.

Fact is, the matter isn't settled yet, which means any open-source project should assume the worst possible outcome - which in practice means a massive AI-generated PR like this should be treated like a nuke which could go off at any moment.

I just use npm because I like to stay as vanilla as possible.

For Bun you're thinking of simple key / values, hardly a database. They also have a SQLite driver which is still just a package.

Node has sqlite, though I have not had any issues using better-sqlite3 and worker processes for long running ops

> Most obviously, Java has JDBC. I think .NET has an equivalent. Drivers are needed but they're often first party, coming directly from the DB vendor itself.

So it's an external dependency that is not part of Java. It doesn't really matter if the code comes from the vendor or not. Especially for OpenSource databases.

yes and no. Waiting 40mins for every test run is pure pain, platform specific ramfs type mounting is quite scriptable. Yes some devs might need to install a dependency, but its not a complex script.

Citation needed.

Whether AI output can fall under copyright at all is still up for debate - with some early rulings indicating that the fact that you prompted the AI does not automatically grant you authorship.

The two main points are that:

1. Copyright cannot be assigned to an AI agent.

2. Copyrighted works require human creativity to be applied in order to be copyrighted.

For point 2 this would apply to times were AI one shots a generic prompt. But for these large PRs where multiple prompts are used and a human has decided what the design should be and how the API should look you get the human creativity required for copyright.

In regards to being a derivative work I think it would be hard to argue that an LLM is copying or modifying an existing original work. Even if it came up with an exact duplicate of a piece of code it would be hard to prove that it was a copy and not an independent recreation from scratch.

>the worst possible outcome

The worst possible outcome is they get sued and Anthropic defends them from the copyright infringement claim due to Anthopic's indemnity clause when using Claude Code.

Node has sandboxing these days: https://nodejs.org/api/permissions.html

No it doesn't, unfortunately.

> The permission model implements a "seat belt" approach, which prevents trusted code from unintentionally changing files or using resources that access has not explicitly been granted to. It does not provide security guarantees in the presence of malicious code. Malicious code can bypass the permission model and execute arbitrary code without the restrictions imposed by the permission model.

Deno's permissions model is actually a very nice feature. But it is not very granular so I think you end up just allowing everything a lot of the time. I also think sandboxing is a responsibility of the OS. And lastly, a lot of use cases do not really benefit from it (e.g. server applications).