Node.js worker threads are problematic, but they work great for us

I like Node.js' simple and fully isolated concurrency model. You shouldn't be blocking the main event loop for 30 seconds! The main event loop is not intended to be used for heavy processing.

You can just set up a separate child process for that. The main event loop which handles connections should just co-ordinate and delegate work to other programs and processes. It can await for them to complete asynchronously; that way the event loop is not blocked.

I recall people have been able to get up to around a million (idle) WebSocket connections handled by a single process.

I was able to comfortably get 20k concurrent sockets per process each churning out 1 outbound message every 3 to 5 seconds (randomized to spread out the load).

It is a good thing that Node.js forces developers to think about this because most other engines which try to hide this complexity tend to impose a significant hidden cost on the server in the form of context switching... With Node.js, there is no such cost, your process can basically have a whole CPU core for itself and it can orchestrate other processes in a maximally efficient way if you write your code correctly... Which Node.js makes very easy to do. Spawning child processes and communicating with them in Node.js is a breeze.

The article calls worker threads "problematic", but it doesn’t really make a strong case for why they’re supposedly problematic.

Having a separate isolate in each threads spawned with the worker threads with a minimal footprint of 10MB does not seem like a high price to pay. It's not like you're going to spawn hundreds of them anyway is it? You will very likely spawn less or as much threads as your CPU cores can handle concurrently. You typically don't run a hundred of threads (OS threads) you use a thread pool and you cap the concurrency by setting a limit of maximum threads to spawn.

This is also how goroutines work under the hood, they are "green threads", an abstraction that operate on top of a much small OS thread pool.

Worker threads have constraints but most of them are intentional, and in many cases desirable.

I’d also add that SharedArrayBuffer doesn’t limit you to “shared counters or coordination primitives”. It’s just raw memory, you could store structured data in it using your own memory layout. There are libraries out there that implement higher-level data structures this way already

Reading the article, I didn’t see this answered: why not scale to more nodes if your workload is CPU bound? Spin off 1 cpu and a few gb of ram container and scale that as wide as you need?

e.g., this certainly helps when the event loop is blocked, but so could FFI calls to another language for the CPU bound work. I’d only reach for a new Node thread if these didn’t pan out, because there’s usually a LOT that goes into spinning up a new node process in a container (isolating the data, making sure any bundlers and transpilers are working, making sure the worker doesn’t pull in all the app code, etc.).

Side car processes aren’t free, either. Now your processes are contending for the same pool of resources and can’t share anything, which IME means more likelihood of memory issues, esp if there isn’t anything limiting the workers your app can spawn.

Still, good article! Love seeing the ways people tackle CPU bound work loads in an otherwise I/O bound Node app.

The worker situation would be much better with inline workers (or modules).

https://github.com/tc39/proposal-module-declarations

Unfortunately the JS standards folks have refused so far to make this situation better.

Ex. it should just be `new Worker(module { ... })`.

Related tangent: Platformatic's "Watt" server^1 takes a pretty interesting approach to Node, leveraging worker threads on all available cores for maximum efficiency.

1. https://docs.platformatic.dev/docs/overview/architecture-ove...

I get its a constraint of the language but the ubiquitousness of bundlers and differing toolchains in the JS world has always made me regret trying to use worker primitives, whether they be web workers, worker threads and more. Not to mention trying to ship them to users via a library being a nightmare as mentioned in the article.

Almost none of them treat these consistently (if they consider these at all) and all require you to work around them in strange ways.

It feels like there is a lot they could help with in the web world, especially in complex UI and moving computation off the main thread but they are just so clunky to use that almost nobody tries to work around it.

The ironic part is if bundlers, transpilers, compilers etc. weren't used at all they would probably have much more widespread use.

The lack of backpressure handling nor promise api for postMessage is also quite annoying, I had many OOMs because of it.

I'm currently writing simulations of trading algorithms for my own use. I'm using worker_threads + SharedArrayBuffer and running them in Bun. I also tried porting the code to C# and Go, but the execution time ended up being very similar to the Bun version. NodeJS was slower. Only C gave a clear, noticeable performance advantage — but since I haven't written C in a long time, the code became significantly harder to maintain.

It’s not weird that you can’t share state between totally different processes except by passing in args.

And you can make it thread-like if you prefer by creating a “load balancer” setup to begin with to keep them CPU bound.

    require('os').cpus().length

Spawn a process for each CPU, bind data you need, and it can feel like multithreading from your perspective.

More here https://github.com/bennyschmidt/simple-node-multiprocess

I went through a similar journey trying worker threads for CPU-bound work in Node. The serialization cost of passing data between threads ate most of my gains, especially with larger inputs. Ended up going the napi-rs route instead — Rust addon running in the main thread with near-zero FFI overhead. Different tradeoff since you lose the parallelism, but for my workload the raw speed was already enough.

- you should be using multiple node processes - you should be spawning tools to do heavy computation

I like Node.js' simple and fully isolated concurrency model. You shouldn't be blocking the main event loop for 30 seconds! The main event loop is not intended to be used for heavy processing.

I recall people have been able to get up to around a million (idle) WebSocket connections handled by a single process.

I was able to comfortably get 20k concurrent sockets per process each churning out 1 outbound message every 3 to 5 seconds (randomized to spread out the load).

Related tangent: Platformatic's "Watt" server^1 takes a pretty interesting approach to Node, leveraging worker threads on all available cores for maximum efficiency.

1. https://docs.platformatic.dev/docs/overview/architecture-ove...

Almost none of them treat these consistently (if they consider these at all) and all require you to work around them in strange ways.

The ironic part is if bundlers, transpilers, compilers etc. weren't used at all they would probably have much more widespread use.

Reading the article, I didn’t see this answered: why not scale to more nodes if your workload is CPU bound? Spin off 1 cpu and a few gb of ram container and scale that as wide as you need?

Still, good article! Love seeing the ways people tackle CPU bound work loads in an otherwise I/O bound Node app.

> but so could FFI calls to another language for the CPU bound work

Worker threads can be more convenient than FFI, as you don't need to compile anything, you can reuse the main application's functions, etc.

It’s not weird that you can’t share state between totally different processes except by passing in args.

And you can make it thread-like if you prefer by creating a “load balancer” setup to begin with to keep them CPU bound.

    require('os').cpus().length

Spawn a process for each CPU, bind data you need, and it can feel like multithreading from your perspective.

More here https://github.com/bennyschmidt/simple-node-multiprocess

I love the simplicity of Node.js that each process or child process can have its own CPU core with essentially no context switching (assuming you have enough CPU cores).

Most other ways are just hiding the context switching costs and complicating monitoring IMO.

- you should be using multiple node processes - you should be spawning tools to do heavy computation

> but so could FFI calls to another language for the CPU bound work

Worker threads can be more convenient than FFI, as you don't need to compile anything, you can reuse the main application's functions, etc.

I love the simplicity of Node.js that each process or child process can have its own CPU core with essentially no context switching (assuming you have enough CPU cores).

Most other ways are just hiding the context switching costs and complicating monitoring IMO.

The worker situation would be much better with inline workers (or modules).

https://github.com/tc39/proposal-module-declarations

Unfortunately the JS standards folks have refused so far to make this situation better.

Ex. it should just be `new Worker(module { ... })`.

We haven't refused, it just takes time! There was an update at the meeting two weeks ago [1]. There's a lot of other machinery which needs to be specified and implemented before module declarations will work but it's coming along.

[1] https://docs.google.com/presentation/d/1inTcnb4hugyAvKrjFX_X...

It's all about security, see my other comment. https://news.ycombinator.com/item?id=47480080

The lack of backpressure handling nor promise api for postMessage is also quite annoying, I had many OOMs because of it.

It's not ideal, the api is kind of low-yet-high-level and that brings some complications.

Move backpressure handling onto the task producer and use a SharedArrayBuffer between the producer and worker, where the worker atomically updates a work-count or current work item ID in that SharedArrayBuffer that the producer can read (atomically) to determine how far along the worker has gotten.

The article calls worker threads "problematic", but it doesn’t really make a strong case for why they’re supposedly problematic.

This is also how goroutines work under the hood, they are "green threads", an abstraction that operate on top of a much small OS thread pool.

Worker threads have constraints but most of them are intentional, and in many cases desirable.

The problematic part is mostly in nomenclature. They’re called “threads” but don’t really behave the way you’d expect threads to.

They’re heavy, they don’t share the entire process memory space (ie can’t reference functions), and I believe their imports are separate from each other (ie reparsed for each worker into its own memory space).

In many ways they’re closer to subprocesses in other languages, with limited shared memory.

It’s not “clean” to spin up thousands of threads, but it does work and sometimes it’s easier to write and reason about than a whole pipeline of distributing work to worked threads. I probably wouldn’t do it in a server, but in a CLI I would totally do something like spawn a thread for each file a user wants to analyze and let the OS do task scheduling for me. If they give me a thousand files, they get a thousand threads. That overhead is pretty minimal with OS threads (on Linux, Windows is a different beast).

It's all about security, see my other comment. https://news.ycombinator.com/item?id=47480080

[1] https://docs.google.com/presentation/d/1inTcnb4hugyAvKrjFX_X...

It's not ideal, the api is kind of low-yet-high-level and that brings some complications.

The problematic part is mostly in nomenclature. They’re called “threads” but don’t really behave the way you’d expect threads to.

In many ways they’re closer to subprocesses in other languages, with limited shared memory.

No, they're threads as far as the OS is concerned (they'll map to OS threads) and actually _do_ share physical process and memory (that's how SharedArrayBuffer works).

However, apart from atomic "plain" memory no objects are directly shared (For Node/V8 they live in so called Isolated iirc) so from a logical standpoint they're kinda like a process.

The underlying reason is that in JavaScript objects are by default open to modification, ie:

  const t = {x:1,y:2};
  t.z = 3;
  console.log(t); // => { x: 1, y: 2, z: 3 }

To get sane performance out of JS there are a ton of tricks the runtime does under the hood, the bad news is that those are all either slow (think Python GIL) or heavily exploitable in a multithreaded scenario.

If you've done multithreaded C/C++ work and touched upon Erlang the JS Worker design is the logical conclusion, message passing works for small packets (work orders, structured cloning) whilst large data-shipping can be problematic with cloning.

This is why SharedArrayBuffer:s allows for no-copy sharing since the plain memory arrays they expose don't offer any security surprises in terms of code execution (spectre style attacks is another story) and also allows for work-subdivision if needed.

I think the isolation and memory safety guarantees that worker threads (or Web Workers) provide are very welcome. The friction mainly comes from ergonomics, as pointed out in the article. So there’s definitely room for improvement there (even within the current constraints).

A worker thread or Web Worker runs in its own isolate, so it needs to initialise it by parsing and executing its entry point. I'm not quite sure whether that's something that already happens but you could imagine optimising this by caching or snapshotting the initial state of an isolate when multiple workers use the same entry point, so new workers can start faster.

That cannot be done with the original main thread isolate because usually the worker environment has both different capabilities than the main isolate and a different entry point.

If I have to handle 1000 files in a small CLI I would probably just use Node.js asynchronous IO in a single thread and let it handle platform specifics for me! You’ll get very good throughput without having to handle threads yourself.

If it’s any comfort, I don’t hear many JS/TS/Node/etc developers calling them threads or really thinking of them that way. Usually just Workers or Web Workers — "worker threads" mostly slips in from Node. Even then, "worker" dominates.

In terms of tradeoffs, if you’re coming from the single event loop model, they’re pretty consistent with the rest of JS. Isolation-first, explicit sharing, fewer footguns. So I think the tradeoffs are the right tradeoffs.

FWIW, traditional threads have their own tradeoffs (especially around IO). In JS that’s mostly a non-issue, so the "I need 1000s of threads" case just doesn’t come up very often.

That cannot be done with the original main thread isolate because usually the worker environment has both different capabilities than the main isolate and a different entry point.

No, they're threads as far as the OS is concerned (they'll map to OS threads) and actually _do_ share physical process and memory (that's how SharedArrayBuffer works).

However, apart from atomic "plain" memory no objects are directly shared (For Node/V8 they live in so called Isolated iirc) so from a logical standpoint they're kinda like a process.

The underlying reason is that in JavaScript objects are by default open to modification, ie:

  const t = {x:1,y:2};
  t.z = 3;
  console.log(t); // => { x: 1, y: 2, z: 3 }

FWIW, traditional threads have their own tradeoffs (especially around IO). In JS that’s mostly a non-issue, so the "I need 1000s of threads" case just doesn’t come up very often.

Node.js runs on a single thread. That's usually fine. The event loop handles I/O concurrency without you thinking about locks, races, or deadlocks. But "single-threaded" has a cost that only shows up under pressure: if your JavaScript monopolizes the CPU, nothing else runs. No timers fire. No network callbacks execute. No I/O completes.

We ran into this with Inngest Connect, a persistent WebSocket connection between your app and the Inngest server. Connect is an alternative to our HTTP-based model (our serve function) that reduces TCP handshake overhead and avoids long-lived HTTP requests during long-running steps. Workers send heartbeats over the WebSocket so the server knows they're alive.

The problem: Users reported "no available worker" errors despite their workers running.
The cause: CPU-heavy user code was monopolizing the main thread, starving the event loop, and blocking heartbeats. The server assumed the workers were dead and stopped routing work to them.
The fix: Move Connect internals into a worker thread.

But getting there taught us a few things about how worker threads actually work in Node.js, and how they compare to threading models in other languages.

Connect's worker thread isolation is a new feature in v4 of the Inngest TypeScript SDK. Upgrade to get it automatically.

This post focuses on Node.js, but Bun and Deno also support worker threads.

Event loop starvation

Node's event loop processes callbacks in phases: timers, I/O polling, setImmediate, close callbacks. Between each phase, it checks for microtasks (resolved promises, queueMicrotask). The critical property: the loop can only advance when the current JavaScript execution yields.

A synchronous function that runs for 30 seconds blocks everything for 30 seconds. That includes setTimeout callbacks, incoming network data, and any other scheduled work. The timers don't fire at all until the thread is free.

Consider a heartbeat scheduled with setInterval every 10 seconds. The callback is queued, ready to fire. Then a CPU-heavy function starts and runs for 30 seconds straight. The heartbeat callback sits in the timer queue the entire time, and multiple intervals pass without a single one going out. By the time the function returns, the server has already timed out and marked the worker as dead.

The standard advice is "don't block the event loop." But sometimes you're running user code and you don't control what it does. You need the critical path (heartbeats, connection management) isolated from user workloads. That's where worker threads come in.

What worker threads give you

The worker_threads module lets you spin up additional JavaScript execution contexts within the same process. Each worker gets its own V8 isolate, its own heap, and its own event loop. Critically, one worker's CPU-bound code does not block another worker's event loop.

Let's look at the basic API. You'll notice that the two threads communicate by sending messages to each other.

main.js

javascript

import { Worker } from "node:worker_threads";

// Spawns a new thread running worker.js
const worker = new Worker("./worker.js", {
  workerData: { greeting: "hello" }, // Initial data, cloned into the worker
});

// Receive messages from the worker
worker.on("message", (msg) => {
  console.log("from worker:", msg);
});

// Send a message to the worker
worker.postMessage({ type: "ping" });

worker.js

javascript

import { parentPort, workerData } from "node:worker_threads";

// workerData is the cloned initial payload
console.log(workerData.greeting); // "hello"

// parentPort is the channel back to the main thread
parentPort.on("message", (msg) => {
  if (msg.type === "ping") {
    parentPort.postMessage({ type: "pong" });
  }
});

Two independent event loops, each able to run JavaScript without blocking the other.

The constraints

Worker threads solve the isolation problem, but they come with constraints that feel jarring if you've used concurrency primitives in other languages.

You can't pass logic to a worker

In Go, you spawn a goroutine with a function:

go func() {
  fmt.Println("hello from goroutine")
}()

In Rust, you spawn a thread with a closure:

rust

std::thread::spawn(|| {
  println!("hello from thread");
});

In Python, you pass a function to a new thread:

threading.Thread(target=lambda: print("hello from thread")).start()

In each of these languages, you hand arbitrary logic directly to the concurrency primitive. The function (or closure, or lambda) carries its logic and captured state together.

A caveat on Python: the GIL (Global Interpreter Lock) prevents threads from executing Python bytecode in parallel, so threads won't speed up CPU-bound work. They're still useful for protecting I/O from being blocked by other threads, which is similar to our heartbeat problem. Python 3.13 introduced an experimental free-threaded mode that removes the GIL, so this limitation is on its way out.

Node.js worker threads don't work this way. You can't pass a function to new Worker(). The structured clone algorithm, which serializes data between threads, can't serialize functions. Instead, you point the worker at a file:

javascript

const worker = new Worker("./my-worker.js");

// There is no equivalent of this:
// const worker = new Worker(() => { doStuff() });

Think of the structured clone algorithm as JSON.stringify/JSON.parse with broader type support (Map, Set, Date, ArrayBuffer, circular references). The key similarity: neither can serialize functions.

This means every worker thread is an independent program with its own entry point, imports, and initialization. You design the communication protocol up front and exchange serialized messages, which makes the experience closer to writing a microservice than spawning a concurrent task.

Communication is message passing

In Go, goroutines share the same address space, and in Python, threads share the same heap. In both languages, passing data between concurrent tasks is cheap because it stays in memory.

Node.js workers are isolated by default. They communicate via postMessage and event listeners. Data is serialized using the structured clone algorithm, meaning most JavaScript values (objects, arrays, typed arrays, Map, Set) are deep-copied between threads. Every message is serialized on one side and deserialized on the other. For small messages this is negligible, but large payloads (big JSON blobs, deeply nested objects) pay a real cost in both CPU time and memory, since the data exists in both heaps simultaneously.

If you need shared state instead of message passing, SharedArrayBuffer lets threads share raw memory, and Atomics provides thread-safe operations on it. This avoids serialization entirely, but you're limited to typed arrays of numbers. It's well suited for shared counters, flags, or coordination primitives, but not for passing structured messages like the ones Connect exchanges between threads.

Bundlers can't see your worker file

This one is subtle and annoying. Bundlers (webpack, esbuild, Rollup) perform static analysis to discover imports and produce optimized bundles. When they encounter a standard import or require, they follow the dependency graph automatically. But new Worker("./worker.js") isn't an import. It's a string argument to a constructor.

Modern bundlers recognize a specific pattern:

javascript

new Worker(new URL("./worker.js", import.meta.url));

If you use this exact syntactic form with a string literal, webpack 5+ will detect it, resolve the file, and emit it as a separate bundle. But any indirection breaks the detection. None of the following examples are detected by webpack:

javascript

const path = "./worker.js";
new Worker(new URL(path, import.meta.url));

new Worker(new URL(`./workers/${name}.js`, import.meta.url));

const url = new URL("./worker.js", import.meta.url);
new Worker(url);

TypeScript adds another layer. The file extension you reference in the new URL() call depends on your toolchain: webpack resolves .ts through its loader pipeline, plain tsc expects .js (because the URL resolves at runtime against compiled output), and esbuild doesn't auto-detect workers at all. Each requires a different approach.

If you're building a library that uses worker threads internally, this gets worse. Your library's worker file needs to survive your consumer's bundler, which you don't control. The file must be explicitly included in the build output as a separate entry point, typically through a build script or bundler plugin.

For our TypeScript SDK, we added the worker file as an explicit entry point in our tsdown config so it gets compiled and included in the package output. We also had to make the file extension dynamic (.js or .ts) based on the file type of the caller, since consumers using ts-node or tsx run against .ts files directly while compiled environments expect .js.

They aren't lightweight

Each worker thread is a full V8 isolate with its own heap and event loop. That means roughly 10 MB of memory overhead per worker and a startup cost in the tens of milliseconds. Goroutines start at a few KB, and a Go program can comfortably run thousands of them. OS-level threads in Rust, C, or Python are an order of magnitude cheaper than a V8 isolate. You won't be running a pool of hundreds of Node.js workers the way you might with goroutines or threads.

This makes worker threads best suited for long-lived workers that justify the overhead, not short-lived tasks you spin up and tear down frequently.

How we used worker threads for Inngest Connect

The architecture shift

Back to the original problem. Connect maintains a persistent WebSocket to the Inngest server. The server pushes function invocations over this connection, and the SDK executes them. Heartbeats flow back to confirm the worker is alive. Any user running CPU-heavy functions with Connect could hit the starvation problem, since a single long computation would block heartbeats and cause the server to drop the connection.

The architecture before worker threads looked like this:

Main Thread

User code execution

Connect internals

‣ WebSocket conn

‣ Heartbeats

‣ Reconnection

‣ Auth handshakes

Everything shared one event loop, so when a user's function did something CPU-intensive (heavy computation, data transformation, image processing) the heartbeat timer couldn't fire and the server timed out.

After worker threads:

Main Thread

User code execution

messages←→

Worker Thread

Connect internals

‣ WebSocket conn

‣ Heartbeats

‣ Etc.

The Connect internals (WebSocket, heartbeats, etc.) live in a worker thread, while user code execution stays on the main thread. The two communicate via message passing.

Now a CPU-heavy function can saturate the main thread's event loop for as long as it wants. The worker thread's event loop keeps ticking independently, so heartbeats go out on time and the server knows the worker is alive.

Message passing in practice

The main traffic is forwarded WebSocket frames: invocations flow from the worker thread to the main thread for execution, and results flow back.

Logging unexpectedly became a message passing problem, too. The Inngest SDK lets users pass a custom logger (Winston, Pino, any compatible logger). That logger lives on the main thread, and since it's an object with methods, the structured clone algorithm can't serialize it. The worker thread can't call logger.info() directly.

So log messages became part of the protocol. The worker thread posts structured log entries (level, message, context) to the main thread, which feeds them into the user's logger. From the user's perspective, logs from the worker thread look like logs from anywhere else in the SDK. From our perspective, it's another message type in the protocol.

This pattern generalizes: anything that relies on user-provided objects (loggers, callbacks, configuration with function values) has to stay on the main thread. The worker thread can only request that the main thread use them on its behalf.

Respawning with backoff

We also had to handle the worker thread dying. A bug in the connection logic, an unhandled exception, or a V8 out-of-memory error could crash the worker. If the main thread doesn't notice and respawn it, the user's app silently loses its connection to Inngest. So the main thread watches for the worker's exit event and spins up a replacement.

But naive respawning is dangerous. If the worker hits a pathological error (a bad server response it can't parse, a misconfiguration that crashes on startup) it could die immediately after spawning, over and over. Without a backoff, the main thread would enter a tight respawn loop, burning CPU and flooding logs. We added exponential backoff to the respawn logic: the first restart is immediate, the second waits a short interval, and each subsequent restart doubles the delay up to a cap. A successful startup resets the backoff. This keeps the system self-healing under transient failures without spiraling under persistent ones.

The tradeoffs

Every constraint described above applied:

We couldn't move "just the heartbeat logic" into a worker. We had to move all the connection management, because the worker thread is a separate file with its own entry point and its own initialization.
All communication between the SDK's public API (main thread) and the connection internals (worker thread) had to be designed as a message protocol.
The worker thread file had to be explicitly compiled and included in the SDK's published package, because bundlers couldn't discover it through static analysis.

But the result was worth it. The "no available worker" errors stopped.

Wrapping up

Worker threads are a genuine tool for isolating work in Node.js, but the programming model is fundamentally different from goroutines, Rust threads, or Python's threading module. You're not spawning a concurrent task with a closure. You're launching a separate program and communicating over a serialized message channel. That's more work up front, but it gives you hard isolation, which is exactly what you need when the main thread can't be trusted to yield.

If you're hitting event loop starvation in your own Node.js apps, worker threads are worth considering. If you're building with Inngest, Connect handles all of this for you. And if you're new to Inngest, start here to see how it works.

Hacker Times

Hacker Times

Node.js worker threads are problematic, but they work great for us

Discussion

Discussion

Event loop starvation

What worker threads give you

The constraints

You can't pass logic to a worker

Communication is message passing

Bundlers can't see your worker file

They aren't lightweight

How we used worker threads for Inngest Connect

The architecture shift

Message passing in practice

Respawning with backoff

The tradeoffs

Wrapping up