PostgreSQL and the OOM Killer: Why We Use Strict Memory Overcommit

(Ozgun from Ubicloud)

I agree with the blog post's technical contents, but I feel we came across too strong in the title. For Ubicloud as a managed Postgres provider, we use strict memory overcommit. Our experience with operating Postgres at scale taught us that it's better to enable this than going with the defaults.

However, I can see many other scenarios, where using strict memory overcommit would have unanticipated side-effects. That's why Linux doesn't go with strict memory commit as its default.

This has bitten me multiple times. The problem I have is that at work we deploy the application (written in Go) and PostgreSQL on the same machine. The backend app allocates a lot of virtual memory, and initially we had overcommit to 0 (heuristic). This caused crashes on big queries in PostgreSQL and we set it to 2. The whole system became a bit unstable because the backend would still allocate a lot of virtual memory and at some point we ran into errors when allocating.

For now, we have overcommit_ratio set to a value that is stable from experience, but there really seems to be no silver lining. Go is very happy to allocate a lot of virtual memory, but so are most managed languages. The best solution would probably be to host the backend and the database on separate servers.

They allude to this in the article but I would emphasize caution when using mode 2 especially if one has already adjusted overcommit ratios as one can prevent forks. Test this in a QA/Perf environment first, also testing the restart of all applications. Load test and do full QA tests before deploying to Production and even then when deploying to production I would just dynamically change the setting via app deployment scripts until confidence is high instead of putting it in the sysctl config files.

I've gone through this exercise in the past on much older kernels which they cover as well and just me personally I ran into less issues by leaving overcommit to 0 and just dropping the overcommit ratio to 0 and setting the oom_score_adj for programs as high as 1000 if I wanted vmscan to leave them alone and of course using the Redhat formulas for setting vm.min_free_kbytes, vm.admin_reserve_kbytes, vm.user_reserve_kbytes. And of course be vigilant in disallowing app owners from using every last bit of memory.

Nothing worse than memory management on Hyperscaler VMs which do not use Swap :|

Took k8s ages to get Swap support.

We lost something when we accepted that Hyperscalers just tell you to use more moemory. It was shitty 5 years ago and today especially after the ram price increases

I read this article about 3 weeks ago when this bit me. Really great write-up, some tricky details.

I think this is also a good lesson on why it's best to isolate mission-critical services like databases on their own compute nodes.

I have disabled overcommit both on Windows and on Linux. I hate having random programs being killed.

Unfortunately, many programs commit 2x memory than they actually use. Often I see ~32GB committed and ~16GB resident.

I read this article about 3 weeks ago when this bit me. Really great write-up, some tricky details.

I think this is also a good lesson on why it's best to isolate mission-critical services like databases on their own compute nodes.

(Ozgun from Ubicloud)

However, I can see many other scenarios, where using strict memory overcommit would have unanticipated side-effects. That's why Linux doesn't go with strict memory commit as its default.

(Furkan, submitter) Hmm, I haven’t thought about that. I updated the title to better reflect Ubicloud Postgres' position.

I'm not sure if you are aware but there are relatively recent environment variables you can set to help contain Go memory to a fixed size.

GOMEMLIMIT works very well if you set it to around 90% of available memory as a rough heuristic. You should definitely profile your application to fine tune this number (e.g. if you link with C libraries that hold large memory pools then Go doesn't account for that) but also to identify sources of spikey/leaky allocations. For example, encoding/json is notorious for it's inner sync.Pool hanging on to outsized buffers. There's usually a lot of low hanging fruit.

In my experience Go can be extremely stable in terms of memory footprint at both small (~O(1MiB)) and large (~O(256GiB)) scales, and it takes only a small amount of effort.

As far as GC languages go, it is by far the easiest to work with.

Yes, it would. Basically every serious database tries to allocate everything and more - back in the day we'd just allocate VMs on the machine even with the overhead because knowing it cannot leave its constraints and would work within them was worth the cost.

Correcting a rather significant typo: setting the oom_score_adj for programs as high as 1000 should be -1000 to be left alone. 1000 would make it a prime candidate for an OOM kill. Positive integers should be used on sacrificial superfluous programs. [1] As an example OpenSSH sets the sshd to -1000 by default.

[1] - https://man7.org/linux/man-pages/man5/proc_pid_oom_score_adj...

Nothing worse than memory management on Hyperscaler VMs which do not use Swap :|

Took k8s ages to get Swap support.

We lost something when we accepted that Hyperscalers just tell you to use more moemory. It was shitty 5 years ago and today especially after the ram price increases

My guess would be: it's because memory management before MGLRU was really not good and required different userspace solutions and tinkering. You either get killed with OOM (no swap) or got into thrashing (swap).

And now, with PSI + MGLRU, situation is much better, but there are still missing features/subsystems which would be nice to have. For example there's no simple way to lock memory mlockall-style to ensure that rarely used daemon would not face long no-cache-latency upon accessing the first time after long idle time.

I have disabled overcommit both on Windows and on Linux. I hate having random programs being killed.

Unfortunately, many programs commit 2x memory than they actually use. Often I see ~32GB committed and ~16GB resident.

I'm not sure if you are aware but there are relatively recent environment variables you can set to help contain Go memory to a fixed size.

In my experience Go can be extremely stable in terms of memory footprint at both small (~O(1MiB)) and large (~O(256GiB)) scales, and it takes only a small amount of effort.

As far as GC languages go, it is by far the easiest to work with.

[1] - https://man7.org/linux/man-pages/man5/proc_pid_oom_score_adj...

Does this result in programs more frequently erroring/crashing because they can't allocate? I don't know how well many of the programs I frequently use on my desktop (Firefox, GNOME desktop, JVM + IntelliJ, Slack, etc.) handle allocation failures. I'm not sure they would do much better than crash, but I know the default OOM killer settings work well for me. About once a year a real runaway process (usually a throwaway program I'm working on) gets OOM-killed, and that's fine with me.

how exactly did you disabled it on Windows?

I dont think it has an option for that.

(Furkan, submitter) Hmm, I haven’t thought about that. I updated the title to better reflect Ubicloud Postgres' position.

Is this an AI response?

There are many reasons to use a dedicated host (or VM) for a DB server, but if only the accessible memory needs to be limited a container is the simpler, more efficient tool. Said that, I would expect to be able to configure how much memory a DB process is allowed to allocate. I remember distinctly that PostgreSQL allows such. But of course both can be configured simultaneously, a belts&suspenders approach if you will.

Whether failed transactions are actually so much more desirable than a OOM-killed process isn't quite obvious, but it might be easier to troubleshoot.

> Does this result in programs more frequently erroring/crashing because they can't allocate?

I run Firefox, VSCodium with LSP, Discord, Signal and there's still space left for a game like CS2. I'm not a heavy user by any means.

> I'm not sure they would do much better than crash

I have yet to see a program that silently handles allocation failures and doesn't crash. These days everything is coded to crash if no memory :(

> About once a year a real runaway process (usually a throwaway program I'm working on) gets OOM-killed

In my case it killed system critical processes with no way to recover. With disabled overcommit, it freezes for a while (usually for a minute or two), I close some random program of my choosing and then see in Resource Monitor what's eating my ram.

how exactly did you disabled it on Windows?

I dont think it has an option for that.

I don't think it has overcommit at all, at least that's the default. That would be why you don't have Windows OOM killer stories.

Not overcommitting is Windows's default and only behavior

A memory allocator can implement overcommit, because you can separate reserving virtual memory and having it backed by physical memory into two different system calls. But from the point of view of the kernel, any time it promises to give you physical memory that memory is backed either by RAM or by space reserved in the swap file

Settings -> View advanced system settings -> Performance (Settings) -> Advanced -> Virtual memory (Change...) -> No paging file

Is this an AI response?

No.. But I have been using a lot of AI, recently. It might have impacted how I form my phrases? maybe?

> Does this result in programs more frequently erroring/crashing because they can't allocate?

I run Firefox, VSCodium with LSP, Discord, Signal and there's still space left for a game like CS2. I'm not a heavy user by any means.

> I'm not sure they would do much better than crash

I have yet to see a program that silently handles allocation failures and doesn't crash. These days everything is coded to crash if no memory :(

> About once a year a real runaway process (usually a throwaway program I'm working on) gets OOM-killed

Not overcommitting is Windows's default and only behavior

No.. But I have been using a lot of AI, recently. It might have impacted how I form my phrases? maybe?

I don't think it has overcommit at all, at least that's the default. That would be why you don't have Windows OOM killer stories.

The reason you hear less about Window's OOM killer is simply because it works well.

The Linux Kernel OOM killer kills random things. Userspace OOM killers are meant to improve this, and they work well in a server situation when you already know in advance what is likely to go haywire and what is safe to kill. But they don't work well on desktop (some of them are improving but it doesn't seem to be a priority).

The Windows OOM killer by comparison usually kills something sensible (i.e. the program that is actually using all the memory), and asks the user for permission before killing it (when possible). You do see a lot of memes of situations where it fails.

Settings -> View advanced system settings -> Performance (Settings) -> Advanced -> Virtual memory (Change...) -> No paging file

That's disabling swap, not overcommit. Windows doesn't overcommit. It's one of the reason why it handles low memory situations so much more gracefully than Linux.

This is almost always a bad idea.

If no memory is available where a page file would make a difference, this leads to application crashes instead. A crash is (usually) worse than paging.

Certain applications, Photoshop being the historical example, will outright fail to run with no page file present.

The reason you hear less about Window's OOM killer is simply because it works well.

damn, good observation, when my data analysis python script goes wrong and allocates 24 GB of RAM on a 32 GB computer, it crashes (gets killed) with "out of memory" error. I've never seen something else getting killed

That's disabling swap, not overcommit. Windows doesn't overcommit. It's one of the reason why it handles low memory situations so much more gracefully than Linux.

    The purpose of the system commit limit and commit charge is to track all uses of these resources to ensure they are never overcommitted — that is, that there is never more virtual address space defined than there is space to store its contents, either in RAM or in backing store (on disk).

- Windows Internals, 7th Edition

This is almost always a bad idea.

If no memory is available where a page file would make a difference, this leads to application crashes instead. A crash is (usually) worse than paging.

Certain applications, Photoshop being the historical example, will outright fail to run with no page file present.

> this leads to application crashes instead

Same happens if the page file is full. In that case, why don't those programs use disk directly instead?

No such problem would've ever occured if programs hadn't allocated more than they actually use.

    The purpose of the system commit limit and commit charge is to track all uses of these resources to ensure they are never overcommitted — that is, that there is never more virtual address space defined than there is space to store its contents, either in RAM or in backing store (on disk).

- Windows Internals, 7th Edition

> this leads to application crashes instead

Same happens if the page file is full. In that case, why don't those programs use disk directly instead?

No such problem would've ever occured if programs hadn't allocated more than they actually use.

By default, windows uses an expandable page file.

Typically, performance drops enough that the user kills the program or reboots before the page file expands to fill the disk. And other threads here suggest there is something that will prompt users to kill programs in states like this.

> No such problem would've ever occured if programs hadn't allocated more than they actually use.

That's part of the issue, but sometimes things do in fact use too much memory as well as allocate too much.

Another part of the issue is that few programs are built to handle allocation failures.

And then you have a metrics issue. There's not really a good metric to know when you're out of memory, other than performance collapse. If your applications don't use disk, it's not too hard; but when they do use disk, performance will collapse once there's insufficient memory to provide the disk caching needed. In my experience, adding a small swap and monitoring swap i/o can be pretty helpful, and a small swap doesn't tend to allow long thrashing when memory use grows. But that's not universal and everybody loves to hate swap these days.

Your argument falls flat when a page file can be multi-GB and automatically grow. And if your application admin was competent, memory monitoring would be part of the application monitoring stack.

An application that grows in such a way (besides having backing stores for memory-mapped files, as well) will often perform so poorly that it requires addressing (adding RAM, looking for application faults, etc).

A page file is insurance, one that can last you much longer than available system memory.

By default, windows uses an expandable page file.

> No such problem would've ever occured if programs hadn't allocated more than they actually use.

That's part of the issue, but sometimes things do in fact use too much memory as well as allocate too much.

Another part of the issue is that few programs are built to handle allocation failures.

Your argument falls flat when a page file can be multi-GB and automatically grow. And if your application admin was competent, memory monitoring would be part of the application monitoring stack.

A page file is insurance, one that can last you much longer than available system memory.

All Blog Posts

April 27, 2026 · 10 min read

Burak Yucesoy

Principal Software Engineer

Our team members built and operated five managed PostgreSQL services over the past 15 years. Across all of them, one configuration has remained constant: strict memory overcommit.

In this blog post, we will explain how strict memory overcommit protects your database from catastrophic OOM (out of memory) kills. We will also share how a three-character kernel bug forced us to temporarily disable this setting. Finally, we will explain our heuristic for determining the right memory overcommit limit. Hopefully, this will help you find the right setting for your workloads.

Why PostgreSQL Can't Tolerate the OOM Killer

Linux allows processes to allocate more virtual memory than what is physically available. When a process allocates memory, for example with malloc(), the kernel reserves virtual address space for it. However, the kernel does not immediately back that space with physical memory. Physical pages are only consumed when the process actually touches the memory.

The kernel relies on the assumption that not all allocated memory will be actively used at the same time. Usually, this assumption holds. When it doesn’t, the kernel invokes the OOM killer to free memory by terminating a process.

For most processes, handling an OOM kill is simple: the process restarts, reconnects, and picks up where it left off. PostgreSQL is different.

PostgreSQL's postmaster (its main supervisor process) forks a backend process for each connection. These backends share memory segments that hold shared buffers, WAL buffers, lock tables, and other shared state. The OOM killer doesn't understand this architecture. It simply picks a process based on an heuristic (usually the process that uses the most memory) and terminates it. If that backend was modifying a shared memory segment, the segment may be left in an inconsistent state. Shared memory has no transactional guarantees at the OS level. A half-written page in shared buffers means silent data corruption.

PostgreSQL's postmaster knows this. When it detects that any of its child processes has been killed, it assumes the worst: shared memory may be corrupted. When shared memory is corrupted, there is a risk of corrupting the stored data as well. To prevent this, the postmaster terminates all remaining backends. Every active connection is dropped. Every in-flight transaction is aborted. On its next start, the database goes through crash recovery.

This is the correct behavior. PostgreSQL is protecting your data. But it means a single OOM kill doesn't just affect one connection. It takes down every connection on the server. On top of that, if the write volume was high, replaying all WAL files for crash recovery can take a long time. This means a single out of memory case can cause long outages.

Strict Overcommit: Fail Early, Not Catastrophically

It is possible to configure how the kernel behaves when processes ask for memory. Linux provides three overcommit policies via vm.overcommit_memory:

Mode 0 (Heuristic): The default. The kernel refuses any single allocation larger than what the system could realistically provide (roughly free memory + swap + reclaimable page cache and slab), but otherwise allows overcommitting freely. In practice, this only blocks absurd requests like a single process asking for more memory than the entire system memory.
Mode 1 (Always): The kernel never refuses an allocation request, regardless of how large it is or how much memory has already been committed. Every malloc() and mmap() succeeds. If processes later fault in more physical memory than the system can actually provide, the OOM killer steps in to free memory by terminating a process.
Mode 2 (Strict): The kernel tracks the total committed virtual memory across all processes in Committed_AS and enforces an upper bound called CommitLimit. Any allocation that would push Committed_AS past CommitLimit is refused immediately with ENOMEM.

Under strict overcommit, the kernel has two knobs to set CommitLimit: overcommit_kbytes and overcommit_ratio. The CommitLimit is calculated as:

CommitLimit = overcommit_kbytes + swap

Or, if overcommit_kbytes is not set:

CommitLimit = overcommit_ratio / 100 * available_memory + swap

When allocation fails with ENOMEM error code. PostgreSQL handles this gracefully. A backend that cannot allocate memory reports an error to the client, cancels the transaction, and continues. The postmaster stays up. Other connections remain unaffected. This is a routine error, not a catastrophe. The trade-off is that strict overcommit converts late, destructive failures into early, graceful ones.

This trade-off works best when the machine is dedicated to PostgreSQL and a small set of known sidecar processes. In that scenario, the committed memory profile is predictable and the limit can be tuned with confidence. On shared machines running diverse workloads, committed memory becomes harder to predict. An unrelated process can use up the commit budget. This can make PostgreSQL get an ENOMEM error, even if the database load is fine.

A Kernel Bug and 648 GB of Phantom Memory

We always favored strict overcommit for PostgreSQL. We used it in previous managed PostgreSQL services we built and also in Ubicloud PostgreSQL. However, after enabling it this time, we quickly ran into trouble. A few weeks after we turned on strict memory overcommit, we started to get failures on some of the databases. They showed out of memory errors, even though there was plenty of free physical memory on the machines. We disabled strict memory overcommit and started investigating.

Discovery

The first clue came from a routine check of /proc/meminfo on one of our servers with 8 GB memory:

$> cat /proc/meminfo | grep "Committed_AS"

Committed_AS: 683547672 kB

651 GB of committed memory on an 8 GB machine! For comparison, a healthy server of the same size showed:

$> cat /proc/meminfo | grep "Committed_AS"

Committed_AS: 2703940 kB

The counter was off by orders of magnitude.

Narrowing It Down

We first looked at ps output.

$> ps -C postgres -o pid,vsz,rss,cmd --sort=-vsz

PID   VSZ     RSS   CMD
96622 2242244 95416 postgres: 18/main: postgres postgres...
95721 2241668 94708 postgres: 18/main: postgres postgres...
96414 2241436 94892 postgres: 18/main: postgres postgres...
96619 2241076 93308 postgres: 18/main: postgres postgres...
96417 2240900 94300 postgres: 18/main: postgres postgres...
95728 2240736 93864 postgres: 18/main: postgres postgres...
96620 2240736 92852 postgres: 18/main: postgres postgres...
95727 2240428 93640 postgres: 18/main: postgres postgres...
96623 2239840 93164 postgres: 18/main: postgres postgres...

VSZ is the total virtual address space a process has mapped and RSS is the physical memory it's actually using. In the output above, each backend shows ~~2 GB of VSZ covering its entire mapped address space, but a much smaller RSS (~~95 MB) reflecting the memory it is actively using. On this 8 GB VM we configure 2 GB of shared_buffers, and if you think ~2 GB VSZ is suspiciously close to the shared_buffers size, you are right. Most of each backend's VSZ is actually the shared memory segment that holds shared_buffers. Every backend maps the same 2 GB region into its own address space, so it shows up in each backend's VSZ. With many backends, the VSZ numbers add up quickly.

That said, none of this should inflate Committed_AS. The shared memory segment appears in every backend's address space but physically exists only once, so it should be counted only once. On top of that, we run PostgreSQL with huge_pages = on, so shared_buffers is allocated from hugetlb. Hugetlb mappings have their own separate reservation accounting and are not supposed to count toward Committed_AS at all. Still, the 2 GB hugetlb region was by far the largest mapping in each backend, and hugetlb accounting is a special case in the kernel. That made it the most natural place to start looking, so our first hypothesis was that the kernel was somehow miscounting these mappings. For example, charging them once per process instead of ignoring them.

To verify, we checked the VMA (Virtual Memory Area) flags on the hugetlb mapping via /proc//smaps. Each VMA has a set of flags, and the ac flag (VM_ACCOUNT) indicates that the region counts toward committed memory:

$> sudo cat /proc/321784/smaps | grep -A 25 "hugepage"

7fce75000000-7fcef0c00000 rw-s 00000000 00:10 10723551  /anon_hugepage (deleted)
Size:            2027520 kB
Shared_Hugetlb:   393216 kB
Private_Hugetlb:       0 kB
...
...
VmFlags: rd wr sh mr mw me ms de ht sd

No ac flag. Huge tables were correctly excluded from committed memory accounting. The hypothesis is ruled out.

We then summed accountable memory (VMAs with the ac flag) across all processes on the machine:

$> sudo awk '/^Size/{size=$2} /VmFlags:/ && / ac/{sum+=size} END{printf "%.2f GB\n", sum/1048576}' /proc/[0-9]*/smaps

2.43 GB

2.43 GB accountable vs 651 GB reported; 648 GB of phantom committed memory. The vm_committed_as counter was leaking. We suspected that the memory was being charged on allocation but was never recredited. This made us consider a potential kernel bug in committed memory calculation.

Fleet-Wide Analysis

At that time, we had two different kernels being used on our fleet. We checked our entire fleet of PostgreSQL servers and compared the ratio of Committed_AS to MemTotal against kernel version and uptime:

Metric	Kernel 6.5.0	Kernel 6.8.0

Median Ratio

| 0.55 | 0.27 | |

Mean Ratio

| 24.97 | 0.32 | |

Max Ratio

| 3,405 | 1.86 | |

Servers with a ratio > 1.0

| 23% | < 1% |

Drag table left or right to see remaining content

We also ran a statistical analysis and found that a server running the 6.5 kernel was 52x more likely to have inflated committed memory.

On 6.5 servers, uptime was positively correlated with inflation. The leak grew at roughly 4.7% compound per week, proportional to uptime. On 6.8 servers, no correlation existed.

This analysis significantly strengthened our hypothesis that this was a kernel bug.

The One-Character Bug

To have definitive proof, we tasked an LLM to look into every commit between 6.5.0 and 6.8.0 to find possible bug fixes in committed memory calculations. It quickly found the following.

The bug was introduced in Linux 6.5 by commit 408579c. This commit changed the return convention of do_vmi_align_munmap():

Before: 0 = success, 1 = success with lock downgraded, negative = error
After: always 0 for success, negative = error

The commit updated callers throughout the mm subsystem. However, in mm/mremap.c, inside move_vma(), the error check was converted incorrectly:

BEFORE (correct): error handler runs on negative return (on error)

if (do_vmi_munmap(&vmi, mm, old_addr, old_len, uf_unmap, false) < 0) {
   /* OOM: unable to split vma, just get accounts right */
   if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP))
       vm_acct_memory(old_len >> PAGE_SHIFT);
}

AFTER (broken): error handler runs when return is 0 (on success)

if (!do_vmi_munmap(&vmi, mm, old_addr, old_len, uf_unmap, false)) {
   /* OOM: unable to split vma, just get accounts right */
   if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP))
       vm_acct_memory(old_len >> PAGE_SHIFT);
}

The change from < 0 to ! inverted the condition. To understand why this matters, consider what move_vma() does. It first decrements Committed_AS for the old region as part of the move, then calls do_vmi_munmap() to actually unmap it. If the unmap fails, the kernel needs to increment the counter back to keep accounting correct. After all, unmap has failed and the old region still exists. Its charge must be restored. With the inverted condition, this re-increment runs on every successful mremap instead of only on failure. The counter grew monotonically with every memory remap operation.

The bug was reported here and bisected here. Linus himself analyzed the root cause and fixed it with a one-line change, reverting the condition back to < 0:

- if (!do_vmi_munmap(&vmi, mm, old_addr, old_len, uf_unmap, false)) {

+ if (do_vmi_munmap(&vmi, mm, old_addr, old_len, uf_unmap, false) < 0) {

As Linus Torvalds wrote in the fix:

This didn't change any actual VM behavior _except_ for memory accounting when 'VM_ACCOUNT' was set on the vma. Which made the wrong return value test fairly subtle, since everything continues to work.

Or rather - it continues to work but the "Committed memory" accounting goes all wonky (Committed_AS value in /proc/meminfo), and depending on settings that then causes problems much much later as the VM relies on bogus statistics for its heuristics.

This is the kind of bug that hides in plain sight. Under heuristic overcommit (the default), Committed_AS is purely informational. The kernel doesn't use it to gate allocations. The bug only causes failures under non-default strict overcommit mode, so it went unnoticed. The failure is also indirect. The accounting drifts silently for weeks before Committed_AS finally crosses CommitLimit and allocations start failing.

Setting the Commit Limit

With the kernel bug behind us, we can gradually go back to enabling strict memory overcommit. This is a good point to explain our heuristic in deciding the commit limit in case you want to enable it for your workloads as well.

We use the formula:

overcommit_kbytes = total_memory_kb × 0.8 + 2 × 1048576

In plain terms: 80% of total physical memory plus 2 GB.

Why 80%

The 20% holdback covers memory used by kernel data structures not seen in userspace. This includes items like page tables, slab caches, network buffers, and the kernel's own allocations.

It is important to note that 20% is not wasted. The kernel still uses it for page cache (i.e. the kernel uses free physical memory to cache file I/O). This is the biggest consumer and directly benefits PostgreSQL read performance. Page cache doesn't count toward Committed_AS because it's reclaimable. The kernel can evict cached pages anytime a process actually needs the memory.

Why +2 GB

Every PostgreSQL server in our fleet runs several sidecar processes. Some examples are prometheus, node_exporter, postgres_exporter and wal-g. These are Go programs, and Go's runtime reserves large virtual memory regions upfront via mmap but only faults in pages as needed. Their committed memory contribution is far larger than their actual resident memory.

We surveyed the committed memory of these sidecar processes across our fleet:

Sidecar Committed Memory	Percentage of Servers
0.0 – 0.5 GB	~64%
0.5 – 1.0 GB	~32%
1.0 – 1.5 GB	~1%
1.5 – 2.0 GB	~1%
2.0 – 2.5 GB	~1%
2.5 – 3.0 GB	~1%
3.0 – 3.5 GB	~1%

Drag table left or right to see remaining content

96% of servers fall under 1 GB. We found a weak positive correlation between vCPU count and sidecar committed memory (r = 0.22). This is likely driven by Go's runtime scaling with available CPUs but it is not strong enough to justify proportional scaling.

The fixed 2 GB covers >99% of servers. It is deliberately generous. If this offset is too small, sidecars can silently consume the remaining commit budget, and PostgreSQL, not the sidecar, is the one that hits ENOMEM.

Implementation

If you are curious about how we implemented this, it is actually pretty straightforward. You can read the code in our GitHub repo here. I’m also adding the core part of it below for convenience.

def configure_memory_overcommit(strict: false)
  if strict
    total_mem_kb = File.read("/proc/meminfo").match(/MemTotal:\s+(\d+)/)[1].to_i
    # 25% of memory is reserved for hugepages, which do not count towards the
    # commit limit, so only the remaining 75% is available for overcommit.
    non_hugepage_mem_kb = total_mem_kb * 0.75
    overcommit_kbytes = (non_hugepage_mem_kb * 0.8 + 2 * 1048576).round
    safe_write_to_file("/etc/sysctl.d/99-overcommit.conf", "vm.overcommit_memory=2\nvm.overcommit_kbytes=#{overcommit_kbytes}\n")
  else
    r "sudo rm -f /etc/sysctl.d/99-overcommit.conf"
  end

r "sudo sysctl --system"
end

Note that we use vm.overcommit_kbytes instead of vm.overcommit_ratio. We need overcommit_kbytes because our formula includes a fixed 2 GB component that can't be expressed as a percentage. On a 4 GB server, the 2 GB buffer is 50% of the physical memory; on a 64 GB server, it's 3%. A single ratio can't capture both.

Conclusion

Strict memory overcommit is a small configuration change that provides a meaningful safety improvement for PostgreSQL. It converts catastrophic OOM kills into graceful allocation failures. This way, each backend can manage the issue without disrupting the whole system. Even though we had to disable it for a while due to a kernel bug, it remains a key configuration for healthy PostgreSQL deployments.

If you run PostgreSQL in production, we recommend enabling vm.overcommit_memory=2. However, it is important to configure this carefully. If CommitLimit is set too low, your application may experience frequent OOM errors. On the other hand, if it is set too high, you will not fully benefit from the protection that strict memory overcommit provides. Our recommendation is to monitor your memory usage over time and enable this setting only after you have a solid understanding of the memory characteristics of your workload.

Next up

Hacker Times