Can't help but feel this is one of the subtle traps hidden beneath the advice that contexts aren't supposed to be stored. I know it's not always that easy, of course.
lostcancel: check cancel func returned by context.WithCancel is called
I'm not 100% sure why `go vet` didn't catch this issue, but storing the cancelFn in the struct is probably part of the reason. Any Go experts know if that's the case?This is a serious tripping point with Go. There's no way to express: "this is a root context that I _want_ to store and only use to create derived contexts". Goroutines are also a source of problems, you can't easily say "I'm passing the ownership of this context to a goroutine".
How I tracked down a tiny Go lifecycle bug in Kubernetes kubelet which leaked a context on every startPodSync.
A few weeks ago, I started getting alerts from a tiny Kubernetes cluster: a single-node Kubernetes test cluster, not running any of our production workload. I’d recently upgraded this cluster to v1.36, hosted on DigitalOcean’s managed DOKS service. My frugality paid unexpected dividends: this tiny (ahem, cheap!) 2 GiB RAM node had so much memory pressure that it quickly revealed a deeper issue in Kubernetes 1.36, which would have taken longer to show up if memory were abundant.
Investigating the alerts revealed that Pods were being restarted, but kubectl top pods didn’t show any unusually-large pods. The applications running on the node weren’t experiencing memory growth and were nowhere near their memory limits.
Peeling back the Kubernetes facade, I opened up a root shell on the node itself, and a short htop and M later, quickly discovered that the kubelet process itself had grown and was growing!
A quick systemctl restart kubelet on the node made the cluster happy again, but the underlying leak was still there, and would come back soon unless I determined the origin of the leak.
Kubernetes is written in Go, and kubelet is a core component of how Kubernetes works: it runs on every node, and is responsible for keeping that node’s containers in sync with the desired cluster state.
Go’s pprof package lets you capture a heap memory profile from a running process, which I saved to a file:
kubectl get --raw "/api/v1/nodes/${NODE}/proxy/debug/pprof/heap?debug=0" > "kubelet_pprof_heap.pb.gz"
After that, go tool pprof -top can be used to see what’s going on, both by total size and by object count.
By object count:
go tool pprof -top -sample_index=inuse_objects kubelet_pprof_heap.pb.gz
flat flat% sum% cum cum%
642456 45.52% 45.52% 918672 65.09% context.(\*cancelCtx).propagateCancel
380137 26.93% 72.45% 380195 26.94% context.withCancel (inline)
276216 19.57% 92.02% 276216 19.57% context.(\*cancelCtx).Done
10923 0.77% 92.80% 10923 0.77% container/list.(\*List).insertValue (inline)
10923 0.77% 93.57% 10923 0.77% container/list.New (inline)
10923 0.77% 94.34% 10923 0.77% golang.org/x/net/http2.(\*clientConnReadLoop).handleResponse
10923 0.77% 95.12% 10923 0.77% google.golang.org/protobuf/internal/impl.consumeStringValueValidateUTF8
10923 0.77% 95.89% 10923 0.77% k8s.io/api/core/v1.(\*VolumeMount).Unmarshal
10923 0.77% 96.67% 10923 0.77% os.(\*File).readdir
4681 0.33% 97.00% 16833 1.19% k8s.io/apimachinery/pkg/watch.(\*StreamWatcher).receive
19 0.0013% 97.00% 21865 1.55% k8s.io/utils/internal/third\_party/forked/golang/golang-lru.(\*Cache).Add
0 0% 97.00% 10923 0.77% container/list.(\*List).PushFront (inline)
0 0% 97.00% 380195 26.94% context.WithCancel
0 0% 97.00% 918614 65.08% context.WithDeadline (inline)
0 0% 97.00% 918614 65.08% context.WithDeadlineCause
0 0% 97.00% 918614 65.08% context.WithTimeout
... 0 0% 97.00% 918206 65.06% k8s.io/apimachinery/pkg/util/wait.PollUntilContextTimeout ... 0 0% 97.00% 918215 65.06% k8s.io/kubernetes/pkg/kubelet.(*Kubelet).SyncPod ... 0 0% 97.00% 22742 1.61% k8s.io/kubernetes/pkg/kubelet.(*Kubelet).syncLoopIteration 0 0% 97.00% 1309333 92.77% k8s.io/kubernetes/pkg/kubelet.(*podWorkers).UpdatePod.func1 0 0% 97.00% 1309333 92.77% k8s.io/kubernetes/pkg/kubelet.(*podWorkers).podWorkerLoop 0 0% 97.00% 929138 65.83% k8s.io/kubernetes/pkg/kubelet.(*podWorkers).podWorkerLoop.func1 (inline) 0 0% 97.00% 380195 26.94% k8s.io/kubernetes/pkg/kubelet.(*podWorkers).startPodSync ... 0 0% 97.00% 907283 64.28% k8s.io/kubernetes/pkg/kubelet/volumemanager.(*volumeManager).WaitForAttachAndMount
By total heap memory usage:
go tool pprof -top -sample_index=inuse_space kubelet_pprof_heap.pb.gz
flat flat% sum% cum cum%
86.33MB 51.28% 51.28% 115.83MB 68.80% context.(*cancelCtx).propagateCancel 29.50MB 17.52% 68.80% 29.50MB 17.52% context.(*cancelCtx).Done 29MB 17.23% 86.03% 30.54MB 18.14% context.withCancel (inline) 1.66MB 0.99% 87.02% 1.66MB 0.99% google.golang.org/grpc/mem.(*sizedBufferPool).Get 1.50MB 0.89% 87.91% 2.50MB 1.49% github.com/google/cadvisor/container/libcontainer.newContainerStats 1MB 0.59% 88.50% 1MB 0.59% reflect.unsafe_New 1MB 0.59% 89.10% 1MB 0.59% github.com/google/cadvisor/container/libcontainer.diskStatsCopy 1MB 0.59% 89.69% 1MB 0.59% k8s.io/apimachinery/pkg/util/sets.Set[go.shape.string].Insert (inline) 1MB 0.59% 90.28% 1MB 0.59% internal/bytealg.MakeNoZero ... 0 0% 91.48% 30.54MB 18.14% context.WithCancel 0 0% 91.48% 114.29MB 67.89% context.WithDeadline (inline) 0 0% 91.48% 114.29MB 67.89% context.WithDeadlineCause 0 0% 91.48% 114.29MB 67.89% context.WithTimeout ... 0 0% 91.48% 103.52MB 61.49% k8s.io/apimachinery/pkg/util/wait.PollUntilContextTimeout ... 0 0% 91.48% 104.04MB 61.80% k8s.io/kubernetes/pkg/kubelet.(*Kubelet).SyncPod ... 0 0% 91.48% 135.08MB 80.24% k8s.io/kubernetes/pkg/kubelet.(*podWorkers).UpdatePod.func1 0 0% 91.48% 135.08MB 80.24% k8s.io/kubernetes/pkg/kubelet.(*podWorkers).podWorkerLoop 0 0% 91.48% 104.54MB 62.10% k8s.io/kubernetes/pkg/kubelet.(*podWorkers).podWorkerLoop.func1 (inline) ... 0 0% 91.48% 103.02MB 61.19% k8s.io/kubernetes/pkg/kubelet/volumemanager.(*volumeManager).WaitForAttachAndMount
Despite never before peeking at kubelet’s implementation, I was immediately surprised to find almost a million contexts taking up the majority of its memory usage! That doesn’t sound right.
Jumping into unfamiliar new codebases and being able to ask questions is one of the more powerful superpowers of the recent generation of AI coding tools. While I was suspicious of the kubelet/volumemanager.(*volumeManager).WaitForAttachAndMount line (maybe exec-based readiness and liveness probes were acting up?), Codex immediately steered me to the correct issue: a change in Kubernetes 1.36 introduced on 2026-02-19 in which this code:
// initialize a context for the worker if one does not exist
if status.ctx \== nil || status.ctx.Err() \== context.Canceled {
status.ctx, status.cancelFn \= context.WithCancel(context.Background())
}
ctx \= status.ctx
was replaced by:
ctx, status.cancelFn \= context.WithCancel(parentCtx)
This runs on every single startPodSync, which is the core reconciliation loop for each Pod. On its own, the new line looks like it might be harmless: it creates a new cancelable context and stores the cancel function.
The problem is what happens on the second pass:
If status.cancelFn already points to the previous cancel function, this assignment overwrites it. If the old cancel function was never called (and it isn’t in the typical success case), the old child context remains attached to its parent. Go’s context docs explicitly say that calling the CancelFunc removes the parent’s reference to the child, and failing to call it leaks the child until the parent is canceled.
This is called for every single Pod reconciliation loop, and so over a few days, it grew to almost a million leaked contexts, and would have been more on a busier cluster!
I’d never committed to the Kubernetes project before, but my experience as a newcomer was that they’re running a great process that helped quickly triage the issue and support me in getting patches merged.
Adding another layer of complexity, my original patch attempt passed local tests, but failed on E2E integration tests that run in Kubernetes CI environments. These revealed another issue of how prober workers, which handle readiness and liveness probes, weren’t using contexts quite correctly either! In the interest of solving the memory leak, the team steered me to simplify just toward reverting the immediate issue, leaving broader context fixup for later.
I left a few “Be careful” comments in the code for the next person brave enough to attempt a deeper cleanup:
// Be careful not to leak contexts (see #139823).
// Be careful that long-lived goroutines (such as prober workers) outlive
// the lifetime of a single startPodSync cancellation context.
startPodSyncaccepted, important-soon, regressionlgtmapproved and merged into master branch (which feeds into v1.37)release-1.36 branch (to get this into a v1.36 patch release)The Kubernetes team was great to work with.
Heap memory profiling is a superpower.
Memory leaks look different than they used to.
Long-running production systems have a time dimension component that short-running tests often lack.
Sometimes the issue is not the application-level workload, but in all the infrastructure underneath it.
“Turn it off and back on again” remains undefeated. :)
kubelet memory usekubectl get --raw "/api/v1/nodes/${NODE}/proxy/metrics" | grep process_resident_memory_bytes
Before:
# HELP process_resident_memory_bytes Resident memory size in bytes.
process_resident_memory_bytes 1.021227008e+09
After a systemctl restart kubelet:
# HELP process_resident_memory_bytes Resident memory size in bytes.
process_resident_memory_bytes 1.15019776e+08
That’s a drop from 974 MiB to 110 MiB.