August 17th, 2024

Linux Memory Overcommit (2007)

Linux's memory overcommit behavior can cause application crashes due to delayed memory access. Adjusting `vm.overcommit_memory` and `vm.overcommit_ratio` settings can improve management and prevent unexpected terminations.

Read original article

The article discusses the Linux memory overcommit behavior, comparing it to how airlines sell more tickets than available seats, hoping some passengers won't show up. By default, Linux allows memory allocation requests to succeed even if the actual memory isn't available until it's used, which can lead to issues in low memory situations. When applications allocate memory but do not use it, they may encounter delays or crashes when they attempt to access that memory, triggering the Out Of Memory (OOM) Killer to terminate processes based on their OOM scores. The author shares test results from two different systems, highlighting that Linux's memory management can be problematic, especially in environments with large Java Virtual Machines (JVMs). To mitigate these issues, the author suggests adjusting the `vm.overcommit_memory` and `vm.overcommit_ratio` settings to enforce stricter memory management, allowing applications to fail gracefully at startup rather than crashing unexpectedly later. The article emphasizes the importance of understanding and configuring these settings to prevent random process terminations in memory-constrained environments.

- Linux's default memory overcommit behavior can lead to unexpected application crashes.

- The OOM Killer terminates processes based on their memory usage scores.

- Adjusting `vm.overcommit_memory` and `vm.overcommit_ratio` can improve memory management.

- Stricter memory management allows applications to fail gracefully at startup.

- Understanding memory allocation behavior is crucial for environments with large JVMs.

For the Love of God, Stop Using CPU Limits on Kubernetes

Using CPU limits on Kubernetes can lead to CPU throttling, causing more harm than good. Setting accurate CPU requests is crucial to avoid throttling. Memory management best practices are also discussed, along with a tool for resource recommendations.

Debugging an evil Go runtime bug: From heat guns to kernel compiler flags

Encountered crashes in node_exporter on laptop traced to single bad RAM bit. Importance of ECC RAM for server reliability emphasized. Bad RAM block marked, GRUB 2 feature used. Heating RAM tested for stress behavior.

Prometheus metrics saves us from painful kernel debugging

The Prometheus host metrics system detected increasing slab memory usage post Ubuntu 22.04 kernel upgrade. Identified AppArmor disablement as the cause, averted out-of-memory crashes by reverting changes. Monitoring setup crucial for swift issue resolution.

The challenges of working out how many CPUs your program can use on Linux

Determining CPU utilization on Linux poses challenges. Methods like /proc/cpuinfo, sched_getaffinity(), and cgroup limits are discussed. Programs may overlook CPU restrictions, causing performance issues. Recommendations include taskset(1) for efficient CPU management, crucial for system performance.

Phantom Menance: memory leak that wasn't there

The author's investigation into a perceived memory leak in a Rust application revealed it was a misunderstanding of misleading Grafana metrics, emphasizing the importance of accurate metric calculation in debugging.

7 comments

By @__turbobrew__ - 8 months

If you want to optimize for throughout one option is to overcommit resources (CPU/MEM/IO/NET) but utilize backpressure mechanisms to reduce load during times of saturation.

Kubernetes does this through node pressure eviction but it is pretty easy to hook into the pressure stall information and have the application handle this as well (for example, start returning 429 HTTP responses when PSI goes over a certain level).

At the end of the day the optimal overcommit is workload dependent — good metrics and an iterative approach is needed.

By @nabla9 - 8 months

As then as now, you can manage it.

vm.overcommit_memory == 0 heuristic overcommit

vm.overcommit_memory == 1 (full overcommit) allows allocating more memory than there is ram + swap.

vm.overcommit_memory == 2 never overcommit. There must be enough physical ram or virtual swap to allocate the memory.

By @pcwalton - 8 months

Reminder: if overcommit as a concept seems distasteful, the real ire should be directed at the Unix fork syscall, an API that will always fail on a process using over 50% of the available memory without overcommit. Ideally apps would use vfork or posix_spawn instead, but that isn't the world we live in. Overcommit is a sensible way to help apps that use a lot of memory "just work". You can always turn it off if you don't need apps using a lot of memory to be able to fork.

By @CAP_NET_ADMIN - 8 months

The truth is that if you're having issues with default overcommit configuration (namely overcommit_memory == 0 and overcommit_ratio == 50) your application probably sucks and you have to actually diagnose it and fix it.

"We run a pretty java-heavy environment, with multiple large JVMs configured per host. The problem is that the heap sizes have been getting larger, and we were running in an overcommitted situation and did not realize it. The JVMs would all start up and malloc() their large heaps, and then at some later time once enough of the heaps were actually used, the OOM killer would kick in and more or less randomly off one of our JVMs."

I know that 2007 may have been different times, but I'd argue that max heaps for all your JVMs running on a system probably shouldn't exceed around 88% of the total system memory. (percentage goes up as the total system memory goes up from 128GB -> 256GB -> 512GB)

By @sylware - 8 months

I have been a linux user/dev for a while now, and it seems I have the wrong idea of what is memory overcommit:

For me, for a modern large CPU implementation, it simplifies a lot the userland software stack to be able to book huge virtual address spaces in userland, which are populated on page faults. If really needed, I can "release" some memory ranges of this memory virtual address space (linux munmap has some options related to just that).

I do realize I am working more and more in a 'infinite memory model' (modulo some "released" ranges from time to time).

For instance, I would be more that happy to book a few 64GiB virtual address ranges on my small 8GiB workstation.

I am talking about anonymous memory, not file backed mmaping which is another story.

By @crest - 8 months

The problem with overcommitting beyond the sum of RAM and swap is that userspace processes can't protect themselves against the kernel's lies. You allocate memory, handle the result, access the address space and someone dies. The Linux default behaviour is an insane over-optimisation, but without it forking large processes becomes painfully expensive. The proper solution is to spawn directly through suitable system calls instead of the traditional fork() + execve() way.

Imo a good compromise to this day is to overcommit RAM with swap as fallback for correctness even if performance falls of a cliff if you use it. All of that is on top of resource limits and usage monitoring.

By @johnea - 8 months

Is any of this still applicable in 2024?

One has to doubt...

Linux Memory Overcommit (2007)

Related

For the Love of God, Stop Using CPU Limits on Kubernetes

Debugging an evil Go runtime bug: From heat guns to kernel compiler flags

Prometheus metrics saves us from painful kernel debugging

The challenges of working out how many CPUs your program can use on Linux

Phantom Menance: memory leak that wasn't there

Related

For the Love of God, Stop Using CPU Limits on Kubernetes

Debugging an evil Go runtime bug: From heat guns to kernel compiler flags

Prometheus metrics saves us from painful kernel debugging

The challenges of working out how many CPUs your program can use on Linux

Phantom Menance: memory leak that wasn't there