Mimalloc Cigarette: Losing one week of my life catching a memory leak (Rust)
The article details a memory leak issue in a pricing engine using mimalloc, revealing that its internal bookkeeping caused memory retention. Restructuring to a single-threaded approach improved memory management.
Read original articleThe article discusses the challenges faced while debugging a memory leak in a RAM-bound pricing engine application that utilizes the mimalloc memory allocator. The author describes the technical complexities involved in managing hotel data and the unexpected out-of-memory (OOM) errors that arose despite the dataset fitting comfortably in memory. The investigation revealed that the choice of memory allocator significantly impacted the program's memory characteristics. While mimalloc is designed for performance, it led to excessive memory allocation during data refresh operations. The author spent considerable time analyzing the code, suspecting issues with the Rust implementation rather than the allocator itself. Ultimately, the problem was traced back to mimalloc's internal bookkeeping, which failed to release memory when threads went to sleep. The solution involved restructuring the program to keep all refreshing operations on a single thread, thereby ensuring proper memory management. The experience was both frustrating and enlightening, highlighting the importance of understanding allocator behavior in performance-sensitive applications.
- The author faced a memory leak issue in a pricing engine application using mimalloc.
- The choice of memory allocator significantly affected the application's memory usage.
- Debugging revealed that mimalloc's bookkeeping could lead to memory not being released when threads were inactive.
- The solution involved restructuring the code to manage memory refresh operations on a single thread.
- The experience underscored the complexities of memory management in performance-critical applications.
Related
Malloc() and free() are a bad API (2022)
The post delves into malloc() and free() limitations in C, proposing a new interface with allocate(), deallocate(), and try_expand(). It discusses C++ improvements and emphasizes the significance of a robust API.
Debugging an evil Go runtime bug: From heat guns to kernel compiler flags
Encountered crashes in node_exporter on laptop traced to single bad RAM bit. Importance of ECC RAM for server reliability emphasized. Bad RAM block marked, GRUB 2 feature used. Heating RAM tested for stress behavior.
The Process That Kept Dying: A memory leak murder mystery (node)
An investigation into a recurring 502 Bad Gateway error on a crowdfunding site revealed a memory leak caused by Moment.js. Updating the library resolved the issue, highlighting debugging challenges.
Phantom Menance: memory leak that wasn't there
The author's investigation into a perceived memory leak in a Rust application revealed it was a misunderstanding of misleading Grafana metrics, emphasizing the importance of accurate metric calculation in debugging.
Linux Memory Overcommit (2007)
Linux's memory overcommit behavior can cause application crashes due to delayed memory access. Adjusting `vm.overcommit_memory` and `vm.overcommit_ratio` settings can improve management and prevent unexpected terminations.
After almost six months, I finally found a spot where I could monkey patch a function to wrap it with a short circuit if the coordinates were out of bounds. Not only fixed the bug but made drag and drop several times faster. Couldn’t share this with the world because they weren’t accepting PRs against the old widgets.
I’ve worked harder on bug fixes, but I think that’s the longest I’ve worked on one.
Level 2 systems programmer: "oh no, my memory allocator is a garbage collector"
( https://github.com/jemalloc/jemalloc/issues/1317 Unlike what the title says, it's not Windows-specific.)
(*): The application uses libc malloc normally, but at some places it allocates pages using `mmap(non_anonymous_tempfile)` and then uses jemalloc to partition them. jemalloc has a feature called "extent hooks" where you can customize how jemalloc gets underlying pages for its allocations, which we use to give it pages via such mmap's. Then the higher layers of the code that just want to allocate don't have to care whether those allocations came from libc malloc or mmap-backed disk file.
If there were 20million rooms in the world with a price for each day of the year, we’d be looking at around 7billion prices per year. That’d be say 4Tb of storage without indexes.
The problem space seems to have a bunch of options to partition - by locality, by date etc.
I’m curious if there’s a commonly understood match for this problem?
FWIW with that dataset size, my first experiments would be with SQL server because that data will fit in ram. I don’t know if that’s where I’d end up - but I’m pretty sure it’s where I’d start my performance testing grappling with this problem.
[1]: https://github.com/microsoft/mimalloc/blob/dev/src/heap.c#L1...
“C programmers think memory management is too important to be left to the computer. LISP programmers think memory management is too important to be left to the user.”
The concept of memory that is allocated by a thread and can only be deallocated by that thread is useful and valid, but as TFA demonstrates, can also cause problems if you're not careful with your overall architecture. If the language you're using even allows you to use this concept, it almost certainly will not protect you from having to get the architecture corect.
Interestingly, it would seem that Java programmers play with garbage collectors while Rust programmers play with memory allocators.
Related
Malloc() and free() are a bad API (2022)
The post delves into malloc() and free() limitations in C, proposing a new interface with allocate(), deallocate(), and try_expand(). It discusses C++ improvements and emphasizes the significance of a robust API.
Debugging an evil Go runtime bug: From heat guns to kernel compiler flags
Encountered crashes in node_exporter on laptop traced to single bad RAM bit. Importance of ECC RAM for server reliability emphasized. Bad RAM block marked, GRUB 2 feature used. Heating RAM tested for stress behavior.
The Process That Kept Dying: A memory leak murder mystery (node)
An investigation into a recurring 502 Bad Gateway error on a crowdfunding site revealed a memory leak caused by Moment.js. Updating the library resolved the issue, highlighting debugging challenges.
Phantom Menance: memory leak that wasn't there
The author's investigation into a perceived memory leak in a Rust application revealed it was a misunderstanding of misleading Grafana metrics, emphasizing the importance of accurate metric calculation in debugging.
Linux Memory Overcommit (2007)
Linux's memory overcommit behavior can cause application crashes due to delayed memory access. Adjusting `vm.overcommit_memory` and `vm.overcommit_ratio` settings can improve management and prevent unexpected terminations.