August 21st, 2024

Mimalloc Cigarette: Losing one week of my life catching a memory leak (Rust)

The article details a memory leak issue in a pricing engine using mimalloc, revealing that its internal bookkeeping caused memory retention. Restructuring to a single-threaded approach improved memory management.

Read original articleLink Icon
Mimalloc Cigarette: Losing one week of my life catching a memory leak (Rust)

The article discusses the challenges faced while debugging a memory leak in a RAM-bound pricing engine application that utilizes the mimalloc memory allocator. The author describes the technical complexities involved in managing hotel data and the unexpected out-of-memory (OOM) errors that arose despite the dataset fitting comfortably in memory. The investigation revealed that the choice of memory allocator significantly impacted the program's memory characteristics. While mimalloc is designed for performance, it led to excessive memory allocation during data refresh operations. The author spent considerable time analyzing the code, suspecting issues with the Rust implementation rather than the allocator itself. Ultimately, the problem was traced back to mimalloc's internal bookkeeping, which failed to release memory when threads went to sleep. The solution involved restructuring the program to keep all refreshing operations on a single thread, thereby ensuring proper memory management. The experience was both frustrating and enlightening, highlighting the importance of understanding allocator behavior in performance-sensitive applications.

- The author faced a memory leak issue in a pricing engine application using mimalloc.

- The choice of memory allocator significantly affected the application's memory usage.

- Debugging revealed that mimalloc's bookkeeping could lead to memory not being released when threads were inactive.

- The solution involved restructuring the code to manage memory refresh operations on a single thread.

- The experience underscored the complexities of memory management in performance-critical applications.

Link Icon 16 comments
By @hinkley - 6 months
We had learned helplessness on a drag and drop bug in jquery UI. I had like three hours every second or third Friday and would just step through the code trying to find the bug. That code was so sketchy the jquery team was trying to rewrite it from scratch one component at a time, and wouldn’t entertain any bug discussions on the old code even though they were a year behind already.

After almost six months, I finally found a spot where I could monkey patch a function to wrap it with a short circuit if the coordinates were out of bounds. Not only fixed the bug but made drag and drop several times faster. Couldn’t share this with the world because they weren’t accepting PRs against the old widgets.

I’ve worked harder on bug fixes, but I think that’s the longest I’ve worked on one.

By @kibwen - 6 months
Level 1 systems programmer: "wow, it feels so nice having control over my memory and getting out from under the thumb of a garbage collector"

Level 2 systems programmer: "oh no, my memory allocator is a garbage collector"

By @Arnavion - 6 months
jemalloc also has its own funny problem with threads - if you have a multi-threaded application that uses jemalloc on all threads except the main thread, then the cleanup that jemalloc runs on main thread exit will segfault. In $dayjob we use jemalloc as a sub-allocator in specific arenas. (*) The application itself is fine in production because it allocates from the main thread too, but the unit test framework only runs tests in spawned threads and the main thread of the test binary just orchestrates them. So the test binary triggers this segfault reliably.

( https://github.com/jemalloc/jemalloc/issues/1317 Unlike what the title says, it's not Windows-specific.)

(*): The application uses libc malloc normally, but at some places it allocates pages using `mmap(non_anonymous_tempfile)` and then uses jemalloc to partition them. jemalloc has a feature called "extent hooks" where you can customize how jemalloc gets underlying pages for its allocations, which we use to give it pages via such mmap's. Then the higher layers of the code that just want to allocate don't have to care whether those allocations came from libc malloc or mmap-backed disk file.

By @CraigJPerry - 6 months
Tangent: what’s the ideal data structure for this problem?

If there were 20million rooms in the world with a price for each day of the year, we’d be looking at around 7billion prices per year. That’d be say 4Tb of storage without indexes.

The problem space seems to have a bunch of options to partition - by locality, by date etc.

I’m curious if there’s a commonly understood match for this problem?

FWIW with that dataset size, my first experiments would be with SQL server because that data will fit in ram. I don’t know if that’s where I’d end up - but I’m pretty sure it’s where I’d start my performance testing grappling with this problem.

By @loeg - 6 months
Sort of tl;dr: mimalloc doesn't actually free memory in a way that it can be reused on threads other than the one that allocated it; the free call marks regions for eventual delayed reclaim by the original thread. If the original thread calls malloc again, those regions are collected (1/N malloc calls). Or (C) you can explicitly invoke mi_collect[1] in the allocating thread (the Rust crate does not seem to expose this API).

[1]: https://github.com/microsoft/mimalloc/blob/dev/src/heap.c#L1...

By @rurban - 6 months
The Annotated C++ Reference Manual:

“C programmers think memory management is too important to be left to the computer. LISP programmers think memory management is too important to be left to the user.”

By @IceTDrinker - 6 months
PSA: do not use floating point for monetary amounts
By @zokier - 6 months
I wonder if there is something that could be done on language design level to have better "sympathy" to memory allocation, i.e. built upon having mmap/munmap as primitives instead of malloc/free; where language patterns are built around allocating pages instead of arbitrarily sized objects. Probably not practical for general high-level languages, but for e.g. embedded or high-performance stuff might make sense?
By @PaulDavisThe1st - 6 months
A perfect demonstration of how many of harder problems we face writing (especially non-browser-based) software are in fact not addressed by language changes.

The concept of memory that is allocated by a thread and can only be deallocated by that thread is useful and valid, but as TFA demonstrates, can also cause problems if you're not careful with your overall architecture. If the language you're using even allows you to use this concept, it almost certainly will not protect you from having to get the architecture corect.

By @znpy - 6 months
> Allocators have different characteristics for a reason - they do some things differently between each other. What do you think mimalloc does that could account for this behavior?

Interestingly, it would seem that Java programmers play with garbage collectors while Rust programmers play with memory allocators.

By @malkia - 6 months
By @Exuma - 6 months
I really love the design of this blog
By @bsder - 6 months
Welcome to systems programming. Allocators are invisible--until they aren't.
By @om8 - 6 months
TLDR: use shitty allocators, win shitty memory leaks