Malloc() and free() are a bad API (2022)
The post delves into malloc() and free() limitations in C, proposing a new interface with allocate(), deallocate(), and try_expand(). It discusses C++ improvements and emphasizes the significance of a robust API.
Read original articleThe blog post discusses the limitations and issues with the malloc() and free() functions in C for memory allocation. It highlights problems such as lack of custom alignment support, metadata storage inefficiencies, wasted space, and limitations of realloc(). The post proposes a new interface with functions like allocate(), deallocate(), and try_expand() to address these shortcomings. It also mentions improvements in C++ standards like aligned allocation and sized deallocation. The author concludes by emphasizing the importance of a good API that provides necessary information and returns useful data, noting that C++23 has made strides in addressing these issues. The post reflects on the evolution of memory allocation interfaces and the advancements made in modern languages like Rust.
Related
How much memory does a call to 'malloc' allocate?
The malloc function in C allocates memory on the heap. Allocating 1 byte incurs an 8-byte overhead. Memory alignment may result in 16-24 bytes. Avoid small allocations for efficiency; consider realloc for extensions.
How much memory does a call to 'malloc' allocates? – Daniel Lemire's blog
The malloc function in C allocates memory on the heap. Allocating 1 byte may result in 16-24 bytes due to overhead. Avoid small allocations and focus on broader concepts for efficient memory management.
Learning C++ Memory Model from a Distributed System's Perspective (2021)
The article explores C++ memory model in distributed systems, emphasizing std::memory_order for synchronization. It covers happens-before relationships, release-acquire ordering, and memory_order_seq_cst for total ordering and synchronization across threads.
Some Tricks from the Scrapscript Compiler
The Scrapscript compiler implements optimization tricks like immediate objects, small strings, and variants for better performance. It introduces immediate variants and const heap to enhance efficiency without complexity, seeking suggestions for future improvements.
Atomicless Per-Core Concurrency
The article explores atomicless concurrency for efficient allocator design, transitioning from per-thread to per-CPU structures on Linux. It details implementing CPU-local data structures using restartable sequences and rseq syscall, addressing challenges in Rust.
The cost of storing the size of every allocation is relatively high, at least some of the time, where it isn’t implied by the usage. Meanwhile the caching system for allocations can store it very efficiently, a block of 4KB of 8-byte allocations will contain over 500 allocations that can all share their metadata. Once they’re handed out by the allocator their shared origin is obscured, so they’d need individual tracking.
I do acknowledge that when size is inherent to the context (new or allocating for a specific struct) then maybe an allocator that doesn’t track size could allow for some clever optimisations, though I’m doubtful it could overcome the loss of shared metadata, which is so much more efficient.
But I concur that realloc is mostly pointless. For code that want to grow or shrink, I think it's much better for it to know the data block size. I think there's very little opportunity to happen to have free memory next to your allocation that can be "grown into". At least for slab like allocators, so the growing room is minimal.
It's a bit difficult to unify all APIs because data will be needlessly passed around, when in most cases you don't care. Aligned allocation may also need a slightly different implementation anyway.
realloc and calloc are warts in my book...
I remember one of the Window allocation functions doing this, but I believe eliminated that behavior as it lead to old applications that didn't handle it correctly crashing.
That is the danger with, say, adding a length value to free. Sometimes an off by one value will work fine, until someone tweaks how the allocator works.
The sorts of things the author wants are indeed valuable and important, but also belong at a higher level of abstraction. The malloc() subsystem would even be a reasonable base to implement that on top of.
func alloc(numBytes: usize) -> (ptr: void *, cookie: uword) | Error
func free(cookie: uword, numBytes: usize) -> void
where free() maybe also should take ptr, strictly for validation purposes.A design like this encourages segregation of allocator metadata and the allocated memory, though it is possible to achieve such a design with the classic C malloc/free API.
However, a design like this is even more helpful against use-after-free because cookies can be unique for the lifetime of a program, whereas pointers naturally get reused when a block of memory is reallocated. So the traditional API can never be fully resilient against UAF, whereas an API like this can.
The underlying observation here is that malloc/free couples two different things (access to memory and identifying a previously made allocation) in a way that creates an API which is far less able to mitigate misuse in a safe way. IMO, these functions should be separated in new designs.
There is now a variant of free which takes a size: free_sized, introduced in the 2023 draft.
There is aligned_alloc, evidently since C11.
The article should be called: ANSI C89 memory allocation sucks, and I'm forever upset.
Related
How much memory does a call to 'malloc' allocate?
The malloc function in C allocates memory on the heap. Allocating 1 byte incurs an 8-byte overhead. Memory alignment may result in 16-24 bytes. Avoid small allocations for efficiency; consider realloc for extensions.
How much memory does a call to 'malloc' allocates? – Daniel Lemire's blog
The malloc function in C allocates memory on the heap. Allocating 1 byte may result in 16-24 bytes due to overhead. Avoid small allocations and focus on broader concepts for efficient memory management.
Learning C++ Memory Model from a Distributed System's Perspective (2021)
The article explores C++ memory model in distributed systems, emphasizing std::memory_order for synchronization. It covers happens-before relationships, release-acquire ordering, and memory_order_seq_cst for total ordering and synchronization across threads.
Some Tricks from the Scrapscript Compiler
The Scrapscript compiler implements optimization tricks like immediate objects, small strings, and variants for better performance. It introduces immediate variants and const heap to enhance efficiency without complexity, seeking suggestions for future improvements.
Atomicless Per-Core Concurrency
The article explores atomicless concurrency for efficient allocator design, transitioning from per-thread to per-CPU structures on Linux. It details implementing CPU-local data structures using restartable sequences and rseq syscall, addressing challenges in Rust.