July 14th, 2024

Malloc() and free() are a bad API (2022)

The post delves into malloc() and free() limitations in C, proposing a new interface with allocate(), deallocate(), and try_expand(). It discusses C++ improvements and emphasizes the significance of a robust API.

Read original article

Malloc() and free() are a bad API (2022)

The blog post discusses the limitations and issues with the malloc() and free() functions in C for memory allocation. It highlights problems such as lack of custom alignment support, metadata storage inefficiencies, wasted space, and limitations of realloc(). The post proposes a new interface with functions like allocate(), deallocate(), and try_expand() to address these shortcomings. It also mentions improvements in C++ standards like aligned allocation and sized deallocation. The author concludes by emphasizing the importance of a good API that provides necessary information and returns useful data, noting that C++23 has made strides in addressing these issues. The post reflects on the evolution of memory allocation interfaces and the advancements made in modern languages like Rust.

How much memory does a call to 'malloc' allocate?

The malloc function in C allocates memory on the heap. Allocating 1 byte incurs an 8-byte overhead. Memory alignment may result in 16-24 bytes. Avoid small allocations for efficiency; consider realloc for extensions.

How much memory does a call to 'malloc' allocates? – Daniel Lemire's blog

The malloc function in C allocates memory on the heap. Allocating 1 byte may result in 16-24 bytes due to overhead. Avoid small allocations and focus on broader concepts for efficient memory management.

Learning C++ Memory Model from a Distributed System's Perspective (2021)

The article explores C++ memory model in distributed systems, emphasizing std::memory_order for synchronization. It covers happens-before relationships, release-acquire ordering, and memory_order_seq_cst for total ordering and synchronization across threads.

Some Tricks from the Scrapscript Compiler

The Scrapscript compiler implements optimization tricks like immediate objects, small strings, and variants for better performance. It introduces immediate variants and const heap to enhance efficiency without complexity, seeking suggestions for future improvements.

Atomicless Per-Core Concurrency

The article explores atomicless concurrency for efficient allocator design, transitioning from per-thread to per-CPU structures on Linux. It details implementing CPU-local data structures using restartable sequences and rseq syscall, addressing challenges in Rust.

10 comments

By @scratcheee - 10 months

I agree with most of this, but I’m not sure about tracking the size metadata becoming a required task for the caller.

The cost of storing the size of every allocation is relatively high, at least some of the time, where it isn’t implied by the usage. Meanwhile the caching system for allocations can store it very efficiently, a block of 4KB of 8-byte allocations will contain over 500 allocations that can all share their metadata. Once they’re handed out by the allocator their shared origin is obscured, so they’d need individual tracking.

I do acknowledge that when size is inherent to the context (new or allocating for a specific struct) then maybe an allocator that doesn’t track size could allow for some clever optimisations, though I’m doubtful it could overcome the loss of shared metadata, which is so much more efficient.

By @hyperhello - 10 months

The point is that they’re simple and direct. Replacing them with a big data structure and modern syntax is not as good as what people have done for decades: allocate a pool of blocks to your exact liking.

By @feffe - 10 months

I don't see the point of passing the size to a "free" function. I don't see how it could be used to speed up de-allocation. Additionally most usage would probably not want to keep the size around.

But I concur that realloc is mostly pointless. For code that want to grow or shrink, I think it's much better for it to know the data block size. I think there's very little opportunity to happen to have free memory next to your allocation that can be "grown into". At least for slab like allocators, so the growing room is minimal.

It's a bit difficult to unify all APIs because data will be needlessly passed around, when in most cases you don't care. Aligned allocation may also need a slightly different implementation anyway.

realloc and calloc are warts in my book...

By @nitwit005 - 10 months

> it would be great if std::malloc() could return how big the allocated memory block actually is, so we can leverage any extra space we might have gotten “for free”.

I remember one of the Window allocation functions doing this, but I believe eliminated that behavior as it lead to old applications that didn't handle it correctly crashing.

That is the danger with, say, adding a length value to free. Sometimes an off by one value will work fine, until someone tweaks how the allocator works.

By @JohnFen - 10 months

I don't actually agree that it's a bad api, although it certainly has shortcomings. It's a low-level api to a library that is intended to be as slim as possible.

The sorts of things the author wants are indeed valuable and important, but also belong at a higher level of abstraction. The malloc() subsystem would even be a reasonable base to implement that on top of.

By @hlandau - 10 months

My own view on this is that a hardened allocator API should separate the functions of an allocation identifier/cookie and the actual pointer to the allocated memory:

    func alloc(numBytes: usize) -> (ptr: void *, cookie: uword) | Error
    func free(cookie: uword, numBytes: usize) -> void

where free() maybe also should take ptr, strictly for validation purposes.

A design like this encourages segregation of allocator metadata and the allocated memory, though it is possible to achieve such a design with the classic C malloc/free API.

However, a design like this is even more helpful against use-after-free because cookies can be unique for the lifetime of a program, whereas pointers naturally get reused when a block of memory is reallocated. So the traditional API can never be fully resilient against UAF, whereas an API like this can.

The underlying observation here is that malloc/free couples two different things (access to memory and identifying a previously made allocation) in a way that creates an API which is far less able to mitigate misuse in a safe way. IMO, these functions should be separated in new designs.

By @seeknotfind - 10 months

If you want page aligned memory for SIMD, you shouldn't be using malloc, you should be using mmap. Though other smaller alignments can be useful.

By @nuc1e0n - 10 months

Yes, they are. If only something like alloca was more workable to manage allocation lifetimes, along with compiler support.

By @alberth - 10 months

Doesn't Zig address/fix all of this?

https://ziglang.org/documentation/master/#Memory

By @kazinator - 10 months

This person is not up to date in following the ISO C standard.

There is now a variant of free which takes a size: free_sized, introduced in the 2023 draft.

There is aligned_alloc, evidently since C11.

The article should be called: ANSI C89 memory allocation sucks, and I'm forever upset.

Malloc() and free() are a bad API (2022)

Related

How much memory does a call to 'malloc' allocate?

How much memory does a call to 'malloc' allocates? – Daniel Lemire's blog

Learning C++ Memory Model from a Distributed System's Perspective (2021)

Some Tricks from the Scrapscript Compiler

Atomicless Per-Core Concurrency

Related

How much memory does a call to 'malloc' allocate?

How much memory does a call to 'malloc' allocates? – Daniel Lemire's blog

Learning C++ Memory Model from a Distributed System's Perspective (2021)

Some Tricks from the Scrapscript Compiler

Atomicless Per-Core Concurrency