July 12th, 2024

Atomicless Per-Core Concurrency

The article explores atomicless concurrency for efficient allocator design, transitioning from per-thread to per-CPU structures on Linux. It details implementing CPU-local data structures using restartable sequences and rseq syscall, addressing challenges in Rust.

Read original articleLink Icon
Atomicless Per-Core Concurrency

The article discusses the concept of atomicless concurrency in building allocators to serve multiple threads efficiently. It explains the shift from per-thread caching to per-CPU data structures, reducing contention and avoiding atomic operations in the fast path. The post delves into implementing CPU-local data structures on modern Linux using restartable sequences and the rseq syscall. It details the process of enabling rseqs for threads, creating critical sections, and handling thread-local variables to ensure proper execution and cleanup. The article also covers the challenges of initializing critical sections in Rust due to limitations in referencing labels in inline assembly. Overall, it provides insights into optimizing concurrency mechanisms for performance-critical applications on Linux systems.

Link Icon 2 comments
By @jiehong - 4 months
The article seems to use “CPU” as a “CPU core”. This isn’t about multisocket systems.

HN title is more accurate!

By @tithos - 4 months
Your mini map is amazing. Im stealing it.