July 6th, 2024

Properly Testing Concurrent Data Structures

The article explores testing concurrent data structures using the Rust library loom. It demonstrates creating property tests with managed threads to simulate concurrent behavior, emphasizing synchronization challenges and design considerations.

Read original articleLink Icon
Properly Testing Concurrent Data Structures

The article discusses testing concurrent data structures, focusing on a Rust library called loom for testing lock-free data structures. It explores testing a concurrent counter implementation for atomicity using property-based testing (PBT) and managed threads. The author demonstrates how to create a property test using managed threads that can pause and unpause, simulating concurrent behavior. The implementation involves coordinating between the main thread and managed threads using thread_local! and Arc, along with mutexes and condition variables for synchronization. The article delves into the challenges of pausing threads between atomic operations and the design considerations for creating a robust testing framework for concurrent data structures. The approach involves creating a shared context for managing thread states and implementing pause and unpause functionalities to control the flow of execution in concurrent scenarios. The article showcases the intricacies of Rust's API for condition variables and provides insights into handling synchronization in concurrent testing environments.

Link Icon 12 comments
By @reitzensteinm - 7 months
I've been building a library in Rust with a very similar approach called Temper [1], although in order to model the bizarre implications of the full Rust memory model you have to go quite a bit further, with bookkeeping tracking which writes each thread has perceived. Based on atomic memory ordering, read and write fences etc, perceiving write X may guarantee that write Y must be perceived.

It contains what I believe to be the largest set of test cases of the C++/Rust memory model, which is essentially everything I could find in books, the C++ standard, Stack Overflow, blogs, etc. E.g. this is the file for Rust Atomics and Locks by Mara Bos [2]

Loom [3], referenced in the article, is a similar and much more complete library, which enables you to exhaustively test higher level constructs like mutexes and queues. Although it doesn't model the memory model as exhaustively as Temper (I've been meaning to port my test cases over to it).

I was inspired by a Foundation DB testing talk by Will Wilson [4], who is now over at Antithesis [5], building a hypervisor based solution to perform this style of testing on arbitrary Docker containers.

I strongly believe we're going to see much more in this domain in the coming decade. WebAssembly is at the perfect intersection of being a platform that's complete enough to be able to compile arbitrary software to, but simple enough that building something like Antithesis isn't a half decade long project from an elite team that's already shipped a database.

[1] https://github.com/reitzensteinm/temper/tree/main [2] https://github.com/reitzensteinm/temper/blob/main/memlog/tes... [3] https://github.com/tokio-rs/loom [4] https://www.youtube.com/watch?v=4fFDFbi3toc [5] https://antithesis.com/

By @kaymanb - 7 months
Very cool! I implemented some shared-memory atomic snapshots in Rust [0] and also did my best to take automated testing very seriously. I started out using loom [1], the library mentioned in the article, but latter switched to shuttle [2].

Shuttle's approach is to be randomized, instead of exhaustive like loom. However, the scheduler does still give probabilistic guarantees about finding bugs. I found that shuttle was faster and scaled to more complicated test scenarios.

Similar to the article, shuttle also lets you save the random seed if a particular schedule causes the test suite to fail. Being able to quickly reproduce failing tests is really important and enables you write explicit test cases for bugs that were previously caught and fixed [3].

[0] https://github.com/kaymanb/todc/tree/main/todc-mem [1] https://github.com/tokio-rs/loom [2] https://github.com/awslabs/shuttle [3] https://github.com/kaymanb/todc/blob/0e2874a70ec8beed8fae773...

By @yafetn - 7 months
JetBrains’ Lincheck[0] is a good library in the Kotlin/Java world for this stuff. I especially like that it’s declarative, and also the way it outputs the linearizability results.

[0]: https://github.com/JetBrains/lincheck

By @idle_cycles - 7 months
This is great, is there a "loom" like library for C++? I have a set of lock-free data structures that I would like to test.
By @o11c - 7 months
If I understand this correctly, this approach has limitations regarding soft forward-progress guarantees.

Consider a cmpxchg loop where the body computation is not quite trivial, but on real hardware (and with real schedulers) is exceedingly unlikely to be interrupted on a given CPU. Thus, if `n` is the number of CPUs, it has a worst-case `1/n` chance of making progress. But with this testing approach, it instead has a `1/t^p` chance, where `t` is the number of tasks (which may be far larger than the number of CPUs) and `p` is the number of pauses (easily at least 3) within the not-quite-trivial loop body; this is sufficient to turn a working algorithm into a broken one.

OTOH, if you do want the tester to detect soft forward-progress as a bug (because you want hard forward-progress), this doesn't seem to provide useful tools either.

(Regardless this is certainly useful for a lot of concurrency problems).

By @ahepp - 7 months
> Well, if I am being honest, there is a bit of up-front knowledge here. I don’t think we can avoid spawning real threads here, unless we do something really cursed with inline assembly. When something calls that pause() function, and we want it to stay paused until further notice, that just has to happen in a thread which maintains a stack separate from the stack of our test.

Is there a reason we couldn't use some kind of async runtime? It seems like you're instrumenting atomic operations to achieve cooperative multitasking. Maybe I need to drink more coffee, but it seems simpler without threads.

By @thrwyexecbrain - 7 months
One downside of this approach is that the tested code itself has to be modified to accomodate the testing code.

I think the same could be achieved by launching two threads and single stepping them with ptrace to "randomly" interleave the execution of their instructions. Something like rr's chaos mode.

Some instructions may not be atomic though, so we would need a way to single step on "atomic microcodes" if that's even possible without emulation?

By @skybrian - 7 months
It looks like you need to use conditional compilation to use Loom, which probably works okay for testing one library, but is pretty intrusive:

    #[cfg(loom)]
    pub(crate) use loom::sync::atomic::AtomicUsize;

    #[cfg(not(loom))]
    pub(crate) use std::sync::atomic::AtomicUsize;
I wonder if there are any languages that do a better job of letting you use your own scheduler?
By @zokier - 7 months
I suppose if you want to be really thorough, you could run the tests with ptrace and single-step the threads to make different interleavings at instruction level. Has anyone seen that sort of thing done? Is there any alternatives for black-box testing, if you are not able to instrument the code like what is done here?
By @a_t48 - 6 months
This is really neat, I’ll have to try it out. It won’t catch all classes of errors though - won’t each call to pause() cause a synchronization between the threads and hide some data race issues? maybe this is a nonissue in Rust.
By @317070 - 7 months
I have been stuck on this exact question a few weeks ago. Does someone know how the same could be done in python? Could a thread class be made that allows this kind of testing? Could it be written in C with e.g. pybind?
By @clearprop - 7 months
I thought Rust was thread-safe and I don't see any "unsafe" blocks. What am I missing?