Tracing garbage collection for arenas
Tracing garbage collection in systems programming languages like C++, Rust, and Ada is compared to reference counting. A simplified tracing garbage collection approach inspired by Mark-and-Sweep is proposed for languages like Zig or Odin.
Read original articleTracing garbage collection for arenas is explored in the context of systems programming languages like C++, Rust, and Ada, which utilize reference counting as a form of garbage collection. The article discusses the advantages of reference counting over typical tracing garbage collection, highlighting its deterministic nature and compatibility with C and scope-based resource management. The author proposes a simplified form of tracing garbage collection, inspired by the basic Mark-and-Sweep algorithm, as seen in the Yices2 SMT solver. This approach involves using arenas or bump allocators to delay deallocation until convenient, similar to how a tracing GC operates. By introducing a collect function alongside free, data can be efficiently copied and compacted within the arena, improving memory locality. The article suggests implementing this strategy in languages like Zig or Odin, where allocators can be passed to functions, enabling manual garbage collection when needed. The proposed system aims to offer a lightweight and optional garbage collection mechanism with minimal overhead, potentially suitable for custom programming languages.
Related
My experience crafting an interpreter with Rust (2021)
Manuel Cerón details creating an interpreter with Rust, transitioning from Clojure. Leveraging Rust's safety features, he faced challenges with closures and classes, optimizing code for performance while balancing safety.
Eight million pixels and counting: improving texture atlas allocation in Firefox (2021)
Improving texture atlas allocation in WebRender with the guillotiere crate reduces texture memory usage. The guillotine algorithm was replaced due to fragmentation issues, leading to a more efficient allocator. Visualizing the atlas in SVG aids debugging. Rust's simplicity and Cargo fuzz testing are praised for code development and robustness. Enhancements in draw call batching and texture upload aim to boost performance on low-end Intel GPUs by optimizing texture atlases.
Reference Counting with Linear Types
The `reference-counting` library in the `ghengin` engine manages resources through `Alias` datatype, supporting alias creation, sharing, and deallocation. It's experimental, functional, and seeks feedback, used effectively in `ghengin`.
Memory Model: The Hard Bits
This chapter explores OCaml's memory model, emphasizing relaxed memory aspects, compiler optimizations, weakly consistent memory, and DRF-SC guarantee. It clarifies data races, memory classifications, and simplifies reasoning for programmers. Examples highlight data race scenarios and atomicity.
Optimizing the Roc parser/compiler with data-oriented design
The blog post explores optimizing a parser/compiler with data-oriented design (DoD), comparing Array of Structs and Struct of Arrays for improved performance through memory efficiency and cache utilization. Restructuring data in the Roc compiler showcases enhanced efficiency and performance gains.
Let’s look at the example from the docs:
SWL_GCArena gc = swl_gc_arena_new(<starting-capacity>);
int* foo = swl_gc_alloc(&gc, 5000, alignof(int), NULL);
int* bar = swl_gc_alloc(&gc, 16000, alignof(int), NULL);
int* baz = swl_gc_alloc(&gc, 3000, alignof(int), NULL);
// foo and baz are roots
SWL_GC_COLLECT(&gc, SWL_ROOT(foo), SWL_ROOT(baz));
// bar is gone, poof
If you try to access bar at the end you get a use after free. I am pretty sure if you tried to access foo again it would trigger the same problem, because it is a pointer directly to int and not an indirection. What happens if you pass a non-arena pointer to SWL_ROOT? What happens if you have multiple arenas and you mix and match them?There are many ways you could trivially shoot yourself in the foot in large program. It is a lot less “idiot-proof” than simple reference counting. The fact that you create each piece of memory and then have to manually pass it back to the root macro on each collection is very similar to the “malloc/free” dance we want to try and avoid.
The rooting issue is something I tried to tackle in my GC for rust[1]. Essentially making low level GC "idiot proof".
[1] https://coredumped.dev/2022/04/11/implementing-a-safe-garbag...
Yes, because it was a failure mixing a conservative GC with the underlying C semantics in Objective-C, so the applications would regularly randomly crash due to pointer misuse.
The alternative was to have the compiler take care of the Cocoa's retain / release message pattern, like VC++ can also do for COM AddRef/Release, while selling the pivot as a success, in good old Apple fashion.
Then Swift, naturally had to use a similar approach, the alternative would be a complex marshaling layer like the CCW/RCW that .NET uses to talk to COM, and was redone in .NET Core with COM Wrappers source generators.
Nowadays everyone not around for the party, always takes the wrong conclusion why Objective-C moved away from a tracing collector.
Cheney's algorithm and conservative roots don't mix too easily, as you cannot move conservative roots: an integer which looks like a pointer shouldn't be changed by moving. https://dl.acm.org/doi/10.1145/2660193.2660198 has an overview of algorithms which don't move conservative roots (including never moving, as in Boehm-Demers-Weiser).
> Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.
My approach with Valk is to allocate per type size and in incremental sized blocks. These blocks work alot like tiny bump allocators and can be reset with 1 line of code without losing the objects in it that are still used.
If the overal memory usage increased a certain percent, we trace the stack, clean up the blocks from long existing objects and free blocks that are completely empty.
My benchmark performes 2x faster than rust, and 4x golang. We use 8 bytes per allocation.
No. 24 bytes per allocation and having to perform a copy and point fixup is too expensive.
The problem with GC is that it’s fine until it’s not. And when it becomes not fine you need to radically change your entire architecture. There is no incremental fix.
That said, Unreal Engine does in fact use a mark-and-sweep GC for certain types of objects. Which could be something for the author to look into.
Related
My experience crafting an interpreter with Rust (2021)
Manuel Cerón details creating an interpreter with Rust, transitioning from Clojure. Leveraging Rust's safety features, he faced challenges with closures and classes, optimizing code for performance while balancing safety.
Eight million pixels and counting: improving texture atlas allocation in Firefox (2021)
Improving texture atlas allocation in WebRender with the guillotiere crate reduces texture memory usage. The guillotine algorithm was replaced due to fragmentation issues, leading to a more efficient allocator. Visualizing the atlas in SVG aids debugging. Rust's simplicity and Cargo fuzz testing are praised for code development and robustness. Enhancements in draw call batching and texture upload aim to boost performance on low-end Intel GPUs by optimizing texture atlases.
Reference Counting with Linear Types
The `reference-counting` library in the `ghengin` engine manages resources through `Alias` datatype, supporting alias creation, sharing, and deallocation. It's experimental, functional, and seeks feedback, used effectively in `ghengin`.
Memory Model: The Hard Bits
This chapter explores OCaml's memory model, emphasizing relaxed memory aspects, compiler optimizations, weakly consistent memory, and DRF-SC guarantee. It clarifies data races, memory classifications, and simplifies reasoning for programmers. Examples highlight data race scenarios and atomicity.
Optimizing the Roc parser/compiler with data-oriented design
The blog post explores optimizing a parser/compiler with data-oriented design (DoD), comparing Array of Structs and Struct of Arrays for improved performance through memory efficiency and cache utilization. Restructuring data in the Roc compiler showcases enhanced efficiency and performance gains.