June 24th, 2024

Tracing garbage collection for arenas

Tracing garbage collection in systems programming languages like C++, Rust, and Ada is compared to reference counting. A simplified tracing garbage collection approach inspired by Mark-and-Sweep is proposed for languages like Zig or Odin.

Read original articleLink Icon
Tracing garbage collection for arenas

Tracing garbage collection for arenas is explored in the context of systems programming languages like C++, Rust, and Ada, which utilize reference counting as a form of garbage collection. The article discusses the advantages of reference counting over typical tracing garbage collection, highlighting its deterministic nature and compatibility with C and scope-based resource management. The author proposes a simplified form of tracing garbage collection, inspired by the basic Mark-and-Sweep algorithm, as seen in the Yices2 SMT solver. This approach involves using arenas or bump allocators to delay deallocation until convenient, similar to how a tracing GC operates. By introducing a collect function alongside free, data can be efficiently copied and compacted within the arena, improving memory locality. The article suggests implementing this strategy in languages like Zig or Odin, where allocators can be passed to functions, enabling manual garbage collection when needed. The proposed system aims to offer a lightweight and optional garbage collection mechanism with minimal overhead, potentially suitable for custom programming languages.

Related

My experience crafting an interpreter with Rust (2021)

My experience crafting an interpreter with Rust (2021)

Manuel Cerón details creating an interpreter with Rust, transitioning from Clojure. Leveraging Rust's safety features, he faced challenges with closures and classes, optimizing code for performance while balancing safety.

Eight million pixels and counting: improving texture atlas allocation in Firefox (2021)

Eight million pixels and counting: improving texture atlas allocation in Firefox (2021)

Improving texture atlas allocation in WebRender with the guillotiere crate reduces texture memory usage. The guillotine algorithm was replaced due to fragmentation issues, leading to a more efficient allocator. Visualizing the atlas in SVG aids debugging. Rust's simplicity and Cargo fuzz testing are praised for code development and robustness. Enhancements in draw call batching and texture upload aim to boost performance on low-end Intel GPUs by optimizing texture atlases.

Reference Counting with Linear Types

Reference Counting with Linear Types

The `reference-counting` library in the `ghengin` engine manages resources through `Alias` datatype, supporting alias creation, sharing, and deallocation. It's experimental, functional, and seeks feedback, used effectively in `ghengin`.

Memory Model: The Hard Bits

Memory Model: The Hard Bits

This chapter explores OCaml's memory model, emphasizing relaxed memory aspects, compiler optimizations, weakly consistent memory, and DRF-SC guarantee. It clarifies data races, memory classifications, and simplifies reasoning for programmers. Examples highlight data race scenarios and atomicity.

Optimizing the Roc parser/compiler with data-oriented design

Optimizing the Roc parser/compiler with data-oriented design

The blog post explores optimizing a parser/compiler with data-oriented design (DoD), comparing Array of Structs and Struct of Arrays for improved performance through memory efficiency and cache utilization. Restructuring data in the Roc compiler showcases enhanced efficiency and performance gains.

Link Icon 11 comments
By @celeritascelery - 4 months
Interesting write up. I am curious about the usage of this framework. One thing in reference counting’s favor is that it is hard to screw up. You just use the rc type and when it is no longer accessible it will magically get freed. That is how GC works in a high level language too, you almost never need to think about it.

Let’s look at the example from the docs:

    SWL_GCArena gc = swl_gc_arena_new(<starting-capacity>);
    int* foo = swl_gc_alloc(&gc, 5000, alignof(int), NULL);
    int* bar = swl_gc_alloc(&gc, 16000, alignof(int), NULL);
    int* baz = swl_gc_alloc(&gc, 3000, alignof(int), NULL);
    // foo and baz are roots
    SWL_GC_COLLECT(&gc, SWL_ROOT(foo), SWL_ROOT(baz));
    // bar is gone, poof
If you try to access bar at the end you get a use after free. I am pretty sure if you tried to access foo again it would trigger the same problem, because it is a pointer directly to int and not an indirection. What happens if you pass a non-arena pointer to SWL_ROOT? What happens if you have multiple arenas and you mix and match them?

There are many ways you could trivially shoot yourself in the foot in large program. It is a lot less “idiot-proof” than simple reference counting. The fact that you create each piece of memory and then have to manually pass it back to the root macro on each collection is very similar to the “malloc/free” dance we want to try and avoid.

The rooting issue is something I tried to tackle in my GC for rust[1]. Essentially making low level GC "idiot proof".

[1] https://coredumped.dev/2022/04/11/implementing-a-safe-garbag...

By @efitz - 4 months
Despite the title “tracing garbage collection for arenas”, the article is not about analysis of sanitation practices at sports facilities.
By @pjmlp - 4 months
> Interestingly Swift’s predecessor Objective-C, older versions of Nim, and early versions of Rust all supported tracing garbage collection! Yet they all abandoned it.

Yes, because it was a failure mixing a conservative GC with the underlying C semantics in Objective-C, so the applications would regularly randomly crash due to pointer misuse.

The alternative was to have the compiler take care of the Cocoa's retain / release message pattern, like VC++ can also do for COM AddRef/Release, while selling the pivot as a success, in good old Apple fashion.

Then Swift, naturally had to use a similar approach, the alternative would be a complex marshaling layer like the CCW/RCW that .NET uses to talk to COM, and was redone in .NET Core with COM Wrappers source generators.

Nowadays everyone not around for the party, always takes the wrong conclusion why Objective-C moved away from a tracing collector.

By @hayley-patton - 4 months
> In a custom language, this can be implemented very easily following the approach outlined in this paper: Accurate garbage collection in an uncooperative environment.

Cheney's algorithm and conservative roots don't mix too easily, as you cannot move conservative roots: an integer which looks like a pointer shouldn't be changed by moving. https://dl.acm.org/doi/10.1145/2660193.2660198 has an overview of algorithms which don't move conservative roots (including never moving, as in Boehm-Demers-Weiser).

By @zozbot234 - 4 months
Interesting discussion. If you want to experiment with 'lightweight' traced GC for systems programming, you might like https://github.com/chc4/samsara (for Rust) or https://github.com/pebal/sgcl (for C++) both implementing a concurrent tracing algorithm which is a bit more involved than what's directly discussed in OP.
By @moonchild - 4 months
greenspun's tenth rule:

> Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

By @ctxcode - 4 months
Interesting article. 16-24 bytes per allocation is pretty big and it sounds like it would a bit too much overhead, an actual demo would be nice.

My approach with Valk is to allocate per type size and in incremental sized blocks. These blocks work alot like tiny bump allocators and can be reset with 1 line of code without losing the objects in it that are still used.

If the overal memory usage increased a certain percent, we trace the stack, clean up the blocks from long existing objects and free blocks that are completely empty.

My benchmark performes 2x faster than rust, and 4x golang. We use 8 bytes per allocation.

By @moonchild - 4 months
i would do a variable-size header 4 or 8 bytes for most small objects (8 if you want to provide alignment, 4 if not or if you can squish your pointers). one bit identifies if the header is small or large; if small, next few bits give the size of the object in words, and the remainder are a bitmap telling which words are pointers. forwarding pointer stomps on the first word of the object
By @jkcxn - 4 months
How do you move memory and have all the pointers update to the new position? I didn't understand that part.
By @hencq - 4 months
In the Zig/Odin example, I would imagine that any pointers from other (non-GC) arenas into the default (GC) arena, would need to be added to the roots for this to work right?
By @forrestthewoods - 4 months
> Can we make “the simplest form” of tracing garbage collection work for systems programming?

No. 24 bytes per allocation and having to perform a copy and point fixup is too expensive.

The problem with GC is that it’s fine until it’s not. And when it becomes not fine you need to radically change your entire architecture. There is no incremental fix.

That said, Unreal Engine does in fact use a mark-and-sweep GC for certain types of objects. Which could be something for the author to look into.