July 30th, 2024

Crafting Interpreters with Rust: On Garbage Collection

Tung Le Vo discusses implementing a garbage collector for the Lox programming language using Rust, addressing memory leaks, the mark-and-sweep algorithm, and challenges posed by Rust's ownership model.

Read original articleLink Icon
AppreciationCuriosityFrustration
Crafting Interpreters with Rust: On Garbage Collection

Tung Le Vo discusses his journey in implementing a garbage collector (GC) for a bytecode interpreter of the Lox programming language using Rust. Initially, he faced memory leaks due to reference counting, prompting him to revisit the project with a better understanding of Rust. His goal is to implement a mark-and-sweep GC comparable to C implementations, acknowledging the challenges posed by Rust's strict ownership and borrowing rules.

The article explains the concept of managed memory and the role of a GC in automatically handling heap memory allocation and deallocation. It outlines the mark-and-sweep algorithm, which consists of two phases: marking reachable objects and sweeping to free unmarked objects. Vo contrasts the semantics of Lox, which allows for shared mutable access to objects, with Rust's ownership model, which enforces strict borrowing rules to prevent data races.

He explores two approaches for implementing a GC in Lox without relying on unsafe Rust features. The first approach involves using Rust's reference counting types, Rc and Arc, to manage shared ownership of heap-allocated values. However, this method introduces challenges, such as the potential for reference cycles, which Lox allows. Vo also mentions the use of RefCell for enabling interior mutability, allowing mutable access to immutable references at runtime, albeit with the risk of runtime errors if borrowing rules are violated. Overall, the article highlights the complexities of integrating Lox's semantics with Rust's memory management paradigms.

Related

My experience crafting an interpreter with Rust (2021)

My experience crafting an interpreter with Rust (2021)

Manuel Cerón details creating an interpreter with Rust, transitioning from Clojure. Leveraging Rust's safety features, he faced challenges with closures and classes, optimizing code for performance while balancing safety.

Tracing garbage collection for arenas

Tracing garbage collection for arenas

Tracing garbage collection in systems programming languages like C++, Rust, and Ada is compared to reference counting. A simplified tracing garbage collection approach inspired by Mark-and-Sweep is proposed for languages like Zig or Odin.

Spending too much time optimizing for loops

Spending too much time optimizing for loops

Researcher Octave Larose shared insights on optimizing Rust interpreters, focusing on improving performance for the SOM language. By enhancing loop handling and addressing challenges, significant speedups were achieved, balancing code elegance with efficiency.

Spending too much time optimizing for loops

Spending too much time optimizing for loops

Researcher Octave Larose discussed optimizing Rust interpreters, focusing on improving performance for the SOM language. They highlighted enhancing loop efficiency through bytecode and primitives, addressing challenges like Rust limitations and complex designs. Despite performance gains, trade-offs between efficiency and code elegance persist.

Properly Testing Concurrent Data Structures

Properly Testing Concurrent Data Structures

The article explores testing concurrent data structures using the Rust library loom. It demonstrates creating property tests with managed threads to simulate concurrent behavior, emphasizing synchronization challenges and design considerations.

AI: What people are saying
The comments on Tung Le Vo's article about implementing a garbage collector for Lox in Rust reflect a variety of perspectives and experiences related to the topic.
  • Several commenters share their own experiences implementing Lox or similar projects in Rust, highlighting common challenges faced, such as memory management and garbage collection.
  • There is a strong appreciation for the "Crafting Interpreters" book, with many recommending it as a valuable resource for learning and implementing interpreters.
  • Some comments emphasize the importance of proper citation and crediting original works, particularly regarding figures and ideas from other authors.
  • Discussions around Rust's ownership model and the use of unsafe code are prevalent, with insights on how to navigate these complexities effectively.
  • Links to additional resources, projects, and articles are shared, fostering a collaborative atmosphere among developers interested in similar topics.
Link Icon 13 comments
By @scott_s - 4 months
In case the author reads this: please explicitly cite all of Nystrom's figures. A link is not enough.

Even with a citation, I'm not quite comfortable just reusing someone else's figures so many times when they're doing so much heavy lifting. But an explicit citation is the minimum.

By @jlewallen - 4 months
Crafting Interpreters is such an amazing work.

There's at least one other Rust implementation of lox that I know of (https://github.com/tdp2110/crafting-interpreters-rs) (no unsafe)

It's always interesting to see how different people approach the problems in their own language or relative isolation. I agree with others here, the real value of the original work lies in avoiding copy and paste.

By @ceronman - 4 months
Nice article. A couple of years ago I also implemented Lox in Rust. And I faced the exact same issues that the author describes here, and I also ended up with a very similar implementation.

I also wrote a post about it: https://ceronman.com/2021/07/22/my-experience-crafting-an-in...

I ended up having two implementations: One in purely safe Rust and another one with unsafe.

Note that if you're interesting in the "object manager" approach mentioned, I did that in my safe implementation, you can take a look at https://github.com/ceronman/loxido

By @jcdavis - 4 months
Fun writeup. When I went through and implement the book in rust (https://github.com/jcdavis/rulox), I just used Rc and never really solved the cycle issue.

I'll +1 and say I highly recommend going through Crafting Interpreters, particularly as a way of building non-trivial programs in languages you are less familiar with - If you just follow the java/C example you are tempted to lean into copy/pasting the samples, but figuring things out in other languages is a great experience.

I spent a longass time tracking down an issue with how the reference interpreter implementation defines tokens: https://github.com/munificent/craftinginterpreters/issues/11... which was frustrating but good debugging practice.

By @nestorD - 4 months
Note that there is quite a bit of thinking and available implementations of GC implementations available in Rust: https://manishearth.github.io/blog/2021/04/05/a-tour-of-safe...
By @harpiaharpyja - 4 months
I did something very similar to this myself. My GC implementation was pretty heavily inspired by manishearth's Rust GC.

My swing at the Crafting Intepreters in Rust can be found here: https://github.com/mwerezak/sphinx-lang

Although, at the time I was really interested in Lua's syntax so for extra credit I designed a completely different syntax for Lox closer to Lua's.

I'd love to revisit this project sometime and implement a register-based VM similar to Lua's.

By @samsartor - 4 months
I have run into a lot of similar problems writing a state management framework for Rust (wip at https://gitlab.com/samsartor/hornpipe) and at one point despaired that I would have to abandon Rust and make a whole new reactive language. But the last couple years I've got it working really nicely with weak references and software transactional memory. Every reference is `Copy + Send + Sync + 'static`. And you can mutate objects by having a transaction make a local bitwise copy, which will get atomically merged and swapped on commit. The old copy gets kept around for undo/redo purposes. I've still got a boatload of algorithmic puzzles to solve to provide all the MVP features, which will take a while because it isn't my full-time job. But the details seem technically sound.

I did write a blog post about some of my design thoughts, although it doesn't dig into the technical guts: https://samsartor.com/guis-3/

By @kaspermarstal - 4 months
> Don’t ask me about covariance since I honestly don’t know the answer. I recommend reading the section on subtyping and variance and The Rustonomicon. I went through them a few times but still don’t entirely get all the details. Then, we can define each type of object like so:

Haha this is gold. We ALL do this ALL the time.

By @ajeetdsouza - 4 months
This is really well-written!

Shameless plug: you may want to check out Loxcraft: https://github.com/ajeetdsouza/loxcraft

I too followed the path of "ignore safe Rust for maximum performance". It got pretty close to the C version, even beating it on some benchmarks.

By @lowbloodsugar - 4 months
TL;DR: Don't use unsafe to break the rules. Use unsafe to enforce the rules in new ways.

Great journey. That's the thing about "unsafe" and "questionable parts"... if they really are questionable, then don't do 'em. In this case, if it is the case that holding a reference to a GC object guarantees that that object can't be freed, then it's not questionable. Proving that to be case can be fun tho.

The question is: does my unsafe code allow violating the borrow rules?

Cell, RefCell, RwLock etc give you interior mutability, but there is no way to violate the borrow rules with them. On the surface, it looks like you are violating them because "I have a non-mut thing that I can mutate".

If you build a thing that allows two distinct &mut T's to the same thing, then you are basically asking for a bug. Don't use unsafe to break the rules. Use unsafe to enforce the rules in new ways.

By @chc4 - 4 months
The other major issue with the Deref implementation is that `&mut` needs to be an exclusive reference, and if you're doing mark/sweep of GC objects via references you break that invariant if you hold any `&mut` across a GC call and are causing UB. In practice this probably doesn't affect your program, but I suspect Miri would yell at you and there's the off chance the compiler gets really tricky with inlining somewhere and you are inflicted with horrible pain.
By @Daffodils - 4 months
Just wanted to share that the Build your own Interpreters challenge on CodeCrafters is based on the Crafting Interpreters book and is free at the moment.

Link: https://app.codecrafters.io/courses/interpreter/overview

(Disclaimer: I work for CodeCrafters, and built this challenge).

By @bombela - 4 months
> How much memory is used for the stack and when it is freed can always be determined at compile-time.

Technically, not always. You can allocate on the stack at runtime, see "alloca" in C.

But is alloca even that useful in practice? I struggle to find a good example.