Crafting Interpreters with Rust: On Garbage Collection
Tung Le Vo discusses implementing a garbage collector for the Lox programming language using Rust, addressing memory leaks, the mark-and-sweep algorithm, and challenges posed by Rust's ownership model.
Read original articleTung Le Vo discusses his journey in implementing a garbage collector (GC) for a bytecode interpreter of the Lox programming language using Rust. Initially, he faced memory leaks due to reference counting, prompting him to revisit the project with a better understanding of Rust. His goal is to implement a mark-and-sweep GC comparable to C implementations, acknowledging the challenges posed by Rust's strict ownership and borrowing rules.
The article explains the concept of managed memory and the role of a GC in automatically handling heap memory allocation and deallocation. It outlines the mark-and-sweep algorithm, which consists of two phases: marking reachable objects and sweeping to free unmarked objects. Vo contrasts the semantics of Lox, which allows for shared mutable access to objects, with Rust's ownership model, which enforces strict borrowing rules to prevent data races.
He explores two approaches for implementing a GC in Lox without relying on unsafe Rust features. The first approach involves using Rust's reference counting types, Rc and Arc, to manage shared ownership of heap-allocated values. However, this method introduces challenges, such as the potential for reference cycles, which Lox allows. Vo also mentions the use of RefCell for enabling interior mutability, allowing mutable access to immutable references at runtime, albeit with the risk of runtime errors if borrowing rules are violated. Overall, the article highlights the complexities of integrating Lox's semantics with Rust's memory management paradigms.
Related
My experience crafting an interpreter with Rust (2021)
Manuel Cerón details creating an interpreter with Rust, transitioning from Clojure. Leveraging Rust's safety features, he faced challenges with closures and classes, optimizing code for performance while balancing safety.
Tracing garbage collection for arenas
Tracing garbage collection in systems programming languages like C++, Rust, and Ada is compared to reference counting. A simplified tracing garbage collection approach inspired by Mark-and-Sweep is proposed for languages like Zig or Odin.
Spending too much time optimizing for loops
Researcher Octave Larose shared insights on optimizing Rust interpreters, focusing on improving performance for the SOM language. By enhancing loop handling and addressing challenges, significant speedups were achieved, balancing code elegance with efficiency.
Spending too much time optimizing for loops
Researcher Octave Larose discussed optimizing Rust interpreters, focusing on improving performance for the SOM language. They highlighted enhancing loop efficiency through bytecode and primitives, addressing challenges like Rust limitations and complex designs. Despite performance gains, trade-offs between efficiency and code elegance persist.
Properly Testing Concurrent Data Structures
The article explores testing concurrent data structures using the Rust library loom. It demonstrates creating property tests with managed threads to simulate concurrent behavior, emphasizing synchronization challenges and design considerations.
- Several commenters share their own experiences implementing Lox or similar projects in Rust, highlighting common challenges faced, such as memory management and garbage collection.
- There is a strong appreciation for the "Crafting Interpreters" book, with many recommending it as a valuable resource for learning and implementing interpreters.
- Some comments emphasize the importance of proper citation and crediting original works, particularly regarding figures and ideas from other authors.
- Discussions around Rust's ownership model and the use of unsafe code are prevalent, with insights on how to navigate these complexities effectively.
- Links to additional resources, projects, and articles are shared, fostering a collaborative atmosphere among developers interested in similar topics.
Even with a citation, I'm not quite comfortable just reusing someone else's figures so many times when they're doing so much heavy lifting. But an explicit citation is the minimum.
There's at least one other Rust implementation of lox that I know of (https://github.com/tdp2110/crafting-interpreters-rs) (no unsafe)
It's always interesting to see how different people approach the problems in their own language or relative isolation. I agree with others here, the real value of the original work lies in avoiding copy and paste.
I also wrote a post about it: https://ceronman.com/2021/07/22/my-experience-crafting-an-in...
I ended up having two implementations: One in purely safe Rust and another one with unsafe.
Note that if you're interesting in the "object manager" approach mentioned, I did that in my safe implementation, you can take a look at https://github.com/ceronman/loxido
I'll +1 and say I highly recommend going through Crafting Interpreters, particularly as a way of building non-trivial programs in languages you are less familiar with - If you just follow the java/C example you are tempted to lean into copy/pasting the samples, but figuring things out in other languages is a great experience.
I spent a longass time tracking down an issue with how the reference interpreter implementation defines tokens: https://github.com/munificent/craftinginterpreters/issues/11... which was frustrating but good debugging practice.
My swing at the Crafting Intepreters in Rust can be found here: https://github.com/mwerezak/sphinx-lang
Although, at the time I was really interested in Lua's syntax so for extra credit I designed a completely different syntax for Lox closer to Lua's.
I'd love to revisit this project sometime and implement a register-based VM similar to Lua's.
I did write a blog post about some of my design thoughts, although it doesn't dig into the technical guts: https://samsartor.com/guis-3/
Haha this is gold. We ALL do this ALL the time.
Shameless plug: you may want to check out Loxcraft: https://github.com/ajeetdsouza/loxcraft
I too followed the path of "ignore safe Rust for maximum performance". It got pretty close to the C version, even beating it on some benchmarks.
Great journey. That's the thing about "unsafe" and "questionable parts"... if they really are questionable, then don't do 'em. In this case, if it is the case that holding a reference to a GC object guarantees that that object can't be freed, then it's not questionable. Proving that to be case can be fun tho.
The question is: does my unsafe code allow violating the borrow rules?
Cell, RefCell, RwLock etc give you interior mutability, but there is no way to violate the borrow rules with them. On the surface, it looks like you are violating them because "I have a non-mut thing that I can mutate".
If you build a thing that allows two distinct &mut T's to the same thing, then you are basically asking for a bug. Don't use unsafe to break the rules. Use unsafe to enforce the rules in new ways.
Link: https://app.codecrafters.io/courses/interpreter/overview
(Disclaimer: I work for CodeCrafters, and built this challenge).
Technically, not always. You can allocate on the stack at runtime, see "alloca" in C.
But is alloca even that useful in practice? I struggle to find a good example.
Related
My experience crafting an interpreter with Rust (2021)
Manuel Cerón details creating an interpreter with Rust, transitioning from Clojure. Leveraging Rust's safety features, he faced challenges with closures and classes, optimizing code for performance while balancing safety.
Tracing garbage collection for arenas
Tracing garbage collection in systems programming languages like C++, Rust, and Ada is compared to reference counting. A simplified tracing garbage collection approach inspired by Mark-and-Sweep is proposed for languages like Zig or Odin.
Spending too much time optimizing for loops
Researcher Octave Larose shared insights on optimizing Rust interpreters, focusing on improving performance for the SOM language. By enhancing loop handling and addressing challenges, significant speedups were achieved, balancing code elegance with efficiency.
Spending too much time optimizing for loops
Researcher Octave Larose discussed optimizing Rust interpreters, focusing on improving performance for the SOM language. They highlighted enhancing loop efficiency through bytecode and primitives, addressing challenges like Rust limitations and complex designs. Despite performance gains, trade-offs between efficiency and code elegance persist.
Properly Testing Concurrent Data Structures
The article explores testing concurrent data structures using the Rust library loom. It demonstrates creating property tests with managed threads to simulate concurrent behavior, emphasizing synchronization challenges and design considerations.