October 30th, 2024

Lessons learned from a successful Rust rewrite

The transition from C++ to Rust improved code performance and safety but revealed challenges like undefined behavior, memory management issues, and tooling limitations, highlighting the need for a stable ABI.

Read original articleLink Icon
Lessons learned from a successful Rust rewrite

the transition from C++ to Rust, the author reflects on the lessons learned during a successful rewrite of a codebase. The incremental approach allowed for the addition of new features without duplicating efforts, resulting in a simpler and more manageable codebase. The rewrite led to the removal of a significant amount of dead code and improved performance by optimizing data structures. Rust's built-in safety features reduced common issues like out-of-bounds accesses and arithmetic overflows, enhancing overall code correctness. However, challenges arose, particularly with undefined behavior due to the use of raw pointers and unsafe blocks, which can lead to runtime errors. Tools like Miri were helpful but not always applicable, necessitating the use of Valgrind for certain checks. Memory management issues persisted, especially when interfacing with C APIs, complicating resource cleanup. Cross-compilation and the use of tools like cbindgen presented additional hurdles, often requiring workarounds. The author emphasizes the need for a stable ABI in Rust to ease integration with C/C++ and highlights the limitations of Rust's memory allocation features. Overall, while the transition to Rust brought many benefits, it also revealed areas for improvement in tooling and language features.

- Incremental rewrites in Rust can simplify code and improve performance.

- Rust's safety features help prevent common programming errors found in C++.

- Undefined behavior and memory management issues remain challenges in Rust.

- Cross-compilation and tooling limitations can complicate the development process.

- A stable ABI in Rust would facilitate better integration with C/C++ libraries.

Link Icon 13 comments
By @steveklabnik - 3 months
Incidentally, the first code sample can work, you just need to use the new raw syntax, or addr_of_mut on older Rusts:

    fn main() {
        let mut x = 1;
        unsafe {
            let a = &raw mut x;
            let b = &raw mut x;
    
            *a = 2;
            *b = 3;
        }
    }
The issue is that the way that the code was before, you'd be creating a temporary &mut T to a location where a pointer already exists. This new syntax gives you a way to create a *mut T without the intermediate &mut T.

That said, this doesn't mean that the pain is invalid; unsafe Rust is tricky. But at least in this case, the fix isn't too bad.

By @ubj - 3 months
I've recently seen a lot of Rust rewrite projects that have talked about how much they've been required to use unsafe blocks. I'm currently in process of my first C++-to-Rust rewrite, and I haven't needed to reach for unsafe at all yet.

What kinds of projects or C++ features are requiring such high usage of unsafe? I'm not implying that this is bad or unnecessary--I'm genuinely curious as to what requires unsafe to be used so frequently. Since by all accounts unsafe Rust can be harder to use than C++, this may help inform me as to whether I attempt using Rust in future rewrites.

By @WhatIsDukkha - 3 months
This seems like a weird use of Rust.

There is no mention of how much of the codebase is even in safe Rust after all this work so no clear value to the migration?

Frequently when people get their code ported they then begin a process of reducing the unsafe surface area but not here.

The author seems to have little or no value on safe Rust? It doesn't seem evident from reading/skimming his 4 articles on the process.

Interesting mechanical bits to read for sure though so it' still a useful read more broadly.

It's unsurprising that the author would go use Zig next time since they didn't seem to have any value alignment with Rust's core safety guarantees.

By @mmastrac - 3 months
This post is subtly wrong: "multiple read-only pointers XOR one mutable pointer" is actually "multiple read-only references XOR one mutable reference".

It _is_ valid to have multiple mutable pointers, just as C and C++ allow. It's when you have multiple live, mutable references (including pointers created from live mutable references) that you end up in UB territory.

By @hyperman1 - 3 months
As someone who likes what Rust brings to the table, I am pleasantly surprised with the honesty of this review.

Interfacing with the C world, both as caller and calllee, happens a lot in real world code. All the C bugs come right back at that point.

By @pjmlp - 3 months
> Many, many hours of hair pulling would be avoided if Rust and C++ adopted, like C, a stable ABI.

What people mistakenly take for the C ABI, is in reality the OS ABI when written in C.

Two C binary libraries might fail to link, or reveal strange behaviours/crashes, when compiled with different C compilers, for anything that isn't clearly defined as part of the OS ABI.

By @showsomerespect - 3 months
The "arena allocator" hyperlink links to localhost:8000
By @kelnos - 3 months
I feel like some of the "what didn't go so well" sections were essentially because their rewrite was incomplete:

> I am still chasing Undefined Behavior. Doing an incremental rewrite from C/C++ to Rust, we had to use a lot of raw pointers and unsafe{} blocks. And even when segregating these to the entry point of the library, they proved to be a big pain in the neck.

These sound like an artifact of the rewrite itself, and I suspect many of these unsafe blocks can be rewritten safely now that there is no C++ code left.

> I am talking about code that Miri cannot run, period: [some code that calls OpenSSL (mbedtls?) directly]

This should be replaced by a safe OpenSSL (mbedtls?) wrapper, or if it wouldn't change the behavior of their library in incompatible ways, rustls.

> I am still chasing memory leaks. Our library offers a C API, something like this: [init()/release() C memory management pattern]

Not sure what this has to do with Rust, though. Yes, if you're going to test your library using the exposed C API interface, your tests may have memory leaks. And yes, if your users are expected to use the library using the C API, they will have to be just as careful about memory as they were before.

The benefit of this rewrite in Rust would be about them not misusing memory internally inside the library. If that benefit isn't useful enough, then they shouldn't have done this rewrite.

> Cross-compilation does not always work

I've certainly run into issues with cross-compilation with Rust, but it is always so much easier than with C/C++.

> Cbindgen does not always work. [...] Every time, I thought of dumping cbindgen and writing all of the C prototypes by hand. I think it would have been simpler in the end.

I'm skeptical of the idea that an automated tool is going to generate something that you'll want to use as your public API. I would probably use cbindgen to get a first draft of the API, modify and clean up the output, and use that as the first version, and then manually add/change things from there as the API changes.

I don't want to silently, accidentally change the API (or worse, ABI) of my library because a code generator changed behavior in a subtle way based on either me upgrading it, or me changing my code in a seemingly-innocuous way.

> Unstable ABI

This is a bummer, but consider that they are not exposing a Rust API to their customers: they're exposing a C API. Why would the expect to be able to expose Rust types through the API?

And they actually can do this: while it is correct that standard Rust types could have a different layout depending on what version of rustc is used to build it, that doesn't actually matter for a pre-built, distributed binary, as long as access to those types from the outside code (that is, through the C API) is done only through accessors/functions and never through direct struct member access. Sure, that requires some overhead, but I would argue that you should never expose struct/object internals in your public API anyway.

By @IshKebab - 3 months
Hmm yeah I'm not surprised that interfacing safe Rust with an existing unsafe C/C++ API is painful. That's really true in every language. (Although I haven't tried Zig tbf.)

I'm also not totally convinced that rewrite from scratch is always the wrong thing. For small projects the total work rewriting can be much less than dealing with this kind of FFI.

By @happyweasel - 3 months
The only real comparison would be a rewrite in modern c++ and then compare that to the rewrite in rust. Also the author mentioned that the original code had no tests at all. Well, good luck.
By @nesarkvechnep - 3 months
What do we expect from someone who says C/C++?
By @tharne - 3 months
That's a confusing title. I was under the impression that on Hacker news, every Rust rewrite is a successful rewrite.