Debugging a rustc segfault on Illumos
The author debugged a segmentation fault in the Rust compiler on illumos while compiling `cranelift-codegen`, using various tools and collaborative sessions to analyze the issue within the parser.
Read original articleThe article discusses the author's experience debugging a segmentation fault in the Rust compiler while working on the illumos operating system, specifically within the context of the Helios distribution used at Oxide. The author encountered a consistent crash (SIGSEGV) when attempting to compile the `cranelift-codegen` library. To address the issue, the author utilized various debugging tools available in illumos, including the Modular Debugger (mdb) to analyze core dumps generated during the crash. The debugging session was collaborative, involving colleagues during a virtual meetup. The author explained the bootstrapping process of the Rust compiler, which is self-hosting and requires careful management of compiler versions. The investigation revealed that the crash occurred within the Rust compiler's parser, specifically during a recursive descent parsing operation. The author highlighted the importance of examining CPU registers and the call stack to understand the state of the program at the time of the crash. The article serves as a guide for technologists interested in systems programming and debugging, providing insights into the tools and methodologies used in the process.
- The author debugged a segmentation fault in the Rust compiler on illumos.
- The crash was consistent and occurred while compiling `cranelift-codegen`.
- Collaborative debugging sessions were held with colleagues to address the issue.
- The Rust compiler's bootstrapping process was explained, emphasizing its self-hosting nature.
- The crash was traced to the Rust parser, highlighting the challenges of recursive descent parsing.
Related
My experience crafting an interpreter with Rust (2021)
Manuel Cerón details creating an interpreter with Rust, transitioning from Clojure. Leveraging Rust's safety features, he faced challenges with closures and classes, optimizing code for performance while balancing safety.
Mix-testing: revealing a new class of compiler bugs
A new "mix testing" approach uncovers compiler bugs by compiling test fragments with different compilers. Examples show issues in x86 and Arm architectures, emphasizing the importance of maintaining instruction ordering. Luke Geeson developed a tool to explore compiler combinations, identifying bugs and highlighting the need for clearer guidelines.
Rust for Filesystems
At the 2024 Linux Summit, Wedson Almeida Filho and Kent Overstreet explored Rust for Linux filesystems. Rust's safety features offer benefits for kernel development, despite concerns about compatibility and adoption challenges.
How to Compile Your Language – Guide to implement a modern compiler for language
This guide introduces programming language design and modern compiler implementation, emphasizing language purpose, syntax familiarity, and compiler components, while focusing on frontend development using LLVM, with source code available on GitHub.
Crafting Interpreters with Rust: On Garbage Collection
Tung Le Vo discusses implementing a garbage collector for the Lox programming language using Rust, addressing memory leaks, the mark-and-sweep algorithm, and challenges posed by Rust's ownership model.
- Many readers found the article informative and well-structured, praising the author's ability to explain complex topics.
- There is a consensus on the importance of understanding discrepancies in debugging, with some comments highlighting the challenges of post-mortem analysis.
- Concerns were raised about the default behavior of core dumps on Unix systems, suggesting a need for better security practices.
- Several commenters reminisced about past experiences with debugging, noting it as a "lost art" in modern development.
- Technical discussions included comparisons between different operating systems' handling of stack management and debugging tools.
> [the core dump is supposed to be in the CWD, and named core, but isn't; what gives?]
Followed by,
$ find / -name core -type f
Is a sort of hilarious brute force solution. But it demonstrates a particular kind of problem, where /-- requires -- evidence
| ^
v |
answer -- requires -/
These are pesky. The brute force search is a good idea, in that it breaks that cycle of almost needing to know the answer in order to discover it. (Unless you can surmise that the CWD is the crate dir, but let's assume that we don't want to depend on having such a moment of sheet "eureka!".)> But there are also other bits of evidence that this theory doesn’t explain, or even cuts against. (This is what makes post-mortem debugging exciting! There are often contradictory-seeming pieces of information that need to be explained.)
I wish more people appreciated this; too many people are apt to sweet such discrepancies under the rug. This post does a good job on not just following through on them, but also showing how figuring some of them out ("why is our stack weird?") leads to the key insights: "oh we're using stacker and … $the_bug".
I do wonder how the author managed to notice that line in a 1.5k line stack trace, though. The "abrupt" address change would have probably gone unnoticed by me. (The only saving grace being a.) it's close to the bottom b.) most of the rest is repetitive, an artifact of a recursive descent parser recursing, and if we just consider that repetition "one chunk", it gets a lot smaller. I still dunno if I'd've seen it, though.)
Definitely a fun read. Debugging crashes has, in the last decade or so, become something a bit like a "lost art". Noone looks at coredumps in the cloud ...
I don't want to outdo you on Solaris debugging (plenty of old-time Solaris folks at Oxide who are totally capable to show how to get things like open files and their contents from a coredump, or how to configure the system to include those should it not be there ... etc ... etc ... Solaris has the best coredumps for all that's worth ...).
A note on the fix side of things though, while adding pthread_get_attr_np() for stack location/size gives Solaris the Linux interface, it already has its own for those - pthread_attr_getstack{size,addt}(), see https://docs.oracle.com/cd/E19455-01/806-5257/6je9h032l/inde... - I happen to remember this because I used this decades ago somewhere in the Solaris name lookup code to choose at runtime between using alloca() and malloc() ... don't ask. Those were different times.
One minor meta point if the author is (still) around: there is something strange with the styling of the hexadecimal literals in the code. Instead of having the prefix "0x", they look like "0×" even though they seem to be normal x:es in the source.
Edit: Firefox 128.0.3 on Linux, btw.
Sounds like a horrible default. That's a security risk (working directory might be readable by untrusted users), and pollutes a random directory with a file that could cause problems for other applications processing files in that directory.
A fixed location inside the user's home directory feels like a much better choice to me.
... mdb sure has a full-on "oldskool" CLI. I don't think that's a good thing, from a perspective of tool accessibility to developers...
This isn't some passive aggressive gotcha, I'm actually curious what people prefer about the Solaris distros nowadays. I know Zones and ZFS are cool, but FreeBSD supports Jails and ZFS out of the box, but maybe there are cool features I'm not aware of.
Related
My experience crafting an interpreter with Rust (2021)
Manuel Cerón details creating an interpreter with Rust, transitioning from Clojure. Leveraging Rust's safety features, he faced challenges with closures and classes, optimizing code for performance while balancing safety.
Mix-testing: revealing a new class of compiler bugs
A new "mix testing" approach uncovers compiler bugs by compiling test fragments with different compilers. Examples show issues in x86 and Arm architectures, emphasizing the importance of maintaining instruction ordering. Luke Geeson developed a tool to explore compiler combinations, identifying bugs and highlighting the need for clearer guidelines.
Rust for Filesystems
At the 2024 Linux Summit, Wedson Almeida Filho and Kent Overstreet explored Rust for Linux filesystems. Rust's safety features offer benefits for kernel development, despite concerns about compatibility and adoption challenges.
How to Compile Your Language – Guide to implement a modern compiler for language
This guide introduces programming language design and modern compiler implementation, emphasizing language purpose, syntax familiarity, and compiler components, while focusing on frontend development using LLVM, with source code available on GitHub.
Crafting Interpreters with Rust: On Garbage Collection
Tung Le Vo discusses implementing a garbage collector for the Lox programming language using Rust, addressing memory leaks, the mark-and-sweep algorithm, and challenges posed by Rust's ownership model.