How GCC and Clang handle statically known undefined behaviour
Discussion on compilers handling statically known undefined behavior (UB) in C code reveals insights into optimizations. Compilers like gcc and clang optimize based on undefined language semantics, potentially crashing programs or ignoring problematic code. UB avoidance is crucial for program predictability and security. Compilers differ in handling UB, with gcc and clang showing variations in crash behavior and warnings. LLVM's 'poison' values allow optimizations despite UB, reflecting diverse compiler approaches. Compiler responses to UB are subjective, influenced by developers and user requirements.
Read original articleThe discussion on how compilers handle statically known undefined behavior (UB) in C code reveals insights into compiler optimizations and behaviors. Compilers like gcc and clang make assumptions based on undefined language semantics to optimize programs. When faced with UB, compilers may take different approaches, such as crashing the program or ignoring problematic code. The presence of UB can lead to unpredictable program behavior and security vulnerabilities, emphasizing the importance of avoiding UB. Compilers may choose to optimize away code exhibiting UB if it is not used, highlighting the role of dead code elimination. The handling of UB varies between compilers like gcc and clang, with differences in crash behavior and warning generation. The use of 'poison' values in LLVM enables optimizations even with UB, showcasing different compiler philosophies. Ultimately, the choice between crashing or continuing compilation in the face of UB is subjective and depends on compiler developers' preferences and user needs.
Related
My experience crafting an interpreter with Rust (2021)
Manuel Cerón details creating an interpreter with Rust, transitioning from Clojure. Leveraging Rust's safety features, he faced challenges with closures and classes, optimizing code for performance while balancing safety.
Own Constant Folder in C/C++
Neil Henning discusses precision issues in clang when using the sqrtps intrinsic with -ffast-math, suggesting inline assembly for instruction selection. He introduces a workaround using __builtin_constant_p for constant folding optimization, enhancing code efficiency.
Memory Model: The Hard Bits
This chapter explores OCaml's memory model, emphasizing relaxed memory aspects, compiler optimizations, weakly consistent memory, and DRF-SC guarantee. It clarifies data races, memory classifications, and simplifies reasoning for programmers. Examples highlight data race scenarios and atomicity.
Optimizing the Roc parser/compiler with data-oriented design
The blog post explores optimizing a parser/compiler with data-oriented design (DoD), comparing Array of Structs and Struct of Arrays for improved performance through memory efficiency and cache utilization. Restructuring data in the Roc compiler showcases enhanced efficiency and performance gains.
Getting 100% code coverage doesn't eliminate bugs
Achieving 100% code coverage doesn't ensure bug-free software. A blog post illustrates this with a critical bug missed despite full coverage, leading to a rocket explosion. It suggests alternative approaches and a 20% coverage minimum.
GCC leaves the print there because it must. While undefined behaviour famously can time travel, that’s only if it would actually have occurred in the first place. If the print blocks indefinitely then that division will never execute, and GCC must compile a binary that behaves correctly in that case.
Are there any reasons why that is so? Do compilers not reuse the information they gather during compilation for diagnostics? Or is it a deliberate decision?
Serious logic error, surely. "Can never happen" does not follow.
this is subtile misleading unspecified and undefined behavior are not quite the same
UB means it's (simplified) specified that the compiler is allowed to do whatever it wants (or more concretely in most cases is allowed to assume that a specific thing is impossible when optimizing to a point where of it does happen pretty much anything can happen including things like it seeming that an int is two different values at the same time or (which might still be one of the more harmless wtfs which can happen))
int i = 0;
// nullptr dereference
int ub = *(int*)i;
It is implementation defined I think, not UB. 0 could be a valid addressDevelopers must avoid UB in their code, at all costs. Wondering what a specific compiler will do with your UB code is useless: as soon as you realise you have UB, go and fix it!
I really don't understand why the C standards body can't just define what should be the intended failure behavior in case of specific UB cases, and then the compiler developers to onboard this spec. There should be no impact to backwards compat because
a. this will be a new C version b. No program ever should be defined around the UB behavior
But UB still exist in 2024 and will likely do till I am too old to whine on the internet about it.
Related
My experience crafting an interpreter with Rust (2021)
Manuel Cerón details creating an interpreter with Rust, transitioning from Clojure. Leveraging Rust's safety features, he faced challenges with closures and classes, optimizing code for performance while balancing safety.
Own Constant Folder in C/C++
Neil Henning discusses precision issues in clang when using the sqrtps intrinsic with -ffast-math, suggesting inline assembly for instruction selection. He introduces a workaround using __builtin_constant_p for constant folding optimization, enhancing code efficiency.
Memory Model: The Hard Bits
This chapter explores OCaml's memory model, emphasizing relaxed memory aspects, compiler optimizations, weakly consistent memory, and DRF-SC guarantee. It clarifies data races, memory classifications, and simplifies reasoning for programmers. Examples highlight data race scenarios and atomicity.
Optimizing the Roc parser/compiler with data-oriented design
The blog post explores optimizing a parser/compiler with data-oriented design (DoD), comparing Array of Structs and Struct of Arrays for improved performance through memory efficiency and cache utilization. Restructuring data in the Roc compiler showcases enhanced efficiency and performance gains.
Getting 100% code coverage doesn't eliminate bugs
Achieving 100% code coverage doesn't ensure bug-free software. A blog post illustrates this with a critical bug missed despite full coverage, leading to a rocket explosion. It suggests alternative approaches and a 20% coverage minimum.