Clang vs. Clang
The blog post critiques compiler optimizations in Clang, arguing they often introduce bugs and security vulnerabilities, diminish performance gains, and create timing channels, urging a reevaluation of current practices.
Read original articleThe blog post discusses the challenges and issues surrounding compiler optimizations, particularly focusing on Clang. It highlights that compiler writers often evade responsibility for bugs introduced through optimizations, attributing them to "undefined behavior" in code written by programmers. The author argues that these optimizations, rather than enhancing performance, often lead to more bugs and security vulnerabilities. They reference Proebsting's Law, which suggests that compiler advancements yield only marginal improvements in computing power, and recent benchmarks indicate that the performance gains from optimizations are diminishing. The post emphasizes that many critical software systems rely heavily on assembly language and intrinsics, as compiler optimizations do not adequately address performance needs.
Moreover, the author raises concerns about security, noting that compiler optimizations can inadvertently create timing channels that expose sensitive information. They cite a 2018 paper that discusses how compiler upgrades can introduce vulnerabilities without warning. The post concludes with a specific example of a timing attack against the Kyber reference code compiled with Clang, illustrating the real-world implications of these optimization issues. The author calls for a reevaluation of the optimization practices in compilers, suggesting that the current approach may be detrimental to both performance and security in software development.
Related
How GCC and Clang handle statically known undefined behaviour
Discussion on compilers handling statically known undefined behavior (UB) in C code reveals insights into optimizations. Compilers like gcc and clang optimize based on undefined language semantics, potentially crashing programs or ignoring problematic code. UB avoidance is crucial for program predictability and security. Compilers differ in handling UB, with gcc and clang showing variations in crash behavior and warnings. LLVM's 'poison' values allow optimizations despite UB, reflecting diverse compiler approaches. Compiler responses to UB are subjective, influenced by developers and user requirements.
Mix-testing: revealing a new class of compiler bugs
A new "mix testing" approach uncovers compiler bugs by compiling test fragments with different compilers. Examples show issues in x86 and Arm architectures, emphasizing the importance of maintaining instruction ordering. Luke Geeson developed a tool to explore compiler combinations, identifying bugs and highlighting the need for clearer guidelines.
Refined Input, Degraded Output: The Counterintuitive World of Compiler Behavior
The study delves into compiler behavior when given extra information for program optimization. Surprisingly, more data can sometimes lead to poorer optimization due to intricate compiler interactions. Testing identified 59 cases in popular compilers, emphasizing the need for better understanding.
Beyond Clean Code
The article explores software optimization and "clean code," emphasizing readability versus performance. It critiques the belief that clean code equals bad code, highlighting the balance needed in software development.
Better Firmware with LLVM/Clang
LLVM and Clang are gaining traction in embedded software development, particularly for ARM Cortex-M devices. The article advocates integrating Clang for better static analysis, error detection, and dual compiler usage.
- Many commenters argue that undefined behavior (UB) in C/C++ is a significant issue, often leading to unexpected results and security vulnerabilities.
- Some believe that blaming compiler optimizations for bugs is misguided, emphasizing that programmers should be aware of language standards and UB.
- There is a call for better languages or compilers that can express semantics more clearly, reducing reliance on optimizations that may introduce errors.
- Several users highlight the trade-offs between performance gains from optimizations and the potential for introducing bugs, particularly in security-critical applications.
- Some suggest using specific compiler flags to manage optimizations, indicating a desire for more control over the compilation process.
But that's not an excuse for having a bug; it's the exact evidence that it's not a bug at all. Calling the compiler buggy for not doing what you want when you commit Undefined Behavior is like calling dd buggy for destroying your data when you call it with the wrong arguments.
A big chunk of the essay is about a side point — how good the gains of optimization might be, which, even with data, would be a use-case dependent decision.
But the bulk of his complaint is that C compilers fail to take into account semantics that cannot be expressed in the language. Wow, shocker!
At the very end he says “use a language which can express the needed semantics”. The entire essay could have been replaced with that sentence.
But blaming the compiler devs for this is just misguided.
https://www.intel.com/content/www/us/en/developer/articles/t...
Look at DOITM in that document — it is simply impossible for a userspace crypto library to set the required bit.
I have seldomly seen someone discredit their expertise that fast in a blog post. (Especially if you follow the link and realized it's just basic fundamental C stuff of UB not meaning it produces an "arbitrary" value.)
They'll probably need some kind of specialized compiler of their own if they want to be serious about it. Or carry on with asm.
For instance, in Python you can write something like:
result = [something(value) for value in set_object]
Because Python's set objects are unordered, it's clear that it doesn't matter in which order the items are processed, and that the order of the results doesn't matter. That opens a whole lot of optimizations at the language level that don't rely on brilliant compilers implying what the author meant. Similar code in another language with immutable data can go one step further: since something(value1) can't possibly affect something(value2), it can execute those in parallel with threads or processes or whatever else makes it go fast.Much of the optimization of C compilers is looking at patterns in the code and trying to find faster ways to do what the author probably meant. Because C lacks the ability to express much intent compared to pretty much any newer language, they have the freedom to guess, but also have to make those kinds of inferences to get decent performance.
On the plus side, this might be a blessing in disguise like when the Hubble telescope needed glasses. We invented brilliant techniques to make it work despite its limitations. Once we fixed its problems, those same techniques made it perform way better than originally expected. All those C compiler optimizations, applied to a language that's not C, may give us superpowers.
Wildly false, and I have no idea where the author is getting this idea from. If you regress people's code in LLVM, your patch gets reverted.
Before reading this, I thought that a simple compiler could never usefully compete against optimizing compilers (which require more manpower to produce), but perhaps there is a niche use-case for a compiler with better facilities for manual optimization. This article has inspired me to make a simple compiler myself.
Compilers are not your enemy. Optimizing compilers do the things they do because that's what the majority of people using them want.
It also mixes in things that have nothing to do with optimizing compilers at all like expecting emulation of 64-bit integers on 32-bit platforms to be constant time when neither the language nor the library in question have ever promised such guarantees. Similar with the constant references to bool as if that's some kind of magical data type where avoiding it gives you whatever guarantees you wish. Sounds more like magical thinking than programming.
I'd file this under "why can't the compiler read my mind and do what I want instead of just what I asked it to".
The "compiler"'s job would then be to assert that the behaviour of the source matches the behaviour of the provided assembly. (This is probably a hard/impossible problem to solve in the general case, but I think it'd be solvable in enough cases to be useful)
To me this would offer the best of both worlds - readable, auditable source code, alongside high-performance assembly that you know won't randomly break in a future compiler update.
> LLVM 11 tends to take 2x longer to compile code with optimizations, and as a result produces code that runs 10-20% faster (with occasional outliers in either direction), compared to LLVM 2.7 which is more than 10 years old.
Yes, C code is expected to benefit less from optimizations, since it is already close to assembly. But compiler optimizations in the past decades had enormous impact - because they allowed better languages. Without modern optimizations, C++ would have never been as fast as C, and Rust wouldn't be possible at all. Same arguments apply to Java and JavaScript.
char* strappend(char const* input, size_t size) {
char* ptr = malloc(size + 2);
if (!ptr) return 0;
memcpy(ptr, input, size);
ptr[size] = 'a';
ptr[size + 1] = 'b';
return ptr;
}
This function is undefined if size is SIZE_T_MAX.Many pieces of code have these sorts of "bugs", but in practice no one cares, because the input required, while theoretically possible, physically is not.
I bet it's roughly none.
And both are just a major headache now, and belong to reasons why few people start new projects in C.
I wonder how many such design decisions, relevant today, but with a potential to screw up future humanity, we are making right now.
How do Firefox and Chrome perform if they are compiled at -O0?
Basically like today's "-Og/-Odebug" or "-fno-omit-frame-pointers" but for this specific niche.
I'd be interested to see a post comparing the performance and vulnerability of the mentioned crypto code with and without this (hypothetical) -Obranchless.
#pragma GCC push_options #pragma GCC optimize ("O0")
Exploiting UB in the optimizer can be annoying, but most projects with bad practices from the 1990s have figured it out by now. UBsan helps of course.I'm pretty grateful for aggressive optimizations. I would not want to compile a large C++ codebase with g++ that has itself been compiled with -O0. Even a 20% speedup helps.
The only annoying issue with C/C++ compilers is the growing list of false positive warnings (usually 100% false positives in well written projects).
This makes it difficult to read the rest of the article. Really? All compiler authors, as a blanket statement, act in bad faith? Whenever possible?
> As a cryptographic example, benchmarks across many CPUs show that the avx2 implementation of kyber768 is about 4 times faster than portable code compiled with an "optimizing" compiler.
What? This is an apples to oranges comparison. Compilers optimize all code they parse; optimizing a single algorithm will of course speed up implementations of that specific algorithm, but what about the 99.9999999% of code which is not your particular hand-optimized algorithm?
C has also other issues related to undefined behavior and it being used for what I call "extreme optimizations" (e.g. not emitting code for an if branch that checks for a null pointer). Rust is emerging as an alternative to C that aims to fix many of its problems, but how does it fares in terms of writing constant-time code? Is it similar to C, easier or more complicated?
Both, gcc and clang, are orders of magnitude better tested than all the closed source applications, developed under tight timelines and that we essentially trust our lives with.
To be very clear, there are compiler bugs but those are almost never the problem in the first place. In the vast majority of cases it starts with buggy user code. An now back to handwritten assembly…
Somehow it took me long minutes to infer this.
Find out on next weeks episode of “lets blame compilers rather than my choice of language”!
Related
How GCC and Clang handle statically known undefined behaviour
Discussion on compilers handling statically known undefined behavior (UB) in C code reveals insights into optimizations. Compilers like gcc and clang optimize based on undefined language semantics, potentially crashing programs or ignoring problematic code. UB avoidance is crucial for program predictability and security. Compilers differ in handling UB, with gcc and clang showing variations in crash behavior and warnings. LLVM's 'poison' values allow optimizations despite UB, reflecting diverse compiler approaches. Compiler responses to UB are subjective, influenced by developers and user requirements.
Mix-testing: revealing a new class of compiler bugs
A new "mix testing" approach uncovers compiler bugs by compiling test fragments with different compilers. Examples show issues in x86 and Arm architectures, emphasizing the importance of maintaining instruction ordering. Luke Geeson developed a tool to explore compiler combinations, identifying bugs and highlighting the need for clearer guidelines.
Refined Input, Degraded Output: The Counterintuitive World of Compiler Behavior
The study delves into compiler behavior when given extra information for program optimization. Surprisingly, more data can sometimes lead to poorer optimization due to intricate compiler interactions. Testing identified 59 cases in popular compilers, emphasizing the need for better understanding.
Beyond Clean Code
The article explores software optimization and "clean code," emphasizing readability versus performance. It critiques the belief that clean code equals bad code, highlighting the balance needed in software development.
Better Firmware with LLVM/Clang
LLVM and Clang are gaining traction in embedded software development, particularly for ARM Cortex-M devices. The article advocates integrating Clang for better static analysis, error detection, and dual compiler usage.