August 3rd, 2024

Clang vs. Clang

The blog post critiques compiler optimizations in Clang, arguing they often introduce bugs and security vulnerabilities, diminish performance gains, and create timing channels, urging a reevaluation of current practices.

Read original article

FrustrationSkepticismInterest

The blog post discusses the challenges and issues surrounding compiler optimizations, particularly focusing on Clang. It highlights that compiler writers often evade responsibility for bugs introduced through optimizations, attributing them to "undefined behavior" in code written by programmers. The author argues that these optimizations, rather than enhancing performance, often lead to more bugs and security vulnerabilities. They reference Proebsting's Law, which suggests that compiler advancements yield only marginal improvements in computing power, and recent benchmarks indicate that the performance gains from optimizations are diminishing. The post emphasizes that many critical software systems rely heavily on assembly language and intrinsics, as compiler optimizations do not adequately address performance needs.

Moreover, the author raises concerns about security, noting that compiler optimizations can inadvertently create timing channels that expose sensitive information. They cite a 2018 paper that discusses how compiler upgrades can introduce vulnerabilities without warning. The post concludes with a specific example of a timing attack against the Kyber reference code compiled with Clang, illustrating the real-world implications of these optimization issues. The author calls for a reevaluation of the optimization practices in compilers, suggesting that the current approach may be detrimental to both performance and security in software development.

How GCC and Clang handle statically known undefined behaviour

Discussion on compilers handling statically known undefined behavior (UB) in C code reveals insights into optimizations. Compilers like gcc and clang optimize based on undefined language semantics, potentially crashing programs or ignoring problematic code. UB avoidance is crucial for program predictability and security. Compilers differ in handling UB, with gcc and clang showing variations in crash behavior and warnings. LLVM's 'poison' values allow optimizations despite UB, reflecting diverse compiler approaches. Compiler responses to UB are subjective, influenced by developers and user requirements.

Mix-testing: revealing a new class of compiler bugs

A new "mix testing" approach uncovers compiler bugs by compiling test fragments with different compilers. Examples show issues in x86 and Arm architectures, emphasizing the importance of maintaining instruction ordering. Luke Geeson developed a tool to explore compiler combinations, identifying bugs and highlighting the need for clearer guidelines.

Refined Input, Degraded Output: The Counterintuitive World of Compiler Behavior

The study delves into compiler behavior when given extra information for program optimization. Surprisingly, more data can sometimes lead to poorer optimization due to intricate compiler interactions. Testing identified 59 cases in popular compilers, emphasizing the need for better understanding.

Beyond Clean Code

The article explores software optimization and "clean code," emphasizing readability versus performance. It critiques the belief that clean code equals bad code, highlighting the balance needed in software development.

Better Firmware with LLVM/Clang

LLVM and Clang are gaining traction in embedded software development, particularly for ARM Cortex-M devices. The article advocates integrating Clang for better static analysis, error detection, and dual compiler usage.

AI: What people are saying

The comments reflect a diverse range of opinions on compiler optimizations and their implications.

Many commenters argue that undefined behavior (UB) in C/C++ is a significant issue, often leading to unexpected results and security vulnerabilities.
Some believe that blaming compiler optimizations for bugs is misguided, emphasizing that programmers should be aware of language standards and UB.
There is a call for better languages or compilers that can express semantics more clearly, reducing reliance on optimizations that may introduce errors.
Several users highlight the trade-offs between performance gains from optimizations and the potential for introducing bugs, particularly in security-critical applications.
Some suggest using specific compiler flags to manage optimizations, indicating a desire for more control over the compilation process.

36 comments

By @josephcsible - 9 months

> compiler writers refuse to take responsibility for the bugs they introduced, even though the compiled code worked fine before the "optimizations". The excuse for not taking responsibility is that there are "language standards" saying that these bugs should be blamed on millions of programmers writing code that bumps into "undefined behavior"

But that's not an excuse for having a bug; it's the exact evidence that it's not a bug at all. Calling the compiler buggy for not doing what you want when you commit Undefined Behavior is like calling dd buggy for destroying your data when you call it with the wrong arguments.

By @gumby - 9 months

I like Bernstein but sometimes he flies off the handle in the wrong direction. This is a good example, which he even half-heartedly acknowledges at the end!

A big chunk of the essay is about a side point — how good the gains of optimization might be, which, even with data, would be a use-case dependent decision.

But the bulk of his complaint is that C compilers fail to take into account semantics that cannot be expressed in the language. Wow, shocker!

At the very end he says “use a language which can express the needed semantics”. The entire essay could have been replaced with that sentence.

By @leni536 - 9 months

C and C++ are unsuitable for writing algorithms with constant-time guarantees. The standards have little to no notion of real time, and compilers don't offer additional guarantees as extensions.

But blaming the compiler devs for this is just misguided.

By @amluto - 9 months

It’s worth noting that, on Intel CPUs, neither clang nor anything else can possibly generate correct code, because correct code does not exist in user mode.

https://www.intel.com/content/www/us/en/developer/articles/t...

Look at DOITM in that document — it is simply impossible for a userspace crypto library to set the required bit.

By @dathinab - 9 months

> [..] whenever possible, compiler writers refuse to take responsibility for the bugs they introduced

I have seldomly seen someone discredit their expertise that fast in a blog post. (Especially if you follow the link and realized it's just basic fundamental C stuff of UB not meaning it produces an "arbitrary" value.)

By @Conscat - 9 months

Fwiw clang has a `clang::optnone` attribute to disable all optimizations on a per-function basis, and GCC has the fantastic `gnu::optimize` attribute which allows you to add or remove optimizations by name, or set the optimization level regardless of compiler flags. `gnu::optimize(0)` is similar to that clang flag. Clang also has `clang::no_builtins` to disable specifically the memcpy and memset optimizations.

By @TNorthover - 9 months

I'm vaguely sympathetic to these crypto people's end goals (talking about things like constant time evaluation & secret hiding), but it's really not what general purpose compilers are even thinking about most of the time so I doubt it'll ever be more than a hack that mostly works.

They'll probably need some kind of specialized compiler of their own if they want to be serious about it. Or carry on with asm.

By @kstrauser - 9 months

I can't help but feel we're going to think of these as the bad old years, and that at some point we'll have migrated off of C to a language with much less UB. It's so easy to express things in C that compile but that the compiler couldn't possibly guess the intent of because C doesn't have a way to express it.

For instance, in Python you can write something like:

  result = [something(value) for value in set_object]

Because Python's set objects are unordered, it's clear that it doesn't matter in which order the items are processed, and that the order of the results doesn't matter. That opens a whole lot of optimizations at the language level that don't rely on brilliant compilers implying what the author meant. Similar code in another language with immutable data can go one step further: since something(value1) can't possibly affect something(value2), it can execute those in parallel with threads or processes or whatever else makes it go fast.

Much of the optimization of C compilers is looking at patterns in the code and trying to find faster ways to do what the author probably meant. Because C lacks the ability to express much intent compared to pretty much any newer language, they have the freedom to guess, but also have to make those kinds of inferences to get decent performance.

On the plus side, this might be a blessing in disguise like when the Hubble telescope needed glasses. We invented brilliant techniques to make it work despite its limitations. Once we fixed its problems, those same techniques made it perform way better than originally expected. All those C compiler optimizations, applied to a language that's not C, may give us superpowers.

By @AndyKelley - 9 months

If you don't like C's semantics then how about using a different programming language instead of getting angry at compiler engineers.

By @krackers - 9 months

Refreshing post that conveys a perspective I haven't seen voiced often. See also: https://gavinhoward.com/2023/08/the-scourge-of-00ub/

By @zokier - 9 months

It's free software, they are completely free to fork it make it have whatever semantics they want if they don't like the ISO C semantics. They can't really expect someone else to do that for them for free, and especially this sort of post is not exactly the sort of thing that would any of the compiler people to come to djbs side

By @lapinot - 9 months

Demonstrating how some languages and some compilers are bad at tasks such as writing constant-time crypto routines is fine. Concluding that all compilers and non-asm languages are bad is a non sequitur. Just because you don't want non-branching code to change into branching code doesn't mean you should have to do register allocation by hand. Write simple domain-specific compilers and languages people.

By @pcwalton - 9 months

> (As a side note, I would expect this conditional branch to slow down more code than it speeds up. But remember that compiler writers measure an "optimization" as successful if they can find any example where the "optimization" saves time.)

Wildly false, and I have no idea where the author is getting this idea from. If you regress people's code in LLVM, your patch gets reverted.

By @quohort - 9 months

Very interesting article and much-needed criticism of the current standard of heuristic optimization.

Before reading this, I thought that a simple compiler could never usefully compete against optimizing compilers (which require more manpower to produce), but perhaps there is a niche use-case for a compiler with better facilities for manual optimization. This article has inspired me to make a simple compiler myself.

By @ziml77 - 9 months

Why does the code need to rely on hacks to get around optimizations? Can't they be disabled per-unit by just compiling different files with different optimization flags?

By @account42 - 9 months

Surprised to see such an incoherent and trite rant from djb.

Compilers are not your enemy. Optimizing compilers do the things they do because that's what the majority of people using them want.

It also mixes in things that have nothing to do with optimizing compilers at all like expecting emulation of 64-bit integers on 32-bit platforms to be constant time when neither the language nor the library in question have ever promised such guarantees. Similar with the constant references to bool as if that's some kind of magical data type where avoiding it gives you whatever guarantees you wish. Sounds more like magical thinking than programming.

I'd file this under "why can't the compiler read my mind and do what I want instead of just what I asked it to".

By @Retr0id - 9 months

What I'd really like is a way to express code in a medium/high level language, and provide hand-optimized assembly code alongside it (for as many target architectures as you need). For a first-pass, you could machine-generate that assembly, and then manually verify that it's constant time (for example) and perform additional optimizations over the top of that, by hand.

The "compiler"'s job would then be to assert that the behaviour of the source matches the behaviour of the provided assembly. (This is probably a hard/impossible problem to solve in the general case, but I think it'd be solvable in enough cases to be useful)

To me this would offer the best of both worlds - readable, auditable source code, alongside high-performance assembly that you know won't randomly break in a future compiler update.

By @afdbcreid - 9 months

A point of the post that I didn't see discussed here is this:

> LLVM 11 tends to take 2x longer to compile code with optimizations, and as a result produces code that runs 10-20% faster (with occasional outliers in either direction), compared to LLVM 2.7 which is more than 10 years old.

Yes, C code is expected to benefit less from optimizations, since it is already close to assembly. But compiler optimizations in the past decades had enormous impact - because they allowed better languages. Without modern optimizations, C++ would have never been as fast as C, and Rust wouldn't be possible at all. Same arguments apply to Java and JavaScript.

By @mgaunard - 9 months

Let's consider this function:

  char* strappend(char const* input, size_t size) {
    char* ptr = malloc(size + 2);
    if (!ptr) return 0;
    memcpy(ptr, input, size);
    ptr[size] = 'a';
    ptr[size + 1] = 'b';
    return ptr;
  }

This function is undefined if size is SIZE_T_MAX.

Many pieces of code have these sorts of "bugs", but in practice no one cares, because the input required, while theoretically possible, physically is not.

By @saagarjha - 9 months

I was already rolling my eyes but then I saw the unironic link to “The Death of Optimizing Compilers” and they might as well have fell out of my head. Someone please explain to the crypto people that designing a general-purpose language around side-channel resistance is actually stupid since most people don’t need it, optimizations actually do help quite a lot (…if they didn’t, you wouldn’t be begging for them: -O0 exists), and the model of UB C(++) has is not going away. If you want to make your own dedicated cryptography compiler that does all this stuff I honestly think you should and I would support such a thing but when you think the whole world is conspiring against your attempts to write perfect code maybe it’s you.

By @wolf550e - 9 months

> It would be interesting to study what percentage of security failures can be partly or entirely attributed to compiler "optimizations".

I bet it's roughly none.

By @inglor_cz - 9 months

Similarly to not checking array bounds, undefined behavior was once introduced in the name of efficiency - back in the ages when the performance difference really mattered.

And both are just a major headache now, and belong to reasons why few people start new projects in C.

I wonder how many such design decisions, relevant today, but with a potential to screw up future humanity, we are making right now.

By @jancsika - 9 months

Ok, as far as the efficacy/importance/tradeoff of optimizing compilers...

How do Firefox and Chrome perform if they are compiled at -O0?

By @quuxplusone - 9 months

The author's Clang patch is interesting, but I wonder if what he really wants is, like, a new optimization level "-Obranchless" which is like O2/O3 but disables all optimizations which might introduce new conditional branches. Presumably optimizations that _remove_ branches are fine; it's just that you don't want any deliberately branchless subexpression being replaced with a branch.

Basically like today's "-Og/-Odebug" or "-fno-omit-frame-pointers" but for this specific niche.

I'd be interested to see a post comparing the performance and vulnerability of the mentioned crypto code with and without this (hypothetical) -Obranchless.

By @fhgag - 9 months

Timing attacks are a very specialized problem. If you don't care about performance, why not wrap the critical section in:

  #pragma GCC push_options #pragma GCC optimize ("O0")

Exploiting UB in the optimizer can be annoying, but most projects with bad practices from the 1990s have figured it out by now. UBsan helps of course.

I'm pretty grateful for aggressive optimizations. I would not want to compile a large C++ codebase with g++ that has itself been compiled with -O0. Even a 20% speedup helps.

The only annoying issue with C/C++ compilers is the growing list of false positive warnings (usually 100% false positives in well written projects).

By @johnfn - 9 months

> The bugs admitted in the compiler changelogs are just the tip of the iceberg. Whenever possible, compiler writers refuse to take responsibility for the bugs they introduced, even though the compiled code worked fine before the "optimizations".

This makes it difficult to read the rest of the article. Really? All compiler authors, as a blanket statement, act in bad faith? Whenever possible?

> As a cryptographic example, benchmarks across many CPUs show that the avx2 implementation of kyber768 is about 4 times faster than portable code compiled with an "optimizing" compiler.

What? This is an apples to oranges comparison. Compilers optimize all code they parse; optimizing a single algorithm will of course speed up implementations of that specific algorithm, but what about the 99.9999999% of code which is not your particular hand-optimized algorithm?

By @GTP - 9 months

As someone that knows C but isn't familiar with compiler internals, I ask: would the disruptive optimizations discussed here kick in even when compiling with ootimizations tured off (-o0)?

C has also other issues related to undefined behavior and it being used for what I call "extreme optimizations" (e.g. not emitting code for an if branch that checks for a null pointer). Rust is emerging as an alternative to C that aims to fix many of its problems, but how does it fares in terms of writing constant-time code? Is it similar to C, easier or more complicated?

By @qalmakka - 9 months

I'm sick and tired of people expecting non-standard behaviour from C/C++ compilers when there are long estabished standards that clearly state what is allowed and what is not. If you are writing something like Unreal Engine and you resort on UB to get all of the performance you can get without writing assembly, then you also need to know you'll have to commit to a certain version of a certain compiler if you want a deterministic behaviour.

By @_orz_ - 9 months

What an interesting discussion. Especially everything about that writing it in Asm would be the solution if you want secure code.

Both, gcc and clang, are orders of magnitude better tested than all the closed source applications, developed under tight timelines and that we essentially trust our lives with.

To be very clear, there are compiler bugs but those are almost never the problem in the first place. In the vast majority of cases it starts with buggy user code. An now back to handwritten assembly…

By @gok - 9 months

Computer security is not a serious field. There is no other group that honestly feels "do what I meant, not what I said" is a sign of someone else's bug.

By @red_admiral - 9 months

So, should we be compiling security-critical code with `-O0` then?

By @tomcam - 9 months

UB means undefined behavior

Somehow it took me long minutes to infer this.

By @e40 - 9 months

Was hoping the title was a pun on Spy vs Spy[0].

[0] https://en.wikipedia.org/wiki/Spy_vs._Spy

By @orf - 9 months

Man attempts to write constant time algorithms using language that does not support constant time algorithms, but who’s really at fault here?

Find out on next weeks episode of “lets blame compilers rather than my choice of language”!

By @ndesaulniers - 9 months

Compile your code with `-O0` and shut up already.

By @o11c - 9 months

Complains about branching, but doesn't even mention `__builtin_expect_with_probability`.

Clang vs. Clang

Related

How GCC and Clang handle statically known undefined behaviour

Mix-testing: revealing a new class of compiler bugs

Refined Input, Degraded Output: The Counterintuitive World of Compiler Behavior

Beyond Clean Code

Better Firmware with LLVM/Clang

Related

How GCC and Clang handle statically known undefined behaviour

Mix-testing: revealing a new class of compiler bugs

Refined Input, Degraded Output: The Counterintuitive World of Compiler Behavior

Beyond Clean Code

Better Firmware with LLVM/Clang