December 10th, 2024

Common Misconceptions about Compilers

The article clarifies misconceptions about compilers, stating they improve performance without guaranteeing optimality, and highlights the complexities of optimization levels, data locality, and the importance of runtime type information.

Read original article

CuriosityAppreciationFrustration

The article discusses common misconceptions about compilers, particularly focusing on large-scale, general-purpose compilers like LLVM, GCC, and ICX. It clarifies that optimization does not guarantee an optimal program, as compilers aim to improve performance rather than achieve the best possible outcome. The article highlights that while compilers can optimize for instruction cache locality, they often do not optimize for data locality due to the complexity of making intrusive changes to the program's structure. It also addresses the misconception that higher optimization levels, such as -O3, always produce significantly faster code than -O2, noting that the differences can be negligible. Additionally, it explains the role of branch weights in modern compilers and the importance of runtime type information in Just-In-Time (JIT) compilation for languages like JavaScript. The article concludes by emphasizing that while -O0 is often perceived as providing fast compilation, it primarily offers debuggable and predictable code rather than speed advantages.

- Compilers aim to improve performance, not necessarily to produce optimal programs.

- Higher optimization levels do not always result in significantly faster code.

- Compilers typically do not optimize for data locality due to the complexity of required changes.

- Runtime type information is crucial for JIT compilation in languages like JavaScript.

- The -O0 optimization level is more about debuggability than compilation speed.

Refined Input, Degraded Output: The Counterintuitive World of Compiler Behavior

The study delves into compiler behavior when given extra information for program optimization. Surprisingly, more data can sometimes lead to poorer optimization due to intricate compiler interactions. Testing identified 59 cases in popular compilers, emphasizing the need for better understanding.

What are the ways compilers recognize complex patterns?

Compilers optimize by recognizing patterns like popcount, simplifying code for efficiency. LLVM and GCC use hardcoded patterns to match common idioms, balancing compile-time speed with runtime gains in critical code sections.

Clang vs. Clang

The blog post critiques compiler optimizations in Clang, arguing they often introduce bugs and security vulnerabilities, diminish performance gains, and create timing channels, urging a reevaluation of current practices.

Optimisation-dependent IR decisions in Clang

Clang's Intermediate Representation varies with optimization levels; disabling optimization adds debugging aids, while enabling it introduces performance enhancements like lifetime markers and TBAA metadata, impacting compiler usage and performance tuning.

Compiler Optimization in a Language You Can Understand

The article explains compiler optimizations, focusing on their types, purposes, and methods. It emphasizes the significance of understanding these optimizations for writing efficient code and contrasts optimized versus unoptimized builds.

AI: What people are saying

The comments on the article reveal several key themes regarding compilers and their optimization processes.

There is a discussion about the role of interpreters and compilers in language bootstrapping and the implications for reproducibility and behavior consistency.
Several commenters highlight the complexities of optimization levels and the impact of different settings on performance, emphasizing that optimizations can vary significantly.
Data locality and layout optimization are noted as challenging problems, with suggestions for improving performance through better data structures.
Concerns about the reliability of compiler outputs and the potential for non-optimal code generation are raised, stressing the importance of understanding compiler behavior.
Some comments touch on the historical issues with C/C++ compilers and the need for innovation in compiler design for new languages.

16 comments

By @mshockwave - 4 months

> As an example, in my desktop, it takes ~25min to compile a Debug version of LLVM, but ~23min to compile a Release version.

oh I think I know what might cause this: TableGen. The `llvm-tblgen` run time accounts for a good chunk of LLVM build time. In a debug build `llvm-tblgen` is also unoptimized, hence the long run time generating those .inc / .def files. You can enable cmake variable `LLVM_OPTIMIZED_TABLEGEN` to build `llvm-tblgen` in release mode while leaving the rest of the project in debug build (or whatever `CMAKE_BUILD_TYPE` you choose).

By @mshockwave - 4 months

> The compiler optimizes for data locality

> So, we have a single array in which every entry has the key and the value paired together. But, during lookups, we only care about the keys. The way the data is structured, we keep loading the values into the cache, wasting cache space and cycles. One way to improve that is to split this into two arrays: one for the keys and one for the values.

Recently someone proposed this on LLVM: https://discourse.llvm.org/t/rfc-add-a-new-structure-layout-...

Also, I think what you meant by data locality here is really optimizing data layout, which, as you also mentioned, is a hard problem. But if it's just optimizing (cache's) locality, I think the classic loop interchange also qualifies. Though it's not enabled by default in LLVM, despite being there for quite a while.

By @Animats - 4 months

About C/C++ compilers, mostly.

C++ has terrible template and include compilation problems, for historical reasons.

By @rkagerer - 4 months

How about reproducibility?

I always took for granted most compilers will generally output consistent binary code given identical inputs. Then the other day I saw a post here about optimization timeouts and such.

By @eru - 4 months

> Knowing which paths are hot in any Javascript bytecode is not enough to produce optimized code; you also need to know the types. And you only know the types at runtime. This is the main reason why JIT compilers compile code at runtime.

You can often make good guesses statically. Especially if your JavaScript was produced from something like a TypeScript source.

By @timeforcomputer - 4 months

Why does optimal substructure work for code size? Couldn't you take a compiled non-minimal function and break each instruction into a function call (with some assumptions about the ISA maybe), then assuming each single-instruction function is minimal by itself, infer the entirety is minimal, contradicting?

By @ykonstant - 4 months

Nice write-up and overall very informative blog. Kind of off topic, but are you Stefanos, doing your PhD at UIUC? That's my alma mater, I hope you find the academic* environment there stimulating!

*Let's not talk about the social environment of UC.

By @dspillett - 4 months

> -O3 produces much faster code than -O2

Also “-Ox gives similar results all the time”. Any optimisation is a gamble, some of them quite unsure gambles. Different cache sizes and management strategies between processing units mean that what might be a small benefit for one could be a detriment when the code is running on another, and even if you specifically tune the optimisations for a given set of chips you are at the mercy of the incoming data which could invalidate any educated guesses made about branch likelihood and so forth.

By @kazinator - 4 months

If you have both an interpreter and compiler, then boostrapping the (self hosted) language is a lot simpler. The interpreter is written in some widely available language like C. The interpreter runs the compiler and standard library code of your language in order to compile that compiler and standard library. After that, those parts execute as native machine code or byte code (with or without JIT).

Without the interpreter, you have to ship the compiled versions of those parts, even to someone building your language from scratch. Or perhaps ship an entire prebuilt package of the language.

An interpreter creates a third opinion on what code should do, resulting in a documentation-interpreter-compiler triangle, where in a situation in which it is unclear what the right behavior is, weight can be given to the situation that two out of those three pieces agree.

Optimized versus unoptimized isn't exactly the same, because there is a dependency. If unoptimized translation has a certain behavior, and optimization doesn't wreck it (prevalent case), then of course the two are going to agree. Their agreement doesn't carry weight against documentation. An interpreter and compiler are almost totally independent, except for where they get the code to share run-time support and library functions.

By @mshockwave - 4 months

Re: Why does link-time optimization (LTO) happen at link-time?

I think maybe LLVM's ThinLTO is what you're looking for where whole-program optimization (more or less) happens in middle end.

By @norir - 4 months

Good observations but a more accurate title would be common misconceptions about c/c++ compilers. I point this out because I think that there is a lot of room for compiler innovation that will come from developing new compilers for new languages and I don't want us to continue being anchored in the mistakes of the past.

By @ben-schaaf - 4 months

> For example, minijava-cpp is a MiniJava compiler written in C++ which has some pretty heavy code in it. It compiles in ~0.7s in my laptop, while linking all these separate files takes much longer.

I can't reproduce this result at all. Using compile.sh it takes my 7950x 0.9s to compile the whole project. A quick meson/ninja setup took 0.8s from a clean build directory, and just 0.043s to link using gnu-ld. Using faster linkers yields even better results: 0.025s for gold and 0.013s for mold. I'm curious how you got your results?

> SQLite, too, compiles everything into a single .c program.

It takes my computer 25 seconds to compile sqlite when making incremental changes. My work's similarly sized C++(!) repo links 3 executables in 0.07s using mold.

I'm having a really hard time seeing how unity builds could possibly be faster. Maybe with an exceedingly terrible linker or spinning rust?

By @baziotis - 4 months

Hey folks, author here. I'm really happy the article has gotten such attention. I hope it's good attention and that you folks learned something! Unfortunately, though, it's _very_ hard to keep track of all the comments in HackerNews. So, if you have a comment you'd like me to reply to and I haven't, please either use the Reddit post: https://www.reddit.com/r/Compilers/comments/1hapetl/common_m...

or head over to my website: https://sbaziotis.com/ for more contact info.

Best, Stefanos

By @eru - 4 months

If the author is here: the link from the table of contents to '-O0 gives you fast compilation' is broken.

By @high_na_euv - 4 months

>Separate compilation is always worth it

>Separate compilation is the idea that you compile a different object file for each source file.22 The reasoning of why this is beneficial is that if you change one source file, you only have to re-compile that one file and "just" link it with the other object files; you don't have to re-compile the whole project.

That reflects my feelings that linker is bullshit step

Common Misconceptions about Compilers

Related

Refined Input, Degraded Output: The Counterintuitive World of Compiler Behavior

What are the ways compilers recognize complex patterns?

Clang vs. Clang

Optimisation-dependent IR decisions in Clang

Compiler Optimization in a Language You Can Understand

Related

Refined Input, Degraded Output: The Counterintuitive World of Compiler Behavior

What are the ways compilers recognize complex patterns?

Clang vs. Clang

Optimisation-dependent IR decisions in Clang

Compiler Optimization in a Language You Can Understand