Common Misconceptions about Compilers
The article clarifies misconceptions about compilers, stating they improve performance without guaranteeing optimality, and highlights the complexities of optimization levels, data locality, and the importance of runtime type information.
Read original articleThe article discusses common misconceptions about compilers, particularly focusing on large-scale, general-purpose compilers like LLVM, GCC, and ICX. It clarifies that optimization does not guarantee an optimal program, as compilers aim to improve performance rather than achieve the best possible outcome. The article highlights that while compilers can optimize for instruction cache locality, they often do not optimize for data locality due to the complexity of making intrusive changes to the program's structure. It also addresses the misconception that higher optimization levels, such as -O3, always produce significantly faster code than -O2, noting that the differences can be negligible. Additionally, it explains the role of branch weights in modern compilers and the importance of runtime type information in Just-In-Time (JIT) compilation for languages like JavaScript. The article concludes by emphasizing that while -O0 is often perceived as providing fast compilation, it primarily offers debuggable and predictable code rather than speed advantages.
- Compilers aim to improve performance, not necessarily to produce optimal programs.
- Higher optimization levels do not always result in significantly faster code.
- Compilers typically do not optimize for data locality due to the complexity of required changes.
- Runtime type information is crucial for JIT compilation in languages like JavaScript.
- The -O0 optimization level is more about debuggability than compilation speed.
Related
Refined Input, Degraded Output: The Counterintuitive World of Compiler Behavior
The study delves into compiler behavior when given extra information for program optimization. Surprisingly, more data can sometimes lead to poorer optimization due to intricate compiler interactions. Testing identified 59 cases in popular compilers, emphasizing the need for better understanding.
What are the ways compilers recognize complex patterns?
Compilers optimize by recognizing patterns like popcount, simplifying code for efficiency. LLVM and GCC use hardcoded patterns to match common idioms, balancing compile-time speed with runtime gains in critical code sections.
Clang vs. Clang
The blog post critiques compiler optimizations in Clang, arguing they often introduce bugs and security vulnerabilities, diminish performance gains, and create timing channels, urging a reevaluation of current practices.
Optimisation-dependent IR decisions in Clang
Clang's Intermediate Representation varies with optimization levels; disabling optimization adds debugging aids, while enabling it introduces performance enhancements like lifetime markers and TBAA metadata, impacting compiler usage and performance tuning.
Compiler Optimization in a Language You Can Understand
The article explains compiler optimizations, focusing on their types, purposes, and methods. It emphasizes the significance of understanding these optimizations for writing efficient code and contrasts optimized versus unoptimized builds.
- There is a discussion about the role of interpreters and compilers in language bootstrapping and the implications for reproducibility and behavior consistency.
- Several commenters highlight the complexities of optimization levels and the impact of different settings on performance, emphasizing that optimizations can vary significantly.
- Data locality and layout optimization are noted as challenging problems, with suggestions for improving performance through better data structures.
- Concerns about the reliability of compiler outputs and the potential for non-optimal code generation are raised, stressing the importance of understanding compiler behavior.
- Some comments touch on the historical issues with C/C++ compilers and the need for innovation in compiler design for new languages.
oh I think I know what might cause this: TableGen. The `llvm-tblgen` run time accounts for a good chunk of LLVM build time. In a debug build `llvm-tblgen` is also unoptimized, hence the long run time generating those .inc / .def files. You can enable cmake variable `LLVM_OPTIMIZED_TABLEGEN` to build `llvm-tblgen` in release mode while leaving the rest of the project in debug build (or whatever `CMAKE_BUILD_TYPE` you choose).
> So, we have a single array in which every entry has the key and the value paired together. But, during lookups, we only care about the keys. The way the data is structured, we keep loading the values into the cache, wasting cache space and cycles. One way to improve that is to split this into two arrays: one for the keys and one for the values.
Recently someone proposed this on LLVM: https://discourse.llvm.org/t/rfc-add-a-new-structure-layout-...
Also, I think what you meant by data locality here is really optimizing data layout, which, as you also mentioned, is a hard problem. But if it's just optimizing (cache's) locality, I think the classic loop interchange also qualifies. Though it's not enabled by default in LLVM, despite being there for quite a while.
C++ has terrible template and include compilation problems, for historical reasons.
I always took for granted most compilers will generally output consistent binary code given identical inputs. Then the other day I saw a post here about optimization timeouts and such.
You can often make good guesses statically. Especially if your JavaScript was produced from something like a TypeScript source.
*Let's not talk about the social environment of UC.
Also “-Ox gives similar results all the time”. Any optimisation is a gamble, some of them quite unsure gambles. Different cache sizes and management strategies between processing units mean that what might be a small benefit for one could be a detriment when the code is running on another, and even if you specifically tune the optimisations for a given set of chips you are at the mercy of the incoming data which could invalidate any educated guesses made about branch likelihood and so forth.
Without the interpreter, you have to ship the compiled versions of those parts, even to someone building your language from scratch. Or perhaps ship an entire prebuilt package of the language.
An interpreter creates a third opinion on what code should do, resulting in a documentation-interpreter-compiler triangle, where in a situation in which it is unclear what the right behavior is, weight can be given to the situation that two out of those three pieces agree.
Optimized versus unoptimized isn't exactly the same, because there is a dependency. If unoptimized translation has a certain behavior, and optimization doesn't wreck it (prevalent case), then of course the two are going to agree. Their agreement doesn't carry weight against documentation. An interpreter and compiler are almost totally independent, except for where they get the code to share run-time support and library functions.
I think maybe LLVM's ThinLTO is what you're looking for where whole-program optimization (more or less) happens in middle end.
I can't reproduce this result at all. Using compile.sh it takes my 7950x 0.9s to compile the whole project. A quick meson/ninja setup took 0.8s from a clean build directory, and just 0.043s to link using gnu-ld. Using faster linkers yields even better results: 0.025s for gold and 0.013s for mold. I'm curious how you got your results?
> SQLite, too, compiles everything into a single .c program.
It takes my computer 25 seconds to compile sqlite when making incremental changes. My work's similarly sized C++(!) repo links 3 executables in 0.07s using mold.
I'm having a really hard time seeing how unity builds could possibly be faster. Maybe with an exceedingly terrible linker or spinning rust?
or head over to my website: https://sbaziotis.com/ for more contact info.
Best, Stefanos
>Separate compilation is the idea that you compile a different object file for each source file.22 The reasoning of why this is beneficial is that if you change one source file, you only have to re-compile that one file and "just" link it with the other object files; you don't have to re-compile the whole project.
That reflects my feelings that linker is bullshit step
Related
Refined Input, Degraded Output: The Counterintuitive World of Compiler Behavior
The study delves into compiler behavior when given extra information for program optimization. Surprisingly, more data can sometimes lead to poorer optimization due to intricate compiler interactions. Testing identified 59 cases in popular compilers, emphasizing the need for better understanding.
What are the ways compilers recognize complex patterns?
Compilers optimize by recognizing patterns like popcount, simplifying code for efficiency. LLVM and GCC use hardcoded patterns to match common idioms, balancing compile-time speed with runtime gains in critical code sections.
Clang vs. Clang
The blog post critiques compiler optimizations in Clang, arguing they often introduce bugs and security vulnerabilities, diminish performance gains, and create timing channels, urging a reevaluation of current practices.
Optimisation-dependent IR decisions in Clang
Clang's Intermediate Representation varies with optimization levels; disabling optimization adds debugging aids, while enabling it introduces performance enhancements like lifetime markers and TBAA metadata, impacting compiler usage and performance tuning.
Compiler Optimization in a Language You Can Understand
The article explains compiler optimizations, focusing on their types, purposes, and methods. It emphasizes the significance of understanding these optimizations for writing efficient code and contrasts optimized versus unoptimized builds.