Optimizers need a rethink
The article emphasizes the need for improved optimizers in compiler and database development, advocating for better documentation, debugging tools, and regression testing to enhance performance and maintainability.
Read original articleThe article discusses the need for a reevaluation of optimizers in compiler and database development, highlighting common issues such as contingent criticality, limited control, limited testability, and limited visibility. It emphasizes that optimizers often operate with little margin for error, which can lead to significant performance issues if optimizations are missed. The author argues that developers require better tools and documentation to understand and control optimizers effectively. This includes the need for clear documentation on optimizations, better debugging tools, and mechanisms for providing guidance to optimizers that cannot be ignored. The article also suggests that missed optimization bugs should be treated with the same severity as other bugs, and that regression testing infrastructure should be established to prevent future issues. The author advocates for a more transparent and predictable approach to optimizers, which would enhance maintainability and performance in software development.
- Optimizers in compilers and databases often lack sufficient control and visibility for developers.
- There is a need for better documentation and debugging tools to help users understand optimizer behavior.
- Missed optimization bugs should be treated seriously and reported similarly to other software bugs.
- Developers should be able to provide guidance to optimizers that cannot be ignored, improving performance predictability.
- Establishing regression testing for optimizations can help prevent future performance issues.
Related
Refined Input, Degraded Output: The Counterintuitive World of Compiler Behavior
The study delves into compiler behavior when given extra information for program optimization. Surprisingly, more data can sometimes lead to poorer optimization due to intricate compiler interactions. Testing identified 59 cases in popular compilers, emphasizing the need for better understanding.
Beyond Clean Code
The article explores software optimization and "clean code," emphasizing readability versus performance. It critiques the belief that clean code equals bad code, highlighting the balance needed in software development.
The human typewriter, or why optimizing for typing is short-sighted
The article highlights the drawbacks of optimizing code for typing speed, advocating for readability and clarity over efficiency, as obfuscated code increases mental strain and confusion among developers.
Clang vs. Clang
The blog post critiques compiler optimizations in Clang, arguing they often introduce bugs and security vulnerabilities, diminish performance gains, and create timing channels, urging a reevaluation of current practices.
An approach to optimizing TypeScript type checking performance
The article outlines strategies to optimize TypeScript's type checking performance, addressing issues like sluggish IDE responsiveness and compile times, particularly due to performance regressions in TypeScript 5.3.
A great example of when winning in the average works is register allocation. It’s fine there because the cost of any particular variable getting spilled is so low. So, all that matters is that most variables are in registers most of the time. If spill heuristics change for the better, it usually means some of your variables that previously got spilled now are in registers while others that were in registers are now spilled - and the compiler writer declares victory if this is a speedup in some overall average of large benchmarks. Similar thinking plays out in stuff like common subexpression elimination or basically any strength reduction. (In fact, most of those optimizations have the peculiar property that you’ll always be able to craft a program that shows the optimization to be a bad idea; we do them anyway because on average they are a speedup.)
In my view, if a compiler optimization is so critical that users rely on it reliably “hitting” then what you really want is for that optimization to be something guaranteed by the language using syntax or types. The way tail calls work in functional languages comes to mind. Also, the way value types work in C#, Rust, C++, etc - you’re guaranteed that passing them around won’t call into the allocator. Basically, relying on the compiler to deliver an optimization whose speedup from hitting is enormous (like order of magnitude, as in the escape analysis to remove GC allocations case) and whose probability of hitting is not 100% is sort of a language design bug.
This is sort of what the article is saying, I guess. But for example on the issue of the optimizer definitely removing a GC allocation: the best design there is for the GC’d language to have a notion of value types that don’t involve allocation at all. C# has that, Java doesn’t.
It's grossly false for the vast majority of code, where the sludge written by thousands of engineers gets completely rewritten by the compiler before it executes. Compilers are the reason unskilled devs manage to get tolerable performance out of frameworks.
The cpython is slow complaints? That's what life without a good compiler looks like.
1. You’re not gonna get any guarantees that the optimization will happen. That makes it High Level. Just write code. We won’t force you to pollute your code with ugly annotations or pragmas.
2. In turn: check the assembly or whatever the concrete thing that reveals that the optimization you wished for in your head actually went through
There’s some kind of abstraction violation in the above somewhere.
However, what I hate is the lack of transparency (and I feel like this article tries to pin-point just this). When I execute a query locally I get a different plan vs staging vs prod. A plan than can also change depending on some parameters or load or size.
I don't care about understanding all the underlying optimizations, I just care that the query plan I saw is the same and is still the same in prod, and that I can be warned when it changes. PG does not return the hash of the query plan or metrics along the data, which is imo a mistake. With this you could track it in your favorite metrics store and be able to point when and why stuff are executing differently.
It never occurred to me that this would be considered a hint to the optimizer. It doesn't affect code generation. What it does do is flag any use of the gc in the function and any functions it transitively may call.
Optimizers have been likened to turning a cow into a hamburger. If you're symbolically debugging optimized code, you're looking at the hamburger. Nobody has been able to solve that problem.
It's true that optimizers themselves are hard to show being correct. The one in the D compiler is a conventional DFA optimizer that uses data flow equations I learned from Hennessy and Ullman in a 1982 seminar they taught. So it has been battle tested for 42 years now(!) and it's pretty rare to find a problem with it, unless it's a new pass I added like SROA. The idea is anytime a problem is identified and corrected, it goes into the test suite. This has the effect of always ratcheting it forward and not regress.
The GC dates from around 2000, when I wrote it for a Javascript engine. It was brutally tested for that, and has been pretty solid ever since. People complain about the GC, but not about it being buggy. A buggy GC is a real horror show as it is painfully difficult to debug.
Once an optimization becomes part of the interface and it is guaranteed, is it really an optimization? Or did it just became part of the language/library/database/whatever?
One example is return value optimization in C++. In C++17 the "optimization" became mandatory in some contexts. What really happened though is that the rules of temporary materialization changed, and in those contexts it just never happens prematurely by the language rules. This ceased to be an optimization and became a mechanism in the language.
What I'm getting at is that unreliability is a defining quality of optimizations.
Sure, there are certain optimizations that become load-bearing, in which case it would be better if they became part of the language's semantics and guarantees, therefore they ceased to be optimizations.
I think that the compilation and optimization step, as a black box, is a disservice for highly reliable software development. Compiler and optimizer bugs are definitely a thing. I was bitten by one that injected timing attacks into certain integer operations by branching on the integer data in order to optimize 32-bit multiplications on 8-bit microcontrollers. Yeah, this makes perfect sense when trying to optimize fixed point multiplication, but it completely destroys the security of DLP or ecDLP based cryptography by introducing timing attacks that can recover the private key. Thankfully, I was fastidious about examining the optimized machine code output of this compiler, and was able to substitute hand coded assembler in its place.
I've been very frustrated at times that you can't just tell it to use a certain index or a certain query plan.
But at the same time, the data in the table can change over time.
So for that particular problem, postgresql's superpower is it can optimize and reoptimize your query at runtime as the data changes!
Not an easy thing to do in most programming languages.
But I do agree with the rest of the article to a large extent.
For some cases, you need a way to say "do this optimization, fail to compile if you cannot".
Most of the time though, I just want the optimizer to do it's best, and don't have something in particular in mind.
What if the compiler could output the optimization or transformations it does to a file? A file that was checked into source code.
Think of lockfiles for dependencies. Or snapshot testing.
I don't mean output the entire IR. I mean more a list of applied transformations.
When those change, you could then be aware of those changes.
Maybe it could act as an actual lockfile and constrain the compiler? Could it also double as a cache and save time?
In the case of SQL, I'd love access to a flavor where I do the joins on indices explicitly, the query is executed as written, and each join (or filter) can be annoted with a strategy (btree lookup, etc). (The most difficult part about using indices by hand is correctly writing all the synchronous triggers on updates, not the queries, IMO).
This layering makes the order of the passes important and very sensitive. The passes usually don't have a grand plan, they just keep shuffling code around in different ways. A pass may only be applicable to code in a specific form created by a previous simplification pass. One pass may undo optimizations of a previous pass, or optimize-out a detail required by a later pass.
Separation into passes makes it easier to reason about correctness of each transformation in isolation, but the combined result is kinda slow and complicated.
If we think of the high-level source code (whether in SQL, JS, or Python) as being a functional spec that the low-level implementation should provably follow, maybe we should check them both in, and at build time, run a proof assistant to verify that the implementation still complies with the spec? Rather than running an optimizing compiler every time. I think this is what nanolith is suggesting in https://news.ycombinator.com/item?id=41965701.
Maybe you run the optimizing compiler once when you first write the code, and again when the proof assistant tells you an optimization is no longer valid, but you check the output before you check it in. Or maybe you have the compiler running all the time on a GPU cluster looking for new optimizations.
Well, sure, sometimes, to an extent. But if it's load-bearing, maybe that's a bug? You might have written non-portable code that won't last, because it depends on an implementation detail that isn't standardized.
There are widespread applications where performance isn't critical. For example, any time you do a network request. Web pages shouldn't break because the network is slow today, if you can possibly avoid it.
The web provides no performance guarantees, but it tries pretty hard to provide compatibility guarantees. Your code should, usually, work on new browser versions and on devices that haven't been released yet. New browsers will have different JIT compilers with different performance cliffs. And yet, there are websites written many years ago that still work.
When standardizing things, we need to be precise about what's standardized and what isn't, or protocols "rust shut" and can't be changed without breaking lots of stuff. Not standardizing performance is often a win. (With some exceptions like video game consoles where the hardware is standardized.)
Hyrum's law suggests that all compiler optimizations will eventually be load-bearing for someone, but we should usually try to avoid it. To make sure that the code is robust, perhaps performance should vary. Maybe it would be useful to have something like a chaos monkey for programs, where optimizations vary based on a random seed?
which leaves me with the strong intuition that indeed what mit benefit more from a rethink is not optimizers but focus on programmer productivity
> Perhaps this means Programming Language Research should be concentrating on something other than optimizations. Perhaps programmer productivity is a more fruitful arena.
[1] https://typesanitizer.com/blog/rethink-optimizers.html#closi...
In Halide, you specify the algorithm at a high level, then specify the "schedule" (execution order, vectorization, storage) separately. Then, it verifies that the two match. For tight loops like image kernels, it's often counter-intuitive which execution plan will give the best performance, so being able to experiment with the guarantee that you are still implementing the algorithm correctly is valuable.
For a database, it would be as if you could send a high level SQL query and an execution plan together.
Halide is very specific, but I could imagine a general-purpose language with a Python-like high level representation for code behavior, with the option to specify a lower-level implementation with a lot of performance annotations.
i.e. optimization had violated a rule we were implicitly relying on (that each non-inlined function should start at a distinct address, so that address-to-symbol mapping could be done easily). But that’s not an explicit guarantee and optimizers don’t seem to think about it much. (Well for inlining it seems to have had some thought, still sucks, but anyway this case doesn’t fit the pattern of inlining).
I find it hard to say anyone is dead wrong in this case… but I would turn off that LTCG optimization any time I could, except where proven necessary.
The query planner for SQL databases is a good example of this. For small queries, the odds of the planner making a bad choice is rather low. Large queries, on the other hand, are comparatively easy to lead down a bad path. This is why the "no sql" family of databases see more predictable results for a lot of uses. If you can't do complicated queries, then you don't have that peril to worry about.
Worth trying if you have the programs to play with. Run it with and without -O3 and see how the compiler does for you.
Still not an easy task (performance is often highly dependent on the input and regressions can hide in very specific cases) but makes more sense to me than writing high level code and then worrying about the details of how the compier translated that high level code.
> Have a good mental model of what the optimizer can and cannot do.
Most DB query planner designers and implementers have little imagination, and their mental model of what optimizers can and cannot do is, well, extremely narrow-minded. There is huge unexplored space of what query planning can be (at least for analytic queries, and we think in columnar terms) - if we just stop insisting on thinking of DBMS operations as black boxes.
Julia is by far the worst language about this. It would be vastly more usable with the addition of @assert_type_stable, @assert_doesn’t_allocate, and @assert_doesn’t_error macros.
Related
Refined Input, Degraded Output: The Counterintuitive World of Compiler Behavior
The study delves into compiler behavior when given extra information for program optimization. Surprisingly, more data can sometimes lead to poorer optimization due to intricate compiler interactions. Testing identified 59 cases in popular compilers, emphasizing the need for better understanding.
Beyond Clean Code
The article explores software optimization and "clean code," emphasizing readability versus performance. It critiques the belief that clean code equals bad code, highlighting the balance needed in software development.
The human typewriter, or why optimizing for typing is short-sighted
The article highlights the drawbacks of optimizing code for typing speed, advocating for readability and clarity over efficiency, as obfuscated code increases mental strain and confusion among developers.
Clang vs. Clang
The blog post critiques compiler optimizations in Clang, arguing they often introduce bugs and security vulnerabilities, diminish performance gains, and create timing channels, urging a reevaluation of current practices.
An approach to optimizing TypeScript type checking performance
The article outlines strategies to optimize TypeScript's type checking performance, addressing issues like sluggish IDE responsiveness and compile times, particularly due to performance regressions in TypeScript 5.3.