October 23rd, 2024

Optimizers need a rethink

The article emphasizes the need for improved optimizers in compiler and database development, advocating for better documentation, debugging tools, and regression testing to enhance performance and maintainability.

Read original articleLink Icon
Optimizers need a rethink

The article discusses the need for a reevaluation of optimizers in compiler and database development, highlighting common issues such as contingent criticality, limited control, limited testability, and limited visibility. It emphasizes that optimizers often operate with little margin for error, which can lead to significant performance issues if optimizations are missed. The author argues that developers require better tools and documentation to understand and control optimizers effectively. This includes the need for clear documentation on optimizations, better debugging tools, and mechanisms for providing guidance to optimizers that cannot be ignored. The article also suggests that missed optimization bugs should be treated with the same severity as other bugs, and that regression testing infrastructure should be established to prevent future issues. The author advocates for a more transparent and predictable approach to optimizers, which would enhance maintainability and performance in software development.

- Optimizers in compilers and databases often lack sufficient control and visibility for developers.

- There is a need for better documentation and debugging tools to help users understand optimizer behavior.

- Missed optimization bugs should be treated seriously and reported similarly to other software bugs.

- Developers should be able to provide guidance to optimizers that cannot be ignored, improving performance predictability.

- Establishing regression testing for optimizations can help prevent future performance issues.

Link Icon 27 comments
By @pizlonator - 3 months
Compiler optimizers are designed from a “win in the average” mindset - so whether a particular optimization succeeds or not is really not something the user ought to rely on. If you’re relying on it then you’re playing with fire. (I say that as someone who has played with this particular kind of fire. Sometimes I’m ok, other times I get burned.)

A great example of when winning in the average works is register allocation. It’s fine there because the cost of any particular variable getting spilled is so low. So, all that matters is that most variables are in registers most of the time. If spill heuristics change for the better, it usually means some of your variables that previously got spilled now are in registers while others that were in registers are now spilled - and the compiler writer declares victory if this is a speedup in some overall average of large benchmarks. Similar thinking plays out in stuff like common subexpression elimination or basically any strength reduction. (In fact, most of those optimizations have the peculiar property that you’ll always be able to craft a program that shows the optimization to be a bad idea; we do them anyway because on average they are a speedup.)

In my view, if a compiler optimization is so critical that users rely on it reliably “hitting” then what you really want is for that optimization to be something guaranteed by the language using syntax or types. The way tail calls work in functional languages comes to mind. Also, the way value types work in C#, Rust, C++, etc - you’re guaranteed that passing them around won’t call into the allocator. Basically, relying on the compiler to deliver an optimization whose speedup from hitting is enormous (like order of magnitude, as in the escape analysis to remove GC allocations case) and whose probability of hitting is not 100% is sort of a language design bug.

This is sort of what the article is saying, I guess. But for example on the issue of the optimizer definitely removing a GC allocation: the best design there is for the GC’d language to have a notion of value types that don’t involve allocation at all. C# has that, Java doesn’t.

By @JonChesterfield - 3 months
The thing where people quote optimisers as only eeking out tiny percentages of performance over decades really annoys me. It's true for the numerical kernels written in C and Fortran where the code is essentially optimal already, and all the compiler has to do is not screw it up.

It's grossly false for the vast majority of code, where the sludge written by thousands of engineers gets completely rewritten by the compiler before it executes. Compilers are the reason unskilled devs manage to get tolerable performance out of frameworks.

The cpython is slow complaints? That's what life without a good compiler looks like.

By @keybored - 3 months
The standard optimization user case story is absurd.

1. You’re not gonna get any guarantees that the optimization will happen. That makes it High Level. Just write code. We won’t force you to pollute your code with ugly annotations or pragmas.

2. In turn: check the assembly or whatever the concrete thing that reveals that the optimization you wished for in your head actually went through

There’s some kind of abstraction violation in the above somewhere.

By @h1fra - 3 months
I don't hate query planning, some people are better than me at writing algo and the database knows my data better than me.

However, what I hate is the lack of transparency (and I feel like this article tries to pin-point just this). When I execute a query locally I get a different plan vs staging vs prod. A plan than can also change depending on some parameters or load or size.

I don't care about understanding all the underlying optimizations, I just care that the query plan I saw is the same and is still the same in prod, and that I can be warned when it changes. PG does not return the hash of the query plan or metrics along the data, which is imo a mistake. With this you could track it in your favorite metrics store and be able to point when and why stuff are executing differently.

By @WalterBright - 3 months
> D has first-class support for marking functions as “no GC”.

It never occurred to me that this would be considered a hint to the optimizer. It doesn't affect code generation. What it does do is flag any use of the gc in the function and any functions it transitively may call.

Optimizers have been likened to turning a cow into a hamburger. If you're symbolically debugging optimized code, you're looking at the hamburger. Nobody has been able to solve that problem.

It's true that optimizers themselves are hard to show being correct. The one in the D compiler is a conventional DFA optimizer that uses data flow equations I learned from Hennessy and Ullman in a 1982 seminar they taught. So it has been battle tested for 42 years now(!) and it's pretty rare to find a problem with it, unless it's a new pass I added like SROA. The idea is anytime a problem is identified and corrected, it goes into the test suite. This has the effect of always ratcheting it forward and not regress.

The GC dates from around 2000, when I wrote it for a Javascript engine. It was brutally tested for that, and has been pretty solid ever since. People complain about the GC, but not about it being buggy. A buggy GC is a real horror show as it is painfully difficult to debug.

By @leni536 - 3 months
I think this is mostly a philosphical question, rather than optimizer quality.

Once an optimization becomes part of the interface and it is guaranteed, is it really an optimization? Or did it just became part of the language/library/database/whatever?

One example is return value optimization in C++. In C++17 the "optimization" became mandatory in some contexts. What really happened though is that the rules of temporary materialization changed, and in those contexts it just never happens prematurely by the language rules. This ceased to be an optimization and became a mechanism in the language.

What I'm getting at is that unreliability is a defining quality of optimizations.

Sure, there are certain optimizations that become load-bearing, in which case it would be better if they became part of the language's semantics and guarantees, therefore they ceased to be optimizations.

By @nanolith - 3 months
One area that I have been exploring is building equivalence proofs between high-level specifications, an implementation in C, and the machine code output from the compiler. I'm still very early in that work, but one of my hopes is to at least demonstrate that the output still meets the specifications, and that we can control things like timing (e.g. no branching on secret data) and cache in this output.

I think that the compilation and optimization step, as a black box, is a disservice for highly reliable software development. Compiler and optimizer bugs are definitely a thing. I was bitten by one that injected timing attacks into certain integer operations by branching on the integer data in order to optimize 32-bit multiplications on 8-bit microcontrollers. Yeah, this makes perfect sense when trying to optimize fixed point multiplication, but it completely destroys the security of DLP or ecDLP based cryptography by introducing timing attacks that can recover the private key. Thankfully, I was fastidious about examining the optimized machine code output of this compiler, and was able to substitute hand coded assembler in its place.

By @CalChris - 3 months
We are nearing the death of Proebsting's Law. AMD CEO Lisa Su's HotChips’19 keynote said that compilers had accounted for 8% performance increase over the decade. That means compilers are now only doubling performance every 90 years.

https://www.youtube.com/watch?v=nuVBid9e3RA

By @f33d5173 - 3 months
the optimizer should be a magic black box. As soon as you start demanding a particular optimization, it shouldn't be an optimization anymore. In C you have inline assembly and intrinsics and pragmas and macros and so on. If you want the compiler to compile your code a particular way you should be using these, not trying to wrangle the optimizer to invoke a particular optimization.
By @looneysquash - 3 months
The postgresql query optimization one is an interesting one.

I've been very frustrated at times that you can't just tell it to use a certain index or a certain query plan.

But at the same time, the data in the table can change over time.

So for that particular problem, postgresql's superpower is it can optimize and reoptimize your query at runtime as the data changes!

Not an easy thing to do in most programming languages.

But I do agree with the rest of the article to a large extent.

For some cases, you need a way to say "do this optimization, fail to compile if you cannot".

Most of the time though, I just want the optimizer to do it's best, and don't have something in particular in mind.

What if the compiler could output the optimization or transformations it does to a file? A file that was checked into source code.

Think of lockfiles for dependencies. Or snapshot testing.

I don't mean output the entire IR. I mean more a list of applied transformations.

When those change, you could then be aware of those changes.

Maybe it could act as an actual lockfile and constrain the compiler? Could it also double as a cache and save time?

By @andyferris - 3 months
If you have any language where it is "semantially correct" to execute it with a simple interpetter, than all optimizations in that language are not semantically important by definition, right? I've even seen interpretters for C... so while I 100% have felt the pain in this article, I don't know where that leaves us? These are more like compilation assertions/strategies than language features. (Not that these shouldn't be written inline with the code, e.g. it can be nice to write unit tests in the same files as the code in some languages).

In the case of SQL, I'd love access to a flavor where I do the joins on indices explicitly, the query is executed as written, and each join (or filter) can be annoted with a strategy (btree lookup, etc). (The most difficult part about using indices by hand is correctly writing all the synchronous triggers on updates, not the queries, IMO).

By @pornel - 3 months
Optimizations in compilers like LLVM are done by many individual code transformation passes, one applied to the result of the previous.

This layering makes the order of the passes important and very sensitive. The passes usually don't have a grand plan, they just keep shuffling code around in different ways. A pass may only be applicable to code in a specific form created by a previous simplification pass. One pass may undo optimizations of a previous pass, or optimize-out a detail required by a later pass.

Separation into passes makes it easier to reason about correctness of each transformation in isolation, but the combined result is kinda slow and complicated.

By @jonstewart - 3 months
At least databases have Explain. I'd love to get feedback from clang or gcc about why particular optimizations were not applied.
By @kragen - 3 months
Daniel Bernstein has given an interesting talk about "the death of optimizing compilers" (PDF slides at https://cr.yp.to/talks/2015.04.16/slides-djb-20150416-a4.pdf) claiming that less and less of our code is in that middle ground where it's performance-critical enough to risk running it through an optimizer but not so performance-critical that it needs to be rewritten in assembly.

If we think of the high-level source code (whether in SQL, JS, or Python) as being a functional spec that the low-level implementation should provably follow, maybe we should check them both in, and at build time, run a proof assistant to verify that the implementation still complies with the spec? Rather than running an optimizing compiler every time. I think this is what nanolith is suggesting in https://news.ycombinator.com/item?id=41965701.

Maybe you run the optimizing compiler once when you first write the code, and again when the proof assistant tells you an optimization is no longer valid, but you check the output before you check it in. Or maybe you have the compiler running all the time on a GPU cluster looking for new optimizations.

By @skybrian - 3 months
> The optimizer’s behavior is in some contexts load-bearing and has little margin for error (e.g. missing an optimization).

Well, sure, sometimes, to an extent. But if it's load-bearing, maybe that's a bug? You might have written non-portable code that won't last, because it depends on an implementation detail that isn't standardized.

There are widespread applications where performance isn't critical. For example, any time you do a network request. Web pages shouldn't break because the network is slow today, if you can possibly avoid it.

The web provides no performance guarantees, but it tries pretty hard to provide compatibility guarantees. Your code should, usually, work on new browser versions and on devices that haven't been released yet. New browsers will have different JIT compilers with different performance cliffs. And yet, there are websites written many years ago that still work.

When standardizing things, we need to be precise about what's standardized and what isn't, or protocols "rust shut" and can't be changed without breaking lots of stuff. Not standardizing performance is often a win. (With some exceptions like video game consoles where the hardware is standardized.)

Hyrum's law suggests that all compiler optimizations will eventually be load-bearing for someone, but we should usually try to avoid it. To make sure that the code is robust, perhaps performance should vary. Maybe it would be useful to have something like a chaos monkey for programs, where optimizations vary based on a random seed?

By @froh - 3 months
in the "closing thoughts" [1] of the OP article there's a pointer to "Proebstrings law" [2] which got me to a benchmark [3] how LLVM has improved over 10 years from 2.7 to 11

which leaves me with the strong intuition that indeed what mit benefit more from a rethink is not optimizers but focus on programmer productivity

> Perhaps this means Programming Language Research should be concentrating on something other than optimizations. Perhaps programmer productivity is a more fruitful arena.

[1] https://typesanitizer.com/blog/rethink-optimizers.html#closi...

[2] https://proebsting.cs.arizona.edu/law.html

[3] https://zeux.io/2022/01/08/on-proebstings-law/

By @russellsprouts - 3 months
This reminds me of Halide (https://halide-lang.org/), which is a really interesting design in this direction.

In Halide, you specify the algorithm at a high level, then specify the "schedule" (execution order, vectorization, storage) separately. Then, it verifies that the two match. For tight loops like image kernels, it's often counter-intuitive which execution plan will give the best performance, so being able to experiment with the guarantee that you are still implementing the algorithm correctly is valuable.

For a database, it would be as if you could send a high level SQL query and an execution plan together.

Halide is very specific, but I could imagine a general-purpose language with a Python-like high level representation for code behavior, with the option to specify a lower-level implementation with a lot of performance annotations.

By @mcfig - 3 months
Great article. One real case I encountered that I find thought provoking, is where a bunch of test failures were bucketed into the same bucket because link-time code-generation had noticed that a bunch of C++ getter functions had the same output code and combined them all. So stack traces became confusing because the address-to-symbol mapping was more complicated than the logic we had in place was prepared for.

i.e. optimization had violated a rule we were implicitly relying on (that each non-inlined function should start at a distinct address, so that address-to-symbol mapping could be done easily). But that’s not an explicit guarantee and optimizers don’t seem to think about it much. (Well for inlining it seems to have had some thought, still sucks, but anyway this case doesn’t fit the pattern of inlining).

I find it hard to say anyone is dead wrong in this case… but I would turn off that LTCG optimization any time I could, except where proven necessary.

By @taeric - 3 months
I think this is misplacing the need for a rethink. Compilers are amazing at optimizing small programs. What they are less amazing at is optimizing large programs.

The query planner for SQL databases is a good example of this. For small queries, the odds of the planner making a bad choice is rather low. Large queries, on the other hand, are comparatively easy to lead down a bad path. This is why the "no sql" family of databases see more predictable results for a lot of uses. If you can't do complicated queries, then you don't have that peril to worry about.

Worth trying if you have the programs to play with. Run it with and without -O3 and see how the compiler does for you.

By @account42 - 3 months
I'm not sure if this is the right way to look at it? Do you really care if some optimization was performed or not or do you only care about the performance characteristics of the resulting program? Maybe a more sensible solution would be to have better tests for performance regressions.

Still not an easy task (performance is often highly dependent on the input and regressions can hide in very specific cases) but makes more sense to me than writing high level code and then worrying about the details of how the compier translated that high level code.

By @Mikhail_Edoshin - 3 months
One of reasons SQL is declarative is that queries run concurrently yet give the illusion of working in isolation. And unlike the concurrency of independent processes, which is mostly mechanical, this one involves detailed coordination because database queries work on the same data. With these constraints it may be not that easy to come up with an imperative model that is as simple as the current declarative one.
By @yxhuvud - 3 months
No. Language designers need to think about what their language promises and when some users have differing needs, they may need to fulfill those needs.
By @einpoklum - 3 months
From the article:

> Have a good mental model of what the optimizer can and cannot do.

Most DB query planner designers and implementers have little imagination, and their mental model of what optimizers can and cannot do is, well, extremely narrow-minded. There is huge unexplored space of what query planning can be (at least for analytic queries, and we think in columnar terms) - if we just stop insisting on thinking of DBMS operations as black boxes.

By @gpm - 3 months
Arguing against query planning by pointing at a quote about databases is wild. Automatic query planning is ubiquitous and hugely succesfull in databases.
By @sebmellen - 3 months
I like that the author included their intended audience up front. Definitely not me, but it helped me read the article with a different perspective.
By @QuadmasterXLII - 3 months
Frankly, the problem is that (generally, across languages various compiler hints) @inline sometimes fails to inline. At this point I’ve given up on ever having an @inline that reliably inlines, and I would very happily settle for an @assert_inline that doesn’t change the generated assembly at all but reliably crashes out if the function isn’t inline.

Julia is by far the worst language about this. It would be vastly more usable with the addition of @assert_type_stable, @assert_doesn’t_allocate, and @assert_doesn’t_error macros.