C++'s `Noexcept` Can (Sometimes) Help (Or Hurt) Performance
The article examines the mixed performance effects of the `noexcept` keyword in C++, revealing that its impact varies by compiler, OS, and code context, urging cautious application by developers.
Read original articleThe article discusses the performance implications of using the `noexcept` keyword in C++ programming, particularly in the context of the PSRayTracing (PSRT) project. The author reflects on initial beliefs that `noexcept` would universally enhance performance by eliminating exception handling overhead. However, after conducting extensive A/B testing, the results were mixed. While some configurations showed a slight performance increase (up to 1%), others experienced a decrease of around 2%. The author notes that the impact of `noexcept` is highly dependent on various factors, including the compiler, operating system, and specific code context. The article highlights that while `noexcept` can help optimize certain operations, such as enabling move semantics in the Standard Library, it can also complicate debugging and testing due to its restrictive nature. The author concludes that while `noexcept` has its benefits, its overall effect on performance is nuanced and not as significant as initially thought. The findings suggest that developers should be cautious when applying `noexcept` indiscriminately across their code.
- The `noexcept` keyword can provide performance benefits but is not universally effective.
- Performance impact varies significantly based on compiler, OS, and code context.
- Some configurations showed up to a 1% increase in performance, while others saw a decrease of around 2%.
- Using `noexcept` can complicate debugging and testing processes.
- Developers should apply `noexcept` judiciously rather than as a blanket solution.
Related
Optimizing JavaScript for Fun and for Profit
Optimizing JavaScript code for performance involves benchmarking, avoiding unnecessary work, string comparisons, and diverse object shapes. JavaScript engines optimize based on object shapes, impacting array/object methods and indirection. Creating objects with the same shape improves optimization, cautioning against slower functional programming methods. Costs of indirection like proxy objects and function calls affect performance. Code examples and benchmarks demonstrate optimization variances.
Spending too much time optimizing for loops
Researcher Octave Larose discussed optimizing Rust interpreters, focusing on improving performance for the SOM language. They highlighted enhancing loop efficiency through bytecode and primitives, addressing challenges like Rust limitations and complex designs. Despite performance gains, trade-offs between efficiency and code elegance persist.
C++ Design Patterns for Low-Latency Applications
The article delves into C++ design patterns for low-latency applications, emphasizing optimizations for high-frequency trading. Techniques include cache prewarming, constexpr usage, loop unrolling, and hotpath/coldpath separation. It also covers comparisons, datatypes, lock-free programming, and memory access optimizations. Importance of code optimization is underscored.
Beyond Clean Code
The article explores software optimization and "clean code," emphasizing readability versus performance. It critiques the belief that clean code equals bad code, highlighting the balance needed in software development.
Clang vs. Clang
The blog post critiques compiler optimizations in Clang, arguing they often introduce bugs and security vulnerabilities, diminish performance gains, and create timing channels, urging a reevaluation of current practices.
- Many commenters emphasize that `noexcept` can introduce performance costs, particularly when functions that can throw are incorrectly marked as `noexcept`.
- There is a consensus that the benefits of `noexcept` are context-dependent, with specific scenarios like move constructors showing potential performance improvements.
- Several users express skepticism about the article's conclusions, suggesting that the benchmarks may not accurately reflect the performance impact of `noexcept`.
- Some commenters advocate for better compiler diagnostics to prevent misuse of `noexcept`, particularly in cases where functions can throw.
- There is a call for deeper understanding and analysis of how `noexcept` interacts with compiler optimizations, with suggestions to use tools like Godbolt for examining generated code.
I work on MSVC backend. I argued pretty strenuously at the time that noexcept was costly and being marketed incorrectly. Perhaps the costs are worth it, but none the less there is a cost
The reason is simple: there is a guarantee here that noexcept functions don't throw. std::terminate has to be called. That has to be implemented. There is some cost to that - conceptually every noexcept function (or worse, every call to a noexcept function) is surrounded by a giant try/catch(...) block.
Yes there are optimizations here. But it's still not free
Less obvious; how does inlining work? What happens if you inline a noexcept function into a function that allows exceptions? Do we now have "regions" of noexceptness inside that function (answer: yes). How do you implement that? Again, this is implementable, but this is even harder than the whole function case, and a naive/early implementation might prohibit inlining across degrees of noexcept-ness to be correct/as-if. And guess what, this is what early versions of MSVC did, and this was our biggest problem: a problem which grew release after release as noexcept permeated the standard library.
Anyway. My point is, we need more backend compiler engineers on WG21 and not just front end, library, and language lawyer guys.
I argued then that if instead noexcept violations were undefined, we could ignore all this, and instead just treat it as the pure optimization it was being marketed as (ie, help prove a region can't throw, so we can elide entire try/catch blocks etc). The reaction to my suggestion was not positive.
Platforms with setjmp-longjmp based exceptions benefit greatly from noexcept as there’s setup code required before calling functions which may throw. Those platforms are now mostly gone, though. Modern “zero cost” exceptions don’t execute a single instruction related to exception handling if no exceptions are thrown (hence the name), so there just isn’t much room for noexcept to be useful to the optimizer.
Outside of those two scenarios there isn’t any reason to expect noexcept to improve performance.
Here's what has jumped out at me: `noexcept` qualifier is not free in some cases, particularly, when a qualified function could actually throw, but is marked `noexcept`. In that case, a compiler still must set something up to fulfil the main `noexcept` promise - call `std::terminate()` if an exception is thrown. That means, that putting `noexcept` on each and every function blindly without any regard to whether the function could really throw or not (for example, `std::vector::push_back()` could throw on reallocation failure, hence if a `noexcept` qualified function call it, a compiler must take into account) doesn't actually test/benchmark/prove anything, since as the author correctly said, - you won't ever do this in a real production project. It would be really interesting to take a look into a full code of cases that showed very bad performance, however, here we're approaching the second issue: if that's the core benchmark code: https://github.com/define-private-public/PSRayTracing/blob/a... then unfortunately it's totally invalid since it measures time with the `std::chrono::system_clock` which isn't monotonic. Given how long the code required to run, it's almost certain that the clock has been adjusted several times...
I think this is genuinely my biggest complaint about the C++ standard library. There are countless scenarios where you want deterministic random numbers (for testing if nothing else), so std's distributions are unusable. Fortunately you can just plug in Boost's implementation.
It seems the selected example function may not be exercising noexcept. I suppose the assumption is that operator[] is something that can throw, but ... perhaps the machinery lives outside the function (so should really examine function calls), or is never emitted without a try/catch, or operator[] (though not marked noexcept...) doesn't throw b/c OOB is undefined behavior, or ... ?
That said, I'm still confused by the perf results of the article, especially the perlin noise vs MSVC one. It's sufficiently weird outlier that it makes me wonder if something in the compiler has a noexcept path that adds checks that aren't usually on (i.e imagine the code has a "debug" mode that did bounds checks or something, but the function resolution you hit in the noexcept path always does the bounds check - I'm really not sure exactly how you'd get that to happen, but "non-default path was not benchmarked" is not exactly an uncommon occurrence)
The OP has this as in the fuzz, which it may be for that particular workload. But across a giant distributed system like youtube or Google search, it is a real gain.
programs can be quite sensitive to how code is laid out because of cache line alignment, cache conflicts etc.
So random changes can have a surprising impact.
There was a paper a couple of years ago explaining this and how to measure compiler optimizations more reliably. Sadly, I do not recall the title/author.
noexcept helps in some cases that author doesn't seem to be using and any performance gain or loss is basically due to some (unrelated?) optimization decisions the compiler takes differently in noexcept builds if I am understanding correctly?
and cmake being:
if (WITH_NOEXCEPT)
message(STATUS "Using `noexcept` annotations (faster?)")
target_compile_definitions(PSRayTracing_StaticLibrary PUBLIC USE_NOEXCEPT)
else()
message(STATUS "Turned off use of `noexcept` (slower?)")
endif()
, the cmake could just be: if (WITH_NOEXCEPT)
message(STATUS "Using `noexcept` annotations (faster?)")
target_compile_definitions(PSRayTracing_StaticLibrary PUBLIC USE_NOEXCEPT=noexcept)
else()
message(STATUS "Turned off use of `noexcept` (slower?)") target_compile_definitions(PSRayTracing_StaticLibrary PUBLIC USE_NOEXCEPT=)
endif()
No need for these shared "common" config headers.Back on topic, this doesn't surprise me. There's this idea that C++ is fast, and that people who work with C++ are focused on optimisation and in my experience there's as many of these theoretical ideas about performance which aren't backed up by numbers, but are now ingrained in people. See https://news.ycombinator.com/item?id=41095814 from last week for another example of dogmatic guidelines having the wrong impact.
Related
Optimizing JavaScript for Fun and for Profit
Optimizing JavaScript code for performance involves benchmarking, avoiding unnecessary work, string comparisons, and diverse object shapes. JavaScript engines optimize based on object shapes, impacting array/object methods and indirection. Creating objects with the same shape improves optimization, cautioning against slower functional programming methods. Costs of indirection like proxy objects and function calls affect performance. Code examples and benchmarks demonstrate optimization variances.
Spending too much time optimizing for loops
Researcher Octave Larose discussed optimizing Rust interpreters, focusing on improving performance for the SOM language. They highlighted enhancing loop efficiency through bytecode and primitives, addressing challenges like Rust limitations and complex designs. Despite performance gains, trade-offs between efficiency and code elegance persist.
C++ Design Patterns for Low-Latency Applications
The article delves into C++ design patterns for low-latency applications, emphasizing optimizations for high-frequency trading. Techniques include cache prewarming, constexpr usage, loop unrolling, and hotpath/coldpath separation. It also covers comparisons, datatypes, lock-free programming, and memory access optimizations. Importance of code optimization is underscored.
Beyond Clean Code
The article explores software optimization and "clean code," emphasizing readability versus performance. It critiques the belief that clean code equals bad code, highlighting the balance needed in software development.
Clang vs. Clang
The blog post critiques compiler optimizations in Clang, arguing they often introduce bugs and security vulnerabilities, diminish performance gains, and create timing channels, urging a reevaluation of current practices.