Limitations of Frame Pointer Unwinding
Recent changes in Linux distributions have disabled frame pointer optimizations to improve profiling, but limitations remain. Alternatives like the SFrame project aim to enhance profiling accuracy without frame pointers.
Read original articleRecent changes in Linux distributions, such as Fedora and Ubuntu, have disabled frame pointer optimizations to enhance profiling tools' ability to produce stack traces. However, this article discusses the limitations of frame pointer unwinding and why simply enabling frame pointers is not a comprehensive solution for profiling. Modern compilers can generate code with or without a frame pointer, but the absence of frame pointers complicates stack trace analysis, as profiling tools must rely on call-frame information. The Linux kernel's perf_events framework can only utilize frame pointer unwinding for user-space code, limiting its effectiveness. Key issues include uneven performance impacts across user groups, gaps in profiling data around function prologues and epilogues, and inaccuracies due to hand-written assembly code in libraries. Alternatives to frame pointer unwinding are being explored, such as the SFrame project and hardware shadow stack support, which could improve profiling accuracy without relying on frame pointers. These initiatives aim to enhance the quality of profiling information available to developers.
- Frame pointer optimizations have been disabled in some Linux distributions to improve profiling.
- Enabling frame pointers does not fully resolve profiling issues due to inherent limitations.
- Profiling accuracy is affected by gaps in data and inaccuracies in assembly code.
- New projects and hardware advancements are being developed to improve profiling without frame pointers.
- The need for a balanced approach to user performance and profiling needs is emphasized.
Related
Profiling with Ctrl-C
Ctrl-C profiling is an effective method for identifying performance issues in programs, especially in challenging environments, despite its limitations in sampling frequency and multi-threaded contexts.
What is the best pointer tagging method?
Pointer tagging methods show varying performance based on architecture and use case, with nan boxing being efficient for floating-point languages. Overall performance is more influenced by memory access patterns and cache efficiency.
What is the best pointer tagging method?
The article analyzes pointer tagging methods for optimizing memory and performance, noting that practical performance varies by architecture, compiler optimizations are crucial, and untagged pointers often outperform tagged ones.
A Time Consuming Pitfall for 32-Bit Applications on AArch64
Running 32-bit applications on 64-bit AArch64 Linux requires separate GCC toolchains and proper configuration to avoid performance issues, particularly ensuring vDSO support for efficient system calls.
Linus Torvalds Lands 2.6% Performance Improvement with Minor Linux Kernel Patch
Linus Torvalds merged a patch improving Linux kernel performance by 2.6% by modifying the copy_from_user() function. It will be included in the upcoming Linux 6.12-rc6 release in November.
9X% of users do not care about a <1% drop in performance. I suspect we get the same variability just by going from one kernel version to another. The impact from all the Intel mitigations that are now enabled by default is much worse.
However I do care about nice profiles and stack traces without having to jump through hoops.
Asking people to recompile an _entire_ distribution just to get sane defaults is wrong. Those who care about the last drop should build their custom systems as they see fit, and they probably already do.
The overhead is about 10% of samples. But at least you can unwind on systems without frame-pointers. Personally I'll take the statistical anomalies of frame-pointers which still allow you to know what PID/TID are your cost center even if you don't get perfect unwinds. Everyone seems motivated towards SFrame going forward, which is good.
https://blogs.gnome.org/chergert/2024/11/03/profiling-w-o-fr...
Some of the statements in the post seem odd to me though.
- 5% of system-wide cycles spent in function prologues/epilogues? That is wild, it can't be right.
- Is using the whole 8 bytes right for the estimate? Pushing the stack pointer is the first instruction in the prologue and it's literally 1 byte. Epilogue is symmetrical.
- Even if we're in the prologue, we know that we're in a leaf call, we can still resolve the instruction pointer to the function, and we can read the return address to find the parent, so what information is lost?
When it comes to future alternatives, while frame pointers have their own problems, I think that there are still a few open questions:
- Shadow stacks are cool but aren't they limited to a fixed number of entries? What if you have a deeper stack?
- Is the memory overhead of lookup tables for very large programs acceptable?
In any event I don't understand why frame pointers need to be in by default instead of developers enabling where needed.
Having Kitten include pointers by default seems reasonable enough, since Kitten is a devel system.
FYI, if you happen to be running on an intel cpu, --call-graph lbr uses some specicalized hardware and often delivers a far superior result, with some notable failure modes. Really looking forward to when AMD implements a similar feature.
Wait, are we really that close to the maximum of what a compiler can optimize that we're getting barely 1% performance improvements per year with new versions?
Related
Profiling with Ctrl-C
Ctrl-C profiling is an effective method for identifying performance issues in programs, especially in challenging environments, despite its limitations in sampling frequency and multi-threaded contexts.
What is the best pointer tagging method?
Pointer tagging methods show varying performance based on architecture and use case, with nan boxing being efficient for floating-point languages. Overall performance is more influenced by memory access patterns and cache efficiency.
What is the best pointer tagging method?
The article analyzes pointer tagging methods for optimizing memory and performance, noting that practical performance varies by architecture, compiler optimizations are crucial, and untagged pointers often outperform tagged ones.
A Time Consuming Pitfall for 32-Bit Applications on AArch64
Running 32-bit applications on 64-bit AArch64 Linux requires separate GCC toolchains and proper configuration to avoid performance issues, particularly ensuring vDSO support for efficient system calls.
Linus Torvalds Lands 2.6% Performance Improvement with Minor Linux Kernel Patch
Linus Torvalds merged a patch improving Linux kernel performance by 2.6% by modifying the copy_from_user() function. It will be included in the upcoming Linux 6.12-rc6 release in November.