November 4th, 2024

Limitations of Frame Pointer Unwinding

Recent changes in Linux distributions have disabled frame pointer optimizations to improve profiling, but limitations remain. Alternatives like the SFrame project aim to enhance profiling accuracy without frame pointers.

Read original article

Recent changes in Linux distributions, such as Fedora and Ubuntu, have disabled frame pointer optimizations to enhance profiling tools' ability to produce stack traces. However, this article discusses the limitations of frame pointer unwinding and why simply enabling frame pointers is not a comprehensive solution for profiling. Modern compilers can generate code with or without a frame pointer, but the absence of frame pointers complicates stack trace analysis, as profiling tools must rely on call-frame information. The Linux kernel's perf_events framework can only utilize frame pointer unwinding for user-space code, limiting its effectiveness. Key issues include uneven performance impacts across user groups, gaps in profiling data around function prologues and epilogues, and inaccuracies due to hand-written assembly code in libraries. Alternatives to frame pointer unwinding are being explored, such as the SFrame project and hardware shadow stack support, which could improve profiling accuracy without relying on frame pointers. These initiatives aim to enhance the quality of profiling information available to developers.

- Frame pointer optimizations have been disabled in some Linux distributions to improve profiling.

- Enabling frame pointers does not fully resolve profiling issues due to inherent limitations.

- Profiling accuracy is affected by gaps in data and inaccuracies in assembly code.

- New projects and hardware advancements are being developed to improve profiling without frame pointers.

- The need for a balanced approach to user performance and profiling needs is emphasized.

Profiling with Ctrl-C

Ctrl-C profiling is an effective method for identifying performance issues in programs, especially in challenging environments, despite its limitations in sampling frequency and multi-threaded contexts.

What is the best pointer tagging method?

Pointer tagging methods show varying performance based on architecture and use case, with nan boxing being efficient for floating-point languages. Overall performance is more influenced by memory access patterns and cache efficiency.

What is the best pointer tagging method?

The article analyzes pointer tagging methods for optimizing memory and performance, noting that practical performance varies by architecture, compiler optimizations are crucial, and untagged pointers often outperform tagged ones.

A Time Consuming Pitfall for 32-Bit Applications on AArch64

Running 32-bit applications on 64-bit AArch64 Linux requires separate GCC toolchains and proper configuration to avoid performance issues, particularly ensuring vDSO support for efficient system calls.

Linus Torvalds Lands 2.6% Performance Improvement with Minor Linux Kernel Patch

Linus Torvalds merged a patch improving Linux kernel performance by 2.6% by modifying the copy_from_user() function. It will be included in the upcoming Linux 6.12-rc6 release in November.

16 comments

By @elteto - 6 months

Didn’t really get the point of the post as it just presents something without a conclusion.

9X% of users do not care about a <1% drop in performance. I suspect we get the same variability just by going from one kernel version to another. The impact from all the Intel mitigations that are now enabled by default is much worse.

However I do care about nice profiles and stack traces without having to jump through hoops.

Asking people to recompile an _entire_ distribution just to get sane defaults is wrong. Those who care about the last drop should build their custom systems as they see fit, and they probably already do.

By @audidude - 6 months

I added support to Sysprof this weekend for unwinding using libdwfl and DWARF/CFI/eh_frame/etc techniques that Serhei did in eu-stacktrace.

The overhead is about 10% of samples. But at least you can unwind on systems without frame-pointers. Personally I'll take the statistical anomalies of frame-pointers which still allow you to know what PID/TID are your cost center even if you don't get perfect unwinds. Everyone seems motivated towards SFrame going forward, which is good.

https://blogs.gnome.org/chergert/2024/11/03/profiling-w-o-fr...

By @ot - 6 months

I broadly agree with the thesis of the post, which if I understand correctly is that frame pointers are a temporary compromise until the whole ecosystem gets its act together and manages to agree on some form of out-of-band tracking of frame pointers, and it seems that we'll eventually get there.

Some of the statements in the post seem odd to me though.

- 5% of system-wide cycles spent in function prologues/epilogues? That is wild, it can't be right.

- Is using the whole 8 bytes right for the estimate? Pushing the stack pointer is the first instruction in the prologue and it's literally 1 byte. Epilogue is symmetrical.

- Even if we're in the prologue, we know that we're in a leaf call, we can still resolve the instruction pointer to the function, and we can read the return address to find the parent, so what information is lost?

When it comes to future alternatives, while frame pointers have their own problems, I think that there are still a few open questions:

- Shadow stacks are cool but aren't they limited to a fixed number of entries? What if you have a deeper stack?

- Is the memory overhead of lookup tables for very large programs acceptable?

By @Brian_K_White - 6 months

Is this a response to Alma Kitten?

In any event I don't understand why frame pointers need to be in by default instead of developers enabling where needed.

Having Kitten include pointers by default seems reasonable enough, since Kitten is a devel system.

By @jeffbee - 6 months

Complaining about frame pointers is like complaining about the budget of the Bureau of Labor Statistics. Yes, it's pure overhead, but also yes, it's good to know what is going on.

By @fooblaster - 6 months

I have always had issues with the perf call trace sampling with frame pointers, even when virtually everything in userspace compiled with fno-omit-frame-pointer. It doesn't look like any of the failure modes listed in the article to me though. Shrug.

FYI, if you happen to be running on an intel cpu, --call-graph lbr uses some specicalized hardware and often delivers a far superior result, with some notable failure modes. Really looking forward to when AMD implements a similar feature.

By @tempfile - 6 months

The JS constantly grabbing the anchor and updating it is absolutely appalling UX. It took me something like 11 back button presses to get back to where I was. Borderline malware.

By @clausecker - 6 months

The “function prologue is at least 8 bytes long” bit only applies if CET is used. If it is not used, the endbr64 instruction is not emitted and the prologue is only 4 bytes long.

By @laserbeam - 6 months

"enabling frame pointers is a 1-2% performance loss, which translates to the loss of about 1 or 2 years of compiler improvements"

Wait, are we really that close to the maximum of what a compiler can optimize that we're getting barely 1% performance improvements per year with new versions?

By @jenda23 - 6 months

It was like a dream to the point of selling off my properties in order to invest into binary option. I met these investors online, at first we shared business or investment ideas and it got to a point we started discussing business proposal until i got confused and convinced into investing with them in binary option. I didn't want to believe i was being scammed at first until i wanted to withdraw my profits into my account. The problem got to a point that i was blocked from logging in to communicate with the site. My $120,000 was lost to online scammers with the believe i was investing into binary options in order to make grand profits. I searched for help but none came until i discussed it with my Mom whom got me to know about Rewallet Coin Recovery which had helped her friend with the recovery of her lost funds. I contacted them and to my utmost surprise all my lost Bitcoin was recovered and the funds transferred into my account. I can call them experts in what they do, i got marveled because it was achieved within 72 hours and still grateful to Rewallet. Contact email, ‎rewalletshieldcoinrecovery@aol.com or WhatsApp::+1 (757) 332-1885

By @dap - 6 months

This reads to me like FUD. Isn’t the fraction of profile samples in a prologue heavily workload dependent? And whichever way you go on frame pointers, there are winners and losers to including them by default.

By @dingi - 6 months

Isn't this just another case of RH (and Canonical to some extent) screwing Fedora users to make their paid offerings better? As far as I understand, this is not gonna be enabled by default on RHEL. If this is a must, it shouldn't be harder to produce two different builds or two different ISOs. People who needs profiling can grab the ISO with profiling enabled. Us regular folks can use the one without.

Profiling with Ctrl-C

What is the best pointer tagging method?

A Time Consuming Pitfall for 32-Bit Applications on AArch64

Linus Torvalds Lands 2.6% Performance Improvement with Minor Linux Kernel Patch

Linus Torvalds merged a patch improving Linux kernel performance by 2.6% by modifying the copy_from_user() function. It will be included in the upcoming Linux 6.12-rc6 release in November.

Limitations of Frame Pointer Unwinding

Related

Profiling with Ctrl-C

What is the best pointer tagging method?

What is the best pointer tagging method?

A Time Consuming Pitfall for 32-Bit Applications on AArch64

Linus Torvalds Lands 2.6% Performance Improvement with Minor Linux Kernel Patch

Related

Profiling with Ctrl-C

What is the best pointer tagging method?

What is the best pointer tagging method?

A Time Consuming Pitfall for 32-Bit Applications on AArch64

Linus Torvalds Lands 2.6% Performance Improvement with Minor Linux Kernel Patch