September 15th, 2024

A Time Consuming Pitfall for 32-Bit Applications on AArch64

Running 32-bit applications on 64-bit AArch64 Linux requires separate GCC toolchains and proper configuration to avoid performance issues, particularly ensuring vDSO support for efficient system calls.

Read original articleLink Icon
A Time Consuming Pitfall for 32-Bit Applications on AArch64

Running 32-bit legacy applications on 64-bit AArch64 Linux systems can lead to performance issues if not configured correctly. Unlike x86_64, AArch64 requires separate GCC toolchains for 32-bit and 64-bit applications. When building the Linux kernel, enabling the CONFIG_COMPAT option is necessary, but it must be paired with the CROSS_COMPILE_COMPAT variable pointing to a 32-bit toolchain. If this is overlooked, the kernel will not provide a virtual dynamic shared object (vDSO) for 32-bit applications, resulting in slower performance due to context switches for system calls like gettimeofday(2). A benchmark demonstrated that querying the current time was over 25 times slower without the vDSO. The author encountered this issue while working on a project using Yocto and had to patch the kernel build process to incorporate CROSS_COMPILE_COMPAT support. To check for vDSO availability, Linux provides methods such as using the auxiliary vector AT_SYSINFO_EHDR or the glibc function getauxval(3). It is crucial for developers to ensure their build systems support the necessary configurations to avoid performance pitfalls when running legacy applications.

- AArch64 requires separate GCC toolchains for 32-bit and 64-bit applications.

- Failure to configure CROSS_COMPILE_COMPAT can lead to significant performance degradation.

- The vDSO allows faster system calls by avoiding context switches.

- Benchmark tests showed a dramatic performance difference with and without vDSO.

- Developers should verify vDSO availability using Linux tools and ensure build systems support necessary configurations.

Related

Arm64EC – Build and port apps for native performance on Arm

Arm64EC – Build and port apps for native performance on Arm

Arm64EC is a new ABI for Windows 11 on Arm devices, offering native performance benefits and compatibility with x64 code. Developers can enhance app performance by transitioning incrementally and rebuilding dependencies. Specific tools help identify Arm64EC binaries and guide the transition process for Win32 apps.

Do not taunt happy fun branch predictor

Do not taunt happy fun branch predictor

The author shares insights on optimizing AArch64 assembly code by reducing jumps in loops. Replacing ret with br x30 improved performance, leading to an 8.8x speed increase. Considerations on branch prediction and SIMD instructions are discussed.

CPU Dispatching: Make your code both portable and fast (2020)

CPU Dispatching: Make your code both portable and fast (2020)

CPU dispatching improves software performance and portability by allowing binaries to select code versions based on CPU features at runtime, with manual and compiler-assisted approaches enhancing efficiency, especially using SIMD instructions.

Fedora 42 On 64-bit ARM Might Make It Seamless To Run x86/x86_64 Programs

Fedora 42 On 64-bit ARM Might Make It Seamless To Run x86/x86_64 Programs

Fedora 42 is considering integrating the FEX emulator to enhance x86 application support on AArch64 systems, aiming for seamless usability similar to macOS on Apple Silicon, pending committee approval.

Fedora 42 On 64-bit ARM Might Make It Seamless To Run x86/x86_64 Programs

Fedora 42 On 64-bit ARM Might Make It Seamless To Run x86/x86_64 Programs

Fedora 42 is considering integrating the FEX emulator to enable x86 application compatibility on AArch64 systems, pending approval from the Fedora Engineering and Steering Committee, enhancing usability for ARM device users.

Link Icon 5 comments
By @leni536 - 7 months
Ouch. I think the blame is partly on the build configuration. IMO build configurations shouldn't degrade silently this way. If the user is OK without a 32-bit vDSO then they should explicitly specify that.
By @o11c - 7 months
It's worth noting that even on x86, -m32 isn't as complete as a real i?86 build of gcc. It's "complete enough" for the kernel and many other things, but, say, I found it very difficult to build a 32-bit program that didn't rely on SSE since the 64-bit version assumes it.
By @WhyNotHugo - 7 months
What’s the use case for running a 32bit binary in a 64bit cpu/os? Is there any advantage? Or is it simply to avoid having to recompile twice to support two architectures?
By @winter_blue - 7 months
I wonder how vDSO works for an x32 ABI program (ie a program with 32 bit pointers, but access to the rest of the x86-64 feature set).
By @andirk - 7 months
Anyone able to give a TL_IDID_R (I did read) but didn't understand a damn thing?

> ensure that CROSS_COMPILE_COMPAT is directed to a 32-bit toolchain. Failure to do so might lead to performance issues.

OK so set up your CONSTs. I agree.