A Time Consuming Pitfall for 32-Bit Applications on AArch64
Running 32-bit applications on 64-bit AArch64 Linux requires separate GCC toolchains and proper configuration to avoid performance issues, particularly ensuring vDSO support for efficient system calls.
Read original articleRunning 32-bit legacy applications on 64-bit AArch64 Linux systems can lead to performance issues if not configured correctly. Unlike x86_64, AArch64 requires separate GCC toolchains for 32-bit and 64-bit applications. When building the Linux kernel, enabling the CONFIG_COMPAT option is necessary, but it must be paired with the CROSS_COMPILE_COMPAT variable pointing to a 32-bit toolchain. If this is overlooked, the kernel will not provide a virtual dynamic shared object (vDSO) for 32-bit applications, resulting in slower performance due to context switches for system calls like gettimeofday(2). A benchmark demonstrated that querying the current time was over 25 times slower without the vDSO. The author encountered this issue while working on a project using Yocto and had to patch the kernel build process to incorporate CROSS_COMPILE_COMPAT support. To check for vDSO availability, Linux provides methods such as using the auxiliary vector AT_SYSINFO_EHDR or the glibc function getauxval(3). It is crucial for developers to ensure their build systems support the necessary configurations to avoid performance pitfalls when running legacy applications.
- AArch64 requires separate GCC toolchains for 32-bit and 64-bit applications.
- Failure to configure CROSS_COMPILE_COMPAT can lead to significant performance degradation.
- The vDSO allows faster system calls by avoiding context switches.
- Benchmark tests showed a dramatic performance difference with and without vDSO.
- Developers should verify vDSO availability using Linux tools and ensure build systems support necessary configurations.
Related
Arm64EC – Build and port apps for native performance on Arm
Arm64EC is a new ABI for Windows 11 on Arm devices, offering native performance benefits and compatibility with x64 code. Developers can enhance app performance by transitioning incrementally and rebuilding dependencies. Specific tools help identify Arm64EC binaries and guide the transition process for Win32 apps.
Do not taunt happy fun branch predictor
The author shares insights on optimizing AArch64 assembly code by reducing jumps in loops. Replacing ret with br x30 improved performance, leading to an 8.8x speed increase. Considerations on branch prediction and SIMD instructions are discussed.
CPU Dispatching: Make your code both portable and fast (2020)
CPU dispatching improves software performance and portability by allowing binaries to select code versions based on CPU features at runtime, with manual and compiler-assisted approaches enhancing efficiency, especially using SIMD instructions.
Fedora 42 On 64-bit ARM Might Make It Seamless To Run x86/x86_64 Programs
Fedora 42 is considering integrating the FEX emulator to enhance x86 application support on AArch64 systems, aiming for seamless usability similar to macOS on Apple Silicon, pending committee approval.
Fedora 42 On 64-bit ARM Might Make It Seamless To Run x86/x86_64 Programs
Fedora 42 is considering integrating the FEX emulator to enable x86 application compatibility on AArch64 systems, pending approval from the Fedora Engineering and Steering Committee, enhancing usability for ARM device users.
> ensure that CROSS_COMPILE_COMPAT is directed to a 32-bit toolchain. Failure to do so might lead to performance issues.
OK so set up your CONSTs. I agree.
Related
Arm64EC – Build and port apps for native performance on Arm
Arm64EC is a new ABI for Windows 11 on Arm devices, offering native performance benefits and compatibility with x64 code. Developers can enhance app performance by transitioning incrementally and rebuilding dependencies. Specific tools help identify Arm64EC binaries and guide the transition process for Win32 apps.
Do not taunt happy fun branch predictor
The author shares insights on optimizing AArch64 assembly code by reducing jumps in loops. Replacing ret with br x30 improved performance, leading to an 8.8x speed increase. Considerations on branch prediction and SIMD instructions are discussed.
CPU Dispatching: Make your code both portable and fast (2020)
CPU dispatching improves software performance and portability by allowing binaries to select code versions based on CPU features at runtime, with manual and compiler-assisted approaches enhancing efficiency, especially using SIMD instructions.
Fedora 42 On 64-bit ARM Might Make It Seamless To Run x86/x86_64 Programs
Fedora 42 is considering integrating the FEX emulator to enhance x86 application support on AArch64 systems, aiming for seamless usability similar to macOS on Apple Silicon, pending committee approval.
Fedora 42 On 64-bit ARM Might Make It Seamless To Run x86/x86_64 Programs
Fedora 42 is considering integrating the FEX emulator to enable x86 application compatibility on AArch64 systems, pending approval from the Fedora Engineering and Steering Committee, enhancing usability for ARM device users.