December 15th, 2024

Linux Fixing a "Hilarious/Revolting Performance Regression" Around Intel KVM

Recent updates to KVM in Linux address a performance regression in Intel's Emerald Rapids processors, improving nested virtualization latency by caching CPUID outputs, with further enhancements expected in Linux 6.14.

Read original articleLink Icon
Linux Fixing a "Hilarious/Revolting Performance Regression" Around Intel KVM

Recent updates to the Kernel-based Virtual Machine (KVM) in Linux have addressed a significant performance regression affecting Intel's newer Xeon processors, particularly the Emerald Rapids series. This regression, described as "hilarious/revolting," results in a 3x-4x increase in latency during nested virtualization transitions due to the high cost of executing CPUID instructions. The interim fix involves caching CPUID outputs during module initialization, which significantly reduces the overhead associated with these transitions. The changes are part of a patch series aimed at improving performance in the upcoming Linux 6.14 release, with the caching solution being applied to the current 6.13 version. The performance issues stem from the increased complexity of handling XSAVE features on newer CPUs, which require multiple runtime CPUID updates that are often unnecessary. The patches aim to streamline these updates, deferring them until they are actually needed, thus enhancing overall efficiency. This development highlights the ongoing challenges in optimizing virtualization performance on modern hardware and the collaborative efforts of engineers, particularly from Google, to resolve these issues.

- A significant performance regression in Intel's KVM virtualization has been identified and addressed.

- The regression is particularly pronounced in Intel's Emerald Rapids processors, affecting nested virtualization transitions.

- An interim fix involves caching CPUID outputs to reduce latency during these transitions.

- The full resolution is expected in the upcoming Linux 6.14 release.

- The issue underscores the complexities of virtualization on modern CPUs and the collaborative efforts to enhance performance.

Link Icon 2 comments
By @not_your_vase - about 2 months
It sounds like the regression comes from recent hardware change (which wasn't present when the original code was written), and not from inherently buggy code