August 22nd, 2024

Intel Further Speeds Up Strnlen() in the GNU C Library for Recent Intel/AMD CPUs

Intel has optimized the strnlen() function in glibc for better performance on modern CPUs, unifying implementations and showing significant improvements in benchmark tests. The update will be in glibc 2.41.

Read original article

Intel Further Speeds Up Strnlen() in the GNU C Library for Recent Intel/AMD CPUs

Intel has made further optimizations to the strnlen() function in the GNU C Library (glibc), enhancing performance for recent Intel and AMD CPUs. The update, merged this week, unifies the EVEX and EVEX512 implementations of strnlen(), which is used to determine the length of a fixed-size string. Matthew Sterrett from Intel noted that this unification reduces the number of implementations and introduces minor optimizations that improve performance. Benchmark tests conducted on an Intel Core i9 7900X Skylake X CPU showed a geometric mean improvement of 0.881 for strnlen-evex and 0.953 for strnlen-evex512 compared to previous versions. This optimized code will be included in the upcoming glibc 2.41 release, scheduled for stable release in February.

- Intel has optimized the strnlen() function in the GNU C Library for better performance on modern CPUs.

- The update unifies the EVEX and EVEX512 implementations, reducing complexity.

- Benchmark tests indicate significant performance improvements over previous implementations.

- The optimized code will be part of the glibc 2.41 release in February.

- Intel continues to focus on software optimizations to enhance CPU performance.

tolower() with AVX-512

Tony Finch's blog post details the implementation of the tolower() function using AVX-512-BW SIMD instructions, optimizing string processing and outperforming standard methods, particularly for short strings.

An SVE backend for astcenc (Adaptive Scalable Texture Compression Encoder)

The implementation of a 256-bit SVE backend for astcenc shows performance improvements of 14% to 63%, utilizing predicated operations and scatter/gather instructions, with future work planned for SVE2.

Zen5's AVX512 Teardown and More

AMD's Zen5 architecture enhances AVX512 capabilities with native implementation, achieving 4 x 512-bit throughput, while facing thermal throttling challenges. It shows significant performance gains, especially in high-performance computing.

Intel Clear Linux: 16% more Ryzen 9 9950X performance

Recent tests of the AMD Ryzen 9 9950X on various Linux distributions revealed that Intel's Clear Linux provided a 16% performance boost, highlighting the importance of software optimizations for hardware performance.

Strlcpy and how CPUs can defy common sense

The article compares the performance of `strlcpy` in OpenBSD and glibc, revealing glibc's faster execution despite double traversal, emphasizing instruction-level parallelism and advocating for sized strings for efficiency.

1 comments

Intel Further Speeds Up Strnlen() in the GNU C Library for Recent Intel/AMD CPUs

Related

tolower() with AVX-512

An SVE backend for astcenc (Adaptive Scalable Texture Compression Encoder)

Zen5's AVX512 Teardown and More

Intel Clear Linux: 16% more Ryzen 9 9950X performance

Strlcpy and how CPUs can defy common sense

Related

tolower() with AVX-512

An SVE backend for astcenc (Adaptive Scalable Texture Compression Encoder)

Zen5's AVX512 Teardown and More

Intel Clear Linux: 16% more Ryzen 9 9950X performance

Strlcpy and how CPUs can defy common sense