Unsafe Read Beyond of Death
The article details the "Unsafe Read Beyond of Death" optimization for the GxHash algorithm, enhancing performance through SIMD instructions and achieving over tenfold speed increases for small payloads while ensuring safety.
Read original articleThe article discusses an optimization technique called "Unsafe Read Beyond of Death" (URBD) developed for the GxHash non-cryptographic hashing algorithm. The author, while working on GxHash during parental leave, aimed to enhance performance by utilizing SIMD (Single Instruction, Multiple Data) instructions, which allow parallel processing of multiple data elements. Traditional methods for handling uneven input lengths involve slower scalar operations, but GxHash avoids this by loading uneven parts into zero-padded SIMD registers.
To further improve speed, the author experimented with reading beyond the end of the input buffer, a risky operation that can lead to crashes. This method resulted in over a tenfold speed increase for small payloads, although it risks reading invalid memory. To mitigate this, a safety check ensures that the end of the buffer lies within the same memory page, which is typically 4KB in size.
Additionally, the article describes how to mask out invalid bytes and incorporate the length of the input into the hashing process to maintain accuracy. The final implementation of URBD allows for a safe and efficient way to handle uneven input lengths while maximizing performance. Benchmark results indicate that GxHash is now the fastest non-cryptographic hashing algorithm, particularly for small payloads, outperforming other algorithms significantly. The source code and benchmarks are available in the GxHash repository.
Related
Group Actions and Hashing Unordered Multisets
Group actions are used to analyze hash functions for unordered sets and multisets, ensuring order-agnostic hashing. By leveraging group theory, particularly abelian groups, hash functions' structure is explored, emphasizing efficient and order-independent hashing techniques.
Benchmarking Perfect Hashing in C++
Benchmarking perfect hashing functions in C++ using clang++-19 and g++-13 reveals mph as the fastest with limitations. Various hash function implementations are compared for lookup time, build time, and size, aiding system optimization.
Do not taunt happy fun branch predictor
The author shares insights on optimizing AArch64 assembly code by reducing jumps in loops. Replacing ret with br x30 improved performance, leading to an 8.8x speed increase. Considerations on branch prediction and SIMD instructions are discussed.
A hash table by any other name
Matthew Wilcox introduced the rosebush data structure, a scalable hash table for the Linux kernel, designed to improve performance by using fixed-size arrays and supporting concurrent access, though feedback remains mixed.
tolower() with AVX-512
Tony Finch's blog post details the implementation of the tolower() function using AVX-512-BW SIMD instructions, optimizing string processing and outperforming standard methods, particularly for short strings.
Related
Group Actions and Hashing Unordered Multisets
Group actions are used to analyze hash functions for unordered sets and multisets, ensuring order-agnostic hashing. By leveraging group theory, particularly abelian groups, hash functions' structure is explored, emphasizing efficient and order-independent hashing techniques.
Benchmarking Perfect Hashing in C++
Benchmarking perfect hashing functions in C++ using clang++-19 and g++-13 reveals mph as the fastest with limitations. Various hash function implementations are compared for lookup time, build time, and size, aiding system optimization.
Do not taunt happy fun branch predictor
The author shares insights on optimizing AArch64 assembly code by reducing jumps in loops. Replacing ret with br x30 improved performance, leading to an 8.8x speed increase. Considerations on branch prediction and SIMD instructions are discussed.
A hash table by any other name
Matthew Wilcox introduced the rosebush data structure, a scalable hash table for the Linux kernel, designed to improve performance by using fixed-size arrays and supporting concurrent access, though feedback remains mixed.
tolower() with AVX-512
Tony Finch's blog post details the implementation of the tolower() function using AVX-512-BW SIMD instructions, optimizing string processing and outperforming standard methods, particularly for short strings.