July 4th, 2024

Th64: Tiny Hash Function in C

A compact hash function "th64" on GitHub excels in performance, passing SMhasher3 tests. It achieves 43 cycles per hash for small keys and 2.61 bytes per cycle for larger keys, operating at 8.52 GiB per second. The function's C code demonstrates efficient data processing with a versatile design for different key sizes. It is available under public domain or MIT-0 license for flexible usage.

Read original articleLink Icon
Th64: Tiny Hash Function in C

The "th64" project on GitHub presents a compact hash function in C language that successfully passes all tests in SMhasher3. This function demonstrates high performance, achieving 43 cycles per hash for small keys (1-32 bytes) and 2.61 bytes per cycle for larger keys, equivalent to 8.52 GiB per second. The provided C code for the "th64" hash function showcases its implementation, utilizing a specific algorithm to process data efficiently. The function is designed to operate swiftly for various key sizes, enhancing its versatility. Additionally, the project offers the hash function under either the public domain or MIT-0 license, allowing for flexible usage and modification within the specified licensing terms.

Related

Arm64EC – Build and port apps for native performance on Arm

Arm64EC – Build and port apps for native performance on Arm

Arm64EC is a new ABI for Windows 11 on Arm devices, offering native performance benefits and compatibility with x64 code. Developers can enhance app performance by transitioning incrementally and rebuilding dependencies. Specific tools help identify Arm64EC binaries and guide the transition process for Win32 apps.

Own Constant Folder in C/C++

Own Constant Folder in C/C++

Neil Henning discusses precision issues in clang when using the sqrtps intrinsic with -ffast-math, suggesting inline assembly for instruction selection. He introduces a workaround using __builtin_constant_p for constant folding optimization, enhancing code efficiency.

Group Actions and Hashing Unordered Multisets

Group Actions and Hashing Unordered Multisets

Group actions are used to analyze hash functions for unordered sets and multisets, ensuring order-agnostic hashing. By leveraging group theory, particularly abelian groups, hash functions' structure is explored, emphasizing efficient and order-independent hashing techniques.

Benchmarking Perfect Hashing in C++

Benchmarking Perfect Hashing in C++

Benchmarking perfect hashing functions in C++ using clang++-19 and g++-13 reveals mph as the fastest with limitations. Various hash function implementations are compared for lookup time, build time, and size, aiding system optimization.

Do not taunt happy fun branch predictor

Do not taunt happy fun branch predictor

The author shares insights on optimizing AArch64 assembly code by reducing jumps in loops. Replacing ret with br x30 improved performance, leading to an 8.8x speed increase. Considerations on branch prediction and SIMD instructions are discussed.

Link Icon 10 comments
By @aappleby - 5 months
Hi, MurmurHash and SMHasher author here.

With the code-golfing obfuscation unwound, Th64 is structured very similarly to MurmurHash. The source data is pre-mixed via "rol(x * r + i, 31)", the hash is mixed via "hash = rol(hash * r ^ x, 31)", the tail bytes are mixed via "hash = hash * r ^ x", and the finalization mix is three "hash = (hash ^ (hash >> 31)) * r" rounds.

I see nothing wrong with it if it passes SMHasher3, but it's also not radically different than other hashes - it's just smooshed down into four lines.

By @tidwall - 5 months
Hi HN. I’m the author of this hash. It’s just something I whipped up last night. It’s really not much different than the other hashes out there. I wanted to see if I could make a simple quality hash that easy to cut & paste, under 80 chars lines, and self contained.

Otherwise just an experiment.

By @ronsor - 5 months
This is cool, but note:

* ~~It requires two passes over the data to hash (notice the two while loops).~~ Actually it doesn't.

* The first loop casts the data pointer to a 64-bit int, so you will get different results depending on endianness. I don't remember if this is undefined behavior or not.

By @jacknews - 5 months
Is the obfuscated-C style really necessary? Last I checked, compacting C source did not directly result in tighter object code.
By @mnurzia - 5 months
This is super cool. Any insight into how this may have been designed? High-quality, compact hash functions are really useful in C.
By @Yoofie - 5 months
Slightly off topic: Does anyone know of a good/fast hash function (similar to this) for use in 32bit embedded systems (such as Cortex-M, TriCore, C2000, etc)?
By @IChooseY0u - 5 months
why not just use fnv64
By @EPWN3D - 5 months
I get that expressing the function in 6 lines is cool and all, but it'd be nice if it was readable.
By @tpoacher - 5 months
Great! Now do it in brainfuck!
By @_V_ - 5 months
Neat, but I'm personally missing some key info here - most importantly "why?". What problem does this solve or why is this particular function good or better than other "standard" hashing functions?

And I have to admin that this kind of obfuscation/minification is especially dangerous in languages such as C as there are quite a lot of footguns already as it is. I don't think that minifying C produces faster programs :)