September 21st, 2024

Questioning the Criteria for Evaluating Non-Cryptographic Hash Functions

The article evaluates non-cryptographic hash functions, emphasizing efficiency and uniform output distribution. It discusses the avalanche criterion, performance testing, and the importance of selecting hash functions based on application needs.

Read original article

Questioning the Criteria for Evaluating Non-Cryptographic Hash Functions

The article discusses the evaluation criteria for non-cryptographic hash functions, emphasizing their widespread use in computing, such as in dictionaries and load balancers. Unlike cryptographic hash functions, which prioritize security and resistance to collisions, non-cryptographic hashes focus on efficiency and uniform distribution of outputs. The authors highlight the importance of testing these functions through practical applications, such as populating hash tables, and demonstrate how different datasets can affect performance. They introduce the avalanche criterion, which states that changing one bit of input should ideally affect every output bit with a 50% probability. The performance of various hash functions, including Murmur2 and FNV, is analyzed against this criterion, revealing that while Murmur2 excels in avalanche performance, it may not always provide optimal collision resistance. The article concludes that every hash function has specific datasets that can challenge its performance, suggesting that the choice of hash function should be tailored to the application's requirements.

- Non-cryptographic hash functions are essential for efficiency in computing tasks.

- The avalanche criterion is a key measure for evaluating hash function performance.

- Practical testing with various datasets reveals strengths and weaknesses in hash functions.

- Murmur2 shows superior avalanche performance but may struggle with collision resistance.

- The choice of hash function should align with specific application needs and data characteristics.

Group Actions and Hashing Unordered Multisets

Group actions are used to analyze hash functions for unordered sets and multisets, ensuring order-agnostic hashing. By leveraging group theory, particularly abelian groups, hash functions' structure is explored, emphasizing efficient and order-independent hashing techniques.

Benchmarking Perfect Hashing in C++

Benchmarking perfect hashing functions in C++ using clang++-19 and g++-13 reveals mph as the fastest with limitations. Various hash function implementations are compared for lookup time, build time, and size, aiding system optimization.

`noexcept` affects libstdc++'s `unordered_set`

The article examines the impact of the `noexcept` specifier on `std::unordered_set` performance in libstdc++, highlighting optimization opportunities and advocating for improvements to handle hash function efficiency better.

Immutable Data Structures in Qdrant

Qdrant's article highlights the benefits of immutable data structures in vector databases, improving performance and memory efficiency, while addressing challenges through mutable segments and techniques like perfect hashing and defragmentation.

When Bloom filters don't bloom (2020)

The author explored Bloom filters for IP spoofing but faced performance issues due to memory access times. A simpler hash table approach ultimately provided better performance for large datasets.

1 comments

By @PeterWhittaker - 3 months

I’ve read the first several paragraphs and skimmed the article a few times, and the motivation still eludes me: given the existence of cryptographic hash functions that are well designed and thoroughly tested, why spend so much time on non-cryptographic hash functions?

What benefits do they provide that are not provided by their secure “brethren”?

Group Actions and Hashing Unordered Multisets

Benchmarking Perfect Hashing in C++

`noexcept` affects libstdc++'s `unordered_set`

Immutable Data Structures in Qdrant

When Bloom filters don't bloom (2020)

The author explored Bloom filters for IP spoofing but faced performance issues due to memory access times. A simpler hash table approach ultimately provided better performance for large datasets.

Questioning the Criteria for Evaluating Non-Cryptographic Hash Functions

Related

Group Actions and Hashing Unordered Multisets

Benchmarking Perfect Hashing in C++

`noexcept` affects libstdc++'s `unordered_set`

Immutable Data Structures in Qdrant

When Bloom filters don't bloom (2020)

Related

Group Actions and Hashing Unordered Multisets

Benchmarking Perfect Hashing in C++

`noexcept` affects libstdc++'s `unordered_set`

Immutable Data Structures in Qdrant

When Bloom filters don't bloom (2020)