July 26th, 2024

A hash table by any other name

Matthew Wilcox introduced the rosebush data structure, a scalable hash table for the Linux kernel, designed to improve performance by using fixed-size arrays and supporting concurrent access, though feedback remains mixed.

Read original article

On June 25, Matthew Wilcox introduced a new data structure called rosebush, designed as a scalable, cache-aware hash table optimized for the Linux kernel's directory-entry cache (dcache). This structure aims to improve performance compared to the existing rhashtable by avoiding linked lists, which can lead to inefficient memory access patterns. Rosebush maintains a similar API to rhashtable but utilizes fixed-size arrays for buckets, allowing for more efficient cache usage during lookups. The design includes a two-level structure where the hash selects a bucket from an array, followed by a linear scan within that bucket. This method reduces cache misses, particularly during dependent cache accesses.

Rosebush supports concurrent access through read-copy-update (RCU), allowing for efficient reads while minimizing locking overhead during resizing operations. Unlike rhashtable, which uses intrusive linked lists, rosebush's design permits items to be stored in multiple structures without additional memory overhead. The patch set has received mixed feedback, with some reviewers questioning the performance claims without empirical data. Wilcox's assertions about rosebush's efficiency stem from the assumption that rhashtable's average bucket length is longer than expected, which may not hold true in practice. The ongoing discussion among developers highlights the complexities of hash table performance and the importance of empirical validation in evaluating new data structures.

Optimizing the Roc parser/compiler with data-oriented design

The blog post explores optimizing a parser/compiler with data-oriented design (DoD), comparing Array of Structs and Struct of Arrays for improved performance through memory efficiency and cache utilization. Restructuring data in the Roc compiler showcases enhanced efficiency and performance gains.

How to implement a hash table in C (2021)

This article explains implementing a hash table in C, covering linear/binary search, hash table design, simple hash function, collision handling, resizing, and API design. Code snippets and GitHub repository link provided.

Atomicless Per-Core Concurrency

The article explores atomicless concurrency for efficient allocator design, transitioning from per-thread to per-CPU structures on Linux. It details implementing CPU-local data structures using restartable sequences and rseq syscall, addressing challenges in Rust.

Binary Search Tree with SIMD

Clément Jean presents a cache-friendly algorithm for binary search trees in Go, optimizing memory access with SIMD operations. The approach enhances search efficiency, demonstrated through benchmark tests, and suggests broader applications in data structures.

Compact Fenwick trees for dynamic ranking and selection (2019)

The paper presents Compact Fenwick trees, enhancing array storage and operations efficiency. It introduces optimized variants for faster searches and reduced space usage, aiming to create a dynamic bit vector with logarithmic time complexity. Additionally, it proposes solutions to leverage CPU cache structures for enhanced performance.

5 comments

By @metadat - 10 months

Related reading:

What is RCU, Fundamentally? - https://lwn.net/Articles/262464/

Amusingly, the first several hits for "Golang RCU" point to GitHub repos which contain a bunch of Mutexes (locks), import "unsafe", and haven't been updated in 7+ years. Perhaps this is not the most well-understood area of computer science.

Let's face it, when was the last time you needed RCU for a web app?

By @stoperaticless - 10 months

In lwn comments a guy insists that “O(f(n)) is by definition a bound over the worst-case case input to a function.”

Is that actually true?

Mathematically: O(f(n)) is asymptotic bound when n approaches infinity, which could be used to describe worst, best, average and other cases.

Have programmers redefined O to specifically mean worst case?

By @commandlinefan - 10 months

Interestingly, I was just reading the book "Java Concurrency in Practice" where the author measures the performance of an array-based hash table like the one described here with a linked-list based hash table and finds that the linked list comes out ahead performance-wise.

By @pphysch - 10 months

> "I've assumed an average chain length of 10 for rhashtable in the above memory calculations."

> Reviewers were, as ever, skeptical in the face of assertions about performance without corresponding measurements. Peng Zhang asked "how much performance improvement is expected if it is applied to dcache?" Wilcox didn't reply to Zhang's question.

Am I reading this right that the author supplied no actual (benchmarking) evidence to support their proposed performance optimization?

By @o11c - 10 months

Is it just me, or is this unsafely assuming that collisions can only happen in low bits, not for the full hash (despite it being only 32 bits)