July 12th, 2024

Binary Search Tree with SIMD

Clément Jean presents a cache-friendly algorithm for binary search trees in Go, optimizing memory access with SIMD operations. The approach enhances search efficiency, demonstrated through benchmark tests, and suggests broader applications in data structures.

Read original articleLink Icon
Binary Search Tree with SIMD

Clément Jean explores a cache-friendly algorithm for data structures like trees, focusing on binary search trees (BST) implemented in Go. The traditional array representation of a BST lacks cache efficiency, prompting the adoption of a new layout resembling parent-children triangles. This layout enables Single Instruction, Multiple Data (SIMD) operations on parent-child pairs for efficient searching. The article delves into the implementation details, including ARM64 assembly code for SIMD instructions. By loading elements into vectors and applying SIMD operations, the algorithm efficiently searches for elements in the BST. The process involves comparing elements, mapping indices, and updating search parameters iteratively. Benchmark tests demonstrate the performance benefits of the SIMD-based binary search compared to a traditional binary search. The article concludes by highlighting the potential applications of this cache-friendly BST algorithm within a frozen AVL Tree data structure. Readers are encouraged to explore further details on the implemented data structure and provide feedback for future improvements.

Link Icon 2 comments
By @rgovostes - 6 months
This article doesn't make a lot of sense to me. It avoids using computer science terminology (like "tree traversal") that would clarify the author's intent here.

I don't follow at all the example of searching for the number 62, deciding that the triplet 41-23-61 contains all numbers greater than 62, and then deciding to "access the 4th child" 73-67-79 — what is the 4th child in a binary tree? How do we determine the index to move to in our flattened array?

It would probably be better to read the original paper "FAST: Fast Architecture Sensitive Tree Search on Modern CPUs and GPUs" (https://doi.org/10.1145/1807167.1807206) if this is of interest.

By @dragontamer - 6 months
This is non-intuitive to me.

I'd expect that the natural SIMD-extension to a binary search tree would be a 4-BTree (assuming 4-elements compared per SIMD unit).

For AVX512, I'd expect a 16-BTree to be used.

----------

EDIT:

> The paper mentionned at the beginning, propose to have a binary tree layed out like the following:

> And if you take time to understand how this maps back to the binary tree, you will notice that we are storing parent-children triangles. With this, we can now apply SIMD operations on both the parent and the children in order to either dtermine if the data we are looking for is in the triangle, or if we should continue our search.

Hmmmmmmm. So maybe they are making a B-Tree here except not calling it one?