June 28th, 2024

The CIDR trie data structure

This blog post explores CIDR trie data structure in Rust for efficient IP address management. It introduces CIDR trie for fast country lookup, explaining implementation details and time complexity benefits, suggesting optimizations. It mentions trie_rs crate for advanced trie use.

Read original articleLink Icon
The CIDR trie data structure

This blog post discusses the concept of CIDR trie data structure and its application in managing IP addresses and subnets efficiently in Rust projects. The problem addressed is the need to restrict access to a service based on specific IP addresses without slowing down the system. By using CIDR notation to represent IP address ranges, a specialized trie data structure, the CIDR trie, is introduced. This trie structure optimizes the search process based on the prefix of an element, providing a fast and efficient way to determine the country associated with a given IP address. The implementation details of the CidrCountryMap struct, insert and search methods are explained, showcasing how the trie is built and utilized for IP address lookups. The blog concludes by highlighting the time complexity benefits of using a CIDR trie for IP address searches and suggests potential improvements for further optimization. Additionally, it mentions the availability of the trie_rs crate for more complex trie implementations but emphasizes the importance of keeping dependencies minimal for better project maintenance.

Link Icon 4 comments
By @vlmutolo - 7 months
Also see “poptrie” [0] for an efficient way to do routing table-style lookups for CIDRs (longest prefix match). They basically use the “popcnt trick” to implement sparse arrays, use “direct indexing” for the first however many bits to reduce the number of indirections, and have a cool optimization specific to longest prefix match (“hole punching” in the paper).

[0]: https://dl.acm.org/doi/10.1145/2829988.2787474

By @toast0 - 7 months
I've never really groked tries, so bad for me. But I've built a few IP lookup tables.

For one, I only needed to know a true/false given an ipv4 address (does geolocation of the ip indicate within a specific set of countries), and the lookup needed to be fast and easy.

If you look at every address, a simple boolean array will be too big, 2^29 bytes with 2^3 values per byte covers the address space, but half a gig of memory is too much. Otoh, geolocation by country is unlikely to be more specific than a /24, 2^21 bytes with 2^3 values per byte is only 2 meg of memory devoted to a very fast lookup; less in actual use if you mmap without large pages, because parts of the range will never page in.

That approach isn't great if your value is larger than a boolean or you need ipv6 (you can use /48s as your maximum prefix length, but there's quite a lot of /48s).

If you want a compact table, you can store an ordered set of (low ip, value), and access the table by finding the nearest value equal or less than the observed ip. I used Erlang's ets with ordered set and the ets:prev(Table, IP + 1) which uses a binary tree under the hood, but anything that can handle 128 bit integer should work.

Building the table is a bit of work, because you've got to take inputs of like X.Y.0.0/16 -> A and then X.Y.6.0/24 -> B and turn that into X.Y.0.0 -> A, X.Y.6.0 -> B, X.Y.7.0 -> A ... but all that transformation happens offline.

I think you're more likely to have a language built in for a binary tree than a trie, but some languages have neither.

By @seligman99 - 7 months
I ended up using the same basic layout for the database behind a little IP lookup tool I wrote to make lookups somewhat responsive from JavaScript [1]. It ends up working out pretty well.

[1] https://cloud-ips.s3-us-west-2.amazonaws.com/index.html

By @fmajid - 7 months
In networking, these are known as Patricia trees, a special case of the Radix tree/trie.