Ask HN: Fast data structures for disjoint intervals?
The author seeks recommendations for innovative data structures to improve read speeds for managing disjoint time intervals, noting that existing solutions often do not outperform simple ordered maps.
The author discusses challenges faced in modeling disjoint time intervals for applications that require efficient insertion, removal, and range queries. They have previously used an ordered map or tree structure, specifically in Rust with `BTreeMap<u32, 32>`, which effectively represents time intervals but lacks the speed needed for real-time applications with heavy read demands. The author notes that many specialized interval libraries do not outperform a simple ordered map, as they often rely on similar underlying structures and do not prioritize cache efficiency or SIMD, leading to slow pointer traversal.
The author has experimented with various data structures, including adaptive radix trees, dense bitsets, spatial indexes, and fixed grid hierarchies, but has not found a solution that significantly improves performance over the ordered map. They are seeking recommendations for innovative data structures or hybrid approaches that could enhance read speeds while accepting a trade-off of slower insertion and removal operations. The inquiry highlights a need for more efficient methods to manage time intervals in applications with high-frequency queries.
Related
The CIDR trie data structure
This blog post explores CIDR trie data structure in Rust for efficient IP address management. It introduces CIDR trie for fast country lookup, explaining implementation details and time complexity benefits, suggesting optimizations. It mentions trie_rs crate for advanced trie use.
Properly Testing Concurrent Data Structures
The article explores testing concurrent data structures using the Rust library loom. It demonstrates creating property tests with managed threads to simulate concurrent behavior, emphasizing synchronization challenges and design considerations.
Binary Search Tree with SIMD
Clément Jean presents a cache-friendly algorithm for binary search trees in Go, optimizing memory access with SIMD operations. The approach enhances search efficiency, demonstrated through benchmark tests, and suggests broader applications in data structures.
Compact Fenwick trees for dynamic ranking and selection (2019)
The paper presents Compact Fenwick trees, enhancing array storage and operations efficiency. It introduces optimized variants for faster searches and reduced space usage, aiming to create a dynamic bit vector with logarithmic time complexity. Additionally, it proposes solutions to leverage CPU cache structures for enhanced performance.
Implementing and Improving Skiplists
Matt Hall discusses skiplists as an alternative to balanced binary trees, focusing on implementation and improvement. Skiplists offer efficiency in data insertion and retrieval but face challenges in optimizing cache utilization with proposed solutions showing mixed performance results.
> The slow cases usually tend to be trying to find wider slots and skipping through many smaller slots. There are often a lot of bookings up to the first wide enough slot too though, so it's a little bit of both.
This right here is the first critical issue. It doesn't matter how fast the data structure is if you're then going to do a linear search over the available slots to find the next one wide enough.
A good data structure for doing this efficiently is a range tree (https://en.wikipedia.org/wiki/Range_tree), where in each node you store the maximum width of the available slots covered by the node. That lets you find the first available slot after some time t and wider than some duration w in O(log(n)), where n is the number of available slots. (It might not be obvious if you're not familiar with range trees; I'm happy to provide more details.)
For the implementation, there are a few options:
A. The simplest is to use a complete range tree, where the leaf nodes are all the possible times. You can lazily create nodes to avoid massive memory usage for sparse ranges. The advantage is that you don't need to do any balancing; the disadvantage is that the time complexity is O(long(T)) where T is the total number of possible times; so it's going to be a little slower on very sparse datasets.
B. The O(log(n)) implementation that's being called for is a self-balancing binary tree (e.g. red-black), modified to also store the maximums in each node. Unfortunately most libraries don't give you low-level control over the tree's nodes, so you'd likely need to copy the code and modify it (or implement the self-balancing tree from scratch).
C. If those are still too slow (and you're certain that your implementation is really O(log(n))), you'll need to improve the cache efficiency. That basically comes down to using larger nodes. The obvious approach is to switch to a B-tree; but you could also keep your nodes binary and just change the way they are allocated to emulate the memory layout of a B-tree (this is simpler but slower because it still uses lots of pointers). Another idea is to replace the first few layers of the tree with a hash table (or a simple array if your data is dense enough). Likewise you can replace the leaf nodes with small arrays.
I’m not sure how applicable my solution is to your problem, but it might work well. I implemented a custom btree with the semantics I needed, since btrees are fast and have great cache locality. The trick is making one that understands ranges, not just singular keys like std BtreeMap.
The code is here if you’re curious. It’s implemented on top of a pair of vecs - which makes it 100% safe rust, and curiously faster than the equivalent tree-of-raw-allocations approach. Please forgive the poor documentation - it’s an internal only type and the code is new enough that the paint hasn’t dried yet.
https://github.com/josephg/diamond-types/blob/master/src/ost...
There are things like:
* in-advance-known closeouts
* sporadic closeouts (e.g. due to the bad weather conditions)
* sell-outs
* varying availability by date/start time
* links to partner services who can partially reduce the availability count
* allotments (availability quotas for different sellers)
* resources linked to availabilities (which is another dimension)
* the list goes on and on...
Anyway, back to the data structures.
After many iterations I've settled with using Guava's Table (yes, we use Java). There are many ways to model this, e.g. you can have start times as rows and dates as columns.
It might not sound as sexy as you'd expect but it's super easy to visualise the model in your head and read/maintain the code later.
Then you can use Guava Range functionality for dates or times and do intersections etc. Hope this helps.
This approach would indeed be slow. You can improve your approach by switching to a `BTreeMap<u32, bool>` with the key being a border between intervals and the value telling you whether the next interval is 'in' or 'out'. (And the interval starting at logical minus infinity being eg 'out' by default.)
You can test whether a specific point is 'in' or 'out' by using something like `upper_bound` [0] to find the key just before your query point (and the value associated with that key tells you whether your query point is 'in' or 'out'.)
See https://doc.rust-lang.org/std/collections/struct.BTreeMap.ht...
Insert and remove are pretty simple to code up too, and looking for next availability after a query point is almost trivial: take the cursor you get from `upper_bound`, and if it's not already `out`, advance the cursor by one. (This is assuming you keep the invariant that `in` and `out` intervals interleave. Ie you merge any adjacent intervals of the same kind, by just removing the key-value pair for the second interval from the BTreeMap. Otherwise, you might have to advance your cursor a few more times, perhaps cleaning up your intervals as you go to restore the invariant.) You can also implement merging of two interval maps represented this way pretty efficiently.
I read elsewhere that you also want eg the next free block that's at least n items long, or the next occupied block of at least some minimum size.
You can do these operations by augmenting your tree nodes with caches of eg what's the longest free block under and the longest occupied block under them.
You can use that cache to find the next free interval of at least some specified size quickly, and you can rebuild that cache quickly, too, whenever you make any modification to the tree. (Quickly here meaning in O(number of layers).)
An RTree is the first structure that comes to mind, but the way you describe the problem, it sounds like the intervals never overlap, so I have my doubts. Sounds like you might be looking to optimize the query "what is the first interval of at least N days?" Maybe look at priority trees. They're good at queries that are bounded in one dimension and half-open in the other.
For a wide variety of time slots, keep separate start-time-ordered maps of free slots of size 2^n-1 to 2^(n+1) and search smallest possible first which should be O( log_2 (buckets)) for earliest-possible or O(log N) for arbitrary desired start time, assuming number of buckets isn't too huge.
I'm curious if a heap would be faster than a btree; I think the speed tradeoff is how long it takes to re-add the portion of the free space back to the data structure. With luck, a btree should only need to move a free space entry within a single node which should be very fast.
If appointments are cancelled, insert an appropriately-sized free space into the data structure. Coalesce adjacent free space (probably fastest in a btree) and move to a larger size bucket if necessary.
For high concurrency also use a rotating series of buckets by time interval so that you can entirely drop structures once they have aged off (maximum time interval end is less than now() ), with your free space at the end of each bucket overlapping the next to at least the duration of the maximum appointment length with special case handling to allow an appointment to overlap two buckets (shrinking free space in both). This should allow full multithreaded processing unless every reservation is soonest-available-appointment.
Appointments can live in a hash by their key or ID with their (start,end) interval as the value for O(1) lookups on appointment deletion.
As others have pointed out it's a lot like malloc algorithms with the constraint that you care about address order (time). So linked lists of free buckets of size 2^k become time-ordered maps, and rotating ephemeral maps to account for unbounded time plus the analog of thread-local storage for concurrency.
Part of your issue is doing the calculation on read. If you can get away with slower writes you can store all available slots for the day on write at minute/5min etc granularity, then remove used slots from the tree as they are taken.
- I'd like to acquire <resource> at time T for N units of time.
- I'd like to acquire <resource> for N units of time, give me possible T's.
First, it's was a great benefit to identify what "unit of time" was. For my case it was 30 minutes; this helped tremendously as you'll see.
Next, for the first question (I want <resource> at time T), once I had enough data that actually searching became slow enough such that a return of "resource not available" became problematic, I found a bloom filter seriously improved things. It allowed for not only the common case fails to fail really fast, but bit-shifting the filter also allowed for some very fast parallel queries of "nearby" time slots. I was able to rapidly return things like "2pm isn't available, but 1:30pm is".
Last, for the second question (I'd like <resource> for N units of time), this is _very_ similar to a heap memory allocator. At its simplest it's just a b-tree of slots, etc. Grabbing a big slot, splitting it up, etc. This gets even easier if there are time boundaries that cannot be crossed (eg, a day).
To be clear, the allocator approach isn't about you managing a BTree like you have been. It's about a tree of _available_ time slots that are _free_. Initially it begins as a tree with a single node for all time. Make a request, scan the tree for a time slot large enough to meet the length needed, subdivide it, etc. If a time slot is not longer needed, you put it back into the tree by scanning for the node it belongs to (or fits between), coalescing, etc.
Hope this helps!
BTreeMap should be nearly optimal, up to fiddling with the layout, changing the number of entries stored in one node, and vectorizing your within-node query function.
For people who need to store sorted sparse records and have memory to spare, or who know the keys of their sorted sparse records take on few values (like only a few million?), they can use a robinhood hash table without any hash function. The whole thing may not fit in a relevant cache, but you generally won't hit more than 1 cache line for the types of queries described. Again, I can't really recommend a library, but it's pretty easy to write this yourself.
Edit: Here's a link to one robinhood hash table: https://github.com/dendibakh/perf-challenge6/blob/Solution_R...
For queries like "next/previous free interval with length at least K", scanning within the appropriate level of a Fenwick tree may be better than scanning within a list of occupied intervals. Maybe you could use a layout that keeps all the entries for a level together, rather than the usual layout that mixes in less useful information and puts useful information powers-of-2 apart just to make the cache behavior as bad as possible. For example, if you have K=100, then you could look in the level that has sums of runs of 64. It's necessary but not sufficient to see 2 consecutive entries with at most 28 missing or 3 consecutive entries with at most 92 missing and the middle entry completely empty. Generalizing to arbitrary K is left as an exercise for the reader.
A few thousand elements is still small for a computer. I would be curious to see a benchmark comparing sorted arrays of intervals against the fancier structures.
The improved cache behavior alone can make an order-of-magnitude difference. The cost you pay, of course, is on inserts and removals, but perhaps these can be ammortized in a read-heavy workload (build up a list of changes, and then rebuild in one go).
https://en.wikipedia.org/wiki/Interval_tree
However, that was mainly for the case where I needed to detect overlapping intervals. Depending on your application perhaps a K-d tree, in this case, K=2 (one dimension for start-time the other for the end-time)? If you know that all you intervals are entirely disjoint (no overlap at all) I don't think you'll be able to do better than just an ordered map or list.
Each interval is represented by a particle of radius half the interval, positioned at the middle of the interval.
Because the radius vary, you can either use adaptive algorithm, or split the interval into multiple small particles.
There is not a lot to gain when doing a single query, but when you are batching queries, you can share the computation and memory accesses. Using radix sort for sorting the integers you get a complexity for iterating over all collisions of O( NbKeys + NbQueries + NbCollisions ).
To improve queries that are contentious I'd try build something like a R-tree or range/interval-tree of the unbooked intervals (the complement of the booked intervals). One invariant will be that availability intervals are placed at the highest possible layer in the tree (the smallest region that they entirely fit inside). Then query the R-tree (or interval tree) but only to the depth corresponding to the interval size requested. (Any lower levels would contain only availability windows that are too small, so we can skip searching them).
That would avoid scanning all the small intervals. Not sure how much overhead this would add and if it would work out significantly faster than your B-tree though...
This is useful to determining maximum concurrent use of a resource; for instance, if there are only 3 projectors, a maximum of 3 lessons that require projectors can overlap at any given time.
The basic idea is to use a sorted multi-set of each interval's end points, doing case analysis on insertion to determine what interval clusters to merge; removal re-computes the entire cluster the removed range was in, splitting the cluster into two or more when gaps are encountered.
For the implementation, see https://github.com/TimefoldAI/timefold-solver/blob/main/core...
As a side effect of keeping track of connected clusters, the data structure also keeps track of gaps between clusters (which could be used to find the next availability).
For how it is used in practice, there an example in Timefold's docs: https://docs.timefold.ai/timefold-solver/latest/constraints-...
A simpler algorithm can be used when each interval represent a time grain (fixed length, no two intervals overlap). See https://github.com/TimefoldAI/timefold-solver/tree/main/core... for that.
I didn't think it though fully, but did you consider using a bit set only storing gap/non-gap transitions and vice versa then compressing it in a rank-select data structure? The even-odd result of a rank query will tell you if you are in a gap or not. It should also be possible to use select to find the closest hole.
The problem is that AFAIK rank-select structures cannot be modified, so you will need to regenerate it from scratch at every update. That (or an hybrid scheme) will work only if updates are rare enough.
You mentioned shop scheduling below as well. If your intervals represent booked time, you could also insert 'unavailable' time into the coalesced set, so that, when fully booked, the interval set is reduced to a single interval.
node->spanning = union(node->interval, node->left->spanning, node->right->spanning)
On rotations, only two spanning intervals need to be recomputed. You can search for an interval and the result is the subset of nodes whose intervals overlap with it. And of course, rotations keep the treap balanced at all times.Something to look into would be to ensure that the starts of intervals are sorted after the ends of intervals if they happen to be at the same time. If you got that then the starts and ends of intervals simply alternate and most queries easily translate to one or two lookups.
Make a binary tree where node sizes are powers of 2. I'm assuming you can define some "zero" time. To add an interval, first check if it exceeds the tree size and if so, double the size of the tree until it fits inside. Doubling involves making a new root node and inserting the existing tree under it (this was extra fun in 3D). Then decide what node size your interval should reside in. I would use the first power of 2 less than or equal to the interval size. Then insert that interval into all nodes of that determined size. Yes, I allow more than one object in a node, as well as allowing smaller nodes under one containing an object. An object may also span more than one node. In other words the objects (intervals) determine their own position in this tree regardless of others in the tree. It's not perfectly optimal but it allows inserts and deletes in O(logN) as well as intersection checks.
If this sounds not too terrible for the 1D case, I can elaborate on the data structure a bit.
Edit: From another comment "quickly finding the nearest gaps of at least duration X is most important." That possibly invalidates this whole approach.
One which came to mind which possibly might be interesting would be the Bounding Interval Hierarchy[2], where you partition the space but also keep the min/max intervals of all children in each node. The "Instant ray tracing" paper[3] details some interesting techniques for speeding up construction.
Or perhaps it's a rubbish idea, haven't really thought it through.
[1]: https://en.wikipedia.org/wiki/Binary_space_partitioning
[2]: https://en.wikipedia.org/wiki/Bounding_interval_hierarchy
--
1: https://or-tools.github.io/docs/cpp/classoperations__researc...
Maybe you can adapt it for your use case and add those new constraints in?
Keep in mind though that this was not written to be in the hot-path itself, you could probably do significantly better by pouring your soul into the SIMD rabbit hole (though SIMD in Rust is usually very annoying to write)
Best of luck, hope this helps!
[0] https://github.com/bytecodealliance/wasmtime/blob/7dcb9bd6ea...
1st - Put all the actual data in a sorted / linked list. Keep this updated as you always have
2nd - As a separate thread, Keep an ordered set of lists, each list matching a time in the past, present, future, with increasing increments as you move from the present. Each of those lists can be all the items available at that time. The past is for reporting on old events, etc.
-365 days
-180 days
-90 days
-45 days
-30 days
-29 days...
-1 day
-23 hours... -1 hour
NOW
+1 hour... +23 hours
+1 day... +30 days
+45, 90, 180, 365 days
In each of those lists, have pointers to all the relevant records in the main list.If there's a pointer error, scan the whole list, the slow but reliable way.
It turns out doing such algebra, you can use any structure for underlaying representation for set of intervals (it is trivially replacable), and I beleve that using vector would be most efficient.
I do have some protoyping code laying around, let me know if you want to play around with it.
If you need second level precision for scheduling, you probably do, but a u16 will cover a 45 day range with minute level precision.
Example use case: https://chrlschn.dev/blog/2022/11/concurrent-processing-dotn...
Queries require O(log n + m) time, with n being the total number of intervals and m being the number of reported results. Construction requires O(n log n) time, and storage requires O(n) space.
I imagine the answer depends on exactly what queries you need. If you have a bunch of different standard event sizes, then managing multiple different sorted freelists for the next open 1 hour slot, the next open 3 hour slot, etc might work. If you need high concurrency and "next available slot" is allowed to be fuzzy, then managing four distinct "event heaps" and parallelizing across them might work.
"Hashed and Hierarchical Timing Wheels: Data Structures for the Efficient Implementation of a Timer Facility" by George Varghese and Tony Lauck
http://www.cs.columbia.edu/~nahum/w6998/papers/sosp87-timing...
If it's not worth doing that pre-computation, as your trials have shown and others say, ordered maps might be the best available.
Your description of the problem smells a bit like job-shop scheduling. Optimizing job-shop scheduling is NP-Hard, so at best one general purpose algorithm might be better than another, but you can't determine which by simple inspection. Efficiency will vary based on the actual details of your data.
The standard approach (after managing IO) is to tune for the regularities in your data. Regularities are what makes your arbitrary data arbitrary data and not random values. Tuning the algorithm can get you a long way in the time domain. No data structure alone can get you out of NP. Good luck.
There's Discete Interval Encoding Tree (Diet). The basic idea is that, in a BST of intervals, if an insertion fills a hole of two intervals, they get merged into a single node.
The paper: https://web.engr.oregonstate.edu/~erwig/diet/
A Scala implementation example: https://typelevel.org/cats-collections/diet.html
Have you tried roaring bitmaps?, it will internally use either a sparse or dense representation based on the bits set and uses SIMD for all its operations.
Do consider to explicitly use 16(-ish) bit pointers instead of native sized ones, and go for a SIMD-friendly binary trie; with some degree of implicitnes where it is addressed as if it was storing a set of mutually prefix-free variable-length (capped to something though, like 32 in your example) bitstrings (no entry is a prefix of another entry), but you're not looking to distinguish `{00, 01}` from `{0}`.
Operations are restricted, but it should be able to be very fast for "find first empty of at least X large starting at or after Y" as well as "clear from X to Y inclusive" and "add/mark from X to Y inclusive".
- If you're only querying one of these things frequently, just do something better than a linear scan and ensure the backing memory is contiguous. It'll fit in the L1 cache and be fast enough.
- If you're querying hundreds of thousands of these (e.g., checking different pieces of equipment?), don't do that. Go up one level of abstraction so that you don't have to do hundreds of thousands of queries. Going to the equipment example, some of that equipment will be equivalent for a given query (maybe all of it, maybe each piece is truly unique for a given application, but for any incoming query there exists a category describing which subset of the equipment you care about). The simplest thing that could possibly work is if you have a low enough category count that you can just have an ordered multiset of available intervals for each category, with some auxilliary data for the bookkeeping on mutations. Any time you're looking at hundreds of thousands of disjiont things in a loop though, that's suggestive of the wrong level of abstraction (from a performance perspective). Even throwing it into in-memory sqlite with a multi-column index is probably better.
- It's almost always worth looking at your desired performance for each possible action on a data structure. E.g., I'm imagining "insert 1" is an atomic thing you don't want to feel laggy to the user, same with "delete 1", and same with "query all." If that's a good enough approximation of your requirements then it buys you a boatload of flexibility since you can move work from query time (which, if you have to do a linear scan of subsets of 100k biggish things is going to be pushing uncumfortably close to the limits of random-access main memory loads no matter which data structure you choose) to mutation time. You can linearly scan the entire L1 cache a thousand times in 10ms, which is a lot of budget to compact a data structure, duplicate its indices hierarchially, or otherwise manipulate the data to make queries faster.
- You can make that idea more flexible with some sort of "deferred maintenance" approach. E.g., if you're inserting `n` things, you can almost certainly afford to defer maintenance till after they're all inserted, hopefully saving some work if a few of those were in the same structure. E.g., if the only maintenance you're doing is compaction for cache reasons, you can maintain a counter of how many inserts have happened (or how slow recent queries were), and the next insert once the scale is tipped will be the one to actually do that maintenance -- allowing a simple strategy of blindly rebuilding these things every time to actually scale to reasonable volumes of bulk inserts within a 10ms window (to keep users from noticing lag).
- You gain a lot of power by your intervals being disjoint. You gain more if they're never adjacent (i.e., a booked block is always an endpoint in the data structure or bounded by unbooked blocks, and vice versa). Representing contiguous blocks of bookings as a single unit referencing the actual blocks (that unit could be some sort of balanced tree if you want, easy to split in half on deletions, just hold a pointer to it and its endpoints though and basically never peek into the details till you decide to actually render that contiguous block as being comprised of its parts).
Related
The CIDR trie data structure
This blog post explores CIDR trie data structure in Rust for efficient IP address management. It introduces CIDR trie for fast country lookup, explaining implementation details and time complexity benefits, suggesting optimizations. It mentions trie_rs crate for advanced trie use.
Properly Testing Concurrent Data Structures
The article explores testing concurrent data structures using the Rust library loom. It demonstrates creating property tests with managed threads to simulate concurrent behavior, emphasizing synchronization challenges and design considerations.
Binary Search Tree with SIMD
Clément Jean presents a cache-friendly algorithm for binary search trees in Go, optimizing memory access with SIMD operations. The approach enhances search efficiency, demonstrated through benchmark tests, and suggests broader applications in data structures.
Compact Fenwick trees for dynamic ranking and selection (2019)
The paper presents Compact Fenwick trees, enhancing array storage and operations efficiency. It introduces optimized variants for faster searches and reduced space usage, aiming to create a dynamic bit vector with logarithmic time complexity. Additionally, it proposes solutions to leverage CPU cache structures for enhanced performance.
Implementing and Improving Skiplists
Matt Hall discusses skiplists as an alternative to balanced binary trees, focusing on implementation and improvement. Skiplists offer efficiency in data insertion and retrieval but face challenges in optimizing cache utilization with proposed solutions showing mixed performance results.