August 14th, 2024

Show HN: SlateDB – An embedded storage engine built on object storage

SlateDB is an embedded storage engine for object storage systems, featuring batch writes and asynchronous operations. It is in early development, with basic functionalities and an open-source Apache 2.0 license.

Read original articleLink Icon
Show HN: SlateDB – An embedded storage engine built on object storage

SlateDB is an embedded storage engine that utilizes a log-structured merge-tree (LSM-tree) architecture, specifically designed to write data to object storage systems such as S3, GCS, ABS, and MinIO. This design allows for extensive storage capacity, high durability, and straightforward replication, although it may result in increased latency and higher API costs compared to traditional local disk storage. Key features of SlateDB include batch writes to minimize write API costs, an asynchronous `put` method for flexibility in durability and latency, and various caching techniques to enhance read performance. To integrate SlateDB into a Rust project, users can add it as a dependency in their `Cargo.toml` file and follow a simple example for basic operations like put, get, and delete. Currently, SlateDB is in early development, with basic functionalities implemented, including a simple API and persistence features, while more advanced features such as range queries and transactions are still under development. The project is licensed under the Apache License, Version 2.0.

- SlateDB is designed for object storage systems, offering high durability and scalability.

- It features batch writes and asynchronous operations to optimize performance and cost.

- The engine is still in early development, with some features implemented and others in progress.

- Users can easily integrate SlateDB into Rust projects with straightforward setup instructions.

- The project is open-source and licensed under the Apache License, Version 2.0.

Link Icon 3 comments
By @Reubend - 4 months
It's a very very cool idea, but I'm still not clear on the main benefits.

Bottomless storage: yes, but couldn't you theoretically achieve this with plenty of cloud DB services? Amazon Aurora goes up to 128 TB, and once your DB gets to that size, it's likely that you can hire some dedicated engineers to handle more complicated setups.

High durability: yes, but couldn't this be achieves with a "normal" DB that has a read replica using object storage, rather than the entire DB using object storage?

Easy replication: arguably not easier than normal replication, depending on which cloud DB you're considering as an alternative.

By @dangoodmanUT - 4 months
I've been working on something super similar, but some of the arch decisions here are curious considering the clear tradeoffs made.

For example, if you have a durability flush interval, what is the WAL for? L0 is the WAL now.

By @Already__Taken - 4 months
Seems analogous to putting seaweedfs in front of a cloud S3. Then adding a database. We use(unrelated) Zenoh and Loki keeping state on S3 so it would be interesting to have a KV engine.