March 24th, 2025

Open-source Rust database tops JSONBench using DataFusion

GreptimeDB excelled in the JSONBench benchmark, outperforming ClickHouse and VictoriaLogs, achieving top query speed for 1 billion JSON documents, and offering cost-effective, efficient solutions for large-scale observability data.

Read original articleLink Icon
Open-source Rust database tops JSONBench using DataFusion

GreptimeDB has demonstrated its capabilities in handling large-scale datasets by outperforming competitors like ClickHouse and VictoriaLogs in the JSONBench benchmark, which focuses on analytical queries over JSON documents. The benchmark involved executing queries on datasets ranging from 1 to 1 billion JSON documents. GreptimeDB achieved the top rank in query speed during the cold run of 1 billion documents and showed superior performance in storage efficiency. Its cloud-native architecture allows it to leverage object storage for primary data storage, significantly reducing costs while maintaining high performance. GreptimeDB also features a built-in Pipeline (ETL) engine for native JSON support, enhancing its usability for observability data. The database's ability to perform in-database streaming allows for efficient real-time analytics, making it suitable for complex queries. Overall, GreptimeDB's performance in the benchmark highlights its potential as a cost-effective solution for enterprises dealing with large volumes of observability data.

- GreptimeDB outperformed ClickHouse and VictoriaLogs in the JSONBench benchmark.

- It ranked first in query speed for 1 billion JSON documents during cold runs.

- The database utilizes object storage to reduce costs while maintaining performance.

- GreptimeDB features a built-in ETL engine for efficient JSON data handling.

- Its in-database streaming capabilities enhance performance for real-time analytics.

Link Icon 4 comments
By @killme2008 - about 1 month
In the past, I also had doubts about whether a Rust database built on open-source components would be performance-limited, but this evaluation has dispelled our concerns. Apache DataFusion + Arrow + Parquet + OpenDAL, as a new data stack, have proven their potential.
By @k_bx - about 1 month
Big question I have is: should I invest in DeltaLake/Iceberg based solution, or something like this?