October 26th, 2024

Apache Doris: open-source data warehouse for real time data analytics

Apache Doris is an open-source data warehouse for real-time analytics, featuring a compute-storage decoupled mode, high concurrency optimization, and support for various data ingestion methods, used by enterprises like TikTok.

Read original articleLink Icon
Apache Doris: open-source data warehouse for real time data analytics

Apache Doris is an open-source data warehouse designed for real-time analytics, enabling rapid data processing and analysis at scale. The latest version, 3.0, introduces a compute-storage decoupled mode, enhancing deployment flexibility. Apache Doris supports various data ingestion methods, including push-based micro-batch and pull-based streaming, allowing for real-time data updates. Its architecture is optimized for high concurrency and throughput, utilizing a columnar storage engine and MPP (Massively Parallel Processing) architecture. The platform also features federated querying capabilities, allowing integration with data lakes and other databases, and supports semi-structured data types like JSON. Apache Doris is compatible with MySQL protocols and ANSI SQL, facilitating easy integration with business intelligence tools and external compute engines. The system is designed for diverse analytics use cases, ranging from real-time reporting to log analytics and customer data platforms. Notable implementations include TikTok's real-time data architecture and MiniMax's PB-scale logging system. The community around Apache Doris is active, with numerous contributors and enterprises utilizing the platform.

- Apache Doris is an open-source data warehouse for real-time analytics.

- Version 3.0 introduces a compute-storage decoupled mode for flexible deployment.

- It supports various data ingestion methods and is optimized for high concurrency.

- The platform allows federated querying and integration with multiple data sources.

- Apache Doris is used by enterprises like TikTok and MiniMax for real-time data solutions.

Link Icon 1 comments
By @reval - 6 months
Does anyone have experience with this? How does it compare to say Redshift?