October 26th, 2024

Apache Doris: open-source data warehouse for real time data analytics

Apache Doris is an open-source data warehouse for real-time analytics, featuring a compute-storage decoupled mode, high concurrency optimization, and support for various data ingestion methods, used by enterprises like TikTok.

Read original article

Apache Doris: open-source data warehouse for real time data analytics

Apache Doris is an open-source data warehouse designed for real-time analytics, enabling rapid data processing and analysis at scale. The latest version, 3.0, introduces a compute-storage decoupled mode, enhancing deployment flexibility. Apache Doris supports various data ingestion methods, including push-based micro-batch and pull-based streaming, allowing for real-time data updates. Its architecture is optimized for high concurrency and throughput, utilizing a columnar storage engine and MPP (Massively Parallel Processing) architecture. The platform also features federated querying capabilities, allowing integration with data lakes and other databases, and supports semi-structured data types like JSON. Apache Doris is compatible with MySQL protocols and ANSI SQL, facilitating easy integration with business intelligence tools and external compute engines. The system is designed for diverse analytics use cases, ranging from real-time reporting to log analytics and customer data platforms. Notable implementations include TikTok's real-time data architecture and MiniMax's PB-scale logging system. The community around Apache Doris is active, with numerous contributors and enterprises utilizing the platform.

- Apache Doris is an open-source data warehouse for real-time analytics.

- Version 3.0 introduces a compute-storage decoupled mode for flexible deployment.

- It supports various data ingestion methods and is optimized for high concurrency.

- The platform allows federated querying and integration with multiple data sources.

- Apache Doris is used by enterprises like TikTok and MiniMax for real-time data solutions.

Show HN: Denormalized – Embeddable Stream Processing in Rust and DataFusion

Denormalized is a developing stream processing engine based on Apache DataFusion, supporting Kafka. Users can start with Docker and Rust/Cargo, with future features planned for enhanced functionality.

Apache Cassandra 5.0 Is Generally Available

Apache Cassandra 5.0 has been released, featuring improved usability, Storage Attached Indexes, Trie optimizations, JDK 17 support, a Unified Compaction Strategy, and vector search capabilities, prompting upgrades from version 3.x.

Datomic and Content Addressable Techniques

Latacora has developed a data collection system using Datomic, focusing on deduplication and efficient querying. It supports dynamic schema inference, real-time analysis, and visualizations for tracking client environment changes.

Show HN: Oodle – serverless, fully-managed, drop-in replacement for Prometheus

Oodle.ai has created a cost-efficient metrics observability system that processes over 1 billion time series per hour, enhancing scalability and performance while integrating easily with existing tools and protocols.

A FLOSS platform for data analysis pipelines that you probably haven't heard of

Arvados is an open-source platform for managing large datasets, featuring Keep for storage, Crunch for workflow orchestration, and ensuring data security. Users can access it via web, command line, or API.

1 comments

By @reval - 6 months

Does anyone have experience with this? How does it compare to say Redshift?

Apache Doris: open-source data warehouse for real time data analytics

Related

Show HN: Denormalized – Embeddable Stream Processing in Rust and DataFusion

Apache Cassandra 5.0 Is Generally Available

Datomic and Content Addressable Techniques

Show HN: Oodle – serverless, fully-managed, drop-in replacement for Prometheus

A FLOSS platform for data analysis pipelines that you probably haven't heard of

Related

Show HN: Denormalized – Embeddable Stream Processing in Rust and DataFusion

Apache Cassandra 5.0 Is Generally Available

Datomic and Content Addressable Techniques

Show HN: Oodle – serverless, fully-managed, drop-in replacement for Prometheus

A FLOSS platform for data analysis pipelines that you probably haven't heard of