April 16th, 2025

KIP-1150: Diskless Kafka Topics

KIP-1150 proposes Diskless Topics in Apache Kafka to optimize storage and reduce costs by using object storage, enabling multi-region active-active topics and automatic failover, enhancing Kafka's market competitiveness.

Read original article

KIP-1150 proposes the introduction of Diskless Topics in Apache Kafka to optimize storage and reduce operational costs, particularly in cloud environments. The motivation behind this proposal stems from the increasing workloads on Kafka and the need for cost-effective solutions, as existing replication methods are expensive. Diskless Topics would allow Kafka operators to utilize object storage instead of block storage, eliminating inter-zone data transfer costs and enabling multi-region active-active topics with automatic failover. This feature aims to enhance Kafka's competitiveness against alternative protocols that already leverage object storage. The proposal does not require immediate changes to the codebase or documentation but seeks consensus on the necessity of this feature. Future KIPs will detail the implementation of related functionalities, such as producer rack-awareness and garbage collection for diskless objects. The proposal emphasizes that maintaining this feature under the Apache 2.0 license will benefit the community and ensure Kafka's relevance in the market. The KIP also outlines potential follow-up features that could enhance Kafka's capabilities further. Overall, KIP-1150 aims to position Apache Kafka as a versatile streaming engine that balances cost and performance across diverse workloads.

- KIP-1150 introduces Diskless Topics to optimize storage in Apache Kafka.

- The proposal aims to reduce operational costs by utilizing object storage instead of block storage.

- Diskless Topics will enable features like multi-region active-active topics and automatic failover.

- Future KIPs will detail the implementation of related functionalities.

- The proposal seeks to maintain Apache Kafka's competitiveness in the market against alternative protocols.

The Essence of Apache Kafka

Apache Kafka is a distributed event-driven architecture that enables efficient real-time data streaming, ensuring fault tolerance and scalability through an append-only log structure and partitioned topics across multiple nodes.

Jepsen: Bufstream 0.1.0

Bufstream 0.1.0, a Kafka-compatible streaming system using object storage, faced safety and liveness issues addressed in version 0.1.3. It aims for cost efficiency and scalability while sharing Kafka's unresolved problems.

Kafka at the low end: how bad can it get?

The blog post outlines Kafka's challenges as a job queue in low-volume scenarios, highlighting unfair job distribution, increased latency, and recommending caution until improvements from KIP-932 are implemented.

Apache Kafka 4.0 Released

Apache Kafka 4.0.0 has been released, eliminating ZooKeeper, enhancing scalability, and introducing new features like improved consumer performance and queue semantics support, while requiring updated Java versions and removing deprecated APIs.

Conflict-Free Distributed Architecture for Append-Only Writes to Apache Iceberg

Apache Iceberg is enhancing scalability by separating data writing from metadata commits, allowing high-throughput real-time ingestion while addressing challenges like writer failures and centralized committer reliability.

4 comments

By @NortySpock - 9 days

I presume this is part of the knock-on effects of Confluent (managed Kafka) buying WarpStream (Kafka emulated on S3 object storage).

Also a shout-out to Bento, the fork of Benthos after RedPanda acquired it.

By @chtefi - 9 days

Full disaggregation of compute and storage is the right direction. Let storage handle replication, it's getting good, global, low latency, cheaper (like with S3 Express). Kafka becomes a smart data ingester and router: it moves bytes, enforces ordering, does minimal buffering. That's it. Do one thing well.

You get a system simpler to operate, to scale, and more flexible; data could be consumed outside of Kafka itself (in a batch way typically), without duplicating the data, that's a big win.

KIP-1150: Diskless Kafka Topics