Jepsen: Bufstream 0.1.0
Bufstream 0.1.0, a Kafka-compatible streaming system using object storage, faced safety and liveness issues addressed in version 0.1.3. It aims for cost efficiency and scalability while sharing Kafka's unresolved problems.
Read original articleBufstream 0.1.0 is a Kafka-compatible streaming system that stores records directly in object storage services like S3. A recent analysis identified three safety issues and two liveness issues in Bufstream, including problems with stuck consumers and producers, spurious zero offsets, and the loss of acknowledged writes in healthy clusters. These issues were addressed in version 0.1.3. The report also highlighted four broader issues related to Kafka, such as inadequate documentation on transaction semantics, deadlocks in the Java client, and various transaction-related problems stemming from the lack of message ordering constraints. These issues affect not only Kafka but also Bufstream and potentially other Kafka-compatible systems. Bufstream aims to provide a cost-efficient alternative to Kafka by utilizing object storage, allowing for stateless, auto-scaled virtual machines. The system consists of three subsystems: an agent for Kafka API interaction, an object store for record storage, and a coordination service for managing record order and commitment. Testing of Bufstream involved various configurations to ensure safety and involved injecting faults to assess system resilience. The findings indicate that while Bufstream offers promising features, it shares some unresolved issues with Kafka, particularly regarding transaction isolation and documentation clarity.
- Bufstream is a Kafka-compatible streaming system that uses object storage for data management.
- Version 0.1.3 addressed several safety and liveness issues found in the initial release.
- The analysis revealed broader issues with Kafka's transaction semantics and documentation.
- Bufstream aims to enhance cost efficiency and scalability compared to traditional Kafka setups.
- Testing involved fault injection to evaluate system resilience and safety under various configurations.
Related
BlockQueue: SQLite-powered pub/sub for lean, fast messaging
Block Queue is a lightweight, cost-effective messaging system using a pub/sub mechanism, built on SQLite3 and NutsDB, supporting Turso Database and PostgreSQL, with high performance and open-source availability.
Show HN: Denormalized – Embeddable Stream Processing in Rust and DataFusion
Denormalized is a developing stream processing engine based on Apache DataFusion, supporting Kafka. Users can start with Docker and Rust/Cargo, with future features planned for enhanced functionality.
BuffDB is a Rust library to simplify multi-plexing on edge devices
BuffDB is a lightweight, high-performance persistence layer for gRPC, developed in Rust. It supports SQLite, DuckDB, and RocksDB, and is ideal for offline data access applications.
Warpstream Joined Confluent
WarpStream is integrating with Confluent to enhance data streaming services, introducing new features like Kafka transaction support and expanding to GCP and Azure while maintaining its existing team and brand.
The Essence of Apache Kafka
Apache Kafka is a distributed event-driven architecture that enables efficient real-time data streaming, ensuring fault tolerance and scalability through an append-only log structure and partitioned topics across multiple nodes.
- There are significant safety and liveness issues in Kafka, with calls for further testing and investigation into its transaction protocol.
- Users express confusion over Bufstream's pricing model and its claim of complete data control without additional fees.
- Concerns are raised about Kafka's auto-commit feature potentially leading to data loss.
- Some commenters inquire about comparisons with other streaming systems like NATS Jetstream and Warpstream.
- There is a desire for more thorough testing frameworks, such as a collaboration between Jepsen and Antithesis, to enhance database safety.
Seems like Jepsen should do another Kafka deep-dive. Last time was in 2013 (https://aphyr.com/posts/293-call-me-maybe-kafka, Kafka version 0.8 beta) and seems like they're on the verge of discovering a lot of issues in Kafka itself. Things like "causing writes to be acknowledged but silently discarded" sounds very scary.
> [with the default enable.auto.commit=true] Kafka consumers may automatically mark offsets as committed, regardless of whether they have actually been processed by the application. This means that a consumer can poll a series of records, mark them as committed, then crash—effectively causing those records to be lost
That's never been my understanding of auto-commit, that would be a crazy default wouldn't it?
The docs say this:
> when auto-commit is enabled, every time the poll method is called and data is fetched, the consumer is ready to automatically commit the offsets of messages that have been returned by the poll. If the processing of these messages is not completed before the next auto-commit interval, there’s a risk of losing the message’s progress if the consumer crashes or is otherwise restarted. In this case, when the consumer restarts, it will begin consuming from the last committed offset. When this happens, the last committed position can be as old as the auto-commit interval. Any messages that have arrived since the last commit are read again. If you want to reduce the window for duplicates, you can reduce the auto-commit interval
I don't find it amazingly clear, but overall my understanding from this is that offsets are committed _only_ if the processing finishes. Tuning the auto-commit interval helps with duplicate processing, not with lost messages, as you'd expect for at-least-once processing.
> Bufstream runs fully within your AWS or GCP VPC, giving you complete control over your data, metadata, and uptime. Unlike the alternatives, Bufstream never phones home.
> Bufstream pricing is simple: just $0.002 per uncompressed GiB written (about $2 per TiB). We don't charge any per-core, per-agent, or per-call fees.
Surely they wouldn’t run their entire business on the honor system?
Ouch. Great investigation work and write-up, as ever!
After reading thru the relevant blog posts and docs, my understanding is that Kafka defines "exactly-once delivery" as a property of what they call a "read-process-write operation", where workers read-from topic 1, and write-to topic 2, where both topics are in the same logical Kafka system. Is that correct? If so, isn't that better described as a transaction?
> Transactions may observe none, part, or all
Should, I think, read:
> Consumets may observe none, part, or all
For those unaware, Antithesis was founded by some of the folks who worked on FoundationDB - see https://youtu.be/4fFDFbi3toc?si=wY_mrD63fH2osiU- for some of their handiwork.
A Jepsen + Antithesis team up is something the world needs right now, specifically on the back of the Horizon Post Office scandal.
Thanks for all your work highlighting the importance of db safety Aphyr
Related
BlockQueue: SQLite-powered pub/sub for lean, fast messaging
Block Queue is a lightweight, cost-effective messaging system using a pub/sub mechanism, built on SQLite3 and NutsDB, supporting Turso Database and PostgreSQL, with high performance and open-source availability.
Show HN: Denormalized – Embeddable Stream Processing in Rust and DataFusion
Denormalized is a developing stream processing engine based on Apache DataFusion, supporting Kafka. Users can start with Docker and Rust/Cargo, with future features planned for enhanced functionality.
BuffDB is a Rust library to simplify multi-plexing on edge devices
BuffDB is a lightweight, high-performance persistence layer for gRPC, developed in Rust. It supports SQLite, DuckDB, and RocksDB, and is ideal for offline data access applications.
Warpstream Joined Confluent
WarpStream is integrating with Confluent to enhance data streaming services, introducing new features like Kafka transaction support and expanding to GCP and Azure while maintaining its existing team and brand.
The Essence of Apache Kafka
Apache Kafka is a distributed event-driven architecture that enables efficient real-time data streaming, ensuring fault tolerance and scalability through an append-only log structure and partitioned topics across multiple nodes.