Show HN: Denormalized – Embeddable Stream Processing in Rust and DataFusion
Denormalized is a developing stream processing engine based on Apache DataFusion, supporting Kafka. Users can start with Docker and Rust/Cargo, with future features planned for enhanced functionality.
Read original articleDenormalized is a fast embeddable stream processing engine built on Apache DataFusion, aimed at real-time stream processing with support for Kafka as both a source and sink. The project is currently in development, and the team is seeking design partners to collaborate on specific use cases. Users can engage with the developers through GitHub issues or email. To get started, users need Docker and Rust/Cargo installed. The quickstart guide includes instructions for running Kafka in Docker, emitting sample data, and performing simple streaming aggregations. Additional examples, such as a Kafka ridesharing scenario, are also provided. The roadmap outlines completed features like stream aggregation and joins, with future plans for checkpointing, session windows, a stateful UDF API, and integrations with DuckDB, PostgreSQL, Python, and TypeScript, along with a user interface. The project is maintained by a team in San Francisco, and inquiries can be directed to hello@denormalized.io or through GitHub.
- Denormalized is a stream processing engine built on Apache DataFusion.
- It supports Kafka for real-time data processing and is currently in development.
- Users can start using it with Docker and Rust/Cargo.
- Future features include checkpointing, session windows, and various integrations.
- The project team is open to collaboration and inquiries via GitHub or email.
Related
The Ultimate Database Platform
AverageDB, a database platform for developers, raised $50 million in funding. It offers speed, efficiency, serverless architecture, real-time data access, and customizable pricing. The platform prioritizes data privacy and caters to diverse user needs.
DuckDB Meets Postgres
Organizations shift historical Postgres data to S3 with Apache Iceberg, enhancing query capabilities. ParadeDB integrates Iceberg with S3 and Google Cloud Storage, replacing DataFusion with DuckDB for improved analytics in pg_lakehouse.
Show HN: I made a TUI for kafka (kaskade)
The GitHub repository "Kaskade" offers a text user interface for Apache Kafka, providing admin features and consumer functionalities. It includes installation guidelines, configuration examples, development guidance, and screenshots. Visit [sauljabin/kaskade] for more details.
Show HN: Pg_replicate – Build Postgres replication applications in Rust
pg_replicate is a Rust crate for PostgreSQL data replication, supporting logical streaming replication. It offers easy integration, a quickstart guide, and plans for future enhancements and additional data sinks.
Launch HN: Synnax (YC S24) – Unified hardware control and sensor data streaming
Synnax is a platform that connects sensors and actuators for real-time telemetry and data analysis, featuring a scalable time series database, supporting multiple programming languages, and offering free usage for up to 50 channels.
- Users express excitement about the project's potential and ease of setup.
- Several commenters inquire about specific features, such as support for OLAP use cases and pluggable data sources.
- There is curiosity about how Denormalized compares to existing solutions like Arroyo and Flink.
- Founders and developers from related projects show interest in collaboration and integration with Denormalized.
- Many users are eager for future features, including a Python SDK and TypeScript bindings.
Ideally, you'd support an api similar to Polars (which I have found to be the nicest thus far).
It'd also be important/useful to support Python udfs (think numpy/jax/etc.).
It'd be very cool if you could collaborate with or even tap into the polars frontend. If you could execute polars logical plans but with a streaming source, that would be huge.
All the description for Denormalized use the term, so if don't know it, it's kind of impossible to understand what Denormalized is / trying to solve.
Bookmarked for future projects!
Will reach out! Congrats on the ship.
Related
The Ultimate Database Platform
AverageDB, a database platform for developers, raised $50 million in funding. It offers speed, efficiency, serverless architecture, real-time data access, and customizable pricing. The platform prioritizes data privacy and caters to diverse user needs.
DuckDB Meets Postgres
Organizations shift historical Postgres data to S3 with Apache Iceberg, enhancing query capabilities. ParadeDB integrates Iceberg with S3 and Google Cloud Storage, replacing DataFusion with DuckDB for improved analytics in pg_lakehouse.
Show HN: I made a TUI for kafka (kaskade)
The GitHub repository "Kaskade" offers a text user interface for Apache Kafka, providing admin features and consumer functionalities. It includes installation guidelines, configuration examples, development guidance, and screenshots. Visit [sauljabin/kaskade] for more details.
Show HN: Pg_replicate – Build Postgres replication applications in Rust
pg_replicate is a Rust crate for PostgreSQL data replication, supporting logical streaming replication. It offers easy integration, a quickstart guide, and plans for future enhancements and additional data sinks.
Launch HN: Synnax (YC S24) – Unified hardware control and sensor data streaming
Synnax is a platform that connects sensors and actuators for real-time telemetry and data analysis, featuring a scalable time series database, supporting multiple programming languages, and offering free usage for up to 50 channels.