Show HN: Pg_replicate – Build Postgres replication applications in Rust
pg_replicate is a Rust crate for PostgreSQL data replication, supporting logical streaming replication. It offers easy integration, a quickstart guide, and plans for future enhancements and additional data sinks.
Read original articleThe GitHub repository pg_replicate is a Rust crate aimed at facilitating the creation of replication solutions for PostgreSQL. It streamlines the development of data pipelines that enable continuous data copying from PostgreSQL to various systems through logical streaming replication. Key features include data replication capabilities, the use of PostgreSQL's logical streaming replication protocol, and an easy setup process with examples and a quickstart guide. Users can initiate replication by creating a publication in PostgreSQL and running a provided example command. To integrate pg_replicate into a Rust project, it can be added to the Cargo.toml file. The repository is structured into several components, including a REST API for cloud hosting, common configuration types, the core library crate, and a binary crate available as a Docker container. The roadmap indicates plans for adding more data sinks, such as Snowflake and ClickHouse, while also focusing on performance enhancements. The project is distributed under the Apache-2.0 License, and additional examples and usage instructions are available in the repository's examples folder.
- pg_replicate is designed for PostgreSQL data replication using Rust.
- It supports logical streaming replication and offers a quickstart guide.
- Users can easily integrate it into their Rust projects via Cargo.toml.
- Future developments aim to include more data sinks and performance improvements.
- The project is licensed under Apache-2.0.
Related
Using short lived Postgres servers for testing
Database servers like PostgreSQL can be quickly set up for short-lived environments or CI/CD pipelines by creating new data directories and using pg_basebackup for efficient data population. This method simplifies testing and demo setups.
Rust FSM-based Resumable Postgres tasks
The "pg_task" project on GitHub manages FSM-based Resumable Postgres tasks. It features granular state machines, error handling, single-table task scheduling, task definition, execution, stopping, and updating guidelines. Licensed under MIT.
PostgREST – Serve a RESTful API from Any Postgres Database
PostgREST creates RESTful APIs from PostgreSQL databases, offering high performance, security via JWT, multiple API versions, self-documentation with OpenAPI, and community support for contributions and sponsorships.
SQLite vs. PostgreSQL
SQLite is ideal for simple, self-hosted projects with low latency, while PostgreSQL is better for applications needing advanced features and geographical distribution. The choice depends on project requirements.
Pgzx: Postgres Extensions with Zig
Xata has launched pgzx, an open-source framework for building PostgreSQL extensions with Zig, enhancing code maintainability and efficiency, and includes features for memory safety, error handling, and testing.
- Users appreciate the simplicity and focus of pg_replicate compared to existing tools like Debezium.
- There is enthusiasm for the integration of Rust with PostgreSQL, seen as a powerful combination for data replication.
- Some users express interest in specific use cases, such as continuous backups and asynchronous processing of database operations.
- Feedback on documentation and performance is noted, indicating areas for improvement as the tool develops.
- Several commenters share their own experiences and projects related to data replication, fostering a sense of community and collaboration.
For the past few months, as part of my job at Supabase, I have been working on pg_replicate. pg_replicate lets you very easily build applications which can copy data (full table copies and cdc) from Postgres to any other data system. Around six months back I was figuring out what can be built by tailing Postgres' WAL. pg_replicate grew organically out of that effort. Many similar tools, like Debezium, exist already which do a great job, but pg_replicate is much simpler and focussed only on Postgres. Rust was used in the project because I am most comfortable with it. pg_replicate abstracts over the Postgres logical replication protocol[0] and lets you work with higher level concepts. There are three main concepts to understand pg_replicate: source, sink and pipeline.
1/ A source is a Postgres db from which data is to be copied. 2/ A sink is a data system into which data will be copied. 3/ A pipeline connects a source to a sink.
Currently pg_replicate supports BigQuery, DuckDb local file and, MotherDuck as sinks. More sinks will be added in future. To support a new data system, you just need to implement the BatchSink trait (older Sink trait will be deprecated soon).
pg_replicate is still under heavy development and is a little thin on documentation. Performance is another area which hasn't received much attention. We are releasing this to get feedback from the community and are still evaluating how (or if) we can integrate it with the Supabase platform. Comments and feedback are welcome.
[0] Postgres logical replication protocol: [https://www.postgresql.org/docs/current/protocol-logical-rep...)
There's external tooling like his project, but postgres extensions in Rust are exciting.
Full extensions via pgrx have been cool to see, but plrust + pg_tle is also starting to show up.
If you aren't familiar with TLE (Trusted Language Extensions), it is a postgres extension from AWS that created some privileged interfaces for procedural languages (used for user-defined functions) to do some extra stuff. Right now it's mostly auth-related hooks but my hope is that it expands in the future.
Plrust is a procedural language extension for Rust, allowing user defined functions written in Rust.
The combination of those two could open up a world of rich extensions usable in managed hosted environments like RDS.
Windmill (https://windmill.dev) used to only support webhooks to trigger code and flow jobs. We have just added email support building our own MX server, and wanted to add CDC change. We were gonna do it on Debezium but this will allow us to remove the need for a third-party service and just add this as a crate. Thank you supabase for open-sourcing this.
My specific use-case would be: Single Postgres in my cluster, replicated via something based on pg_replicate running as a side-car and writing to my NAS running Minio.
Might be a good idea to have it documented or have the default level set to info for the stdout example.
Maybe this is common Rust knowledge and I just don't know what I'm doing though.
The issue we've ran into is some team at work decides to re-write an entire table and things get backed up until they stop updating rows.
I'm exploring an alternative way to run logic asynchronously after db operations without the overhead, and I think using cdc to export jobs into an external queue is the way to go here. Essentially a lightweight alternative to Debezium with a better developer experience that is easier to manage. This crate could serve as the core of such a service.
Related
Using short lived Postgres servers for testing
Database servers like PostgreSQL can be quickly set up for short-lived environments or CI/CD pipelines by creating new data directories and using pg_basebackup for efficient data population. This method simplifies testing and demo setups.
Rust FSM-based Resumable Postgres tasks
The "pg_task" project on GitHub manages FSM-based Resumable Postgres tasks. It features granular state machines, error handling, single-table task scheduling, task definition, execution, stopping, and updating guidelines. Licensed under MIT.
PostgREST – Serve a RESTful API from Any Postgres Database
PostgREST creates RESTful APIs from PostgreSQL databases, offering high performance, security via JWT, multiple API versions, self-documentation with OpenAPI, and community support for contributions and sponsorships.
SQLite vs. PostgreSQL
SQLite is ideal for simple, self-hosted projects with low latency, while PostgreSQL is better for applications needing advanced features and geographical distribution. The choice depends on project requirements.
Pgzx: Postgres Extensions with Zig
Xata has launched pgzx, an open-source framework for building PostgreSQL extensions with Zig, enhancing code maintainability and efficiency, and includes features for memory safety, error handling, and testing.