Postgres Webhooks with Pgstream
pgstream is a CDC tool for PostgreSQL that triggers webhooks for data and schema changes, allowing tailored notifications for inserts, updates, deletes, and schema alterations with detailed event information.
Read original articlepgstream is a Change-Data-Capture (CDC) tool designed for PostgreSQL that enables the calling of webhooks in response to data and schema changes within a Postgres database. This functionality allows for notifications whenever rows are inserted, updated, or deleted, or when tables undergo various alterations. To utilize pgstream, users must have a Postgres instance with logical replication enabled and the wal2json output plugin. The setup involves cloning the pgstream repository, starting a Postgres instance using Docker, and initializing pgstream. Configuration is done through a .env file that specifies database connection details. A sample webhook listener can be implemented to receive notifications, which can be set up to log JSON payloads. Users can create subscriptions to specific tables or schemas, allowing for tailored event notifications. The webhook events include detailed information about the actions performed, such as inserts, updates, and deletes, along with the relevant data. Additionally, pgstream supports including old values during updates and can handle large values stored in TOAST. It also allows for receiving notifications about schema changes, addressing a common limitation in Postgres replication. Overall, pgstream provides a robust solution for monitoring and responding to changes in PostgreSQL databases.
- pgstream enables webhook notifications for data and schema changes in PostgreSQL.
- Users must set up a Postgres instance with logical replication and configure pgstream.
- Webhook events provide detailed information about database actions, including old values.
- pgstream can handle large values stored in TOAST and supports schema change notifications.
- Subscriptions can be tailored to specific tables or event types for customized notifications.
Related
Pgzx: Postgres Extensions with Zig
Xata has launched pgzx, an open-source framework for building PostgreSQL extensions with Zig, enhancing code maintainability and efficiency, and includes features for memory safety, error handling, and testing.
Show HN: Pg_replicate – Build Postgres replication applications in Rust
pg_replicate is a Rust crate for PostgreSQL data replication, supporting logical streaming replication. It offers easy integration, a quickstart guide, and plans for future enhancements and additional data sinks.
Postgres.new: In-browser Postgres with an AI interface
postgres.new is an in-browser Postgres sandbox that integrates AI assistance for managing databases, supporting features like CSV imports, report generation, and semantic search, with future cost-effective deployments planned.
Change Data Capture (CDC) Tools should be database specialized not generalized
PeerDB focuses on Postgres for Change Data Capture, minimizing pipeline failures and load impacts. Their customers manage data sizes from 300GB to 20TB, necessitating ongoing improvements as Postgres evolves.
Beyond logical replication: pg_easy_replicate Supports Tracking DDL Changes
pg_easy_replicate now supports tracking schema changes in PostgreSQL, allowing replication of DDL changes like adding columns. It enhances flexibility and ensures schema synchronization during database migrations and upgrades.
For deployments at a more serious scale, it seems they support buffering WAL events into Kafka, similar to Debezium (the current leader for change data capture), to de-couple the replication slot reader on the Postgres side throughput from the event handlers you deliver the events to.
Pgstream seems more batteries-included compared to using Debezium to consume the PG log; Kafka is optional with pgstream and things like webhook delivery and OpenSearch indexing are packaged in, rather than being a “choose your own adventure” game with Kafka Streams middleware ecosystem jungle. If their offerings constraints work for your use-case, why not prefer it over Debezium since it seems easier? I’d rather write Go than Java/JVM-language if I need to plug into the pipeline.
However at any sort of serious scale you’ll need the Kafka in there, and then I’m less sure you’d make use of the other plug and play stuff like the OpenSearch indexer or webhooks at all. Certainly as volume grows webhooks start to feel like a bad fit for CDC events at the level of a single row change; at least in my brief Debezium CDC experience, my consumer pulls batches of 1000+ changes at once from Kafka.
The other thing I don’t see is transaction metadata, maybe I shouldn’t worry much about it (not many people seem to be concerned) but I’d like my downstream consumer to have delayed consistency with my Postgres upstream, which means I need to consume record changes with the same transactional grouping as in Postgres, otherwise I’ll probably never be consistent in practice: https://www.scattered-thoughts.net/writing/internal-consiste...
1. Can it handle deliveries to slow on unreliable endpoints? In particular, what happens if the external server is deliberately slow to respond - someone could be trying to crash your system by forcing it to deliver to slow-loading endpoints.
2. How are retries handled? If the server returns a 500, a good webhooks system will queue things up for re-delivery with an exponential backoff and try a few more times before giving up completely.
Point 1. only matters if you are delivering webhooks to untrusted endpoints - systems like GitHub where anyone can sign up for hook deliveries.
2. is more important.
https://github.com/xataio/pgstream/blob/bab0a8e665d37441351c... shows that the HTTP client can be configured with a timeout (which defaults to 10s https://github.com/xataio/pgstream/blob/bab0a8e665d37441351c... )
From looking at https://github.com/xataio/pgstream/blob/bab0a8e665d37441351c... it doesn't look like this system handles retries.
Retrying would definitely be a useful addition. PostgreSQL is a great persistence store for recording failures and retry attempts, so that feature would be a good fit for this system.
Are there real world use cases being solved with this, or is it just hobby tool?
Related
Pgzx: Postgres Extensions with Zig
Xata has launched pgzx, an open-source framework for building PostgreSQL extensions with Zig, enhancing code maintainability and efficiency, and includes features for memory safety, error handling, and testing.
Show HN: Pg_replicate – Build Postgres replication applications in Rust
pg_replicate is a Rust crate for PostgreSQL data replication, supporting logical streaming replication. It offers easy integration, a quickstart guide, and plans for future enhancements and additional data sinks.
Postgres.new: In-browser Postgres with an AI interface
postgres.new is an in-browser Postgres sandbox that integrates AI assistance for managing databases, supporting features like CSV imports, report generation, and semantic search, with future cost-effective deployments planned.
Change Data Capture (CDC) Tools should be database specialized not generalized
PeerDB focuses on Postgres for Change Data Capture, minimizing pipeline failures and load impacts. Their customers manage data sizes from 300GB to 20TB, necessitating ongoing improvements as Postgres evolves.
Beyond logical replication: pg_easy_replicate Supports Tracking DDL Changes
pg_easy_replicate now supports tracking schema changes in PostgreSQL, allowing replication of DDL changes like adding columns. It enhances flexibility and ensures schema synchronization during database migrations and upgrades.