September 1st, 2024

Postgres Webhooks with Pgstream

pgstream is a CDC tool for PostgreSQL that triggers webhooks for data and schema changes, allowing tailored notifications for inserts, updates, deletes, and schema alterations with detailed event information.

Read original article

pgstream is a Change-Data-Capture (CDC) tool designed for PostgreSQL that enables the calling of webhooks in response to data and schema changes within a Postgres database. This functionality allows for notifications whenever rows are inserted, updated, or deleted, or when tables undergo various alterations. To utilize pgstream, users must have a Postgres instance with logical replication enabled and the wal2json output plugin. The setup involves cloning the pgstream repository, starting a Postgres instance using Docker, and initializing pgstream. Configuration is done through a .env file that specifies database connection details. A sample webhook listener can be implemented to receive notifications, which can be set up to log JSON payloads. Users can create subscriptions to specific tables or schemas, allowing for tailored event notifications. The webhook events include detailed information about the actions performed, such as inserts, updates, and deletes, along with the relevant data. Additionally, pgstream supports including old values during updates and can handle large values stored in TOAST. It also allows for receiving notifications about schema changes, addressing a common limitation in Postgres replication. Overall, pgstream provides a robust solution for monitoring and responding to changes in PostgreSQL databases.

- pgstream enables webhook notifications for data and schema changes in PostgreSQL.

- Users must set up a Postgres instance with logical replication and configure pgstream.

- Webhook events provide detailed information about database actions, including old values.

- pgstream can handle large values stored in TOAST and supports schema change notifications.

- Subscriptions can be tailored to specific tables or event types for customized notifications.

Pgzx: Postgres Extensions with Zig

Xata has launched pgzx, an open-source framework for building PostgreSQL extensions with Zig, enhancing code maintainability and efficiency, and includes features for memory safety, error handling, and testing.

Show HN: Pg_replicate – Build Postgres replication applications in Rust

pg_replicate is a Rust crate for PostgreSQL data replication, supporting logical streaming replication. It offers easy integration, a quickstart guide, and plans for future enhancements and additional data sinks.

Postgres.new: In-browser Postgres with an AI interface

postgres.new is an in-browser Postgres sandbox that integrates AI assistance for managing databases, supporting features like CSV imports, report generation, and semantic search, with future cost-effective deployments planned.

Change Data Capture (CDC) Tools should be database specialized not generalized

PeerDB focuses on Postgres for Change Data Capture, minimizing pipeline failures and load impacts. Their customers manage data sizes from 300GB to 20TB, necessitating ongoing improvements as Postgres evolves.

Beyond logical replication: pg_easy_replicate Supports Tracking DDL Changes

pg_easy_replicate now supports tracking schema changes in PostgreSQL, allowing replication of DDL changes like adding columns. It enhances flexibility and ensures schema synchronization during database migrations and upgrades.

7 comments

By @jitl - 5 months

The GitHub repo README gives a better sense of the capabilities of the pgstream system: https://github.com/xataio/pgstream?tab=readme-ov-file#archit...

For deployments at a more serious scale, it seems they support buffering WAL events into Kafka, similar to Debezium (the current leader for change data capture), to de-couple the replication slot reader on the Postgres side throughput from the event handlers you deliver the events to.

Pgstream seems more batteries-included compared to using Debezium to consume the PG log; Kafka is optional with pgstream and things like webhook delivery and OpenSearch indexing are packaged in, rather than being a “choose your own adventure” game with Kafka Streams middleware ecosystem jungle. If their offerings constraints work for your use-case, why not prefer it over Debezium since it seems easier? I’d rather write Go than Java/JVM-language if I need to plug into the pipeline.

However at any sort of serious scale you’ll need the Kafka in there, and then I’m less sure you’d make use of the other plug and play stuff like the OpenSearch indexer or webhooks at all. Certainly as volume grows webhooks start to feel like a bad fit for CDC events at the level of a single row change; at least in my brief Debezium CDC experience, my consumer pulls batches of 1000+ changes at once from Kafka.

The other thing I don’t see is transaction metadata, maybe I shouldn’t worry much about it (not many people seem to be concerned) but I’d like my downstream consumer to have delayed consistency with my Postgres upstream, which means I need to consume record changes with the same transactional grouping as in Postgres, otherwise I’ll probably never be consistent in practice: https://www.scattered-thoughts.net/writing/internal-consiste...

By @simonw - 5 months

Implementing webhook deliveries is one of those things that's way harder than you would initially imagine. The two things I always look for in systems like this are:

1. Can it handle deliveries to slow on unreliable endpoints? In particular, what happens if the external server is deliberately slow to respond - someone could be trying to crash your system by forcing it to deliver to slow-loading endpoints.

2. How are retries handled? If the server returns a 500, a good webhooks system will queue things up for re-delivery with an exponential backoff and try a few more times before giving up completely.

Point 1. only matters if you are delivering webhooks to untrusted endpoints - systems like GitHub where anyone can sign up for hook deliveries.

2. is more important.

https://github.com/xataio/pgstream/blob/bab0a8e665d37441351c... shows that the HTTP client can be configured with a timeout (which defaults to 10s https://github.com/xataio/pgstream/blob/bab0a8e665d37441351c... )

From looking at https://github.com/xataio/pgstream/blob/bab0a8e665d37441351c... it doesn't look like this system handles retries.

Retrying would definitely be a useful addition. PostgreSQL is a great persistence store for recording failures and retry attempts, so that feature would be a good fit for this system.

By @canadiantim - 5 months

Xata putting out some cool pg things. Is PG roll still looking good these days? Seemed pretty sexy last time I looked at it

By @adamcharnock - 5 months

A quick google didn’t reveal much about pgstream’s delivery semantics. Do these webhooks get called at most once, or at least once? I hope the latter, but in absence of information I’d guess it is the former.

By @newb76 - 5 months

This is very nice! I came across this through an internal ref. It appears that there is a lot of movement in the CDC space lately - pgstream, sequinstream, bemi, pg_flo (Saw some guy on X posting videos of CDC tools too)

Are there real world use cases being solved with this, or is it just hobby tool?

By @tazu - 5 months

Curious if there's a reason this extension is written in Go and not Zig? The blog posted [1] a few months ago about their cool Zig library for making extensions and I've been playing around with it. Is Zig just not mature enough?

[1]: https://xata.io/blog/introducing-pgzx

By @oars - 5 months

Postgres webhooks could be very usefulness for me. Thanks for sharing.

Postgres Webhooks with Pgstream

Related

Pgzx: Postgres Extensions with Zig

Show HN: Pg_replicate – Build Postgres replication applications in Rust

Postgres.new: In-browser Postgres with an AI interface

Change Data Capture (CDC) Tools should be database specialized not generalized

Beyond logical replication: pg_easy_replicate Supports Tracking DDL Changes

Related

Pgzx: Postgres Extensions with Zig

Show HN: Pg_replicate – Build Postgres replication applications in Rust

Postgres.new: In-browser Postgres with an AI interface

Change Data Capture (CDC) Tools should be database specialized not generalized

Beyond logical replication: pg_easy_replicate Supports Tracking DDL Changes