August 20th, 2024

Change Data Capture (CDC) Tools should be database specialized not generalized

PeerDB focuses on Postgres for Change Data Capture, minimizing pipeline failures and load impacts. Their customers manage data sizes from 300GB to 20TB, necessitating ongoing improvements as Postgres evolves.

Change Data Capture (CDC) Tools should be database specialized not generalized

Change Data Capture (CDC) presents numerous challenges due to its complexity and potential failure points. PeerDB has chosen to concentrate solely on Postgres, which has allowed them to address many edge cases effectively and implement various performance and reliability optimizations native to Postgres. As a result, pipeline failures have become infrequent, and their operations have not adversely impacted source databases due to load. Most of their customers manage data sizes ranging from 300-400GB to 15-20TB, which has provided valuable testing for their product, ensuring it performs well even for larger datasets. Despite these advancements, the author believes that CDC is not yet a fully resolved issue, as Postgres continues to evolve and present new challenges. Ongoing improvements and adaptations will be necessary to keep pace with these changes. The key takeaway is that specialized CDC tools focusing on a single or limited database can offer a more reliable CDC experience.

- PeerDB focuses exclusively on Postgres to enhance CDC reliability.

- The company has successfully minimized pipeline failures and load impacts on source databases.

- Most customers operate within a data size range of 300-400GB to 15-20TB.

- Continuous evolution and improvement are necessary as Postgres develops.

- Specialized CDC tools can provide a more solid experience compared to broader solutions.

Link Icon 1 comments
By @yen223 - 6 months
One of the interesting things about databases is that unlike most other markets, there is no single database that completely owns a majority of market share.

MySQL, Oracle, SQL Server and Postgres are very close to each other in terms of market share (however you choose to define it).

The effect is that most database tooling folks almost never specialise on one database. They'd be giving up too much TAM for that.

I think this is a big part of why database tooling in general are generalised, not specialised. It's also a big part of why database tooling aren't as good as they should be.