July 11th, 2024

DuckDB Meets Postgres

Organizations shift historical Postgres data to S3 with Apache Iceberg, enhancing query capabilities. ParadeDB integrates Iceberg with S3 and Google Cloud Storage, replacing DataFusion with DuckDB for improved analytics in pg_lakehouse.

Read original articleLink Icon
DuckDB Meets Postgres

A growing number of organizations are moving historical data from Postgres to S3 and adopting Apache Iceberg, a table format enabling S3 data to be queried like SQL tables. While query engines like Trino, Spark, or Flink integrate with Iceberg, Postgres does not. ParadeDB has introduced Iceberg table support for S3 and Google Cloud Storage, replacing DataFusion with DuckDB for accelerated engineering efforts. Iceberg, designed for analytics over large datasets, organizes metadata around files like Parquet, offering features like ACID transactions and schema evolution. pg_lakehouse adds Iceberg support to Postgres using the foreign data wrapper API, pushing most queries down to DuckDB for enhanced analytical performance. Initially preferring DataFusion for its extensibility and growing adoption, ParadeDB switched to DuckDB due to its out-of-the-box integrations, familiarity to end users, and better performance in benchmarks. Future plans for pg_lakehouse include adding write support for copying Postgres tables into external object stores. Iceberg is supported on pg_lakehouse version 0.8.0 or later, available for installation or through a Docker image.

Link Icon 8 comments
By @mritchie712 - 3 months
The `duckdb_iceberg`[0] extension doesn't support partitioning / predicate pushdown. Is that what you're using for pg_lakehouse?

0 - https://github.com/duckdb/duckdb_iceberg/issues/2

By @nitinreddy88 - 3 months
I really wish they resume work on pg_analytics which is in IMO the key differentiator in current PG Market
By @PaulWaldman - 3 months
How does this compare to the Parquet S3 Foreign Data Wrapper?

https://github.com/pgspider/parquet_s3_fdw

By @lolive - 3 months
Can you have a single transaction write both to a pg table and an iceberg table?
By @chuckhend - 3 months
Congrats to the paradedb team!
By @rubenfiszel - 3 months
Why not call it pg_duckdb ?
By @nhudson2367 - 3 months
congrats paradedb team!