Pg_parquet: An extension to connect Postgres and parquet
Crunchy Data released pg_parquet, an open-source PostgreSQL extension for reading and writing Parquet files, enabling data export/import, cloud storage integration, and schema inspection for enhanced analytical capabilities.
Read original articleCrunchy Data has announced the release of pg_parquet, an open-source extension for PostgreSQL that facilitates the reading and writing of Parquet files directly from the database. This extension allows users to export tables or queries to Parquet files, ingest data from Parquet files into PostgreSQL, and inspect the schema and metadata of existing Parquet files. Parquet is a columnar file format known for its efficient data compression, making it suitable for analytics and data sharing between systems. The pg_parquet extension enhances PostgreSQL's capabilities by enabling seamless integration with Parquet without the need for additional data pipelines. Users can utilize the COPY command to transfer data between PostgreSQL and Parquet files stored locally or in cloud storage like S3. The extension also provides functionalities to describe Parquet schemas and retrieve detailed metadata, which can be beneficial for data management and analytics. With pg_parquet, PostgreSQL aims to expand its role beyond transactional workloads to become a robust solution for analytical tasks.
- pg_parquet is an open-source extension for PostgreSQL to work with Parquet files.
- It allows exporting and importing data between PostgreSQL and Parquet files.
- The extension supports cloud storage integration, particularly with S3.
- Users can inspect Parquet file schemas and metadata directly from PostgreSQL.
- pg_parquet enhances PostgreSQL's capabilities for both transactional and analytical workloads.
Related
DuckDB Meets Postgres
Organizations shift historical Postgres data to S3 with Apache Iceberg, enhancing query capabilities. ParadeDB integrates Iceberg with S3 and Google Cloud Storage, replacing DataFusion with DuckDB for improved analytics in pg_lakehouse.
Major Developments in Postgres Extension Discovery and Distribution
The article covers advancements in Postgres extension discovery and distribution. Postgres extensions enhance database capabilities with features like query hints and encryption. PGXN facilitates extension access. A summit in Vancouver will address extension challenges, encouraging developer involvement for ecosystem enhancement.
Does PostgreSQL respond to the challenge of analytical queries?
PostgreSQL has advanced in handling analytical queries with foreign data wrappers and partitioning, improving efficiency through optimizer enhancements, while facing challenges in pruning and statistical data. Ongoing community discussions aim for further improvements.
PostGIS Meets DuckDB: Crunchy Bridge for Analytics Goes Spatial
Crunchy Data's update to Crunchy Bridge for Analytics introduces geospatial analytics, allowing users to create analytics tables from datasets via URLs, supporting formats like GeoParquet, and integrating with DuckDB and QGIS.
Show HN: Squey, an open-source GPU-accelerated data visualization software
Squey 5.0 is an open-source visualization software that introduces a Parquet plugin, enabling data import/export, real-time visualizations, and tools for data quality assessment and anomaly detection across various fields.
A lot of other commenters are talking about `pg_duckdb` which maybe also could've solved my problem, but this looks quite simple and clean.
I hope for some kind of near-term future where there's some standardish analytics-friendly data archival format. I think Parquet is the closest thing we have now.
Also, how does it compare to pg_duckdb (which adds DuckDB execution to Postgres including reading parquet and Iceberg), or duck_fdw (which wraps a DuckDB database, which can be in memory and only pass-through Iceberg/Parquet tables)?
What would be recommended to output regularly old data to S3 as parquet file? To use a cron job which launches a second Postgres process connecting to the database and extracting the data, or using the regular database instance? doesn't that slow down the instance too much?
Related
DuckDB Meets Postgres
Organizations shift historical Postgres data to S3 with Apache Iceberg, enhancing query capabilities. ParadeDB integrates Iceberg with S3 and Google Cloud Storage, replacing DataFusion with DuckDB for improved analytics in pg_lakehouse.
Major Developments in Postgres Extension Discovery and Distribution
The article covers advancements in Postgres extension discovery and distribution. Postgres extensions enhance database capabilities with features like query hints and encryption. PGXN facilitates extension access. A summit in Vancouver will address extension challenges, encouraging developer involvement for ecosystem enhancement.
Does PostgreSQL respond to the challenge of analytical queries?
PostgreSQL has advanced in handling analytical queries with foreign data wrappers and partitioning, improving efficiency through optimizer enhancements, while facing challenges in pruning and statistical data. Ongoing community discussions aim for further improvements.
PostGIS Meets DuckDB: Crunchy Bridge for Analytics Goes Spatial
Crunchy Data's update to Crunchy Bridge for Analytics introduces geospatial analytics, allowing users to create analytics tables from datasets via URLs, supporting formats like GeoParquet, and integrating with DuckDB and QGIS.
Show HN: Squey, an open-source GPU-accelerated data visualization software
Squey 5.0 is an open-source visualization software that introduces a Parquet plugin, enabling data import/export, real-time visualizations, and tools for data quality assessment and anomaly detection across various fields.