March 5th, 2025

Postgres Just Cracked the Top Fastest Databases for Analytics

PostgreSQL has been optimized for analytics with pg_mooncake, achieving a Top 10 ClickBench ranking. It uses a columnstore format and DuckDB for improved performance, rivaling specialized databases.

Read original article

Postgres Just Cracked the Top Fastest Databases for Analytics

Clickbench has recognized PostgreSQL as a strong contender in the analytics database space, particularly following the development of pg_mooncake, which has achieved a Top 10 ranking on ClickBench, a benchmark primarily dominated by specialized analytics databases. Traditionally, databases like ClickHouse have outperformed general-purpose databases such as PostgreSQL and MySQL by significant margins. However, the creators of pg_mooncake have optimized PostgreSQL to enhance its analytics capabilities. They built pg_mooncake as a native PostgreSQL extension, utilizing a columnstore format to improve data access speeds and efficiency. This format allows for sequential reads, which is faster than traditional row-based storage. Additionally, they integrated DuckDB as the execution engine to facilitate vectorized execution, processing data in batches for improved performance. The architecture also includes efficient metadata management directly within PostgreSQL, which enhances query planning and execution speed. The advancements made with pg_mooncake demonstrate that PostgreSQL can now deliver analytics performance comparable to specialized databases while maintaining its flexibility and ecosystem benefits. The initial version, v0.1, was launched recently and is available on platforms like Neon Postgres and Supabase.

- PostgreSQL has been optimized to perform well in analytics, achieving a Top 10 ranking on ClickBench.

- pg_mooncake is a native PostgreSQL extension that utilizes a columnstore format for improved data access.

- DuckDB is embedded as the execution engine to enhance query performance through vectorized execution.

- Efficient metadata management within PostgreSQL contributes to faster query planning and execution.

- The advancements position PostgreSQL as a viable alternative to specialized analytics databases.

Does PostgreSQL respond to the challenge of analytical queries?

PostgreSQL has advanced in handling analytical queries with foreign data wrappers and partitioning, improving efficiency through optimizer enhancements, while facing challenges in pruning and statistical data. Ongoing community discussions aim for further improvements.

pg_duckdb: Splicing Duck and Elephant DNA

MotherDuck launched pg_duckdb, an open-source extension integrating DuckDB with Postgres to enhance analytical capabilities while maintaining transactional efficiency, supported by a consortium of companies and community contributions.

ClickHouse Data Modeling for Postgres Users

ClickHouse acquired PeerDB to enhance PostgreSQL data replication. The article offers data modeling tips, emphasizing the ReplacingMergeTree engine, duplicate management, ordering key selection, and the use of Nullable types.

Crunchy Data Warehouse: Postgres with Iceberg for High Performance Analytics

Crunchy Data has launched Crunchy Data Warehouse, a PostgreSQL-based analytics database that integrates Apache Iceberg, supports ACID transactions, offers over 10x performance improvement, and is available as a managed AWS service.

Reliably Replicating Data Between PostgreSQL and ClickHouse

The article explains how to replicate data from PostgreSQL to ClickHouse using PeerDB, focusing on Change Data Capture for efficient synchronization and offering both self-managed and managed solutions.

26 comments

By @bhouston - about 2 months

Just to be clear, standard SQL databases are not great for large-scale analytics. I know from first hand experience and a lot of pain.

We tried using Postgres with large analytics at my previous company https://threekit.com but it is an absolute pain. Basically we started to collected detailed analytics and thus had a rapidly growing table of around 2B records of user events during their sessions. As it grew past a 500 million records it turned out to be impossible to query this table in any thing close to real-time - it was basically untouchable because it was so slow.

I know I could have used some type of daily aggregation combined with a weekly aggregation, etc to roll up the data incrementally. A dev tried this and yeah, it hide the slow queries but then it became inflexible in terms of reporting. And writing and maintaining these cronjobs is a lot of work.

But then I tried using BigQuery on my https://web3dsurvey.com side project and I just recorded raw data and then wrote queries to do real-time aggregation (with a caching layer) in a large variety of ways. And it was near instant and easy and also very cheap.

So then I deployed this strategy over at Threekit.com and it also was easy and fast and cheap. Even more flexible queries than Web3DSurvey at basically no developer cost or infrastructure costs. No more need for aggregating cron-jobs or trying to decide ahead of time how users want to slice and dice the data. Real time, flexible queries on the fly via BigQuery is the way.

Also BigQuery bill for https://web3dsurvey.com is like $0.25 month and it is dealing with millions of records in its 3 month window of stored data. Where as just running the cheapest Postgres SQL server on Google Cloud is like >$25/month and it is a slow one.

I would never go back to traditional SQL for analytics - it was hard, slow, expensive and inflexible. Worst of all worlds.

By @saisrirampur - about 2 months

Sai from PeerDB/ClickHouse here. Nice to see the progress on this project! I wanted to leave a few notes:

For analytics on transactional data, it looks like you'd still need to use logical replication (https://github.com/Mooncake-Labs/pg_mooncake/issues/90). Logical replication is somewhat similar to an ETL/CDC experience, though it's more Postgres-native. Managing logical replication at a production grade isn't trivial — it's quite common for customers to use PeerDB for homogeneous replication due to performance, manageability and observability issues with logical replication.

One potential value of extensions is Postgres compatibility for your queries. However, I'm curious about the SQL coverage on columnar tables — for example, advanced constructs like partitioned tables, prepared statements, correlated subqueries, RCTEs, triggers, and more. While it seems there’s ongoing work to improve compatibility, achieving full Postgres compatibility is challenging. In some cases, you might need to fall back to Postgres (instead of DuckDB), sacrificing performance for certain queries.

The biggest challenge we faced at Citus was the constant trade-off between Postgres compatibility and performance — both are complex and constantly moving targets. This was the key reason why Citus couldn't compete with purpose-built databases like Snowflake, ClickHouse, SingleStore. While DuckDB didn’t exist when we built Citus, it's still fundamentally different from Postgres. Even though the DuckDB dialect is somewhat similar to Postgres (as is Snowflake’s), retrofitting two distinct databases — each evolving independently for world's best performance — isn't trivial.

In short, relying on ETL (logical replication) without providing full Postgres compatibility raises the question: is it better to lean on a purpose-built database with a laser focus on performance, or adopt a more generalized approach?

Anyway, I come from a different school of thought — using the right tool for the right job. That said, I love seeing all the progress and evolution in the Postgres community — Postgres will always be my first love!

By @dleeftink - about 2 months

> To enhance query execution speed, we embedded DuckDB as the execution engine for columnstore queries

So is it Postgres or DuckDB that cracked the analytics top ?

By @nikita - about 2 months

This is an exciting project. Few highlights: - Query processor is DuckDB - as long as it translates PG type system to DuckDB typesystem well - it will be very fast. - Data is stored on S3 in Parquet with Delta or Iceberg metadata. This is really cool. You don't need to push analytical data through WAL - only metadata goes into WAL. This mean fast loading at least in theory, and compatibility with all the Delta/Iceberg ecosystem. - Once they build real-time ingest, you can just push timeseries into this system and you don't need a second system like Clickhouse

By @dsiegel2275 - about 2 months

A question that I have had for a while that I can't seem to find an answer: for teams that are using various columnar store extensions to turn Postgres into a viable OLAP solution - are they doing so in the same instance of their Postgres that they are using for OLTP? Or are they standing up a separate Postgres instance?

I'm trying to understand if there is any potential performance impact on the OLTP workload by including the OLAP in the same process.

By @owenthejumper - about 2 months

What's the business model? It's a extension that's MIT licensed, yet it has a company and a VC behind it. Sounds like a rug pull waiting to happen

By @spapas82 - about 2 months

As others have mentioned the results are from an extension, not postgres.

Unfortunately using an extension that's not "part" of postgres (like pg_stat_statements) is not trivial for most people since there are both technical and legal issues.

By @xkgt - about 2 months

I read the title a couple of times and I'm still not sure it isn't misleading. The benchmarks are not just for Postgres but for Postgres with the Mooncake extension. There are also other results for Postgres with different extensions. While it does rank among the top fastest databases, it is not the fastest and not even within the top 10.

By @jot - about 2 months

How is this different from Crunchy Warehouse which is also built on Postgres and DuckDB?

https://www.crunchydata.com/products/warehouse

By @bigtones - about 2 months

Looks like they're currently No. 12 on their cited Clickbench benchmark - so not quite in the top 10 three weeks later.

By @dcreater - about 2 months

All you need is postgres: part 73

By @antonmks - about 2 months

It is not really Postgres, the queries are run on DuckDB. Yeah, DuckDB is really fast for analytical queries.

By @polskibus - about 2 months

How does mooncake work with std oltp workloads? Can I use Postgres with OLTP , add mooncake and expect duckdb-level of performance for OLAP queries? I know that SAP HANA has some sort of several layers of storage and automatic movement of data between them to allow for such performant OLTP/OLAP hybrid, and I think this is the holy grail for cheap/open source db. Users need OLTP first but don’t want to add Kafka + clickhouse or similar pipelines when OLAP is needed.

By @mirekrusin - about 2 months

Why timescaleDB is not mentioned anywhere? Genuine question.

By @pier25 - about 2 months

They're already working on 0.2

https://www.mooncake.dev/blog/pgmooncake-v02discussion

By @rubenvanwyk - about 2 months

Looking at the effort being put in it seems there’s a massive market for proper Postgres analytics / columnstore, which is very telling of the pain involved in moving data between systems.

By @osigurdson - about 2 months

It would be great if Postgres had native columnar storage.

By @scirob - about 2 months

Any differentiation vs Hydra ? They also put duckdb inside pg https://www.hydra.so/

By @tarun_anand - about 2 months

Will this work alongside Citus? If yes, what benefits/disadvantages would it have? I presume at some point of time workloads do need a cluster.

By @xiphias2 - about 2 months

It's cool proof that it's worth to spend the effort for Postgres authors to add vectorization to the default columnstore.

By @timsuchanek - about 2 months

Congrats on the launch! Does it support ordinary db migrations with the normal Postgres interface?

By @rafael3 - about 2 months

Noob question: why is there no Hadoop cluster on Clickbench?

By @stoicfungi - about 2 months

what are the differences between pg_mooncake and pg_duck?

Postgres Just Cracked the Top Fastest Databases for Analytics

Related

Does PostgreSQL respond to the challenge of analytical queries?

pg_duckdb: Splicing Duck and Elephant DNA

ClickHouse Data Modeling for Postgres Users

Crunchy Data Warehouse: Postgres with Iceberg for High Performance Analytics

Reliably Replicating Data Between PostgreSQL and ClickHouse

Related

Does PostgreSQL respond to the challenge of analytical queries?

pg_duckdb: Splicing Duck and Elephant DNA

ClickHouse Data Modeling for Postgres Users

Crunchy Data Warehouse: Postgres with Iceberg for High Performance Analytics

Reliably Replicating Data Between PostgreSQL and ClickHouse