September 9th, 2024

DuckDB 1.1.0 Released

DuckDB 1.1.0, codenamed "Eatoni," introduces significant updates including new SQL functionalities, improved community extensions, and performance enhancements, aiming to enhance user experience and efficiency in data analysis.

Read original article

DuckDB has announced the release of version 1.1.0, codenamed "Eatoni," which introduces several significant updates and features. This release follows version 1.0.0, which was launched three months prior. Key enhancements include breaking SQL changes, community extensions, and performance improvements. Notably, the handling of division by zero has changed to return infinity instead of NULL, aligning with IEEE-754 standards. The release also introduces community extensions, allowing users to create and distribute their own extensions more easily. New SQL functionalities include a histogram function for data analysis, support for SQL variables, and the ability to unpack columns dynamically. Performance optimizations include dynamic filter pushdown during joins, automatic materialization of common table expressions (CTEs), and parallel streaming queries, which enhance query execution speed. The update also improves the handling of foreign keys and introduces parallel processing for union operations by name. Overall, DuckDB 1.1.0 aims to enhance user experience and performance for data analysis tasks.

- DuckDB 1.1.0, codenamed "Eatoni," has been released with numerous new features.

- Key changes include updated handling of division by zero and improved community extension capabilities.

- New SQL functionalities include histogram calculations and support for SQL variables.

- Performance enhancements focus on dynamic filtering, automatic CTE materialization, and parallel streaming.

- The release aims to improve user experience and efficiency in data analysis tasks.

What Happens When You Put a Database in the Browser?

WebAssembly (Wasm) enhances browser capabilities, enabling high-performance apps like DuckDB for ad-hoc queries and Python environments. DuckDB Wasm boosts performance in interfaces like lakeFS, Evidence, and Count. MotherDuck enables local querying, emphasizing efficient data processing.

DuckDB Community Extensions

The DuckDB team launched the DuckDB Community Extensions repository for easy extension installation. Users benefit from a simplified process, while developers can streamline publication tasks. Security measures include code vetting options.

Memory Management in DuckDB

DuckDB optimizes query processing with effective memory management, using a streaming execution engine and disk spilling for large datasets. Its buffer manager enhances performance by caching frequently accessed data.

pg_duckdb: Splicing Duck and Elephant DNA

MotherDuck launched pg_duckdb, an open-source extension integrating DuckDB with Postgres to enhance analytical capabilities while maintaining transactional efficiency, supported by a consortium of companies and community contributions.

Farewell Pandas, and thanks for all the fish

Ibis will remove its pandas and Dask backends in version 10.0, favoring DuckDB for better performance and ease of use, while still allowing pandas DataFrames for data transfer.

14 comments

By @wenc - 8 months

Lots of automatic performance optimizations on what is already a very fast engine. (I’ve stopped using Pandas)

I know most software folks feel some type of way about SQL (most don’t grok it beyond a simple SELECT) but this is one of the advantages of declarative languages like SQL and a plan-execute programming paradigm where a plan is created before instructions are run, making it amenable to plan optimization.

Maybe the syntax of the language could be improved (e.g. Linq) but conceptually SQL is historically when we’ve blundered into the right. Data operations are often done in sets rather than loops and it’s a worthwhile investment for software engineers to learn to think in this way if they want to work with data correctly at scale.

Stonebraker was right in that people who avoid SQL are doomed to reinvent it poorly.

By @simonw - 8 months

I feel like R-Tree spatial indexes are potentially the most exciting new feature in this release, but they're buried right down at the bottom of the announcement: https://duckdb.org/2024/09/09/announcing-duckdb-110.html#r-t...

By @ashkankiani - 8 months

Love the expanded C API support! Also those performance improvements are massive! Pushing through filters and the streaming optimization for fetchone() is great! This makes it more viable to use duckdb in smaller queries from python.

I'm pretty excited for variables too! I really wanted them for when I'm using the CLI. Same with query/query_table! I appreciate the push for features that make people's lives easier while also still improving performance.

Everyone who I've introduced duckdb to (at work or outside of work) eventually is blown away (some still have lingering SQL stigma)

By @ZeroCool2u - 8 months

Damn, GeoParquet and R-Tree for spatial indexes is huge!!! ESRI better watch their back!

By @adwf - 8 months

Big fan of DuckDB!

Has saved us a number of times when having to deploy at a remote client with limited on-prem customisation for security reasons (ie. no to installing a big Postgres or other RDBMS solution).

Powerful tooling; all local to the environment and the data being worked on; SQL, so it's pretty close to a drop-in replacement compared to our old solution. Really great stuff and I was very happy to see the project gain the confidence to hit 1.0 a while back and now 1.1.

Congrats to everyone!

By @beingflo - 8 months

I've been eyeing DuckDB for a metric collection hobby project. Quick benchmark showed promising query performance over SQLite (unsurprising considering DuckDB is column oriented), but quite a bit slower for inserts. Does anyone have experience using it as an "online" backend DB as opposed to a data analytics engine for interactive use? From what I gather they are trying to position themselves more in the latter use case.

By @log4shell - 8 months

Congratulations to duckdb team! Can't wait to try some of the newly released features and performance improvements.

I am quite curious about the plans for python dataframe like API for duckdb, and python ecosystem in general.

By @tlavoie - 8 months

This episode of Kris Jenkins' Developer Voices podcast talks with a couple authors of a new book on DuckDB, and does a great job of explaining the sorts of things that make it so unusual: https://www.youtube.com/watch?v=_nA3uDx1rlg

By @fforflo - 8 months

The C extensions API is a big big very big thing. As someone who routinely write small PG extensions, I'd love to be able to kinda use the same code for multiple backend DBs. And I guess lots of inspiration has come from all the efforts that embed DuckDb in Postgres as a miniOLAP

By @mrbonner - 8 months

I wanted to use ibis with Duck as the backend. But I am afraid the chicken and egg problem of using existing ML libraries defaulting to use pandas as the dataframe API. Does anyone have a workaround?

By @RandomCitizen12 - 8 months

The namesake: https://en.wikipedia.org/wiki/Eaton%27s_pintail

By @openplatypus - 8 months

Perfect timing. We are currently building a spike to validate migration to DuckDB. We are extremely optimistic based on community feedback.

By @BurnGpuBurn - 8 months

Today I learned:

SELECT 1 / 0 AS var_name;

yields a double with value infinite. Which is SQL spec. Must be fun times to actually use that :-)

By @lakomen - 8 months

"In process analytics database" none the wiser

What Happens When You Put a Database in the Browser?

DuckDB Community Extensions

Memory Management in DuckDB

pg_duckdb: Splicing Duck and Elephant DNA

Farewell Pandas, and thanks for all the fish

Ibis will remove its pandas and Dask backends in version 10.0, favoring DuckDB for better performance and ease of use, while still allowing pandas DataFrames for data transfer.

DuckDB 1.1.0 Released

Related

What Happens When You Put a Database in the Browser?

DuckDB Community Extensions

Memory Management in DuckDB

pg_duckdb: Splicing Duck and Elephant DNA

Farewell Pandas, and thanks for all the fish

Related

What Happens When You Put a Database in the Browser?

DuckDB Community Extensions

Memory Management in DuckDB

pg_duckdb: Splicing Duck and Elephant DNA

Farewell Pandas, and thanks for all the fish