September 9th, 2024

DuckDB 1.1.0 Released

DuckDB 1.1.0, codenamed "Eatoni," introduces significant updates including new SQL functionalities, improved community extensions, and performance enhancements, aiming to enhance user experience and efficiency in data analysis.

Read original articleLink Icon
DuckDB 1.1.0 Released

DuckDB has announced the release of version 1.1.0, codenamed "Eatoni," which introduces several significant updates and features. This release follows version 1.0.0, which was launched three months prior. Key enhancements include breaking SQL changes, community extensions, and performance improvements. Notably, the handling of division by zero has changed to return infinity instead of NULL, aligning with IEEE-754 standards. The release also introduces community extensions, allowing users to create and distribute their own extensions more easily. New SQL functionalities include a histogram function for data analysis, support for SQL variables, and the ability to unpack columns dynamically. Performance optimizations include dynamic filter pushdown during joins, automatic materialization of common table expressions (CTEs), and parallel streaming queries, which enhance query execution speed. The update also improves the handling of foreign keys and introduces parallel processing for union operations by name. Overall, DuckDB 1.1.0 aims to enhance user experience and performance for data analysis tasks.

- DuckDB 1.1.0, codenamed "Eatoni," has been released with numerous new features.

- Key changes include updated handling of division by zero and improved community extension capabilities.

- New SQL functionalities include histogram calculations and support for SQL variables.

- Performance enhancements focus on dynamic filtering, automatic CTE materialization, and parallel streaming.

- The release aims to improve user experience and efficiency in data analysis tasks.

Link Icon 14 comments
By @wenc - 2 months
Lots of automatic performance optimizations on what is already a very fast engine. (I’ve stopped using Pandas)

I know most software folks feel some type of way about SQL (most don’t grok it beyond a simple SELECT) but this is one of the advantages of declarative languages like SQL and a plan-execute programming paradigm where a plan is created before instructions are run, making it amenable to plan optimization.

Maybe the syntax of the language could be improved (e.g. Linq) but conceptually SQL is historically when we’ve blundered into the right. Data operations are often done in sets rather than loops and it’s a worthwhile investment for software engineers to learn to think in this way if they want to work with data correctly at scale.

Stonebraker was right in that people who avoid SQL are doomed to reinvent it poorly.

By @simonw - 2 months
I feel like R-Tree spatial indexes are potentially the most exciting new feature in this release, but they're buried right down at the bottom of the announcement: https://duckdb.org/2024/09/09/announcing-duckdb-110.html#r-t...
By @ashkankiani - 2 months
Love the expanded C API support! Also those performance improvements are massive! Pushing through filters and the streaming optimization for fetchone() is great! This makes it more viable to use duckdb in smaller queries from python.

I'm pretty excited for variables too! I really wanted them for when I'm using the CLI. Same with query/query_table! I appreciate the push for features that make people's lives easier while also still improving performance.

Everyone who I've introduced duckdb to (at work or outside of work) eventually is blown away (some still have lingering SQL stigma)

By @ZeroCool2u - 2 months
Damn, GeoParquet and R-Tree for spatial indexes is huge!!! ESRI better watch their back!
By @adwf - 2 months
Big fan of DuckDB!

Has saved us a number of times when having to deploy at a remote client with limited on-prem customisation for security reasons (ie. no to installing a big Postgres or other RDBMS solution).

Powerful tooling; all local to the environment and the data being worked on; SQL, so it's pretty close to a drop-in replacement compared to our old solution. Really great stuff and I was very happy to see the project gain the confidence to hit 1.0 a while back and now 1.1.

Congrats to everyone!

By @beingflo - 2 months
I've been eyeing DuckDB for a metric collection hobby project. Quick benchmark showed promising query performance over SQLite (unsurprising considering DuckDB is column oriented), but quite a bit slower for inserts. Does anyone have experience using it as an "online" backend DB as opposed to a data analytics engine for interactive use? From what I gather they are trying to position themselves more in the latter use case.
By @log4shell - 2 months
Congratulations to duckdb team! Can't wait to try some of the newly released features and performance improvements.

I am quite curious about the plans for python dataframe like API for duckdb, and python ecosystem in general.

By @tlavoie - 2 months
This episode of Kris Jenkins' Developer Voices podcast talks with a couple authors of a new book on DuckDB, and does a great job of explaining the sorts of things that make it so unusual: https://www.youtube.com/watch?v=_nA3uDx1rlg
By @fforflo - 2 months
The C extensions API is a big big very big thing. As someone who routinely write small PG extensions, I'd love to be able to kinda use the same code for multiple backend DBs. And I guess lots of inspiration has come from all the efforts that embed DuckDb in Postgres as a miniOLAP
By @mrbonner - 2 months
I wanted to use ibis with Duck as the backend. But I am afraid the chicken and egg problem of using existing ML libraries defaulting to use pandas as the dataframe API. Does anyone have a workaround?
By @RandomCitizen12 - 2 months
By @openplatypus - 2 months
Perfect timing. We are currently building a spike to validate migration to DuckDB. We are extremely optimistic based on community feedback.
By @BurnGpuBurn - 2 months
Today I learned:

SELECT 1 / 0 AS var_name;

yields a double with value infinite. Which is SQL spec. Must be fun times to actually use that :-)

By @lakomen - 2 months
"In process analytics database" none the wiser