January 1st, 2025

Databases in 2024: A Year in Review

In 2024, Redis Ltd. and Elastic N.V. changed their licensing models amid competition, while Databricks and Snowflake intensified rivalry, focusing on ecosystem integration. DuckDB gained popularity for analytics.

Read original articleLink Icon
Databases in 2024: A Year in Review

In 2024, the database landscape experienced significant changes, particularly regarding licensing and competition among major players. Redis Ltd. transitioned from a permissive BSD-3 license to a dual license model, prompting backlash and the emergence of forks like Valkey and Redict. This shift was part of Redis Ltd.'s strategy to consolidate control and prepare for an IPO. Similarly, Elastic N.V. reverted its licensing for Elasticsearch back to AGPL after initially adopting a dual-license model, following competitive pressures from Amazon's OpenSearch. The ongoing rivalry between Databricks and Snowflake intensified, with both companies investing heavily in open-source large language models (LLMs) and competing for dominance in data management ecosystems. Databricks acquired Tabular for $2 billion, while Snowflake announced its Polaris catalog service. The competition has shifted from raw performance to the broader ecosystem surrounding databases, emphasizing compatibility and user experience. Additionally, DuckDB emerged as a popular choice for analytical queries, with various extensions developed to integrate it into existing systems like Postgres. Overall, the year highlighted the challenges faced by open-source database vendors against cloud giants and the evolving dynamics of database technology.

- Redis Ltd. faced backlash after changing its license, leading to the creation of forks.

- Elastic N.V. reverted Elasticsearch's license back to AGPL amid competition with Amazon.

- Databricks and Snowflake intensified their rivalry, focusing on ecosystem integration and open-source LLMs.

- DuckDB gained popularity for analytical queries, with extensions for Postgres released.

- The database market is increasingly influenced by cloud vendors, challenging open-source ISVs.

Link Icon 54 comments
By @atombender - 4 months
The article mentions Greenplum, but it's worth noting that when the code was closed, several of the original developers created an open-source fork, Cloudberry, which seems to be thriving. Cloudberry was accepted into the Apache project this year, and has synced with Postgres 14, whereas the closed-source Greenplum is still stuck on Postgres 12.

The architecture is quite ancient at this point, but I'm not sure it's completely outdated. It's single-master shared-nothing, with shards distributed among replicas, similar to Citus. But the GPORCA query planner is probably the most advanced distributed query planner in the open source world at this point. From what I know, Greenplum/Cloudberry can be significantly faster than Citus thanks to the planner being smarter about splitting the work across shards.

By @mrtimo - 4 months
Enjoyed his roundup in the "Shoving Ducks into Everything" section.

DuckDB is a great tool. In April 2020, the creator of DuckDB gave a talk at CMU. In the beginning he makes a convincing argument (in 5 minutes) why data scientists don't use RDBMS and how this was the genesis of DuckDB. Here is a video that starts 3 minutes into the talk (where is argument starts): https://youtu.be/PFUZlNQIndo?si=ql9n2QuBlAEuGIqo&t=204

By @gigatexal - 4 months
This take screams more than a technical criticism but of something personal. “I'll be blunt: I don't care for Redis. It is slow, it has fake transactions, and its query syntax is a freakshow. Our experiments at CMU found Dragonfly to have much more impressive performance numbers (even with a single CPU core). In my database course, I use the Redis query language as an example of what not to do.” (From the article)

Of course it’s not to be used as a general purpose DB it’s keys and values. Used for caches and things like that. In my experience in real world scenarios and loads vanilla single threaded Redis is stable, fast, and nigh bulletproof.

By @codeulike - 4 months
Weird how SQL Server and its Azure variants gets no mention. It dominates in certain sectors. DBEngines ranks it third most popular overall https://db-engines.com/en/ranking
By @PeterZaitsev - 4 months
I think one thing Andy misses about why people were pissed about Elastic and Redis but not as many for MongoDB and some other is their license and size of Contributors Community.

When original license is as restricted as AGPL it is unlikely there is much of embedded use... so less people are impacted in truly catastrophic way

Also if there is no contributor community to speak of... who is going to do the fork ?

I put some thoughts about it in my post about ScyllaDB https://peterzaitsev.com/thoughts-on-scylladb-license-change...

By @antirez - 4 months
Wow, the reasons why Redis commands API suck in Andy's video (linked in the post) are the weakest ever. It is possible to make a case against the Redis API (I would not agree of course but... it's totally legitimate), but you gotta have stronger arguments than those, particularly if you are a teacher of some kind. Especially: you need to be somewhat fluent in Redis and how developers use Redis in order to understand why so many people like it, and then elaborate what it's wrong about it (if you believe there is something wrong). The video shows a general feeling of "I don't really use / know this, but I don't like how NON-SQL it is".
By @bcoates - 4 months
"I've never met anybody that used Alteryx"

I have! It's a pretty good no-code/minimal-code graphical ELT+Analytics in one tool. It's one of those alternate-universe tools that has it's own way of doing things from everything else in the industry, but it’s pragmatic and the people who use it tend to love it.

The one thing that makes it viable is that is has/had (pre-acquisition) very aggressive compatibility with anything else that can hold data, so you can use it as a bolt-on to whatever other databases or files your company has.

Despite what the PE press release about the acquisition says, it has virtually nothing to do with AI, at lease in the modern big NN sense.

If you're looking to fix your giant pile of alteryx workbooks or migrate them to something else, hmu

By @mebcitto - 4 months
A couple of spicy things:

> OtterTune. Dana, Bohan, and I worked on this research project and startup for almost a decade. And now it is dead. I am disappointed at how a particular company treated us at the end, so they are forever banned from recruiting CMU-DB students. They know who they are and what they did.

Ouch.

> Lastly, I want to give a shout-out to ByteBase for their article Database Tools in 2024: A Year in Review. In previous years, they emailed me asking for permission to translate my end-of-year database articles into Chinese for their blog. This year, they could not wait for me to finish writing this one, so they jocked my flow and wrote their own off-brand article with the same title and premise.

Also sounds like he's preparing a new company:

> I hope to announce our next start-up soon (hint: it’s about databases).

By @ak_111 - 4 months
Wow his database startup that raised 12M died this year after only three years.

If anything this shows how insanely difficult it must be to succeed as a database startup (when was the most recent startup success in this space?), as the founding team is stellar.

On the other hand I am surprised it died this quick and interested to know if they did a proper postmortem. Not only did they raise way more than is needed to survive for three years but the idea is about utilising AI to improve DB performance and I find it hard to imagine they couldn't find more investors to lend them a lifeline with all the AI hype.

By @dig1 - 4 months
> There was no major effort to fork off MongoDB, Neo4j, Kafka, or CockroachDB when they announced their license changes.

AFAIK people didn't take MongoDB seriously from the start, especially with the "web scale database" joke circulating. The Neo4j Community version has been under GPLv3 for quite some time, while the Enterprise version has always been somewhat closed, regardless of whether the source code was available on GitHub (the mentioned license change affected the Enterprise version).

Regarding CockroachDB, I must admit that I've only heard about it on HN and don't know anyone who seriously uses it. As for Kafka, there are two versions: Apache Kafka, the open-source version that almost everyone uses (under the Apache license), and Confluent Kafka, which is Apache Kafka enhanced with many additional features from Confluent, and the license change affected Confluent Kafka. In short, maybe the majority simply didn't care about these projects very much, so there is no major fork.

> It cannot be because the Redis and Elasticsearch install base is so much larger than these other systems, and therefore, there were more people upset by the change since the number of MongoDB and Kafka installations was equally as large when they switched their licenses.

I can’t speak for MongoDB, but the Confluent Kafka install base is significantly smaller than that of Apache Kafka, Redis and ES.

> Dana, Bohan, and I worked on this research project and startup for almost a decade. And now it is dead. I am disappointed at how a particular company treated us at the end, so they are forever banned from recruiting CMU-DB students. They know who they are and what they did.

Call me a skeptic, but I can't see this as a fair approach. If your company fails for whatever reasons, you should not recruit the university department/group/students against your peers (I can't find that CMU-DB was one of the founders of Ottertune).

Wrt Andy, here are [1] somehow interesting views from (presumably) previous employees.

[1] https://www.reddit.com/r/Database/comments/1dgaazw/comment/l...

By @nwatson - 4 months
Interesting seeing the death of blockchain-based AWS QLDB mentioned.

I worked at a company for a while that used QLDB as the primary system of record. The idea is great but the problem is that due to performance and other QLDB limitations all data had to be mirrored to an RDBMS via a streaming/queuing system, and there always were programmatic errors in interpreting data arriving for import into the RDBMS ... text field too long for RDBMS field; wrong data type or overflowing integer; invalid text encoding; ... Etc. These errors had to be noticed, debugged, fixed, and data had to be re-streamed. In the meantime official transactions were missing from the RDBMS side, which was used for reporting, driving the UI, deriving monetary obligations, etc. it was not worth the trouble. (I was lucky to not be involved in that design or implementation.)

By @softwaredoug - 4 months
On the “Amazon can just offer your DB as a service”

Yes this can happen. But a lot of people don’t want a AWS managed service. They're like 30% cheaper for 30% less value. They can develop a bad reputation and feel like weird forks (kinesis vs Kafka) that have weird undocumented gotchas and edge cases that never get fixed. Many teams want to host on k8s anyway, and you’ll probably have better k8s support from the main project. Another example is the success of Flink over hosted Google Dataflow. Seems eventually the teams I know trend to the most mainstream OSS implementation over time, maybe after early prototyping on a managed system.

IMO it might not be the highest growth market anymore. Those who want to pay for a managed service will. But many are just figuring out a k8s based solution to their infra needs as k8s knowledge becomes more ubiquitous.

By @polishdude20 - 4 months
After I interviewing at OtterTune a while back and being bombarded with multiple rounds of leetcode questions, I somehow knew OtterTune wouldn't make it
By @the_arun - 4 months
This person started with news on DB - reviewing all prominent DBs & finally ended talking about love of Larry Ellison. A perfect human in the days of LLMs. Amazing write up.
By @Upvoter33 - 4 months
Pretty funny.

One factual issue: "The university had previously announced that this player was transferring from Louisiana State to Michigan." This is not true. Underwood had committed to LSU but then switched his commitment to Michigan. He was still in high school at the time, and has never attended LSU.

But, do you really expect a funny database prof to know much about football?

By @m_ke - 4 months
Andy is a treasure, if only we had more professors like him
By @ksec - 4 months
>Six years after MySQL v8 went GA, the team turned v9 out on the streets. ......Oracle is putting all its time and energy into its proprietary MySQL Heatwave service.

Oracle actually released 9.1 already in 2024. [1] And expect another release this month, and every quarter. So I think MySQL continues to get some new features bug fix and support like it used to. Contrary to most people think it is all going to Heatwave. I just hope Vector will be open source later as official to MySQL rather than behind Heatwaves.

[1] https://dev.mysql.com/doc/relnotes/mysql/9.1/en/news-9-1-0.h...

By @badindentation - 4 months
The section on Larry Ellison is amusing.
By @maeil - 4 months
Good read!

> Postgres' support for extensions and plugins is impressive. One of the original design goals of Postgres from the 1980s was to be extensible. The intention was to easily support new access methods and new data types and operations on those data types (i.e., object-relational). Since 2006, Postgres' "hook" API. Our research shows that Postgres has the most expansive and diverse extension ecosystem compared to every other DBMS.

Greenhorn developers don't even know that there are non-Postgres databases which have extensions too - such is the gap! I wouldn't be surprised if Postgres had as many as all others combined.

By @RedShift1 - 4 months
I've been using plain postgres for over 5 years now, reading this I feel like I'm in the eye of a storm...
By @CT4u8798 - 4 months
I love SQL. I'm not a full-time developer but always use SQL over other abstractions, which I find extremely confusing and way more complicated that plain SQL.
By @memhole - 4 months
Love the style! CMU making databases cool. Sorry to hear about OtterTune.
By @kwillets - 4 months
I spent the past year puzzling over the DB market as well, but I don't feel like I'm much closer to understanding it.

It appears that a lot of attention is now directed at the folks doing 100 MB queries, and the high end has moved past everybody's radar. My idea of an exciting product is Ocient, who have skipped over Cloud and gone for hyperscale on-prem hardware. Yellowbrick is also a contender here.

I have a lot of experience with Vertica, and they seem to have gotten stuck in this niche as well, with sales tilted towards big accounts, but less traction in smaller shops, and a difficult road to get a SaaS or similar easy-start offering.

There's a crossover point where self-managed is cheaper than cloud, but nobody seems to have any idea where it is. Snowflake will gladly tell you that your sub-$1M Vertica cluster should be replaced by $10M of sluggish SaaS, and that you are saving money by doing so. These decisions seem more in the realm of psychology or political science.

DHH's cloud exit was a refreshing take on the expense issue, even if it wasn't strictly in the database space -- the cost per VCPU and so forth that he documented is a good start for estimating savings, and he debunked a lot of the "hidden costs" that cloud maximalists claim.

In the business/financial space the biggest news to me was the correction in Snowflake's stock price, which seemed to indicate that investors were finally noticing metrics like price-performance, but they added a little more AI and went back into irrationality.

I'm heavily in favor of DuckDB, Hudi, Iceberg, S3 tables, and the like. Mixing high-end and low-end tools seems like the best strategy (although settling on one high-end DWH has also worked IME), and the low end is getting better and cheaper, squeezing out the mid-range SaaS vendors.

In research I found Goetz Graefe's work in offset-value coding exciting -- he's wired it into query operators in a way that saves a lot of CPU on sorting and joins/aggregation. This is a technique that I've applied favorably in string sorting, and it was discovered in the DB community decades ago but largely forgotten. (This work precedes 2024, but I'm a slow study.)

By @quotemstr - 4 months
There's one QOL extension that I haven't seen anyone else implement: dimensional analysis. I can declare a column is an integer. Why not an integer that expresses feet? Why shouldn't I be able to write SELECT 1inch + 1cm and get a correctly computed length? Why can't the query parser help me avoid nonsense like SELECT 1kg + 1hr? All this stuff is pretty straightforward to add and would help avoid avoidable mistakes.
By @phartenfeller - 4 months
There is a lot about Larry Ellison, but not a single word about this year's rather big Oracle release (23ai)?
By @dandan7 - 4 months
Thanks for the article. On the topic of redis: 3 executives from redis built FalkorDB (succeeded redisgraph) raising 3m to build a graphdb for better rag (ref:https://github.com/FalkorDB/GraphRAG-SDK)
By @based2 - 4 months
By @roark_howard - 4 months
DuckDB dominating over DataFusion could fuel the ongoing language war with a great half baked argument!
By @wahnfrieden - 4 months
2024 was also the year that Realm died
By @hdesh - 4 months
> The upcoming year is going to be the test of strength for many database startups. Nobody wants to be the next MariaDB Corporation,

The link for "MariaDB corporation" points to an empty image with white colour background. Can anyone explain the context here?

By @rednafi - 4 months
Loved the overview. Hated the shade toward Redis. Redis has arguably the best key-value query syntax, and there’s a reason so many people swear by it. True, the decision-makers at Redis Ltd are absolute pieces of trash, but Redis itself is a delightful piece of engineering artifact.

I don’t care about the billion-dollar drama behind a piece of tech, but Redis defined the key-value query API for many similar databases. Trashing it just because it isn’t SQL-like feels unjustified.

By @samanthasu - 4 months
would love to see what Andy's take on GreptimeDB https://github.com/GreptimeTeam/greptimedb
By @rozenmd - 4 months
I loved ottertune, it's a shame it died the way it did.
By @sigbottle - 4 months
These year in review posts are really neat, I liked the AI in review posts really well.

Maybe algorithms review or TCS review or some specific math topic review next?

By @uncomplexity_ - 4 months
thank you for your work andy!

for a moment i got reminded of the rap music in your courses

im glad that tigerbeetle got here, really impressive team they have.

there are a lot of other missing alien technologies i've discovered recently too like quickwit which is like elasticsearch but s3-compatible, and typesense which is like elasticsearch but memory-based

By @travisgriggs - 4 months
The funding section had me thinking “one of these is not like the others”. Both the amount and count of successive rounds.
By @lvl155 - 4 months
I highly recommend their Youtube series on databases. They have great guest speakers.
By @pbrunoster - 4 months
oracle, sqlserver , db2 , informix , teradata for example, does not exist ... ok
By @osigurdson - 4 months
More like: "Database license drama - a year in review".
By @zhousun - 4 months
It's such an honor our https://pgmooncake.com/ is covered in the review!

A little sad Andy didn't share more of his thoughts on the intersection between Data and AI, and how that's going to evolve.

By @kopirgan - 4 months
Wow so much to learn reading this . Thanks
By @bionhoward - 4 months
Redis is slow?
By @wslh - 4 months
Great heads up. I wonder about graph databases. He mentioned <https://umbra-db.com/> and <https://cedardb.com/> both include the graph use case and I wonder how they compare to <https://neo4j.com/>.
By @mihirrd - 4 months
Quite informative
By @pbrunoster - 4 months
oracle, sqlserver , db2 , informix , teradata for example, does not exist ... ok ....
By @Beefin - 4 months
TL;DR SQL is king
By @swyx - 4 months
> I need to figure out to juice my stats because in September 2024, Wikipedia removed the article about me over not having enough citations.

guys, what are we doing here. this is ridiculous. andy pavlo cannot get an article on wikipedia? have you seen his work?