July 22nd, 2024

Is an All-in-One Database the Future?

Specialized databases are emerging to tackle complex data challenges, leading to intricate infrastructures. A universal, all-in-one database remains unfulfilled due to optimization issues and unique challenges of different database types.

Is an All-in-One Database the Future?

As data challenges become more complex, specialized databases have emerged to address specific problems, leading to intricate data infrastructures often reliant on third-party data pipelines and event streaming products. Companies typically utilize multiple databases to meet diverse needs, resulting in a complicated network of interconnected systems. The pursuit of a generalist, all-in-one database that is scalable, efficient across various contexts, and commercially viable remains unfulfilled. Different database types, including relational OLTP, non-relational document stores, and memory-based caches, are optimized for particular use cases, each facing unique challenges that hinder effective consolidation. While Object-Relational Mappers (ORMs) aim to simplify database interactions, they struggle with non-relational types, highlighting the difficulty of creating a unified solution. Postgres is noted for its extensibility through plugins, making it a close contender for a comprehensive solution, yet it still does not fully meet the criteria of a one-stop-shop. The technical and practical challenges associated with developing an all-in-one database render it impractical, maintaining the modern data stack as a complex necessity. Theoretically, a universal database would face issues with optimization, data model overhead, and latency, as each type is designed for specific applications, making a singular solution inefficient. Although Postgres can be enhanced with plugins like pg_vector for vector search, it is not designed to address every problem effectively, as evidenced by the complex data stacks in organizations utilizing it.

Link Icon 5 comments
By @iamlucaswolf - 6 months
I think this question mixes up two orthogonal dimensions: transactionality (i.e., OLTP vs. OLAP) and data model (relational, document-based, graph, vector, ...).

Regarding transactionality: There is an entire area of research on "hybrid transactional and analytical processing" (HTAP) systems that unifies OLAP and OLTP systems. Hyper [1] pioneered this path at TU Munich, it's successor Umbra [2] recently incorporated as CedarDB [3]. There are lots of others. Most of these systems, AFAIK, are relational.

Regarding data model: What we've seen in the past few decades is that non-relational DBMS (excluding key-value stores) only make sense in rare edge cases that require huge scale. There has, e.g. been research [4] that shows that graph databases are still, well, lacking, compared to relational systems. The common pattern seems to be: unless you need to service very specific workloads at huge scales, SQL is probably enough [5]. Then again, it really comes down to intrinsics. If you were to, for example, implement distributed locking using Postgres, you would likely run into problems with MVCC and Xids very quickly.

So, as you already mentioned, there is no silver bullet. But even today, unless you are Meta or Google, SQL is probably enough for a long time and lots of use cases.

(Full disclosure: I'm working on Hyper full-time).

[1]: https://hyper-db.de/ [2]: https://umbra-db.com/ [3]: https://cedardb.com/ [4]: https://homepages.cwi.nl/~boncz/edbt2022.pdf [5]: https://www.youtube.com/watch?v=VxKt245X_ws

By @Etai - 6 months
By @lifeinthevoid - 6 months
It all depends on the use-case, a generalist database can never achieve the performance of specialized databases once the workload gets large enough. It's like asking for a car that's great off-road but also beats an F1 car on the circuit.
By @tacostakohashi - 6 months
Hmm, ever used Oracle?

I'm not here to defend it, but in practical terms, it does everything anyone is likely to need out of the box. Relational, XML, JSON, replication, document...

By @yen223 - 6 months
I would be curious to know if anyone tried using one of the newer data-warehouse databases (Clickhouse, Snowflake, etc) as an OLTP database, to run their crud app.