August 26th, 2024

Predicting the future of distributed systems

Object storage is increasingly integrated into transactional and analytical systems, enhancing reliability. Future programming models may shift code management to infrastructure, despite skepticism about the sustainability of emerging technologies.

Read original articleLink Icon
Predicting the future of distributed systems

significant ways to reduce perceived risks and demonstrate clear value. The evolution of distributed systems is marked by the integration of object storage into transactional and analytical frameworks, which is seen as a step-change in value. However, the adoption of new programming models remains challenging due to the perception of high investment risks and the difficulty in identifying one-way-door versus two-way-door decisions. Object storage has matured and is increasingly utilized across various systems, offering features that enhance reliability and simplicity. The future of programming models may involve a shift towards extracting code from applications into infrastructure, allowing for better security, management, and portability. This transition could lead to a more efficient development process, where auxiliary code is managed independently, thus reducing the burden on developers. Despite the potential benefits, skepticism remains regarding the longevity and viability of emerging technologies, as well as the control engineers have over their stacks. The path forward is complex, but the opportunities for innovation in distributed systems and programming models are significant.

- Object storage is becoming integral to both transactional and analytical systems.

- The distinction between one-way-door and two-way-door decisions is crucial for technology adoption.

- New programming models may shift code management from applications to infrastructure.

- There is skepticism about the sustainability of emerging technologies in distributed systems.

- The evolution of programming models could enhance security and portability of business logic.

Link Icon 5 comments
By @leventov - 5 months
I don't buy this "object storage + Iceberg is all you need for OLAP" hype. If the application has sustained query load, it makes sense to provision servers rather than go pure serverless. If there are already provisioned servers, it makes sense to cache data on them (either in their memory or SSDs) to avoid round-trips to object storage. This is the architecture of the latest breed of OLAP databases such as Databend and Firebolt, as well as the latest iterations of Redshift's and Snowflake's architecture. Also, this is the approach of the newest breed of vector databases, such as Turbopuffer.

For OLAP use cases with real-time data ingestion requirements, object-storage-only approach also leads to write amplification. Therefore, I don't think that architectures like Apache Pinot, Apache Paimon, and Apache Druid are going anywhere.

Another problem with "open table formats" like Iceberg, Hudi, and Delta Lake is their slow innovation speed.

I've recently argued about this at greater length here: https://engineeringideas.substack.com/p/the-future-of-olap-t...

By @jschrf - 5 months
I dunno, Postgres seems pretty solid.
By @seattleeng - 5 months
A new programming model for distributed computing is desperately needed. Something between a full operational system a la Temporal, but without the extreme operational overhead + a sane cooperative runtime like Golang.

I think we're probably too early to build this today. Ray is used at my current job for scaling subroutines in our distributed job system. It's the closest I've seen.

By @jamesblonde - 5 months
This was on the front page 3 days ago #deng

https://news.ycombinator.com/item?id=41363499

By @gslaller - 5 months
Having superficial knowledge about data storage/DBMS, perhaps I am just spewing nonsense. But I always imagine that a compaction layer for the long-tail/old data should be a standard, where less accessed(sometimes once a month) is pushed to a s3(but still queryable) like storage.