August 27th, 2024

Predicting the Future of Distributed Systems

Object storage is increasingly integrated into transactional and analytical systems, enhancing reliability. Organizations face challenges in adopting new programming models due to perceived investment risks and uncertainty about technology longevity.

Read original article

DiscontentEnjoymentAnticipation

Predicting the Future of Distributed Systems

significant ways to reduce perceived risks and demonstrate clear value. The evolution of distributed systems is marked by the integration of object storage into transactional and analytical frameworks, which is seen as a step-change in value. However, the adoption of new programming models remains challenging due to the perception of high investment risks and the difficulty in identifying one-way-door versus two-way-door decisions. Object storage has matured and is increasingly utilized across various systems, offering features that enhance reliability and simplicity. The future of programming models may involve a shift towards extracting code from applications into infrastructure, allowing for better management and security. This transition could lead to more portable and secure business logic, ultimately facilitating easier updates and maintenance. The landscape is characterized by a plethora of emerging technologies, but the uncertainty surrounding their longevity and effectiveness complicates decision-making for organizations.

- Object storage is becoming integral to both transactional and analytical systems.

- The distinction between one-way-door and two-way-door decisions is crucial for effective technology adoption.

- New programming models may shift code from applications to infrastructure for better management.

- The future of distributed systems is marked by innovation but also uncertainty regarding technology longevity.

- Organizations face challenges in rationalizing investments in new technologies due to perceived risks.

A Eulogy for DevOps

DevOps, introduced in 2007 to improve development and operations collaboration, faced challenges like centralized risks and communication issues. Despite advancements like container adoption, obstacles remain in managing complex infrastructures.

Is it time to version observability?

The article outlines the transition from Observability 1.0 to 2.0, highlighting structured logs for better data analysis, improved debugging, and enhanced software development, likening its impact to virtualization.

Ask HN: Pragmatic way to avoid supply chain attacks as a developer

The article addresses the security risks of managing software dependencies, highlighting a specific incident of a compromised package. It debates the effectiveness of containers versus VMs and seeks practical solutions.

Don't Believe the Big Database Hype, Stonebraker Warns

Mike Stonebraker critiques the hype around new database technologies, asserting many are not beneficial, while emphasizing the enduring relevance of the relational model and SQL amidst evolving cloud architectures.

Continuous reinvention: A brief history of block storage at AWS

Marc Olson discusses the evolution of Amazon Web Services' Elastic Block Store (EBS) from basic storage to a system handling over 140 trillion operations daily, emphasizing the need for continuous optimization and innovation.

AI: What people are saying

The comments reflect a diverse range of opinions on the article's discussion of object storage and programming models in distributed systems.

Several commenters emphasize the importance of industry adoption of specific APIs to facilitate the transition to distributed systems.
There is a recognition that economic factors, rather than purely technological advancements, drive the dominance of object storage solutions like S3.
Some contributors express concerns about the marketing and recognition of new programming models and tools, highlighting the challenges faced by developers.
Discussions around the potential for smarter storage solutions and the integration of synchronous and asynchronous APIs are prevalent.
Commenters also touch on the future of AI in infrastructure, suggesting a shift towards more abstracted and user-friendly programming paradigms.

14 comments

By @purpleidea - 8 months

> Programming Models

If you read this section, the author gets a lot of things right, but clearly doesn't know the space that well since there have been people building things along these lines for years. And making vague commentary instead of describing the nitty-gritty doesn't evoke much confidence.

I work on one such language/tool called mgmt config, but I have had virtually no interest and/or skill in marketing it. TBQH, I'm disenchanted by the fact that it seems to get any recognition you need to have VC's and a three-year timeline, short-term goals, and a plan to be done by then or move on.

If you're serious about future infra, then it's all here:

https://github.com/purpleidea/mgmt/

Looking for coding help for some of the harder bits that people might wish to add, and for people to take it into production and find issues that we've missed.

By @awkii - 8 months

I think the author has a point with one-way doors slowing down the adoption of distributed systems. The best way to build two way doors is to push for industry adoption of a particular API. In theory the backend of these APIs matter little to me, the developer, so long as they are fast and consistent. Some examples that come to mind is that Apache Beam is a "programming model" for Data pipelines, Akka is a "programming model" for stateful distributed systems, OpenTelemetry for logging/telemetry, and Kubernetes for orchestration. Oh, and local development is a strong preference.

By @jamesblonde - 8 months

I really enjoyed this article. The one point I have issue with is that the dominance of object storage in today's distributed systems is very much due to economics, not technology. There's basically cheering every little step S3 takes towards a POSIX-like distributed file system like HDFS - "consistent listing of files, yeah!". Last week it was preconditions for writing files. There's still huge gymnastics needed in Iceberg/Delta to work with S3 given the lack of atomic rename.

By @buro9 - 8 months

Things I have come to know about distributed systems:

The S3 API (object storage) is the accepted storage API, but you do not need AWS (but they are very good at this).

The Kafka API is the accepted stream/ buffer/ queue API, but you do not need Confluent.

SQL is the query language, but you do not need a relational database.

By @willvarfar - 8 months

Something I anticipate is smarter storage that can do some filtering on push down predicates. There's compute on the storage nodes that is being wasted today.

I was kinda expecting BigQuery to do this under the hood, but it seems like they don't, which is a shame. BigQuery isn't faster than, say, trino on gcs, even though Google could do some major optimisations here.

By @pjdesno - 8 months

Something missing here in the discussion of object storage and databases is any mention of the declining importance of the file system.

From the 70s through the 90s or 00s everything was file system-based, and it was just assumed that the best way to store data in a distributed system - even a globally-distributed one - was some sort of distributed file system. (e.g. Andrew File System, or research projects like OceanStore.

Nowadays the file system holds applications and configuration, but applications mostly store data in databases and object stores. In distributed systems this is done almost exclusively through system-specific network connections (e.g. port 3306 to MySQL, or HTTP for S3) rather than OS-level mounting of a file system.

(not counting HPC, where distributed file systems are used to preserve the developer look and feel of early non-distributed HPC systems)

By @jensneuse - 8 months

I'd like to add that I'm seeing more and more companies unifying synchronous and asynchronous APIs. With the concept of GraphQL Federation, it's possible to "extend" Entities by defining their (primary) keys in a GraphQL Schema. If we're combining this with Async APIs, e.g. NATS or Kafka, we can enable teams to build APIs around events, while still being able to distribute the implementation of how certain fields can be resolved. The Federation Router then joins the Stream with additional data from synchronous services, a very powerful pattern I believe. I wrote a bit more on the topic here: https://wundergraph.com/blog/distributed_graphql_subscriptio...

By @__turbobrew__ - 8 months

Pushing as much down to the infra sounds like aws lambda and friends. You basically upload a zip or container and say, “just run this business code somewhere, I don’t care”. OCI bundles are basically a two day door at this point, you can build them with many tools, and run them with many other tools.

It works great for stateless things, but not so great for stateful things. I guess this plays into state being persisted in object storage or DBs, this allows the application to be stateless.

By @nyrikki - 8 months

On a unrelated note, does anyone know the origins of the one way vs two way door analogy?

In this post it is attributed to Jeff Bezos quotes, but it was popular in the Pacific North West before his rise.

By @BraveNewCurency - 8 months

> One-Way-Door and Two-Way-Door Decisions

See also the "Linux kernel management style" document that's been in the kernel since forever: https://docs.kernel.org/6.1/process/management-style.html

By @notverysubtle - 8 months

The right time to mention Designing Data-Intensive Applications by Martin Kleppmann. Amazing book explaining distributed systems concepts in a digestible language.

By @samstave - 8 months

>>the biggest opportunity for a new programming model is extracting the majority of the code from an application and moving it into the infrastructure instead. The second biggest opportunity is for the remaining code—what people refer to as the business logic, the essence of the program—to be portable and secure.

This was such a well put comment, that truly made me grok the entire article in just this one statement.

---

Infrastructure needs to be invisible, and that is where the future of AI-enabled orchestration/abstraction will allow development to be more poetry than code - whereby we can describe complex logic paths/workflows in a language of intent - and all the components required to accomplish that desired outcome will be much more quickly, elegantly be a reality.

THe real challenge ahead is the divide between those who have the capability and power of all the AI tools available to them, and those who are subjugated by those who do.

For example, an individual can build a lot with the current state of the available tool universe... but a more sophisticated and well funded organization will have a lot more potential capability.'

What I am really interested to know, is if there is a dark Dystopian Cyberpunk AI under-world happening yet?

Whats the state of BadActor/BigCorpo/BigSpy's capability and covert actions currently?

While we are distracted by AI_ClipArt and celebrity voice squabbles, and seemingly Top AI Voices are being ignored after founding organizations for Alignment/Governance/Humane/etc and warning of catastrophe - define The State of Things?

But yeah - extracting the code and letting logic just be handled yet portable, clonable, refactorable easily is where we are already headed. Its amazing and terrifying at the same time.

I'm thankful that all my Cyberpunk Fantasy reading, thinking, imagining and then my tiny part in the overall evolution of the world of tech today, having the opportunity to be here, worked with and build to, in with -- and now seeing the birth of AI and using it daily in my actual interactions with my IRL.

Such an amazing moment in Human History to be here through this.

Predicting the Future of Distributed Systems

Related

A Eulogy for DevOps

Is it time to version observability?

Ask HN: Pragmatic way to avoid supply chain attacks as a developer

Don't Believe the Big Database Hype, Stonebraker Warns

Continuous reinvention: A brief history of block storage at AWS

Related

A Eulogy for DevOps

Is it time to version observability?

Ask HN: Pragmatic way to avoid supply chain attacks as a developer

Don't Believe the Big Database Hype, Stonebraker Warns

Continuous reinvention: A brief history of block storage at AWS