August 12th, 2024

Distributed == Relational

The article explores how distributed systems can utilize relational database principles, advocating for parallel data gathering, triggers for function invocations, and user-friendly alternatives to SQL for efficient software development.

Read original articleLink Icon
Distributed == Relational

The article discusses the relationship between distributed systems and relational databases, proposing that distributed architectures can be effectively modeled using relational principles. It illustrates a scenario where one system (A) needs to gather data from two other systems (B and C) before invoking a function on a fourth system (D). The traditional sequential approach is contrasted with a more efficient parallel method, where A sends requests to B and C simultaneously, allowing for better resource utilization. The author suggests that this model can be implemented using existing technologies that allow for message passing and caching of function arguments. The discussion extends to the use of triggers in relational databases, which can facilitate function invocations based on data changes. The article also critiques SQL, advocating for more user-friendly alternatives for relational querying and function invocation. The author envisions a system where API clients can create custom endpoints and utilize functional programming concepts, leading to more flexible and efficient software development. The conclusion emphasizes that efficient distributed systems are best expressed through functions triggered by upsert operations on addressable relations, paving the way for innovative applications, including reactive user interfaces.

- Distributed systems can be modeled using relational database principles.

- Efficient data gathering can be achieved through parallel requests rather than sequential ones.

- Triggers in relational databases can facilitate function invocations based on data changes.

- There is a need for more user-friendly alternatives to SQL for relational querying.

- The proposed model allows for custom API endpoints and functional programming operations.

Link Icon 7 comments
By @hughesjj - 5 months
> Truly efficient distributed systems are most naturally expressed through functions as triggers invoked from upsert operations on addressable relations

O_o

Just because you can express something as an 'upsert', that doesn't make it 'relational'. Transactions exist outside the concept of rdbms'. The article doesn't mention relational algebras once.

Yes, a lot of terminology and math in rdbms' are useful in distributed computing, but you have the causality backwards.

I don't get all the author's hate for sql. It's one of the most successful declarative languages ever.

Nothing prevents you from modelling a distributed system as a set of key-value stores (as we often do today), the idea of a message queue is independent of using a database as the mechanism to do so. 'using postgress'/rdbms doesn't mean your entire system is 'relational'.

Wouldn't it be better for D to lazily request information from b and c on your behalf in the 'maximally efficient' case? Given D is where the computation is run and it could cache the results. From an auth perspective that seems simpler than cross wiring connections between all nodes as proposed.

All this talk and nothing about n-phase commits or Byzantine generals/any tie backs to the typical way of talking about distributed computing, but they dance around the subjects.

IDK. Sorry man. Didn't like the article, which feels bad because you seem passionate about the presentation of it.

Edit: looked through the sub stacks other posts. They do kinda talk about relational algebra in a subsequent post, but overall I'm curious if the author has looked into dataflow programming before. It seems like the author is kind of trying to describe that concept but with a vocabulary mostly consisting of rdbms terminology and history

https://en.m.wikipedia.org/wiki/Dataflow_programming

By @LudwigNagasena - 5 months
In the first diagram I see well-encapsulated services B, C and D and their orchestrator A.

In the second diagram I see ill-defined responsibilities and a looming callback hell.

What if we could look at more concrete examples with a dozen services that model a real process? Maybe that would elucidate the benefits.

By @xpe - 5 months
> A distributed computing system, then, is naturally expressed as a set of relational stores with triggers.

1. This statement is too big of a leap; it doesn’t follow from the setup which only required an upsert operation and key/value storage. Nothing in the setup example requires relational algebra. Agree?

2. This seems like a hasty generalization. Is the author claiming that the toy example is representative of distributed systems? If so, what aspects?

IMO, the post started strong but fizzled; this is why I’m giving pointed feedback.

By @brandonbloom - 5 months
The request/response optimization discussed in the first half of this post has been explored quite a bit in the context of Object-Oriented Programming and Actors, where the desired feature is called "Promise Pipelining":

http://www.erights.org/elib/distrib/pipeline.html

Outside of the E programming language and in the realm of language-agnostic tooling, you can find promise pipelining in some RPC frameworks, such as Cap'n Proto:

https://capnproto.org/rpc.html

Generally, this work comes from the Object-Capabilities community.

By @two_handfuls - 5 months
This article claims that the sequential messages of the first examples can be replaced by database operations for performance.

I would like to see someone run the experiment. Databases come with their own overheads, after all.

By @xpe - 5 months
If the author is here, I’d suggest naming the arguments for D to be (b, c) so as to correspond with B, C.