July 7th, 2024

Synchronization Is Bad for Scale

Challenges of synchronization in scaling distributed systems are discussed, emphasizing issues with lock contention and proposing alternatives like sharding and consistent hashing. Mailgun's experiences highlight strategies to avoid synchronization bottlenecks.

Read original articleLink Icon
Synchronization Is Bad for Scale

The article discusses the challenges of synchronization in scaling distributed systems, highlighting issues with lock contention and its impact on horizontal scaling. It emphasizes the drawbacks of using locks in high-concurrency environments and presents alternatives like sharding, eliminating locks, consistent hashing, reservation queues, and the Saga Pattern to address synchronization needs efficiently. The author shares experiences from working on a distributed lock service at Mailgun, ultimately abandoning it due to performance limitations. Examples from Mailgun's use of MongoDB showcase strategies like spreading writes across multiple collections to avoid synchronization bottlenecks. The article concludes by cautioning against over-reliance on databases for synchronization, advocating for design approaches that embrace eventual consistency and data normalization to reduce the need for locks.

Link Icon 10 comments
By @kccqzy - 3 months
The author abandoned the distributed lock service he/she is writing, and it might benefit to read the original Google paper on their distributed lock service Chubby to understand why it is so enduring at Google: https://static.googleusercontent.com/media/research.google.c...

> I abandoned the project after it very quickly become apparent that despite having written the service in this super fast, brand new language called golang, the service just wasn’t fast enough to handle the scale we threw at it.

This makes me think the author wishes to use the distributed lock service for some purpose that's not well served by distributed locks. It's not that distributed locks are bad, it's just that the author seems to have a particular use case already in mind that's poorly suited to a distributed lock service.

By @ibash - 3 months
Did I miss it or nowhere did he talk about the problem he was trying to solve?

Any solution can be bad at scale if it doesn’t fit your problem… unfortunately there’s no way to know without stating the problem.

By @metadat - 3 months
> The answer to the synchronization problem for Gubernator was to remove the need for a lock by using a S3-FIFO cache. We did attempt to shard, but the overhead of calculating the hash to shard with, and the increased number of queues and threads needed (thus increased context switching) made this a unviable solution for our use case.

Why would the sharding hash computation be expensive? MD5 would be suitable here; it's blazing fast even when used for 10s of thousands of operations per second on large keys.. it will work fine.

I wish the author had given firmer details on why they abandoned approaches that are proven to work across a wide variety of use-cases.

By @dig1 - 3 months
Without more details about the project requirements, my initial impression is that the author could get most of these features from Zookeeper, which is designed for stuff like distributed locks and synchronization.

However I've seen the pattern to use the database for a complete system synchronization in multiple projects. Don't have to say how this is a bad thing - probably easy to start with but a nightmare to scale later, just like the author mentioned in the article.

By @23B1 - 3 months
I'm curious if HN readers see articles like this and transpose the logic to other areas – like management or leadership. Open question, just curious.
By @exabrial - 3 months
Good news: your problem is not scale.
By @tigron - 3 months
Synchronization scales for redundancy purposes unless you are single core.
By @temporarely - 3 months
> Synchronization is bad for scale

In other news, water is wet.

By @sitkack - 3 months
I would use Aerospike, it has strong consistency or session consistency and will soon get multirecord transactions.

https://dbdb.io/db/aerospike

And also see the second bullet point from this Guy Steele talk from 2010 Strange Loop. Synchronization and ordering and a lack of idempotency need to be heavily justified.

"How to Think about Parallel Programming: Not!" - Guy L. Steele Jr. (Strange Loop 2010) https://www.youtube.com/live/dPK6t7echuA?app=desktop&t=1747s

transcript https://github.com/matthiasn/talk-transcripts/blob/master/St... thanks matthiasn!

Previous discussion https://news.ycombinator.com/item?id=2105661

By @neonsunset - 3 months
Because Go is ill-suited for projects like these :)

Just like synchronization, Go scales poorly with project complexity and aspirations. It is unable to efficiently take advantage of beefy many-core nodes, its GC is bad at achieving high throughput and the language itself is incapable of providing zero-cost abstractions and non-zero cost abstractions end up being more expensive than in other compiled languages.

Should have picked .NET for scalability and low-level tuning. Or Rust if higher engineering cost is acceptable.