August 30th, 2024

Behind AWS S3's Scale

AWS S3, launched in 2006, supports 100 million requests per second, stores 280 trillion objects, utilizes over 300 microservices, and offers strong durability and data redundancy features for cloud storage.

Read original articleLink Icon
Behind AWS S3's Scale

AWS S3, launched in 2006, has evolved into a highly scalable multi-tenant storage service, supporting 100 million requests per second and storing 280 trillion objects across 31 regions. Initially designed for backups and media storage, it now serves as a backbone for analytics and machine learning. S3's architecture comprises over 300 microservices, organized into four main components: a front-end REST API, a namespace service, a storage fleet, and a management fleet. The storage system utilizes millions of hard drives, employing a new backend called ShardStore, which optimizes data storage and retrieval. S3 employs Erasure Coding for data redundancy, allowing recovery from multiple shard failures while minimizing extra capacity usage. The system's design mitigates hot spots and balances I/O demand across drives, enhancing performance. S3 also leverages parallelism, encouraging users to make multiple connections to optimize throughput. In 2020, S3 introduced strong read-after-write consistency, ensuring immediate access to the latest data. With an impressive durability rate of 11 nines, S3 is designed to minimize data loss, making it a reliable choice for cloud storage.

- AWS S3 supports 100 million requests per second and stores 280 trillion objects.

- The architecture consists of over 300 microservices and employs millions of hard drives.

- S3 uses Erasure Coding for efficient data redundancy and recovery.

- Strong read-after-write consistency was introduced in 2020, enhancing data access.

- S3 offers 11 nines of durability, ensuring minimal data loss.

Link Icon 7 comments
By @cduzz - 8 months
The article states:

    1956: a 3.75MB drive cost $9k
    2024: 26TB drives exist, where 1TB costs $15
I think that radically understates the cost of storage in 1956, when people used mercury delay lines, drum drives, core memory, williams tubes, etc. 1956 was a long time ago and stuff done back then was physically huge, microscopic "modern" units and enormously expensive. Thank goodness photolithography and being able to scale semiconductor transistors...

It apparently cost $3200 per month to lease one of them [1] so actually a storage payment model akin to S3...

[1] https://www.dataclinic.co.uk/history-snapshot-1956-the-world...

By @mannyv - 8 months
The thing that people don't realize about AWS is that the hard part, and the thing they do really well, are authorization and billing.

Every call is authenticated. Changes to authorization ripple across AWS in realtime. If you revoke a priv, things stop working immediately. That's incredibly hard to do, especially when you're authenticating billions of requests a second.

For billing and telemetry, everything's is logged. There are companies that are built on the idea of logging, and at AWS it's just something they do - without slowing anything down.

AWS just might be one of the most complicated things humanity has ever built, which is a weird thought.

By @hobs - 8 months
There's very little "behind" here, just a bunch of speculation on public articles, many of which are recent and not about how S3 scaled.
By @bushbaba - 8 months
> When you aggregate on a large enough scale, a single workload cannot influence the aggregate peak.

This is also the “power” behind snowflake and bigquery.

By @bosky101 - 8 months
Wow.

Also happy that highscalability is still up and about.

By @mickael-kerjean - 8 months
One of the crazy parts about S3 which is not touched on in this post is how it's becoming a file transfer protocol in its own right. Every cloud vendor now has an S3 compatible interface, but when you look deeper into the actual http contract behind the S3 spec, I don't understand how one can shit on FTP and webdav as a protocol and S3 not receive worse treatment. I don't want to be reminiscent of the Dropbox FTP deal on HN, but I hope one day people will steer toward open protocols and stop shitting on open ones for reasons that quite frankly 99% of the people couldn't give much of a shit about.