August 30th, 2024

Behind AWS S3's Scale

AWS S3, launched in 2006, supports 100 million requests per second, stores 280 trillion objects, utilizes over 300 microservices, and offers strong durability and data redundancy features for cloud storage.

Read original article

AWS S3, launched in 2006, has evolved into a highly scalable multi-tenant storage service, supporting 100 million requests per second and storing 280 trillion objects across 31 regions. Initially designed for backups and media storage, it now serves as a backbone for analytics and machine learning. S3's architecture comprises over 300 microservices, organized into four main components: a front-end REST API, a namespace service, a storage fleet, and a management fleet. The storage system utilizes millions of hard drives, employing a new backend called ShardStore, which optimizes data storage and retrieval. S3 employs Erasure Coding for data redundancy, allowing recovery from multiple shard failures while minimizing extra capacity usage. The system's design mitigates hot spots and balances I/O demand across drives, enhancing performance. S3 also leverages parallelism, encouraging users to make multiple connections to optimize throughput. In 2020, S3 introduced strong read-after-write consistency, ensuring immediate access to the latest data. With an impressive durability rate of 11 nines, S3 is designed to minimize data loss, making it a reliable choice for cloud storage.

- AWS S3 supports 100 million requests per second and stores 280 trillion objects.

- The architecture consists of over 300 microservices and employs millions of hard drives.

- S3 uses Erasure Coding for efficient data redundancy and recovery.

- Strong read-after-write consistency was introduced in 2020, enhancing data access.

- S3 offers 11 nines of durability, ensuring minimal data loss.

Show HN: S3HyperSync – Faster S3 sync tool – iterating with up to 100k files/s

S3HyperSync is a GitHub tool for efficient file synchronization between S3-compatible services. It optimizes performance, memory, and costs, ideal for large backups. Features fast speeds, UUID Booster, and installation via JAR file or sbt assembly. Visit GitHub for details.

Using S3 as a Container Registry

Adolfo Ochagavía discusses using Amazon S3 as a container registry, noting its speed advantages over ECR. S3's parallel layer uploads enhance performance, despite lacking standard registry features. The unconventional approach offers optimization potential.

AWS powered Prime Day 2024

Amazon Prime Day 2024, on July 17-18, set sales records with millions of deals. AWS infrastructure supported the event, deploying numerous AI and Graviton chips, ensuring operational readiness and security.

Amazon S3 now supports conditional writes

Amazon S3 has introduced conditional writes to prevent overwriting existing objects, enhancing reliability for concurrent updates in applications. This feature is free and accessible via AWS SDK, API, or CLI.

Continuous reinvention: A brief history of block storage at AWS

Marc Olson discusses the evolution of Amazon Web Services' Elastic Block Store (EBS) from basic storage to a system handling over 140 trillion operations daily, emphasizing the need for continuous optimization and innovation.

7 comments

By @cduzz - 8 months

The article states:

    1956: a 3.75MB drive cost $9k
    2024: 26TB drives exist, where 1TB costs $15

I think that radically understates the cost of storage in 1956, when people used mercury delay lines, drum drives, core memory, williams tubes, etc. 1956 was a long time ago and stuff done back then was physically huge, microscopic "modern" units and enormously expensive. Thank goodness photolithography and being able to scale semiconductor transistors...

It apparently cost $3200 per month to lease one of them [1] so actually a storage payment model akin to S3...

[1] https://www.dataclinic.co.uk/history-snapshot-1956-the-world...

By @mannyv - 8 months

The thing that people don't realize about AWS is that the hard part, and the thing they do really well, are authorization and billing.

Every call is authenticated. Changes to authorization ripple across AWS in realtime. If you revoke a priv, things stop working immediately. That's incredibly hard to do, especially when you're authenticating billions of requests a second.

For billing and telemetry, everything's is logged. There are companies that are built on the idea of logging, and at AWS it's just something they do - without slowing anything down.

AWS just might be one of the most complicated things humanity has ever built, which is a weird thought.

By @hobs - 8 months

There's very little "behind" here, just a bunch of speculation on public articles, many of which are recent and not about how S3 scaled.

By @bushbaba - 8 months

> When you aggregate on a large enough scale, a single workload cannot influence the aggregate peak.

This is also the “power” behind snowflake and bigquery.

By @bosky101 - 8 months

Wow.

Also happy that highscalability is still up and about.

By @mickael-kerjean - 8 months

One of the crazy parts about S3 which is not touched on in this post is how it's becoming a file transfer protocol in its own right. Every cloud vendor now has an S3 compatible interface, but when you look deeper into the actual http contract behind the S3 spec, I don't understand how one can shit on FTP and webdav as a protocol and S3 not receive worse treatment. I don't want to be reminiscent of the Dropbox FTP deal on HN, but I hope one day people will steer toward open protocols and stop shitting on open ones for reasons that quite frankly 99% of the people couldn't give much of a shit about.

Behind AWS S3's Scale

Related

Show HN: S3HyperSync – Faster S3 sync tool – iterating with up to 100k files/s

Using S3 as a Container Registry

AWS powered Prime Day 2024

Amazon S3 now supports conditional writes

Continuous reinvention: A brief history of block storage at AWS

Related

Show HN: S3HyperSync – Faster S3 sync tool – iterating with up to 100k files/s

Using S3 as a Container Registry

AWS powered Prime Day 2024

Amazon S3 now supports conditional writes

Continuous reinvention: A brief history of block storage at AWS