Behind AWS S3's Scale
AWS S3, launched in 2006, supports 100 million requests per second, stores 280 trillion objects, utilizes over 300 microservices, and offers strong durability and data redundancy features for cloud storage.
Read original articleAWS S3, launched in 2006, has evolved into a highly scalable multi-tenant storage service, supporting 100 million requests per second and storing 280 trillion objects across 31 regions. Initially designed for backups and media storage, it now serves as a backbone for analytics and machine learning. S3's architecture comprises over 300 microservices, organized into four main components: a front-end REST API, a namespace service, a storage fleet, and a management fleet. The storage system utilizes millions of hard drives, employing a new backend called ShardStore, which optimizes data storage and retrieval. S3 employs Erasure Coding for data redundancy, allowing recovery from multiple shard failures while minimizing extra capacity usage. The system's design mitigates hot spots and balances I/O demand across drives, enhancing performance. S3 also leverages parallelism, encouraging users to make multiple connections to optimize throughput. In 2020, S3 introduced strong read-after-write consistency, ensuring immediate access to the latest data. With an impressive durability rate of 11 nines, S3 is designed to minimize data loss, making it a reliable choice for cloud storage.
- AWS S3 supports 100 million requests per second and stores 280 trillion objects.
- The architecture consists of over 300 microservices and employs millions of hard drives.
- S3 uses Erasure Coding for efficient data redundancy and recovery.
- Strong read-after-write consistency was introduced in 2020, enhancing data access.
- S3 offers 11 nines of durability, ensuring minimal data loss.
Related
Show HN: S3HyperSync – Faster S3 sync tool – iterating with up to 100k files/s
S3HyperSync is a GitHub tool for efficient file synchronization between S3-compatible services. It optimizes performance, memory, and costs, ideal for large backups. Features fast speeds, UUID Booster, and installation via JAR file or sbt assembly. Visit GitHub for details.
Using S3 as a Container Registry
Adolfo Ochagavía discusses using Amazon S3 as a container registry, noting its speed advantages over ECR. S3's parallel layer uploads enhance performance, despite lacking standard registry features. The unconventional approach offers optimization potential.
AWS powered Prime Day 2024
Amazon Prime Day 2024, on July 17-18, set sales records with millions of deals. AWS infrastructure supported the event, deploying numerous AI and Graviton chips, ensuring operational readiness and security.
Amazon S3 now supports conditional writes
Amazon S3 has introduced conditional writes to prevent overwriting existing objects, enhancing reliability for concurrent updates in applications. This feature is free and accessible via AWS SDK, API, or CLI.
Continuous reinvention: A brief history of block storage at AWS
Marc Olson discusses the evolution of Amazon Web Services' Elastic Block Store (EBS) from basic storage to a system handling over 140 trillion operations daily, emphasizing the need for continuous optimization and innovation.
1956: a 3.75MB drive cost $9k
2024: 26TB drives exist, where 1TB costs $15
I think that radically understates the cost of storage in 1956, when people used mercury delay lines, drum drives, core memory, williams tubes, etc. 1956 was a long time ago and stuff done back then was physically huge, microscopic "modern" units and enormously expensive. Thank goodness photolithography and being able to scale semiconductor transistors...It apparently cost $3200 per month to lease one of them [1] so actually a storage payment model akin to S3...
[1] https://www.dataclinic.co.uk/history-snapshot-1956-the-world...
Every call is authenticated. Changes to authorization ripple across AWS in realtime. If you revoke a priv, things stop working immediately. That's incredibly hard to do, especially when you're authenticating billions of requests a second.
For billing and telemetry, everything's is logged. There are companies that are built on the idea of logging, and at AWS it's just something they do - without slowing anything down.
AWS just might be one of the most complicated things humanity has ever built, which is a weird thought.
This is also the “power” behind snowflake and bigquery.
Also happy that highscalability is still up and about.
Related
Show HN: S3HyperSync – Faster S3 sync tool – iterating with up to 100k files/s
S3HyperSync is a GitHub tool for efficient file synchronization between S3-compatible services. It optimizes performance, memory, and costs, ideal for large backups. Features fast speeds, UUID Booster, and installation via JAR file or sbt assembly. Visit GitHub for details.
Using S3 as a Container Registry
Adolfo Ochagavía discusses using Amazon S3 as a container registry, noting its speed advantages over ECR. S3's parallel layer uploads enhance performance, despite lacking standard registry features. The unconventional approach offers optimization potential.
AWS powered Prime Day 2024
Amazon Prime Day 2024, on July 17-18, set sales records with millions of deals. AWS infrastructure supported the event, deploying numerous AI and Graviton chips, ensuring operational readiness and security.
Amazon S3 now supports conditional writes
Amazon S3 has introduced conditional writes to prevent overwriting existing objects, enhancing reliability for concurrent updates in applications. This feature is free and accessible via AWS SDK, API, or CLI.
Continuous reinvention: A brief history of block storage at AWS
Marc Olson discusses the evolution of Amazon Web Services' Elastic Block Store (EBS) from basic storage to a system handling over 140 trillion operations daily, emphasizing the need for continuous optimization and innovation.