December 3rd, 2024

New Amazon S3 Tables: Storage optimized for analytics workloads

Amazon has launched S3 Tables to enhance storage for analytics, utilizing Apache Iceberg for faster queries and improved transaction rates, with automatic maintenance and integration with AWS Glue Data Catalog.

Read original articleLink Icon
New Amazon S3 Tables: Storage optimized for analytics workloads

Amazon has introduced S3 Tables, a new feature designed to optimize storage for analytics workloads, particularly for tabular data such as purchase transactions and sensor data. This feature utilizes the Apache Iceberg format, allowing for efficient querying with popular engines like Amazon Athena and Apache Spark. S3 Tables promise significant performance improvements, offering up to three times faster query speeds and ten times more transactions per second compared to self-managed storage solutions. The new table buckets serve as a specialized storage type within S3, providing durability, scalability, and performance while automatically optimizing storage for cost and query efficiency. Each table bucket is unique within its AWS region and can contain multiple tables organized into namespaces for better management. The service includes automatic maintenance features such as compaction, snapshot management, and removal of unreferenced files, relieving users of these tasks. Integration with AWS Glue Data Catalog is currently in preview, enhancing data querying capabilities across AWS analytics services. S3 Tables are available in specific AWS regions, and users are charged based on storage, requests, and maintenance operations.

- Amazon S3 Tables optimize storage for analytics workloads using Apache Iceberg format.

- Users can expect up to 3x faster queries and 10x more transactions per second.

- Automatic maintenance features include compaction and snapshot management.

- Integration with AWS Glue Data Catalog enhances data querying capabilities.

- S3 Tables are currently available in select AWS regions.

Link Icon 2 comments
By @exergy - 5 months
A managed iceberg service, marketed as a new S3 bucket type.

I thought AWS was dropping the ball on Open Table Formats, but this one is a smart move and a direct shot fired at Databricks with Delta.

Simply by making it the default for software engineers wanting a big data table on S3

By @jamesblonde - 5 months
They are like s3 buckets, but you write iceberg tables. They will do maintenance for you, make it available in AWS Glue for querying via Athena, etc.

Instead of $23.5/TB/month for normal s3, it's $26.5/TB/month for s3 tables. That's cheap. They charge for compactions and other table services, but are cheap.

AWS have been falling behind Databricks-Snowflake at the higher AI layers, but going low (metaphorically and literally) this puts the cat amongst the pigeons with them. At what point should the competition lawyers look at this?