New Amazon S3 Tables: Storage optimized for analytics workloads
Amazon has launched S3 Tables to enhance storage for analytics, utilizing Apache Iceberg for faster queries and improved transaction rates, with automatic maintenance and integration with AWS Glue Data Catalog.
Read original articleAmazon has introduced S3 Tables, a new feature designed to optimize storage for analytics workloads, particularly for tabular data such as purchase transactions and sensor data. This feature utilizes the Apache Iceberg format, allowing for efficient querying with popular engines like Amazon Athena and Apache Spark. S3 Tables promise significant performance improvements, offering up to three times faster query speeds and ten times more transactions per second compared to self-managed storage solutions. The new table buckets serve as a specialized storage type within S3, providing durability, scalability, and performance while automatically optimizing storage for cost and query efficiency. Each table bucket is unique within its AWS region and can contain multiple tables organized into namespaces for better management. The service includes automatic maintenance features such as compaction, snapshot management, and removal of unreferenced files, relieving users of these tasks. Integration with AWS Glue Data Catalog is currently in preview, enhancing data querying capabilities across AWS analytics services. S3 Tables are available in specific AWS regions, and users are charged based on storage, requests, and maintenance operations.
- Amazon S3 Tables optimize storage for analytics workloads using Apache Iceberg format.
- Users can expect up to 3x faster queries and 10x more transactions per second.
- Automatic maintenance features include compaction and snapshot management.
- Integration with AWS Glue Data Catalog enhances data querying capabilities.
- S3 Tables are currently available in select AWS regions.
Related
DuckDB Meets Postgres
Organizations shift historical Postgres data to S3 with Apache Iceberg, enhancing query capabilities. ParadeDB integrates Iceberg with S3 and Google Cloud Storage, replacing DataFusion with DuckDB for improved analytics in pg_lakehouse.
AWS powered Prime Day 2024
Amazon Prime Day 2024, on July 17-18, set sales records with millions of deals. AWS infrastructure supported the event, deploying numerous AI and Graviton chips, ensuring operational readiness and security.
Behind AWS S3's Scale
AWS S3, launched in 2006, supports 100 million requests per second, stores 280 trillion objects, utilizes over 300 microservices, and offers strong durability and data redundancy features for cloud storage.
New physical AWS Data Transfer Terminals let you upload to the cloud faster
AWS launched the Data Transfer Terminal in Los Angeles and New York to expedite data uploads to the cloud, allowing users to reserve slots for secure, high-throughput data transfers.
Amazon Aurora DSQL
Amazon Aurora DSQL is a serverless distributed SQL database offering high availability, strong data consistency, and automatic updates. It is PostgreSQL-compatible and supports various applications, including cloud-native and SaaS solutions.
I thought AWS was dropping the ball on Open Table Formats, but this one is a smart move and a direct shot fired at Databricks with Delta.
Simply by making it the default for software engineers wanting a big data table on S3
Instead of $23.5/TB/month for normal s3, it's $26.5/TB/month for s3 tables. That's cheap. They charge for compactions and other table services, but are cheap.
AWS have been falling behind Databricks-Snowflake at the higher AI layers, but going low (metaphorically and literally) this puts the cat amongst the pigeons with them. At what point should the competition lawyers look at this?
Related
DuckDB Meets Postgres
Organizations shift historical Postgres data to S3 with Apache Iceberg, enhancing query capabilities. ParadeDB integrates Iceberg with S3 and Google Cloud Storage, replacing DataFusion with DuckDB for improved analytics in pg_lakehouse.
AWS powered Prime Day 2024
Amazon Prime Day 2024, on July 17-18, set sales records with millions of deals. AWS infrastructure supported the event, deploying numerous AI and Graviton chips, ensuring operational readiness and security.
Behind AWS S3's Scale
AWS S3, launched in 2006, supports 100 million requests per second, stores 280 trillion objects, utilizes over 300 microservices, and offers strong durability and data redundancy features for cloud storage.
New physical AWS Data Transfer Terminals let you upload to the cloud faster
AWS launched the Data Transfer Terminal in Los Angeles and New York to expedite data uploads to the cloud, allowing users to reserve slots for secure, high-throughput data transfers.
Amazon Aurora DSQL
Amazon Aurora DSQL is a serverless distributed SQL database offering high availability, strong data consistency, and automatic updates. It is PostgreSQL-compatible and supports various applications, including cloud-native and SaaS solutions.