November 22nd, 2024

Amazon S3 now supports the ability to append data to an object

Amazon S3 Express One Zone now allows users to append data to existing objects, benefiting applications like log-processing and media-broadcasting, and is accessible via AWS SDK, CLI, or Mountpoint.

Read original articleLink Icon
ExcitementSkepticismFrustration
Amazon S3 now supports the ability to append data to an object

Amazon S3 Express One Zone has introduced a new feature that allows users to append data to existing objects. This enhancement is particularly beneficial for applications that require continuous data input, such as log-processing and media-broadcasting applications. Previously, these applications had to store data locally before uploading the final object to S3. With the new capability, users can now directly append data to existing objects and read them immediately within S3 Express One Zone. This feature is available in all AWS Regions where the storage class is offered, and users can utilize the AWS SDK, AWS CLI, or Mountpoint for Amazon S3 (version 1.12.0 or higher) to get started. For further details, users are directed to the S3 User Guide.

- Amazon S3 Express One Zone now supports appending data to existing objects.

- This feature is useful for applications that continuously receive data, like log-processing and media-broadcasting.

- Users can append data directly without needing to combine it in local storage first.

- The feature is available in all AWS Regions where S3 Express One Zone is offered.

- Users can access this functionality through the AWS SDK, AWS CLI, or Mountpoint for Amazon S3.

AI: What people are saying
The introduction of the append feature in Amazon S3 Express One Zone has generated a mix of excitement and skepticism among users.
  • Many users appreciate the potential for real-time data processing applications, such as log processing and media workflows.
  • Concerns are raised about the limitations of the feature, including the requirement to specify a write offset and the 10,000 parts limit.
  • Some users express disappointment over the higher costs and lower availability of the Express One Zone compared to standard S3.
  • Comparisons are made with other cloud storage solutions, highlighting existing features in competitors like Google Cloud Storage and Azure.
  • There are discussions about the implications of Amazon's changes on the broader ecosystem and compatibility with third-party services.
Link Icon 23 comments
By @simonw - 5 months
Wrote some notes on this here: https://simonwillison.net/2024/Nov/22/amazon-s3-append-data/

Key points:

- It's just for the "S3 Express One Zone" bucket class, which is more expensive (16c/GB/month compared to 2.3c for S3 standard tier) and less highly available, since it lives in just one availability zone

- "With each successful append operation, you create a part of the object and each object can have up to 10,000 parts. This means you can append data to an object up to 10,000 times."

That 10,000 parts limit means this isn't quite the solution for writing log files directly to S3.

By @electroly - 5 months
The original title is "Amazon S3 Express One Zone now supports the ability to append data to an object" and the difference is extremely important! I was excited for a moment.
By @teractiveodular - 5 months
For comparison, while GCS doesn't support appends directly, there's hacky but effective workaround in that you can compose existing objects together into new objects, without having to read & write the data. If you have existing object A, upload new object B, and compose A and B together so that the resulting object is also called A, this effectively functions the same as appending B into A.

https://cloud.google.com/storage/docs/composite-objects#appe...

By @sureIy - 5 months
It's crazy to me that anyone would still consider S3 after R2 was made available, given the egress fees. I regularly see people switching to R2 and saving thousands or hundreds of thousands by switching.
By @ChrisArchitect - 5 months
Please fix title: Amazon S3 Express One Zone now supports the ability to append data to an object
By @supermatt - 5 months
This doesnt seem very useful for many cases, given that you NEED to specify a write offset in order for it to work. So you need to either track the size (which becomes more complex if you have multiple writers), or need to first request the size every time you want to do a write and then race for it using the checksum of the current object... Urghhh.
By @thecleaner - 5 months
I don't understand the bashing in the comments. I image this is a tough distributed systems challenge (as with anything S3). Of course AWS is charging more since they've cracked it.

Does anybody know if appending still has that 5TB file limit ?

By @taeric - 5 months
I'm curious on the different use cases for this? Firehose/kinesis whatever the name seems to have the append case covered in ways that I would think has fewer foot guns?
By @styx31 - 5 months
I am surprised it was not supported until now? How does it compare to azure blob append (which exists for years)?

I have been using azure storage append blob to store logs of long running tasks with periodic flush (see https://learn.microsoft.com/en-us/rest/api/storageservices/u...)

By @merek - 5 months
This is specifically for S3 "Express One Zone"
By @exac - 5 months
I wonder what the implications for all the s3-like APIs is going to be.
By @from-nibly - 5 months
Logs are a terrible usecase for this. Loki already existed and it uses the cheaper more highly available s3
By @kylegalbraith - 5 months
I got excited until I saw the one zone part. That is a critical difference in terms of cost.
By @wood_spirit - 5 months
Will be exciting to see what adaptations are needed and how performance and cost changes for delta lake and iceberg and other cloud mutable data storage formats. It could be really dramatic!

S3 is often used as a lowest common denominator, and a lot of the features of azure and gcs aren’t leveraged by libraries and formats that try to be cross platform so only want to expose features that are available everywhere.

If these days all object stores do append then perhaps all the data storage formats and libs can start leveraging it?

By @crest - 5 months
I wonder at which point they'll admit they've added back all the complexity of a hierarchical filesystem.
By @klysm - 5 months
This sounds like a thin wrapper over an underlying object store
By @msoad - 5 months
Does it really work for livestreams? Can I stream read and write on the same video file? That is huge if true!

Edit: oh it’s only in one AZ

By @chx - 5 months
If you want to know the differences between Express One Zone and normal, check https://www.vantage.sh/blog/amazon-s3-express-one-zone this blog post. I had no idea this even existed. tl;dr: it's x7 expensive.
By @andrewstuart - 5 months
There are many, many S3 compatible storage services out there provided by other companies.

Most of them cheaper, some MUCH cheaper.

By @water9 - 5 months
Incredible breakthrough. What will they come up with next the ability to remove data from an object? It’s clear that not working from home is really working out for them
By @maryndisouza - 5 months
This is a fantastic addition to Amazon S3 Express One Zone! The ability to directly append data to existing objects opens up new possibilities for real-time data processing applications. Whether it's continuously adding log entries or appending video segments in a media workflow, this feature will streamline workflows and improve efficiency for many use cases. It's great to see AWS continuing to innovate and make data management even more flexible and user-friendly. Excited to see how this feature enhances the scalability of applications across various industries!
By @andrewstuart - 5 months
Amazon has no right to do this - it no longer owns the S3 standard and should respect the ecosystem and community.

S3 has stagnated for a long time, allowing it to become a standard.

Third parties have cloned the storage service and a vast array of software is compatible. There’s drivers, there’s file transfer programs and utilities.

What does it mean that Amazon is now changing it.

Does Amazon even really own the standard any more, does it have the right to break the long standing standards?

I’m reminded of IBM when they broke compatibility of the PS/2 computers just so it could maintain dominance.