July 8th, 2024

Show HN: S3HyperSync – Faster S3 sync tool – iterating with up to 100k files/s

S3HyperSync is a GitHub tool for efficient file synchronization between S3-compatible services. It optimizes performance, memory, and costs, ideal for large backups. Features fast speeds, UUID Booster, and installation via JAR file or sbt assembly. Visit GitHub for details.

Read original articleLink Icon
Show HN: S3HyperSync – Faster S3 sync tool – iterating with up to 100k files/s

S3HyperSync is a tool available on GitHub designed for efficient file synchronization between S3-compatible storage services. It is optimized for high performance, memory efficiency, and cost-effectiveness. The tool is particularly useful for creating daily backups of large S3 buckets with millions of files and terabytes of data to a separate AWS account. S3HyperSync reduces the need for expensive GetObject requests and minimizes costly MultiPart uploads. It offers fast iteration and copy speeds on AWS Fargate, along with a UUID Booster feature for quick bucket comparisons. Users can install the tool by downloading the JAR file from the Release Section or building it with sbt assembly. S3HyperSync provides various options for syncing S3 buckets efficiently. For more information on installation, usage guidelines, contributing, licensing, and acknowledgments, visit the S3HyperSync GitHub Repository.

Related

Show HN: High-frequency trading and market-making backtesting tool with examples

Show HN: High-frequency trading and market-making backtesting tool with examples

The GitHub URL leads to the "HftBacktest" project, a Rust framework for high-frequency trading. It offers detailed simulation, order book reconstruction, latency considerations, multi-asset backtesting, and live trading bot deployment.

Show HN: Synapse – TypeScript Toolchain for Cloud Apps

Show HN: Synapse – TypeScript Toolchain for Cloud Apps

Synapse is a full-stack TypeScript toolchain with resource-driven programming, cloud-agnostic libraries, and fine-grained permissions. It includes a TypeScript compiler, fast package manager, and testing framework for local or AWS deployment. Installation instructions vary by OS. Leveraging TypeScript, esbuild, Node.js, Terraform, and AWS SDK. Detailed documentation on GitHub covers Custom Resources, Environments, Packages, and Tests, with a Quick Start guide available.

Resilient Sync for Local First

Resilient Sync for Local First

The "Local-First" concept emphasizes empowering users with data on their devices, using Resilient Sync for offline and online data exchange. It ensures consistency, security, and efficient synchronization, distinguishing content changes and optimizing processes. The method offers flexibility, conflict-free updates, and compliance documentation, with potential enhancements for data size, compression, and security.

Show HN: Standard Webhooks – simplifying 3rd party API's

Show HN: Standard Webhooks – simplifying 3rd party API's

Syncd simplifies webhook integrations by offering real-time data connectivity. Users can tunnel webhooks to different endpoints, test locally, and manage incoming data efficiently. The platform streamlines API integration, providing features like logging, debugging, and local testing. Join the waitlist for early access.

Combine multiple RSS feeds into a single feed, as a service

Combine multiple RSS feeds into a single feed, as a service

The GitHub URL provides details on "RSS Combine," a tool merging multiple RSS feeds. It guides users on local setup, configuration via YAML or environment variables, and generating a static RSS file in S3. Simplifies feed consolidation.

Link Icon 6 comments
By @iknownothow - 6 months
How does it compare with s5cmd [1]? s5cmd is my goto tool for fast s3 sync and they have the following at the top of their Github page:

> For uploads, s5cmd is 32x faster than s3cmd and 12x faster than aws-cli. For downloads, s5cmd can saturate a 40Gbps link (~4.3 GB/s), whereas s3cmd and aws-cli can only reach 85 MB/s and 375 MB/s respectively.

[1] https://github.com/peak/s5cmd

By @kapilvt - 6 months
For large buckets key space enumeration is a significant portion of most bulk operations, especially on a potentially non optimized key space (aka hotspots), there’s a few heuristics that can be utilized, but doing an s3 inventory allows skipping that and focusing on transfer with significantly less api calls, albeit requires bucket preparation.
By @sam_goody - 6 months
How can I be confident that everything was synced correctly? Is there a way to compare the SHA or whatever key S3 provides?

Also, would this work well when there is not a lot of room on the disk it is syncing from? I have had serious issues with the S3 cli in such a scenario?

Also, how would this compare to something like rclone?

By @toomuchtodo - 6 months
Can you target buckets at different providers, such as syncing from AWS to Backblaze (assuming S3 compatible target)?
By @asyncingfeeling - 6 months
Very nice!

Seemingly not the intended use case, and I might be overlooking something, but nice to have features which the s3 sync tool has and I'd personally miss: - profiles - local sync

By @jayzalowitz - 6 months
Anyone have time to dig in and say what tricks its using most?