September 19th, 2024

We accidentally burned through 200GB of proxy bandwidth in 6 hours

Skyvern's AI agent consumed 200GB of proxy bandwidth in six hours, costing $500, due to repeated downloads of a Google machine learning model. Solutions include local caching and URL blocking.

Read original articleLink Icon
FrustrationConcernCuriosity
We accidentally burned through 200GB of proxy bandwidth in 6 hours

Skyvern, an AI agent designed to automate browser workflows, experienced a significant issue when it unexpectedly consumed 200GB of proxy bandwidth in just six hours, costing approximately $500. The incident was discovered when the founder noticed a spike in failure rates and bandwidth alerts. Initial concerns about potential account abuse were dismissed after reviewing usage stats. Further investigation revealed that repeated calls to a Google URL, specifically for downloading a machine learning model, were responsible for the excessive bandwidth usage. The problem stemmed from Skyvern not persisting browser state between sessions, causing the system to repeatedly download the model. To address this, the team decided to implement two solutions: running Chrome locally to save the user data directory, which would cache the model, and blocking the specific Google URL to prevent future downloads. These measures aimed to mitigate the issue and ensure more efficient bandwidth usage moving forward.

- Skyvern consumed 200GB of proxy bandwidth in six hours, costing around $500.

- The excessive usage was due to repeated downloads of a Google machine learning model.

- The lack of persistent browser state led to continuous uncached downloads.

- Solutions included caching the model locally and blocking the problematic URL.

- The incident highlights the importance of monitoring and managing proxy bandwidth effectively.

AI: What people are saying
The discussion around Skyvern's bandwidth issue reveals several key themes and insights.
  • Many commenters suggest exploring alternative proxy solutions, including unlimited bandwidth options and static residential ISP proxies.
  • There is a consensus on the need for better management of external dependencies, particularly regarding reliance on Google services.
  • Some users express skepticism about the cost of bandwidth, questioning the pricing models of cloud services.
  • Several comments highlight the importance of implementing measures to prevent unauthorized downloads and manage bandwidth usage effectively.
  • Technical misunderstandings about bandwidth and data measurement are noted, with calls for clearer definitions.
Link Icon 21 comments
By @patmcc - 7 months
I'm now expecting we'll see a couple things in the next few years:

1. An explosion of residential proxy networks and other stuff to circumvent blocking of cloud IP ranges, for all the various AI scraping tools to use.

2. A corresponding explosion of countermeasures to the above. Instead of blocking suspicious IPs, maybe they get a 3GB file on their request to /scrape-target.html

By @metadat - 7 months
200GB is nothing since 2018 when AT&T mass introduced their 1-gig symmetric fiber. Any single common gigabit link can run 200GB in 15 minutes.

On any gig link, over the course of 6 hours you can transmit a little more than 4TB one way.. which is 40x more.

By @omoikane - 7 months
The discussion linked in the post is from 2022, and the corresponding issue has already been fixed:

https://issues.chromium.org/issues/40220332

I wonder if there is a more recent bug related to this?

By @sam0x17 - 7 months
Gosh I regularly burn through that much just updating games in steam :D. Not proxy bandwidth of course but isn't it funny that the the line between regular usage and $$$ can be what is using the bandwidth. Or rather, isn't it funny that regular consumers expect to be able to use multiple terabytes of data for < $100/mo but the same can still be thousands in other enterprise domains
By @perks_12 - 7 months
200GB for $500? What cloud is this?
By @tristor - 7 months
I would have liked to see a bit more of 5 Whys here. It seems like a consistent lesson that startups have to learn over and over is how to manage external dependencies, and particularly the dangers of having Google as a dependency. This is new Chrom(e|ium) behavior, and it has a real cost, both for this company and for users, which may or may not be worth the ROI, but this is what happens when you have a large scale external dependency: stuff moves without your knowledge, consent, or control.

Instead of Always. Be. Closing. it should be Always. Be. Mitigating. Dependencies. for startups.

By @8organicbits - 7 months
What infrastructure is this using? Bandwidth seems pretty pricy
By @dusted - 7 months
" 200GB of proxy bandwidth was approximately $500 burned over the course of 6 hours"

The fuck ? So Internet is literally more expensive than buying a drive at amazon, paying for shipping, filling it up putting it on a truck towards a destination anywhere in the world.

By @hkon - 7 months
Literally means cloudprotection in Norwegian. Thought for a second we had gotten our own cloudflare.
By @tcfhgj - 7 months
Please, gigabyte isn't a unit of bandwidth.

Bandwidth is measured in data/time

By @bradley13 - 7 months
"We run leverage proxy networks and run headful browser instances"

Um...say what? I'm pretty broadly based in IT, and I have no idea what that means.

By @elphinstone - 7 months
Skyvern is a great name, very evocative. Typical arrogant Google, downloading trash to the user without consent.
By @olliej - 7 months
Honestly given many of these stories, $500 seems to be getting off pretty lightly.

It’s still absurd to me that many (most?) of these hosting/bandwidth providers don’t seems to allow automatic cut offs and such

By @tim_at_ping - 7 months
Hello,

A (different) proxy company owner here. This sucks! Sorry that you lost out on so much bandwidth.

Feel free to reach out to me at tim@pingproxies.com and I'd be happy to get you set up on our service and credit you with 100GB of free bandwidth to help soften the blow. I'll also be able to get you pricing alittle better than you're currently on if you are interested ;)

Within the next few months we're also releasing a bunch of tools to help stop things like this happening on our residential network such as some intelligent routing logic, spend controls and a few other things.

You may also want to look into Static Residential ISP Proxies - we charge these per IP address rather than bandwidth and they often end up more economical. We work with carriers like Spectrum, Comcast & AT&T directly to get IP addresses on their networks so they look like residential connections but host them in datacenters - this way you get 99.99%+ availability, 1G+ throughput, stable IP addresses and have unlimited bandwidth.

@ everyone else in the thread; if you run a start-up and need proxies then email me - happy to credit you with 50GB free residential bandwidth + give some advice on infra if needed.

Cheers, Tim at Ping

By @ang_cire - 7 months
Blocking Google from downloading anything onto your computer without consent is always a good idea.
By @meindnoch - 7 months
>200GB of proxy bandwidth

Gigabyte is a measure of information.

Bandwidth is information transmitted over time.

By @keepamovin - 7 months
you shouldn’t be paying by the terabyte. Colocate and just pay for the maximum throughout. Far better rates