My Cloud Billing Screw-Up
Matt Gowie recounts a cloud billing error from a Dockerfile change that led to nearly $1000 in AWS charges. He emphasizes validating changes before deployment and suggests using a terraform module for cost management.
Read original articleMatt Gowie shares a personal experience of a significant cloud billing mistake he made while working as a solo consultant. He describes how a code change to a Dockerfile led to a failure in starting a container within an AWS ECS cluster. Due to the repeated attempts to pull the container from a private subnet to the internet through a NAT Gateway, he incurred nearly $1000 in data processing charges over a weekend. Fortunately, he was able to explain the situation to his client, who was understanding, and he successfully obtained a credit from AWS to cover the unexpected costs. Gowie emphasizes the importance of validating changes before deployment and shares a tip about managing cloud costs, particularly in test environments, by using a terraform module designed to remove unnecessary resources.
- Matt Gowie experienced a significant cloud billing error due to a code change that caused repeated container failures.
- The incident resulted in nearly $1000 in charges from AWS for data processing over a weekend.
- Gowie was able to resolve the issue with his client and received a credit from AWS.
- He highlights the importance of validating code changes before deployment to avoid similar mistakes.
- Gowie offers a solution for managing cloud costs in test environments through a terraform module.
Related
Is Cloudflare overcharging us for their images service?
Jérôme Petazzoni reported unexpectedly high charges for Cloudflare's Images service, exceeding $400 instead of the anticipated $110, due to confusing billing practices. He is considering alternatives like Amazon S3.
How to save $13.27 on your SaaS bill
The author discusses managing costs with Vercel's analytics, converting images to reduce charges, and building a custom API using SQLite. They faced deployment challenges but plan future enhancements.
How HashiCorp evolved its cloud infrastructure
Michael Galloway discusses HashiCorp's cloud infrastructure evolution, emphasizing the need for clear objectives, deadlines, and executive buy-in to successfully redesign and expand their services amid growing demands.
We survived 10k requests/second: Switching to signed asset URLs in an emergency
Hardcover experienced a surge in Google Cloud expenses due to unauthorized access to their public storage. They implemented signed URLs via a Ruby on Rails proxy, reducing costs and enhancing security.
Admins wonder if the cloud was such a good idea after all
Many organizations find cloud services from major providers have not met cost-saving expectations, with significant price increases attributed to rising electricity and labor costs, prompting calls for better ROI assessments.
Sounds perfectly fine until you realize the internet is a vast space for people constantly scraping. I too left it over the weekend and came back to 70k unique series in our cloud account, pushing the bill well over $1k. What's worse is that Grafana is kind enough to not charge for these spikes, if you catch them before 48hrs. I caught it approx 50 hours later.
Like the OP, though, Grafana was nice enough to make it fall off after I explained the situation. Lesson learned!
I had deployed basic websites / servers with more managed platforms before, but I needed? more control to be able to host the C++ server.
So I found GCP, created a docker image, and got the server up and running somehow. We played for maybe 10 minutes before we ran out of stuff to do, and stopped playing. What I didn't realise at the time was that auto-scaling was a concept. I thought when there was no traffic then the server wouldn't work, and I forgot I ever deployed it.
Anyways, a month later I got a $400 bill, not nearly as much as some people have lost but for a broke college student it was a lot - especially considering I only used it for 10 minutes.
Thankfully, they forgave the bill (thanks jdt!), but it still scared me. I was still at the point back then where a bill like that could've killed my company, or at the very least got me into a lot of trouble.
After that, I pretty much ruled out usage-based billing for my company as too risky. This was quite a few years ago, but to this day I still have no major dependencies that offer usage-based billing.
1. "It was late and I was done for the evening so I didn't validate the change." - if I could use one sentence to explain what is my value as a DevOps engineer, it could be putting these safety pins in place. You shouldn't need to validate anything - it should be a part of the pipeline.
2. AWS is using extortion fees for things like NAT Gateway processing, egress traffic etc. Knowing that, and being aware that container images need to be pulled frequently, it does make sense to use ECR or any other internally hosted container registry. If you don't do that, you will spend that $1000 anyway, just over a longer period than a weekend.
3. Any changes on Friday evening - just don't.
Turned out it was a billing error on their side that they would have probably completely ignored if we didn't notice it.
Luckily, my credit card had a reduced limit, and later Google Cloud forgave the debt as long as I promised not to do it again.
Related
Is Cloudflare overcharging us for their images service?
Jérôme Petazzoni reported unexpectedly high charges for Cloudflare's Images service, exceeding $400 instead of the anticipated $110, due to confusing billing practices. He is considering alternatives like Amazon S3.
How to save $13.27 on your SaaS bill
The author discusses managing costs with Vercel's analytics, converting images to reduce charges, and building a custom API using SQLite. They faced deployment challenges but plan future enhancements.
How HashiCorp evolved its cloud infrastructure
Michael Galloway discusses HashiCorp's cloud infrastructure evolution, emphasizing the need for clear objectives, deadlines, and executive buy-in to successfully redesign and expand their services amid growing demands.
We survived 10k requests/second: Switching to signed asset URLs in an emergency
Hardcover experienced a surge in Google Cloud expenses due to unauthorized access to their public storage. They implemented signed URLs via a Ruby on Rails proxy, reducing costs and enhancing security.
Admins wonder if the cloud was such a good idea after all
Many organizations find cloud services from major providers have not met cost-saving expectations, with significant price increases attributed to rising electricity and labor costs, prompting calls for better ROI assessments.