July 13th, 2024

Firing Myself

Noormar, a developer, accidentally cleared a production database at a Social Gaming startup, causing revenue losses and customer complaints. The incident led to guilt, a tarnished reputation, and eventual resignation.

Read original article

The article recounts a personal experience of a developer, Noormar, who made a critical mistake while working at a Social Gaming startup in 2010. Assigned to implement a feature inspired by World of Warcraft, he accidentally cleared the USERS table in the production database, wiping out character stats for thousands of paying customers. The error led to a crisis, with the CEO estimating millions in revenue losses. Despite efforts to salvage the data and manage customer complaints, the blame was initially attributed to a 'junior engineer' before it became known that Noormar was responsible. The incident caused a shift in how colleagues perceived him, leading to feelings of guilt and ultimately prompting his resignation and departure from the company. The narrative serves as a cautionary tale about the consequences of oversight and the impact of mistakes in a high-stakes environment.

An arc welder in the datacenter: what could possibly go wrong?

A former IBM engineer fixed a cracked metal frame on a stock exchange's printer in the 1960s. A later inexperienced repair attempt caused chaos, emphasizing the need for expertise in critical system maintenance.

Whose bug is this anyway?? (2012)

Patrick Wyatt shares bug experiences from game development, including issues in StarCraft and Guild Wars. Compiler problems caused bugs, emphasizing the need for consistent tools and work-life balance in development.

35 comments

By @fishtoaster - 10 months

I once made a huge fuckup.

A couple years into my career, I was trying to get my AWS keys configured right locally. I hardcoded them into my .zshrc file. A few days later on a Sunday, forgetting that I'd done that, I committed and pushed that file to my public dotfiles repo, at which point those keys were instantly and automatically compromised.

After the dust settled, the CTO pulled me into the office and said:

1. So that I know you know: explain to me what you did, why it shouldn't have happened, and how you'll avoid it in the future.

2. This is not your fault - it's ours. These keys were way overpermissioned and our safeguards were inadequate - we'll fix that.

3. As long as it doesn't happen again, we're cool.

Looking back, 10 years later, I think that was exactly the right way to handle it. Address what the individual did, but realize that it's a process issue. If your process only works when 100% of people act perfectly 100% of the time, your process does not work and needs fixing.

By @xyst - 10 months

> One of the peculiarities of my development environment was that I ran all my code against the production database.

Hahaha. I still see this being done today every now and then.

> The CEO leaned across the table, got in my face, and said, "this, is a monumental fuck up. You're gonna cost us millions in revenue". His co-founder (remotely present via Skype) chimed in "you're lucky to still be here".

this type of leadership needs to be put on blast. 2010 or 2024, doesn’t matter.

If it’s going to cost “millions in revenue”, then maybe it would have been prudent to invest time in proper data access controls, proper backups, and rollback procedures.

Absolutely incompetent leadership should never be hired ever again. There should be a public blacklist so I don’t make the mistake of ever working with such idiocy.

The only people ever “fired” should be leadership. Unless the intent is on purpose in which you should be subject to jail time

By @freehorse - 10 months

This sounds like a company that does not learn from errors, looks for "junior engineer" scapegoats instead of looking for the systemic processes that facilitated this, and not a great place to stay tbh. This was a chance for the company to reflect on some of their processes, and take measures that will avoid similar issues (and the steps to take are pretty obvious). And the description of what happened afterwords show a probably toxic environment.

It should never be like this, and especially in this case I blame OP 0%. This is something that could happen to anybody in such circumstances. I have not deleted a full database, but have had to restore stuff a few times, I have made mistakes myself and have rushed to fix problems caused by others' mistakes and each single time the whole point and discussions was about improving our processes so that this does not happen again.

By @zitterbewegung - 10 months

> I found myself on the phone to Rackspace, leaning on a desk for support, listening to their engineer patiently explain that backups for this MySQL instance had been cancelled over 2 months ago. Ah.

This is the issue not what the author did. It would be a matter of time that the database would have been accidentally deleted somehow.

By @dudus - 10 months

So a company gives junior engineers full access to a production database without backup so they can work on it developing features that require DDL SQL commands. I've seen it happen before, what I've never seen is someone blame the junior employee when things undoubtedly go south.

I'm not sure I even believe that part of the story. This was either a very disfunctional company or a looooong time ago.

By @cybervegan - 10 months

There's a lot of responsibility there resting on your superiors because they weren't following "best practises". Sure you fucked up, but if they had backups, it wouldn't have been such a disaster, and if you had a Dev environment to test against, it would have been a non-issue entirely. Straight out of Uni, you shouldn't have been expected to know that, but I bet you grew as a consequence.

By @kaiokendev - 10 months

Have been in situations just like this, on pretty much every side (the fuck-upper, the person who has to fix the fuck up, and the person who has to come up with a fuck-up remediation plan)

The most egregious case involved an incompetent configuration that resulted in hundreds of millions $ in lost data and a 6-month long automated recovery project. Fortunately, there were traces of the data across the entire stack - from page caches in a random employee's browser, to automated reports and OCR dumps. By the end of the project, all data was recovered. No one from outside ever found out or even realized anything had happened - we had redundancy upon redundancy across several parts of the business, and the entire company basically shifted the way we did ops to work around the issue for the time being. Every department had a scorecard tracking how many of their files were recovered, and we had little celebrations when we hit recovery milestones. To this day only a few people know who was responsible (wasn't me! lol)

Blame and derision are always inevitable in situations like this. It's how it's handled afterwards that really marks the competence of the company.

By @cstrahan - 10 months

I can relate to this with my own story, where I managed to delete an entire database — my first day on the job, no less.

I was hired by a little photo development company, doing both walk in jobs and electronic B2B orders. I was brought in to pick up on the maintenance and development of the B2B order placement web service the previous developer had written.

Sadly, the previous dev designed the DB schema and software under the assumption that there would only ever be one business customer. When that ceased to be the case, he decided to simply create another database and spin up another process.

So here I am on my first day, tasked with creating a new empty database to bring on another customer. I used the Microsoft SQL Server admin GUI to generate the DDL from one of the existing tables, created (and switched the connection to) a pristine, new DB, and ran the script.

Little did I know, in the middle of many thousands of lines of SQL, the script switched the connection back to the DB from which the DDL was generated, and then proceeds to drop every single table.

Oops.

Of course, the last dev disabled back ups a couple months before I joined. My one saving grace was that the dev had some strange fixation on logging every single thing that happened in a bunch of XML log files; I managed to quickly write some code to rebuild the state of the DB from those log files.

I was (and am) grateful to my boss for trusting my ability to resolve the problem I had created, and placing as much value as he did in my ownership of the problem.

That was about 16 years ago. One of the best working experiences in my career, and a time of rapid technical growth for myself. I would have missed out on a lot if that had been handled differently.

By @easeout - 10 months

That's a mistake I've made before, except we did have nightly backups.

I was not the one who set up the backups, nor did I perform the restore. I just told my senior I made a big mistake, and he said thanks for saying so right away and we're going to handle it. Our client company told their people to take the rest of the day off. It must have been costly. I learned a lesson, but I also internalized some guilt about it.

Reflecting on this, I was in an environment where it was the norm to edit production live, imagining we could be careful enough. I'm suggesting that the error the author publicly took the fall for was not all their fault. Everyone up the chain was responsible for creating that risky situation. How could they not have backups?

By @doctor_eval - 10 months

> I found myself on the phone to Rackspace, leaning on a desk for support, listening to their engineer patiently explain that backups for this MySQL instance had been cancelled over 2 months ago. Ah.

There is no part of this story that’s the protagonist’s fault. What a mess.

By @loktarogar - 10 months

No junior should have been able to cause this much damage on their own without a safety net of some kind.

It's on the company for cancelling their backups.

By @badgersnake - 10 months

I wouldn’t blame you for resigning, it sounds like an awful environment.

But individuals will always make mistakes, systems and processes prevent individuals mistakes from doing damage. That’s what was lacking here, not your fault at all. I just hope lessons were learned.

By @lkrubner - 10 months

If it's true that the company had no backups of their production database, then the engineer was nearly blameless. The CTO should have been fired. If there was no CTO, then the Board Of Directors should have asked the CEO why they felt they were competent to run a software company without a CTO. If the CEO was technical enough to take responsibility for technical decisions, then the CEO should have been fired or severely reprimanded.

If a software company is making money, but developers develop against the production database, and there are no backups, then its the leadership that is at fault. The leadership deserves severe criticism.

By @menzoic - 10 months

Clearly the fault of a terribly lead engineering organization. Mistakes are almost guaranteed to happen. This is why good engineering orgs have guardrails in place. There were no guardrails whatsoever here. Accounts used to manually access adhoc production databases should not have delete permissions for critical data. And worst off all no backups.

By @alex_lav - 10 months

> I found myself on the phone to Rackspace, leaning on a desk for support, listening to their engineer patiently explain that backups for this MySQL instance had been cancelled over 2 months ago. Ah.

As usual, a company with legitimately moronic processes experiences the consequences of those moronic processes when a "junior" person breaks something. Whoever turned off those backups as well as whoever thought devs (especially "junior" devs) should be mutating prod tables by hand are ultimately accountable.

By @m463 - 10 months

I can remember my huge fuckups on a few fingers of one hand.

I'm not implying there are few of them, more like they each have a dedicated finger, and I remember them and the cold sweat feel of each specific one.

The first one was decades ago. There was a server room. They had a hardcopy printer in there, and behind it was a big red mushroom button that shut down the whole room. It was about head level if you were looking behind the printer...

By @CommieBobDole - 10 months

Why is the first paragraph in the article an affiliate link to some medical scam site?

Edit: Also, the user name 'drericlevi' seems to be used on pretty much all other social media outlets by a very online ENT doctor from Melbourne. I think somebody got the password to an unused Substack and is just posting blogspam.

Edit again: Also the story in the post is taken from the 2023 book 'Joy of Agility' by Joshua Kerivsky and modified (presumably by AI) to be told from the first person perspective:

https://books.google.com/books?id=0pZuEAAAQBAJ&pg=PA96&lpg=P...

By @heelix - 10 months

I've really come to appreciate blameless post postmortems. People hide mistakes when you have a culture that punishes.

My biggest foible: Our 400 person company had issued blackberries and was pulled into a project where all the work was in India. I covered EMEA, and phone rates were like a quarter a minute, or something like that. There was no concern about using the phone/data.

I ended up spending a significant amount of time over there and one thing I noticed is it looked like I had a different carrier every time I looked at the device. Did not think much of it in the couple months I was there. When I got back - one of our Ops guys called me and we discovered it was closer to $9/min, with absurd data charges. Sat down with the CFO, as the $12k charge was not projected. I had no idea if I was about to be canned or not.

Instead, leadership and sales got brand new blackberries, with an unlocked SIM card. Celebrations by all the brass... Was very happy to have a job.

By @dlevine - 10 months

After 20+ years in tech, the only time I have ever seen a developer fired for making a mistake was when he leaked credentials because he used one of our repos as the base for an interview project at another company (on a day when he was allegedly working from home). We had a hardcoded key in our package.json to allow it to access a private repo, and he pushed that to a public repo which had an incriminating name like company-x-interview-project-hisname.

We immediately got an email from Github because they scan for such things. We rotated our keys within a few minutes, and he was gone the next day.

Obviously we should have been more careful with our keys, so all would have been forgiven if not for the fact that he leaked keys while looking for another job on company time.

By @lrvick - 10 months

Any privileged access or secrets given to engineers in plaintext on their daily driver workstations -will- be compromised or misused. Full stop. There is never a good reason to do this apart from management inexperienced in security and infrastructure management.

Every single usage of secrets of privileged action of consequence must go through code review followed by multiple signatures where an automated system does the dangerous things on your behalf with a paper trail.

Never follow any instructions to place privileged secrets or access in plaintext on an development OS or you might well be the one a CEO chooses as a sacrificial lamb.

By @steve_adams_86 - 10 months

I can't imagine putting someone who's new to this work in that kind of precarious position. If I let someone make a mistake that severe, I'd apologize to them and work with them through the solution and safeguards to prevent it from happening again.

A little bit of room for error is essential for learning, but this is insane. I'm so glad the only person who has ever put me in that kind of position is me, haha. This career would have seemed so much scarier if the people I worked with early on were willing to trust me with such terrifying error margins.

By @hcarvalhoalves - 10 months

Should expose the CEO’s name. Between this and forcing you to work 3 days straight, that was the least professional way to handle this situation.

By @microflash - 10 months

I recently dealt with something like this. Someone ran a `delete *` statement which truncated a bunch of tables in production with critical business data. They had autocommit on in their database client. Luckily, there was a backup available which restored the data. After the analysis, it was decided to provision less privileged roles and explicitly turn off autocommit in all database clients. I am also introducing a PR workflow with static analysis that detects these issues. Nobody was fired and no names were ever mentioned in the announcements.

By @delichon - 10 months

It makes you a better developer. I backup obsessively BECAUSE I fucked up almost this badly and more than once. Hire yourself back and charge a bit more for the extra wisdom.

By @newaccountman2 - 10 months

> backups for this MySQL instance had been cancelled over 2 months ago.

Uhh, there's the problem, not that someone accidentally deleted something lol

By @throw156754228 - 10 months

A typical commercial change control process exists to factor in human error. There's a cost to setting it up. This company never set it up, but ended up paying another way.

By @swaggyBoatswain - 10 months

Its really the fault of the process decided by upper management, a junior dev shouldnt have that much access. Least access privilege wasnt done correctly here

By @andrewstuart - 10 months

It is ALWAYS the fault of management when the databases are lost.

Engineers must never feel guilty if the company was run in such a way as to make that possible.

By @amackera - 10 months

Less "Firing yourself" and more like liberating yourself from a toxic unprofessional clown show.

By @rambojohnson - 10 months

the fact that the leadership vilified a JUNIOR developer on the erasure of a PRODUCTION database speaks volumes as to how toxic of a workplace this was. glad to hear you promptly walked out. screw places like this. disgusting.

By @langcss - 10 months

She got gaslit. As a junior. Pretty sad.

Firing Myself

Related

An arc welder in the datacenter: what could possibly go wrong?

Whose bug is this anyway?? (2012)

Related

An arc welder in the datacenter: what could possibly go wrong?

Whose bug is this anyway?? (2012)