Let's blame the dev who pressed "Deploy"
The article addresses accountability in software engineering post-incidents like the CrowdStrike outage. It criticizes blaming developers and advocates for holding leaders accountable to address systemic issues. It emphasizes respecting software engineers' expertise.
Read original articleThe article discusses the accountability in the software engineering industry following incidents like the CrowdStrike outage. The author argues against blaming developers for bugs and outages, pointing fingers instead at CEOs, customers, hospitals, governments, middle management, and boards. The piece criticizes the lack of accountability among leaders who prioritize profit over public service and highlights the importance of respecting software engineers' expertise and decisions. It emphasizes that holding developers responsible without addressing underlying organizational issues is counterproductive. The author draws parallels with other professions like Structural Engineers and Anesthesiologists who receive respect for their work and suggests that software engineers should be treated similarly to ensure accountability. The article concludes by condemning the practice of scapegoating developers for systemic failures in the industry.
Related
Tech's accountability tantrum is pathetic
Silicon Valley tech giants criticized for lack of accountability and ethical behavior. Companies like Uber, Amazon, Google, and individuals like Elon Musk prioritize innovation over laws and ethics. Resistance to oversight and lobbying against regulations noted. Importance of accountability stressed to prevent societal harm.
Bad habits that stop engineering teams from high-performance
Engineering teams face hindering bad habits affecting performance. Importance of observability in software development stressed, including Elastic's OpenTelemetry role. CI/CD practices, cloud-native tech updates, data management solutions, mobile testing advancements, API tools, DevSecOps, and team culture discussed.
The IT Industry is a disaster (2018)
The IT industry faces challenges in IoT and software reliability. Concerns include device trustworthiness, complex systems, and security flaws. Criticisms target coding practices, standards organizations, and propose accountability and skill recognition.
CrowdStrike fail and next global IT meltdown
A global IT outage caused by a CrowdStrike software bug prompts concerns over centralized security. Recovery may take days, highlighting the importance of incremental updates and cybersecurity investments to prevent future incidents.
> Neither the offerings nor crowdstrike tools are for use in the operation of aircraft navigation, nuclear facilities, communication systems, weapons systems, direct or indirect life-support systems, air traffic control, or any application or installation where failure could result in death, severe physical injury, or property damage.
(Originally in all-caps, presumably to make it sound more legally binding)
I thought this blog might have some substance about proper postmortem investigations and how to evaluate and address the circumstances that led to a failure like this, but it has none of that. It’s just a very angry rant about CEOs and middle management. The premise is that engineers can’t bear any responsibility for their actions because they don’t get “respect”
This has to be the 10th time I’ve seen arguments that “blame” is the right action in this case, but with the key exception that we’re only allowed to blame people other than the engineers. The last article was a lengthy rant about how it’s actually QA’s fault and engineers shouldn’t be expected to ensure their own code is correct, therefore engineers are blameless.
This is empty calories for people who like ragebait, but nothing more.
And from now on every engineer now has the autonomy to refuse to use certain libraries and software stacks they are unfamiliar with, can refuse to submit a change, will have control over whether to push out software they worked on, etc.
And a huge pay bonus as well since they have all this CEO-like risk/responsibility now.
This in a way covers the issue. Never in all my decades developing have had an estimate accepted. All the developers get is "It must be done by ...., no exceptions".
So you end up pulling all nighters for weeks, and because you are tired, errors or bad decisions creep in. So yes, the issue is fully with upper management.
I left a company because a big project was about to start and I could see it would be a big cluster**. People who stayed told me that is what happened.
Major logic error. See how much chaos ensued when that display went down? That's why it needs protection to keep the display up. Dumb or not is irrelevent. It is mission-critical regardless.
… arguing for 100% test coverage or SOLID principles are more like philosophies and anecdotes without a lot of hard data supporting them.
Software engineering looks less like engineering and more like a lot of bike shedding conversations.
Blaming the developers (or any specific individual/group for that matter) is a cop out, it’s easy and lazy and doesn’t get to the root of the problem, which is more often than not a lack of processes, tools, information and lack of time/desire from leadership to address “technical debt” (for lack of a better term), no matter how many times the devs bring that up.
When you blame an individual or a group you can close the case shut on the post mortem and not get to any substantive improvements, meaning this can and will happen again, just to somebody else.
That’s why blameless postmortem and a blameless culture is so important. This is a good article about that philosophy:
> My summary of blameless culture is: when there is an outage, incident, or escaped bug in your service, assume the individuals involved had the best of intentions, and either they did not have the correct information to make a better decision, or the tools allowed them to make a mistake.
There is incompetence and complacency showing at every level after the CrowdStrike outage. Both the people selling CrowdStrike and the ones implementing it.
> I remember times when leaders had dignity and self-respect. They would go on stage and apologize
Etiquette rules change over time, but a constant throughout history is that people in power don't take responsibility voluntarily. The "good old days" where leaders had dignity and self-respect never existed.
> [...] delusional claim how software engineers should bear the responsibility for bugs and outages
The /opinion/ that people /should/ be held responsible can't be delusional. Apparently the author believes that software engineers should bear zero responsibility -- even when their software kills people -- because we don't get enough respect. I don't agree with that opinion but it's a bit rich to call other people delusional when making one unfounded claim after another.
The post as a whole is way too angry and too cynical for me.
I mean, it is a bit of a cluster fuck.
Most of the issues I've seen are due to speed or cost. We don't have time for tests.we don't have time to record the business requirements in a collective place (just look through multiple JIRA stories and piece it together). We don't have money for dedicated QA roles. So of course problems happen. Luckily my team only works on a lower criticality site.
None of this is really a dev's fault. It's leadership and the culture they incentivise.
The article says:
> You want software engineers to be accountable for their code, then give them the respect they deserve.
The problem is, respect is something that's taken as much as its something given. We can't even decide for ourselves if software is worthy of respect. Can anyone learn to code from a coding bootcamp, where getting a job is all that matters? Or is it a discipline that takes years to master, where mistakes can and will bring down the global economy? Are we glorified plumbers, or are we mathematicians and civil engineers combined?
If you see yourself as a code monkey, of course you can't be "held responsible" for the results of your work. Coding bootcamps don't teach infosec. Its up to your company to set good practices and your job is just to follow them.
Its only if you personally want to take your role in society seriously that it makes sense to consider not just your job, but the effect your job has on the wider world. I'm personally of the opinion that this mindset is almost always long term positive for your career. Its less "blame the dev for hitting deploy" and more "I'm the dev. No, I won't hit deploy on that code in the state its in."
I didn't go to a coding bootcamp. I went to university. There they forced all of us engineering & CS students to do an ethics course - which was actually fantastic. There they taught us about the Therac-25: a computer controlled radiation machine which killed a bunch of people. The engineers on the ground knew that it needed more testing, but the company insisted it was fine and pushed it out the door.
Here's the question: If you were one of those engineers, what would you do? If you knew, or suspected, that a bug in your code could bring down 911 services and hospitals, ground planes or give people a lethal dose of radiation, do you really trust your manager to make the engineering call? Do you think the CEO understands the risk that is being taken by hitting deploy?
Of course, if we're playing the blame game, the blame ultimately the blame falls on the CEO of the company or something.
But forget the blame game. You won't be fired. You won't lose your cushy job. The question is: Who do you want to be in situations like this? Think about this question now. You won't have time in the moment when your boss tells you to hit deploy, and you have second thoughts.
Me? I want to be someone who would say no.
This is so naive that it is impossible to take the author seriously.
Related
Tech's accountability tantrum is pathetic
Silicon Valley tech giants criticized for lack of accountability and ethical behavior. Companies like Uber, Amazon, Google, and individuals like Elon Musk prioritize innovation over laws and ethics. Resistance to oversight and lobbying against regulations noted. Importance of accountability stressed to prevent societal harm.
Bad habits that stop engineering teams from high-performance
Engineering teams face hindering bad habits affecting performance. Importance of observability in software development stressed, including Elastic's OpenTelemetry role. CI/CD practices, cloud-native tech updates, data management solutions, mobile testing advancements, API tools, DevSecOps, and team culture discussed.
The IT Industry is a disaster (2018)
The IT industry faces challenges in IoT and software reliability. Concerns include device trustworthiness, complex systems, and security flaws. Criticisms target coding practices, standards organizations, and propose accountability and skill recognition.
CrowdStrike fail and next global IT meltdown
A global IT outage caused by a CrowdStrike software bug prompts concerns over centralized security. Recovery may take days, highlighting the importance of incremental updates and cybersecurity investments to prevent future incidents.