September 2nd, 2024

How CrowdStrike Stopped Everything

On July 19, 2024, a CrowdStrike software update caused a global IT outage affecting 8.5 million Windows devices, disrupting essential services and highlighting vulnerabilities in modern interconnected systems.

Read original articleLink Icon
How CrowdStrike Stopped Everything

On July 19, 2024, a significant global IT outage caused by a software update from cybersecurity firm CrowdStrike rendered millions of Windows computers inoperable, disrupting essential services across various sectors. The incident was triggered by a mismatch in input fields during a content update to the CrowdStrike Falcon Windows Sensor, leading to system crashes and the infamous Blue Screen of Death (BSOD). The outage affected approximately 8.5 million devices, impacting hospitals, airlines, emergency services, and everyday applications, resulting in canceled surgeries, grounded flights, and inaccessible emergency call centers. Experts noted that while such outages are difficult to prevent entirely, their effects could be mitigated through improved system resilience and better software deployment practices. The incident also highlighted vulnerabilities in data management, as critical information was lost during the downtime, and it provided opportunities for cybercriminals to exploit the situation through phishing attempts. The CrowdStrike outage serves as a reminder of the interconnectedness of modern systems and the potential consequences of IT failures.

- A software update from CrowdStrike caused a global IT outage affecting millions of Windows devices.

- The outage disrupted critical services, including healthcare, transportation, and emergency response.

- Experts emphasize the need for improved resilience and better software deployment practices to mitigate future outages.

- The incident resulted in lost data and increased phishing attempts by cybercriminals.

- The event underscores the vulnerabilities in modern interconnected systems and their potential impact on society.

Link Icon 6 comments
By @jmclnx - 8 months
I know people many suffered with this and in some cases the suffering was real. For most people it was just an inconvenience.

And I am a bit ashamed to say, the panic from users users who could not get to Office 365 or their Windows PC cannot boot still brings a bit of a smile to my face. I think there is a German word for this :)

A relative worked from my house for those days because their internet was also broken. We worked for 2 different companies, me on a Linux Workstation, him on Windows 11 spending time with his help desk with me "translating".

One lesson learned, the Help Desk people should be trained to avoid Tech Words when dealing with people who's workflow is just Email and Excel. Both the Help Desk and my relative had nothing but a high level of frustration dealing with each other.

By @nitinreddy88 - 8 months
Any new information available here? Sorry couldn't find anything new than what world knows so far.
By @duxup - 8 months
No real insights here, just high level explanation and empty quotes from folks.

> The root cause analysis (RCA) means that a CrowdStrike programmer(s) did not check their inputs before pushing an update to the CrowdStrike Falcon Windows Sensor in production.

How is it this isn’t just automated and / or that update automatically run in a VM or something and when it crashed the rollout prevented?

It’s not a new concept…

By @cosmin800 - 8 months
Not everything, there are companies that do not own, or use any single windows machine for this exact reason waiting to happen. Was this worse than msblaster or? I remember that, painful.
By @mindslight - 8 months
The article didn't even touch on the social factor of insurers and other types of middle managers uniformly pushing everyone to install an unnecessary RCE-based piece of software to check off their finely crafted bullet points that demand centralized legibility at the expense of everything else. When Microsoft does something that knocks some large amount of systems out, it's at least understandable why such a monoculture exists. But this state of affairs was entirely self inflicted. And an article in CACM should really be addressing these factors, because everybody already knows Crowdstroke itself was supremely incompetent. The question is not how Crowdstroke can prevent this type of software bug from happening again, but rather how we as a society can prevent the creation of more centralizing companies like Crowdstroke, especially ones that leverage the regulatory apparatus to drive adoption of their top-down version of "security".