CrowdStrike fixes start at "reboot up to 15 times", gets more complex from there
A faulty update to CrowdStrike's Falcon security software caused Windows crashes, impacting businesses. Microsoft and CrowdStrike advise rebooting affected systems multiple times or restoring from backups to resolve issues. CrowdStrike CEO apologizes and promises support.
Read original articleA buggy update to CrowdStrike's Falcon security software caused Windows-based systems to crash, leading to major disruptions for various businesses. Microsoft and CrowdStrike have pulled the affected update, advising IT admins on various fixes. The first recommendation is to reboot affected machines multiple times to try to grab a non-broken update before the faulty driver causes the blue screen of death (BSOD). If rebooting doesn't work, admins can restore systems using a backup from before the buggy update was released or manually delete the problematic file. Deleting the file is particularly time-consuming for systems using BitLocker encryption, as it requires the recovery key to unlock encrypted disks. CrowdStrike CEO George Kurtz expressed apologies for the inconvenience caused and assured impacted customers of support in restoring their systems. Both Microsoft and CrowdStrike are continuously updating their recommendations for fixes as the situation evolves.
Related
Latest Crowdstrike Update Causes Blue Screen of Death on Microsoft Windows
Crowdstrike update causes BSOD on Windows, affecting many users with various sensor versions. Company investigating, advises waiting for official Technical Alert for details and workarounds. Users urged to monitor forum for updates.
Cybersecurity platform Crowdstrike down worldwide, users logged out of systems
CrowdStrike, a cybersecurity platform, faced a global outage affecting users in countries like India, Japan, Canada, and Australia due to a technical error in its Falcon product. Users encountered disruptions, including BSOD errors. CrowdStrike is actively working on a fix.
CrowdStrike code update bricking PCs around the world
CrowdStrike's Falcon Sensor update triggers Windows crashes with Blue Screen of Death due to csagent.sys file issues. Workaround involves file deletion in Safe Mode. CrowdStrike is addressing the problem.
Major Windows BSOD issue takes banks, airlines, and broadcasters offline
A global outage caused by a faulty update from CrowdStrike led to Windows machines experiencing Blue Screen of Death issues, affecting banks, airlines, and broadcasters worldwide. Recovery efforts are ongoing.
Global IT Collapse Puts Cyber Firm CrowdStrike in Spotlight
A faulty patch from CrowdStrike Holdings Inc. caused a global IT collapse, impacting various sectors. CrowdStrike's shares dropped by 15%, losing $8 billion. The incident emphasized the importance of endpoint protection software.
As in, what exactly is wrong in these C00000291-*.sys files that triggers the crash in csagent.sys, and why?
Seriously though - this entire outage is the poster child for why you NEVER have software that updates without explicit permission from a sysadmin. If I were in congress, I would make it illegal, it's an obvious national security issue.
They exist solely to tick the box. That’s it. Nobody who pushes for them gives a shit about security or anything that isn’t “our clients / regulators are asking for this box to be ticked”. The box is the problem. Especially when it’s affecting safety critical and national security systems. The box should not be tickable by such awful, high risk software. The fact that it is reflects poorly on the cybersecurity industry (no news to those on this forum of course, but news to the rest of the world).
I hope the company gets buried into the ground because of it. It’s time regulators take a long hard look at the dangers of these pretend turnkey solutions to compliance and we seriously evaluate whether they follow through on the intent of the specs. (Spoiler: they don’t)
If I can't commit code to our app without a branch, pull requests, code review...why can the infrastructure team just send shit out willy-nilly?
"Always allow new updates" must have been checked, or someone just goes through a dashboard and blindly clicks "Approve"
And the Crowdstrike CTO has either been given the ammunition to get __whatever they ask for, ever again__ with regard to appropriate allocation of resources for devops *or* they'll be fired (whether or not it's their fault).
And let me be very clear. This is absolutely, positively and wholly not the person that pressed the button's fault. Not even a little. At a company as integral as CrowdStrike, the number of mistakes and errors that had to have happened long before it got to "Joe the Intern Press Button" is huge and absurd. But many of us have been in (a much, much, *MUCH* smaller version of) Joe's shoes, and we know the gut sinking feeling that hits when something bad happens. A good company and team won't blame Joe and will do everything they can to protect Joe from the hilariously bad systemic issues that allowed this to happen.
I see my orgs SCCM admins have been consulted
I thought it was BSOD'ing on boot? I don't understand how this works. It auto-updates on boot? From the internet?
"the truth is everything is breaking all the time, everywhere, for everyone"
I feel like we are better off running open-source software. Everyone can see where the mistakes are instead of running around like a chicken with its head cut off.
Automatic updates should be considered harmful. At the minimum, there should be staged rollouts, with a significant gap (days) for issues to arise in the consumer case. Ideally, in the banks/hospitals/... example, their IT should be reading release notes and pushing the update only when necessary, starting with their own machines in a staged manner. As one 90ies IT guy I worked with used to say "you don't roll out a new Windows version before SP1 comes out"
SkyNet, according to the story, was a lot like CrowdStrike. This makes me think about how it could have broken out of its sandbox. Everybody is using AI coding assistants, automated test cases, automated integration testing and deployment. Its objective is to pass all the tests and deploy. But now it has learned economic and military effects, so it has to triage and optimize for those, at which point it starts controlling the machines it’s tasked with securing.
That sounds like there is either:
- some kind of upstream issue with deploying a fix (so most of the reboots are effectively no-ops relative to the fix)
- some kind of local reboot threshold before the system bypasses the bad driver file somehow.
The former I can see because of the complexity of update deployment on the internet, but if it's the latter then that's very non-deterministic behavior for local software.
I'm trying to understand how there is such a serious issue at this scale.
Pretty sure nothing will change though
Seems like people need to be at the physical box to fix and it's complex even then.
Seriously. Software should NOT be this bad that your fix begins with reboot up to X times.
- our IT wizard says the fixes wont work on lathes/CNC systems. we may need to ship the controllers back to the manufacturer in Wisconsin.
- AC is still not running. sent the apprentice to get fans from the shop floor.
- building security alarms are still blaring, need to get a ladder to clip the horns and sirens on the outside of the building. still cant disarm anything.
- still no phones. IT guy has set up two "emergency" phones...one is a literal rotary phone. stresses we still cannot call 911 or other offices. fire sprinklers will work, but no fire department will respond.
- no email, no accounting, nothing. I am going to the bank after this to pick up cash so i can make payday for 14 shop technicians. was warned the bank likely would either not have enough, or would not be able to process the account (if they open at all today.)
Consider the recent npm supply chain attack a few weeks ago, or the attempted SSH attack before that, or the solar winds attack before that.
This type of thing is institutionally supported, and in some cases when you’re working with with the government, practically required.
We’re going to see more of this.
Related
Latest Crowdstrike Update Causes Blue Screen of Death on Microsoft Windows
Crowdstrike update causes BSOD on Windows, affecting many users with various sensor versions. Company investigating, advises waiting for official Technical Alert for details and workarounds. Users urged to monitor forum for updates.
Cybersecurity platform Crowdstrike down worldwide, users logged out of systems
CrowdStrike, a cybersecurity platform, faced a global outage affecting users in countries like India, Japan, Canada, and Australia due to a technical error in its Falcon product. Users encountered disruptions, including BSOD errors. CrowdStrike is actively working on a fix.
CrowdStrike code update bricking PCs around the world
CrowdStrike's Falcon Sensor update triggers Windows crashes with Blue Screen of Death due to csagent.sys file issues. Workaround involves file deletion in Safe Mode. CrowdStrike is addressing the problem.
Major Windows BSOD issue takes banks, airlines, and broadcasters offline
A global outage caused by a faulty update from CrowdStrike led to Windows machines experiencing Blue Screen of Death issues, affecting banks, airlines, and broadcasters worldwide. Recovery efforts are ongoing.
Global IT Collapse Puts Cyber Firm CrowdStrike in Spotlight
A faulty patch from CrowdStrike Holdings Inc. caused a global IT collapse, impacting various sectors. CrowdStrike's shares dropped by 15%, losing $8 billion. The incident emphasized the importance of endpoint protection software.