CrowdStrike Timeline Mystery
On July 19, 2024, a faulty CrowdStrike update caused system crashes globally, affecting 8.5 million devices and leading to significant disruptions, including 5,000 canceled flights, prompting recovery efforts.
Read original articleOn July 19, 2024, a faulty update to the CrowdStrike Falcon sensor configuration for Windows systems caused widespread system crashes, affecting millions globally. The update, intended to enhance security, inadvertently triggered a logic error, leading to blue screens of death (BSOD) on impacted machines. Bitsight estimates that this incident resulted in a 15% to 20% drop in the number of systems connected to CrowdStrike Falcon servers. The timeline of events began with the release of the update at 04:09 UTC, followed by identification of the issue and a reversion of changes by 05:27 UTC. However, by then, approximately 8.5 million devices had already been affected, disrupting various sectors, including airlines and healthcare, with over 5,000 flights canceled.
CrowdStrike and Microsoft collaborated to provide remediation steps, requiring manual intervention to delete the faulty file from affected machines, complicating recovery efforts. Bitsight's analysis of traffic data revealed a significant drop in unique IPs contacting CrowdStrike servers, particularly after the update. Notably, a traffic spike was observed on July 16, three days prior to the outage, raising questions about potential correlations between these events. As organizations increasingly rely on external software, the incident underscores the importance of proper technology hygiene, including staged updates and operational disruption plans. Bitsight continues to investigate the traffic patterns and implications of this outage as organizations work to recover.
Related
Cybersecurity platform Crowdstrike down worldwide, users logged out of systems
CrowdStrike, a cybersecurity platform, faced a global outage affecting users in countries like India, Japan, Canada, and Australia due to a technical error in its Falcon product. Users encountered disruptions, including BSOD errors. CrowdStrike is actively working on a fix.
CrowdStrike fixes start at "reboot up to 15 times", gets more complex from there
A faulty update to CrowdStrike's Falcon security software caused Windows crashes, impacting businesses. Microsoft and CrowdStrike advise rebooting affected systems multiple times or restoring from backups to resolve issues. CrowdStrike CEO apologizes and promises support.
Technical Details on Today's Outage
CrowdStrike faced a temporary outage on July 19, 2024, caused by a sensor update on Windows systems, not a cyberattack. The issue affected some users but was fixed by 05:27 UTC. Systems using Falcon sensor for Windows version 7.11+ between 04:09-05:27 UTC might have been impacted due to a logic error from an update targeting malicious named pipes. Linux and macOS systems were unaffected. CrowdStrike is investigating the root cause and supporting affected customers.
Global CrowdStrike Outage Proves How Fragile IT Systems Have Become
A global software outage stemming from a faulty update by cybersecurity firm CrowdStrike led to widespread disruptions. The incident underscored the vulnerability of modern IT systems and the need for thorough testing.
Microsoft says 8.5M systems hit by CrowdStrike BSOD, releases USB recovery tool
Microsoft addressed issues caused by a faulty CrowdStrike security update affecting 8.5 million Windows systems. A USB recovery tool was released to delete the problematic file, emphasizing the need for thorough update testing.
> While we can not infer what the root cause of the change in traffic patterns on the 16th can be attributed to, it does warrant the foundational question of “Is there any correlation between the observations on the 16th and the outage on the 19th?”. As more details from the event emerge, Bitsight will continue investigating the data.
Interested to know how they're capturing sample data for IPs accessing Crowdstrike Falcon APIs and the corresponding packet data.
EDIT: Not to mention that they're able to distill their dataset to group IPs by their representative organizations. Since they have that info I feel a proper analysis would include actually analyzing which orgs (types, country of origin, etc) started dropping off starting on the 16th. Alas since this seems like just a marketing fluff piece we'll never get anything substantial :(
Just a random security company with a fluff piece with "CrowdStrike" in the title trying to get in the headlines.
The latter works better with organizations that release often and have reasonable surety that their updates are not going to cause disruption -- it becomes a normal part of the day, most commonly it causes no noticeable disruption at all, and thus it makes sense to not have to have eng / ops working late hours for the release. This surety can come from different ways, but the one I've seen is having a very methodical rollout with at least a smoke-test (affecting a very small subset of "production", not internal or lab machines, so in CRWD's case it would be customers' machines), and then rolling out to a random %age of machines starting with 1%, and depending on your level of confidence, some schedule that gets you to 100% before the end of business for your easternmost co-workers.
Some additional things to gain confidence can include a 1% rollout to a set of machines that is picked to ideally provide exposure to every type of machine in the fleet, and 100% rollout to customers who have agreed to be at the cutting edge (how you get them to accept that risk is an exercise for the reader, but maybe cut them a deal like 30% off their license).
The reason I'm curious about the distribution of channel file drops, for the case of Crowdstrike, is that if it's an atypically-timed release, that could indicate that it's a response to whatever caused the dip in traffic on the 16th mentioned in the Bitsight article.
Edit: From what I understand, Crowdstrike does have at least some segmentation of releases for the kernel extension, but it appears the configuration file / channel file updates seem to be "Oh well, fire ze missiles".
Not sure if that would make them more or less incompetent...
Something happened, the nature of that something might be unrelated to the BSOD crash. Could just be another piece of software doing an update at a different frequency that sometimes changes the timing of the crowdstrike update.
You'd need a longer term view of data searching for beat patterns to detect that.
If the something was a one-off effect, like admins taking sick days to watch the Euro final, I'm not sure how you could positively identify the cause.
It's incredibly creepy that they A) are collecting this much data from customers B) are comfy drilling into it by IP/organization and C) have enough spare time to do so for a marketing blog post.
Also, for god's sake, you're a company, you're supposed to look professional. If you're going to use AI art for your blog at least don't be lazy: load up Photopea and either fix the broken text or magic wand it out. It'll take you 5 minutes.
Related
Cybersecurity platform Crowdstrike down worldwide, users logged out of systems
CrowdStrike, a cybersecurity platform, faced a global outage affecting users in countries like India, Japan, Canada, and Australia due to a technical error in its Falcon product. Users encountered disruptions, including BSOD errors. CrowdStrike is actively working on a fix.
CrowdStrike fixes start at "reboot up to 15 times", gets more complex from there
A faulty update to CrowdStrike's Falcon security software caused Windows crashes, impacting businesses. Microsoft and CrowdStrike advise rebooting affected systems multiple times or restoring from backups to resolve issues. CrowdStrike CEO apologizes and promises support.
Technical Details on Today's Outage
CrowdStrike faced a temporary outage on July 19, 2024, caused by a sensor update on Windows systems, not a cyberattack. The issue affected some users but was fixed by 05:27 UTC. Systems using Falcon sensor for Windows version 7.11+ between 04:09-05:27 UTC might have been impacted due to a logic error from an update targeting malicious named pipes. Linux and macOS systems were unaffected. CrowdStrike is investigating the root cause and supporting affected customers.
Global CrowdStrike Outage Proves How Fragile IT Systems Have Become
A global software outage stemming from a faulty update by cybersecurity firm CrowdStrike led to widespread disruptions. The incident underscored the vulnerability of modern IT systems and the need for thorough testing.
Microsoft says 8.5M systems hit by CrowdStrike BSOD, releases USB recovery tool
Microsoft addressed issues caused by a faulty CrowdStrike security update affecting 8.5 million Windows systems. A USB recovery tool was released to delete the problematic file, emphasizing the need for thorough update testing.