Microsoft has serious questions to answer after the biggest IT outage in history
The largest IT outage in history stemmed from a faulty software update by CrowdStrike, impacting 70% of Windows computers globally. Mac and Linux systems remained unaffected. Concerns arise over responsibility and prevention measures.
Read original articleThe article discusses what could potentially be the largest IT outage in history caused by a faulty software update from cybersecurity company CrowdStrike affecting over 70% of the world's desktop computers running on Microsoft Windows. The economic impact of this outage is significant, with global disruption reported. Fortunately, other software families like Mac and Linux systems were not affected. Questions are raised regarding CrowdStrike's responsibility and Microsoft's measures to prevent such outages in the future. The incident highlights the risks of over-reliance on a single system and the importance of redundancy in critical digital infrastructure. While emergency services and essential sectors seem to have weathered the outage, the event prompts a broader discussion on the reliability of the software underpinning global operations.
Related
Worldwide BSOD outage caused by Crowdstrike
A widespread IT outage affects Australian institutions and global companies due to a software issue with Crowdstrike. Major sectors experience disruptions, with ongoing efforts to resolve the outages.
Cybersecurity platform Crowdstrike down worldwide, users logged out of systems
CrowdStrike, a cybersecurity platform, faced a global outage affecting users in countries like India, Japan, Canada, and Australia due to a technical error in its Falcon product. Users encountered disruptions, including BSOD errors. CrowdStrike is actively working on a fix.
Microsoft outage: Chaos as internet down and flights grounded around the world
A global IT outage, possibly linked to Crowdstrike antivirus software, caused chaos worldwide. Windows crashes affected sectors like healthcare and transportation. Crowdstrike's shares dropped. Various services faced disruptions, prompting calls for system modernization.
Major Windows BSOD issue takes banks, airlines, and broadcasters offline
A global outage caused by a faulty update from CrowdStrike led to Windows machines experiencing Blue Screen of Death issues, affecting banks, airlines, and broadcasters worldwide. Recovery efforts are ongoing.
Microsoft/Crowdstrike outage ground planes, banks and the London Stock Exchange
A cybersecurity program update failure caused global disruptions affecting businesses and services like United Airlines, McDonald’s, and the London Stock Exchange. Microsoft and CrowdStrike faced issues, but the problem was resolved without a cyberattack. CrowdStrike's shares dropped 20%, and Microsoft's fell 2.9%. The incident, involving Windows and security software, is one of the largest IT outages, surpassing past disruptions.
I guess the only question they could answer is why they don't provide a framework like Apple do with Endpoint Security for third-party vendors to use.
there fixed it for you
MS can’t prevent a software vendor from breaking the machine.
But one thing I don't get about this: what was the role of the enterprise admins?
Most administrators at large companies are cautious about rolling out new software versions to their employees. They (normally?) test before broad deployment.
Seems like one of three things would have had to have happened for this to be missed:
1. Admins ignored testing this update prior to enterprise rollout.
2. Crowdstrike forced the update on unwilling users.
3. Crowdstrike does not provide a framework for such pre-rollout testing, and enterprises chose to use it anyway.
Can anyone offer insight?
[Disclosure: I'm a Microsoft employee, but not an enterprise admin]
Reporting from the US and elsewhere seems to be a bit more on point. Is it just because the Brits went to press earlier in the day before the problem was understood?
- why automatic, silent upgrades
- why no boot environment/generations at boot to reboot into a previous snapshot of the system (since nfts do have snapshots indeed), meaning why no integration between the storage and the system management
- why massive rollout instead of partitioned testing rollout slowly propagating
For the rest is a third party tool, not mandated by the vendor so... It's a user choice.
Near complete vertical integration of security, like with Apple.
> A software update from cybersecurity company CrowdStrike has now taken a large number of those machines offline.
So Tom opens the article with the admission that it is CrowdStrike, not Microsoft.
> Thankfully, the update that caused the Microsoft meltdown did not affect these other software families - if it had, the impacts could have been catastrophic.
This is such a strawman (like the rest of the article honestly) I don't know where to begin. Inflammatory language.
A fucking "meltdown"? A meltdown of Microsoft, no less? Putting aside the fact that Microsoft and Windows are not the same thing, it is again nothing "meltdown" like that Microsoft did or could do.
> There are serious questions of course for CrowdStrike. As a leading provider of security software for large companies like Microsoft.
Tom, was you paid by CrowdStrike or what? What do you mean "of course"? It is literally the only party that should be answering questions here. I suspect that even if Tom were to "question" Microsoft their answer about kernels, drivers, privileges, and how shipping seemingly untested code into the core of an operating system is a bad idea wouldn't even be comprehendible for him.
> The situation may also lead to calls from Microsoft users about what more the company could do to ensure products made for their software aren't going to cause major outages like this one.
This is getting absurd now, and I just can't give more energy to this. OK, it could now insist only memory safe languages such as Rust are allowed for drivers. Or outright permanently blacklisting drivers from certain vendors. The bitching and moaning from manufacturers would then, of course, have people like Tom writing articles like "Microsoft is making manufacturers lives harder, think of the poor IT professionals!".
> Any engineer will tell you over-reliance on one system leaves you open to a "single point of failure". Critical digital infrastructure has to have redundancy - back up systems - built in to ensure it is resilient.
Please Tom, tell us more on your thoughts about memory safe languages, failure recovery modes, the unikernel vs microkernel debate, and how it's just a simple matter of overnight making operating systems "not a single point of failure".
This entire article is some kind of exercise in trying to get everything wrong while meeting a minimum word count, and I bet with some ChatGPT thrown in there too.
I flagged this post because I think it's far below even the minimum quality level for HN. It's outright clickbait drivel.
Related
Worldwide BSOD outage caused by Crowdstrike
A widespread IT outage affects Australian institutions and global companies due to a software issue with Crowdstrike. Major sectors experience disruptions, with ongoing efforts to resolve the outages.
Cybersecurity platform Crowdstrike down worldwide, users logged out of systems
CrowdStrike, a cybersecurity platform, faced a global outage affecting users in countries like India, Japan, Canada, and Australia due to a technical error in its Falcon product. Users encountered disruptions, including BSOD errors. CrowdStrike is actively working on a fix.
Microsoft outage: Chaos as internet down and flights grounded around the world
A global IT outage, possibly linked to Crowdstrike antivirus software, caused chaos worldwide. Windows crashes affected sectors like healthcare and transportation. Crowdstrike's shares dropped. Various services faced disruptions, prompting calls for system modernization.
Major Windows BSOD issue takes banks, airlines, and broadcasters offline
A global outage caused by a faulty update from CrowdStrike led to Windows machines experiencing Blue Screen of Death issues, affecting banks, airlines, and broadcasters worldwide. Recovery efforts are ongoing.
Microsoft/Crowdstrike outage ground planes, banks and the London Stock Exchange
A cybersecurity program update failure caused global disruptions affecting businesses and services like United Airlines, McDonald’s, and the London Stock Exchange. Microsoft and CrowdStrike faced issues, but the problem was resolved without a cyberattack. CrowdStrike's shares dropped 20%, and Microsoft's fell 2.9%. The incident, involving Windows and security software, is one of the largest IT outages, surpassing past disruptions.