Microsoft technical breakdown of CrowdStrike incident
The blog discusses a CrowdStrike outage caused by a memory safety issue with the CSagent driver, emphasizing the importance of Windows' security features and future enhancements for better security integration.
Read original articleWindows is a widely used platform for businesses requiring high security and availability. It offers various operating modes that allow users to restrict execution to approved software and drivers, enhancing security and reliability. Users can utilize integrated security monitoring and detection features or opt for third-party solutions from a diverse ecosystem of vendors. The blog discusses the recent CrowdStrike outage, attributing it to a memory safety issue linked to the CSagent driver. Microsoft analyzed the incident using Windows Error Reporting (WER) kernel crash dumps, identifying global crash patterns and specific faulting details related to the CSagent module. The analysis revealed a read out-of-bounds access violation, which was traced back to a failure in handling memory access correctly. The blog emphasizes the importance of leveraging Windows' integrated security capabilities to improve overall security and reliability. It also highlights future enhancements in Windows' extensibility for security products, aiming to provide better support for both customers and security vendors. The discussion includes technical insights into the crash dump analysis, showcasing the use of Microsoft WinDBG Kernel Debugger for troubleshooting. This incident serves as a reminder of the critical need for robust security practices and the potential risks associated with third-party security solutions. By understanding these challenges, organizations can better integrate and manage their security tools within the Windows environment, ensuring a more secure operational framework.
Related
Crashes and Competition
The article explores Windows OS design, kernel access impact on security firms, CrowdStrike crash consequences, Microsoft's limitations due to agreements, and regulatory implications for system security and functionality balance.
Microsoft calls for Windows changes and resilience after CrowdStrike outage
Microsoft is reconsidering security vendor access to the Windows kernel after a CrowdStrike update outage affected 8.5 million PCs, emphasizing the need for improved resilience and collaboration in security practices.
- Many commenters question how the faulty driver passed Microsoft's quality control and WHQL verification processes.
- There is a strong sentiment that Microsoft shares responsibility for the incident, with suggestions for better user-mode security implementations.
- Some commenters express skepticism about CrowdStrike's technical competency and the influence of non-technical leadership.
- Legal implications and potential negligence lawsuits against CrowdStrike and Microsoft are discussed.
- Several users advocate for improved user feedback mechanisms in Windows to prevent similar issues in the future.
> Providing safe rollout guidance, best practices, and technologies to make it safer to perform updates to security products.
> Reducing the need for kernel drivers to access important security data.
They are being as diplomatic as they can, but it's definitely a slap to CS. Read as "they don't know how to roll things out, they need guidance on basic QA practices, we'll happily teach them...". Then, they list a set of facilities running in user-mode to avoid needing to run as many things in kernel mode.
I would be interested what the water cooler discussion about CS was like inside Microsoft. Especially in teams needed to respond to customers about "Your windows OS is broken, our hospital patients are suffering...".
This post explains why security software has historically run in kernel-mode, and really seems to be pushing new technology that Microsoft has that would push security vendors into user-mode (with APIs that attempt to assist with many of the reasons why they have historically used kernel-mode).
Crowdstrike already runs in user-mode on both Mac and Linux (from what I can tell), and it seems like running in user-mode on Windows would significantly lessen the risk of catastrophic failures like a blue-screen-of-death. I know the bulk of the failures here belong to CrowdStrike, but I can't help but think about the fact that Apple kicked security vendors out of kernel-mode a ways back, and that if Windows had done similarly, an issue like this probably wouldn't have been possible. By even offering kernel-mode options to external vendors, I believe Microsoft is creating risk for themselves.
That's the sort of thing a negligence lawyer focuses on. Partner at Brown Rudrick: "The most likely legal theory will be one of negligence. [Congress] will drag the guy over the coals, they'll maybe implicate him and his company and put in place a negligence action. There'll maybe be a couple of plaintiffs lawyers who dig up some exceptional theory on negligence, and get some class action lawsuits going. Again, we still don't know all the facts in this case, and there are other dimensions which have not yet been fully explored, including how CrowdStrike had access to kernel level updates on the Microsoft operating system? How come Microsoft didn't have any control over these updates being pushed on their kernel?"
The first two class actions are already starting.
[1] https://learn.microsoft.com/en-us/windows-hardware/drivers/d...
[2] https://www.channele2e.com/analysis/crowdstrike-legal-and-li...
Yep. You just have to pretend that everyone who deployed Windows had an actual competitive choice available to them.
> A second benefit of loading into kernel mode is tamper resistance.
I guess availability is negotiable after all.
If you want to decide which OS/distros to avoid for critical stuff, look to see who's learning from the incident (even if not bitten by it) compared to those saying "it wasn't our fault" (and that's not just MS).
?!?!
But they didn’t.
And Microsoft, I argue, also has blood on their hands for every hospital this hit. Giving users a prompt to disable the driver, after three successive failed boots, would have saved lives.
a simple plastic covering of your new dyson has more legal scrutiny and action (see the "children may choke" warnings they all need to come with) than software that we otherwise block in the name of "national security".
given how much overvalued tech companies are in this region, i believe it is high time to start legally recognizing the real-life impact of digital tech. to hell with the "but muh innovation" argument.
You are the clown's of the world, that's all ... xD
Related
Crashes and Competition
The article explores Windows OS design, kernel access impact on security firms, CrowdStrike crash consequences, Microsoft's limitations due to agreements, and regulatory implications for system security and functionality balance.
Microsoft calls for Windows changes and resilience after CrowdStrike outage
Microsoft is reconsidering security vendor access to the Windows kernel after a CrowdStrike update outage affected 8.5 million PCs, emphasizing the need for improved resilience and collaboration in security practices.