CrowdStrike's impact on aviation
On July 19, 2024, a CrowdStrike software update caused the largest IT outage, affecting 8.5 million Windows computers, disrupting services, and grounding flights for major airlines, particularly Delta and United.
Read original articleOn July 19, 2024, CrowdStrike released a software update that resulted in the largest IT outage in history, affecting approximately 8.5 million Windows computers. This incident disrupted critical services, including hospitals, banks, and emergency systems, while systems running on Linux, Macs, and mobile devices remained unaffected. The aviation sector experienced significant turmoil, with major airlines like Delta, United, and American Airlines grounding flights due to the outage. A timelapse video from FlightRadar24 illustrated the drastic reduction in air traffic, particularly among these airlines.
Data analysis revealed that Delta Air Lines was the most severely impacted, with a 46% reduction in flights, followed by United Airlines at 36% and American Airlines at 16%. Southwest Airlines, however, reported a slight increase in flights, attributed to its outdated Windows systems that were not affected by the update. Delta's prolonged recovery was linked to its lack of a robust disaster recovery plan, requiring manual fixes for numerous digital terminals, while other airlines had prepared plans that allowed for quicker restoration of services. American Airlines resumed normal operations by the end of the day following the outage, while United Airlines was back on track by Saturday morning. The incident highlighted vulnerabilities in IT infrastructure and the importance of having effective disaster recovery strategies in place for critical operations.
Related
Microsoft has serious questions to answer after the biggest IT outage in history
The largest IT outage in history stemmed from a faulty software update by CrowdStrike, impacting 70% of Windows computers globally. Mac and Linux systems remained unaffected. Concerns arise over responsibility and prevention measures.
2024 CrowdStrike incident: The largest IT outage in history
A faulty update by CrowdStrike led to a global computer outage affecting airlines, banks, hospitals, and government services. Over 3,200 flights were canceled, emphasizing the need for strong cybersecurity.
Southwest Air Saved from Global IT Outage Thanks to Never Upgrading from Win 3.1
Many airlines globally faced software outages due to a faulty update by CrowdStrike. Southwest Airlines, using Windows 3.1, avoided disruptions, prompting discussions on system reliability and modernization in aviation.
CrowdStrike global outage to cost US Fortune 500 companies $5.4B
A global technology outage from a faulty CrowdStrike update is estimated to cost US Fortune 500 companies $5.4 billion, affecting banking, healthcare, and airlines, with significant operational disruptions reported.
List of Companies Affected by the Global Microsoft-CrowdStrike Outage
On July 19, 2024, a software defect in CrowdStrike's Falcon sensor caused a global outage affecting 8.5 million Windows PCs, disrupting businesses across various sectors and highlighting the need for better cyber resilience.
- Many commenters express confusion over how social media platforms maintained functionality while critical systems failed, highlighting a perceived disparity in technology reliability.
- There is speculation about the reasons for Delta's slower recovery compared to other airlines, with some attributing it to inadequate disaster recovery plans.
- Several comments discuss the outdated technology used by some airlines, particularly the claim that Southwest Airlines operates on Windows 3.1, raising concerns about the risks of legacy systems.
- Commenters question the effectiveness of CrowdStrike's services and express skepticism about the company's future following the outage.
- There is a broader discussion about the need for modernizing IT infrastructure and the potential consequences of relying on outdated systems.
I read somewhere that their crew tracking software was hit hard and took time to recover. Will look for source on that.
(Edited) source: https://news.delta.com/update-delta-customers-ceo-ed-bastian
“… and in particular one of our crew tracking-related tools was affected and unable to effectively process the unprecedented number of changes triggered by the system shutdown…”
https://www.marketwatch.com/story/delta-hires-law-firm-seeki...
> To give you an idea of just how outdated this operating system is, Windows 3.1 was originally launched in 1992, and Microsoft ended support for it on December 31, 2001, except for the embedded version, which was officially retired in 2008.
I keep hearing the Windows 3.1 story repeated. I mean here it comes from TechRadar and even has the "Pro" in the name, they can't possibly make stuff up, right? But still don't quite believe it.
Can anyone working at Southwest confirm that their main scheduling system is running on Windows 3.1?
https://finance.yahoo.com/news/delta-air-lines-seek-compensa...
> Southwest wasn’t affected because they don’t use CrowdStrike
The article quotes https://www.reddit.com/r/delta/comments/1edtfbh/why_did_delt... (with improper attribution)
topgun966Platinum wrote on Reddit """ These "experts" are completely wrong. The core issue was Delta did NOT have a proper DR plan ready and did NOT have a proper IT business continuity plan ready. UA, AA, and F9 recovered so fast because they had plans on stand-by and engaged them immediately. After the SWA IT problem, UA and AA put in robust DR plans staged everywhere from the server farms, to cloud solutions, to end-user stations at airports. They had plans on how to recover systems. DL outsources a lot of their IT. UA and AA engaged those plans quickly. They did not hold back paying OT for staff. UA and AA have just as much reliance on Windows as Delta. AA was recovered by end of data Friday and resumed normal operations Saturday. UA was about 12 hours behind them having it resolved by Saturday morning resuming normal schedules Saturday afternoon. The ONUS is 100% on DL C+ level in their IT decisions. The problem is that the lower level IT staff is going to get the brunt of the blame and the consequences. """
From crowdstrike terms and services [1]: […] THERE IS NO WARRANTY THAT THE OFFERINGS OR CROWDSTRIKE TOOLS WILL BE ERROR FREE, OR THAT THEY WILL OPERATE WITHOUT INTERRUPTION OR WILL FULFILL ANY OF CUSTOMER’S PARTICULAR PURPOSES OR NEEDS. THE OFFERINGS AND CROWDSTRIKE TOOLS ARE NOT FAULT-TOLERANT AND ARE NOT DESIGNED OR INTENDED FOR USE IN ANY HAZARDOUS ENVIRONMENT REQUIRING FAIL-SAFE PERFORMANCE OR OPERATION. NEITHER THE OFFERINGS NOR CROWDSTRIKE TOOLS ARE FOR USE IN THE OPERATION OF AIRCRAFT NAVIGATION, NUCLEAR FACILITIES, COMMUNICATION SYSTEMS, WEAPONS SYSTEMS, DIRECT OR INDIRECT LIFE-SUPPORT SYSTEMS, AIR TRAFFIC CONTROL, OR ANY APPLICATION OR INSTALLATION WHERE FAILURE COULD RESULT IN DEATH, SEVERE PHYSICAL INJURY, OR PROPERTY DAMAGE. Customer agrees that it is Customer’s responsibility to ensure safe use of an Offering and the CrowdStrike Tools in such applications and installations. CROWDSTRIKE DOES NOT WARRANT ANY THIRD PARTY PRODUCTS OR SERVICES.
[1] section 8.6 of https://www.crowdstrike.com/terms-conditions/
At this point using windows for these tasks seems like using legacy software because training people to use an iPad or a web browser seems too complicated or because no one wants to move their age old systems to a more modern web based system because of costs. Native apps work great, but I think the world is moving to the cloud and that means web based everything should be the norm. Yes AWS AZURE outages can still happen but those can be fixed by spinning up a VM in different clouds.
This is also why software jobs aren’t going anywhere thanks for a while. Many systems need to be changed to more modern and robust clouds. It might take decades for this transformation across the globe.
People need to stop believing everything they read on the Internet and have a little bit of skepticism.
With all of the angry customers, lots of incoming lawsuits, and the fact that their "protection" is provably more costly than no protection at all now - I can't imagine why investors aren't dumping it like mad.
this is pretty damning both ways
on the one hand, it's insane, unfathomable and inconceivable that anyone can run anything critical on windows 3.1 (!!!)
on the other hand, it's equally insane, unfathomable and inconceivable that those who do are actually better off - 30 years of "progress" is actually just bs? what are we as an industry "even doing here"???? is computing actually a solved problem and we're really just mostly reinventing the wheel and enshittifying perfectly already working systems?
OMFG, does this mean we need to be prepared for a (juicy) “IT failure” that brings down Southwest at some point?
A lot of code lives on much longer than you think. The general attitude we took was that most of the code we were writing would be running for at least 30 years. And that was the attitude at an R&D branch, arguably a side of that industry where we were working on the new tech.
Edit: Win 3.1 or something else, the point still stands. There is a lot of old software running out there that will continue to run our core services. Legacy software doesn't just mean v1 versus v2, it can mean v1 versus v41.
The “ingenious” strategy saved them from a weeks worth of downtime this year. But that same “ingenious” strategy was the primary reason for their meltdown in 2022
[1] https://www.npr.org/2022/12/30/1146377342/5-things-to-know-a...
[2] https://www.nytimes.com/2022/12/28/travel/southwest-airlines...
Related
Microsoft has serious questions to answer after the biggest IT outage in history
The largest IT outage in history stemmed from a faulty software update by CrowdStrike, impacting 70% of Windows computers globally. Mac and Linux systems remained unaffected. Concerns arise over responsibility and prevention measures.
2024 CrowdStrike incident: The largest IT outage in history
A faulty update by CrowdStrike led to a global computer outage affecting airlines, banks, hospitals, and government services. Over 3,200 flights were canceled, emphasizing the need for strong cybersecurity.
Southwest Air Saved from Global IT Outage Thanks to Never Upgrading from Win 3.1
Many airlines globally faced software outages due to a faulty update by CrowdStrike. Southwest Airlines, using Windows 3.1, avoided disruptions, prompting discussions on system reliability and modernization in aviation.
CrowdStrike global outage to cost US Fortune 500 companies $5.4B
A global technology outage from a faulty CrowdStrike update is estimated to cost US Fortune 500 companies $5.4 billion, affecting banking, healthcare, and airlines, with significant operational disruptions reported.
List of Companies Affected by the Global Microsoft-CrowdStrike Outage
On July 19, 2024, a software defect in CrowdStrike's Falcon sensor caused a global outage affecting 8.5 million Windows PCs, disrupting businesses across various sectors and highlighting the need for better cyber resilience.