July 29th, 2024

CrowdStrike's impact on aviation

On July 19, 2024, a CrowdStrike software update caused the largest IT outage, affecting 8.5 million Windows computers, disrupting services, and grounding flights for major airlines, particularly Delta and United.

Read original articleLink Icon
FrustrationDisbeliefConcern
CrowdStrike's impact on aviation

On July 19, 2024, CrowdStrike released a software update that resulted in the largest IT outage in history, affecting approximately 8.5 million Windows computers. This incident disrupted critical services, including hospitals, banks, and emergency systems, while systems running on Linux, Macs, and mobile devices remained unaffected. The aviation sector experienced significant turmoil, with major airlines like Delta, United, and American Airlines grounding flights due to the outage. A timelapse video from FlightRadar24 illustrated the drastic reduction in air traffic, particularly among these airlines.

Data analysis revealed that Delta Air Lines was the most severely impacted, with a 46% reduction in flights, followed by United Airlines at 36% and American Airlines at 16%. Southwest Airlines, however, reported a slight increase in flights, attributed to its outdated Windows systems that were not affected by the update. Delta's prolonged recovery was linked to its lack of a robust disaster recovery plan, requiring manual fixes for numerous digital terminals, while other airlines had prepared plans that allowed for quicker restoration of services. American Airlines resumed normal operations by the end of the day following the outage, while United Airlines was back on track by Saturday morning. The incident highlighted vulnerabilities in IT infrastructure and the importance of having effective disaster recovery strategies in place for critical operations.

AI: What people are saying
The comments reflect a range of opinions and insights regarding the CrowdStrike software update outage and its implications for technology and airlines.
  • Many commenters express confusion over how social media platforms maintained functionality while critical systems failed, highlighting a perceived disparity in technology reliability.
  • There is speculation about the reasons for Delta's slower recovery compared to other airlines, with some attributing it to inadequate disaster recovery plans.
  • Several comments discuss the outdated technology used by some airlines, particularly the claim that Southwest Airlines operates on Windows 3.1, raising concerns about the risks of legacy systems.
  • Commenters question the effectiveness of CrowdStrike's services and express skepticism about the company's future following the outage.
  • There is a broader discussion about the need for modernizing IT infrastructure and the potential consequences of relying on outdated systems.
Link Icon 31 comments
By @feyman_r - 4 months
>> Why were other airlines able to get back to normal so much faster than Delta?

I read somewhere that their crew tracking software was hit hard and took time to recover. Will look for source on that.

(Edited) source: https://news.delta.com/update-delta-customers-ceo-ed-bastian

“… and in particular one of our crew tracking-related tools was affected and unable to effectively process the unprecedented number of changes triggered by the system shutdown…”

By @Zigurd - 4 months
I would like to know if a solid, up to date, well-rehearsed disaster recovery plan saved anyone's butt, or if we're all just raw dogging our machines whether IT is paying for backup and recovery or not?
By @pimlottc - 4 months
One thing I don't understand from these graphs - why was there a relative uptick in takeoffs starting a short time /before/ the CrowdStrike update was pushed? It's in the overall graph, as well as the graphs for United, American, and especially Delta. I can't think of any reason for this, maybe it's just random noise, or maybe there was something unusual about the previous week at the same time?
By @jandrusk - 4 months
Will be most interesting how this lawsuit by Delta plays out against Microsoft & Crowdstrike:

https://www.marketwatch.com/story/delta-hires-law-firm-seeki...

By @rdtsc - 4 months
From the included link: https://www.techradar.com/pro/security/southwest-airlines-av...

> To give you an idea of just how outdated this operating system is, Windows 3.1 was originally launched in 1992, and Microsoft ended support for it on December 31, 2001, except for the embedded version, which was officially retired in 2008.

I keep hearing the Windows 3.1 story repeated. I mean here it comes from TechRadar and even has the "Pro" in the name, they can't possibly make stuff up, right? But still don't quite believe it.

Can anyone working at Southwest confirm that their main scheduling system is running on Windows 3.1?

By @camillomiller - 4 months
Berlin Brandenburg got hit hard. As a disgruntled BER user, I am NOT surprised they had one of the worse repercussion.
By @beambot - 4 months
Lawsuits inbound. Delta appears to be gearing up for one already:

https://finance.yahoo.com/news/delta-air-lines-seek-compensa...

By @bandyaboot - 4 months
Anyone know why Minneapolis-St Paul began experiencing cancellations much earlier than other US airports?
By @miohtama - 4 months
How to avoid getting rekt

> Southwest wasn’t affected because they don’t use CrowdStrike

By @roshankhan28 - 4 months
i really dont understand, how can my social media have better backup and infrastructure as compared to an OS which is being used by worldwide?
By @firtoz - 4 months
Is there a similar global analysis?
By @jijji - 4 months
basically any airline using linux is not on that list
By @aftbit - 4 months
One interesting feature of this outage was that "PROD" was generally fine, on account of mostly running on Linux and/or ancient proprietary software, while "CORP" was generally wrecked, on account of mostly running Windows. In other words, the bank systems responsible for moving money mostly worked, while the systems responsible for allowing humans to interact with them (to issue approvals, change configuration, or other ops things) often did not.
By @mjevans - 4 months
Outsourcing a core business competency and surely also cutting the contracts to the bone as well to pocket the savings embrittled Delta and I seriously hope the compensation to customers costs more than any savings or profits they made in the interim. It MUST be painful enough that they do not repeat this mistake again.

The article quotes https://www.reddit.com/r/delta/comments/1edtfbh/why_did_delt... (with improper attribution)

topgun966Platinum wrote on Reddit """ These "experts" are completely wrong. The core issue was Delta did NOT have a proper DR plan ready and did NOT have a proper IT business continuity plan ready. UA, AA, and F9 recovered so fast because they had plans on stand-by and engaged them immediately. After the SWA IT problem, UA and AA put in robust DR plans staged everywhere from the server farms, to cloud solutions, to end-user stations at airports. They had plans on how to recover systems. DL outsources a lot of their IT. UA and AA engaged those plans quickly. They did not hold back paying OT for staff. UA and AA have just as much reliance on Windows as Delta. AA was recovered by end of data Friday and resumed normal operations Saturday. UA was about 12 hours behind them having it resolved by Saturday morning resuming normal schedules Saturday afternoon. The ONUS is 100% on DL C+ level in their IT decisions. The problem is that the lower level IT staff is going to get the brunt of the blame and the consequences. """

By @skrebbel - 4 months
I love that “CrowdStrike” is now a synonym for “global outage”. Not some cute hihi name like “heartbleed”, just the name of the company that did the screwup. Seems fair.
By @ks1723 - 4 months
I found it quite interesting, that crowdstrike actually exclude a bunch of services explicitly. They also basically say, don’t use, if it needs to be reliable. I don’t know if this is standard for software, but for me this was quite surprising.

From crowdstrike terms and services [1]: […] THERE IS NO WARRANTY THAT THE OFFERINGS OR CROWDSTRIKE TOOLS WILL BE ERROR FREE, OR THAT THEY WILL OPERATE WITHOUT INTERRUPTION OR WILL FULFILL ANY OF CUSTOMER’S PARTICULAR PURPOSES OR NEEDS. THE OFFERINGS AND CROWDSTRIKE TOOLS ARE NOT FAULT-TOLERANT AND ARE NOT DESIGNED OR INTENDED FOR USE IN ANY HAZARDOUS ENVIRONMENT REQUIRING FAIL-SAFE PERFORMANCE OR OPERATION. NEITHER THE OFFERINGS NOR CROWDSTRIKE TOOLS ARE FOR USE IN THE OPERATION OF AIRCRAFT NAVIGATION, NUCLEAR FACILITIES, COMMUNICATION SYSTEMS, WEAPONS SYSTEMS, DIRECT OR INDIRECT LIFE-SUPPORT SYSTEMS, AIR TRAFFIC CONTROL, OR ANY APPLICATION OR INSTALLATION WHERE FAILURE COULD RESULT IN DEATH, SEVERE PHYSICAL INJURY, OR PROPERTY DAMAGE. Customer agrees that it is Customer’s responsibility to ensure safe use of an Offering and the CrowdStrike Tools in such applications and installations. CROWDSTRIKE DOES NOT WARRANT ANY THIRD PARTY PRODUCTS OR SERVICES.

[1] section 8.6 of https://www.crowdstrike.com/terms-conditions/

By @bustling-noose - 4 months
> The outage highlighted a different kind of digital divide. On one side, gmail, Facebook, and Twitter kept running, letting us post photos of blue screens located on the other side: the Windows machines responsible for actually doing things in the world like making appointments, opening accounts, and dispatching police.

At this point using windows for these tasks seems like using legacy software because training people to use an iPad or a web browser seems too complicated or because no one wants to move their age old systems to a more modern web based system because of costs. Native apps work great, but I think the world is moving to the cloud and that means web based everything should be the norm. Yes AWS AZURE outages can still happen but those can be fixed by spinning up a VM in different clouds.

This is also why software jobs aren’t going anywhere thanks for a while. Many systems need to be changed to more modern and robust clouds. It might take decades for this transformation across the globe.

By @otterley - 4 months
It blows my mind how many people actually believed the claim -- clearly in the obvious-joke category -- that SWA is running their mission critical flight systems on Windows 3.1. (Yes, Southwest runs a lot of old tech in their stack, but that claim is patently hyperbolic.)

People need to stop believing everything they read on the Internet and have a little bit of skepticism.

By @nostromo - 4 months
It's insane to me that CrowdStrike's stock is still up 66% year-over-year.

With all of the angry customers, lots of incoming lawsuits, and the fact that their "protection" is provably more costly than no protection at all now - I can't imagine why investors aren't dumping it like mad.

By @azinman2 - 4 months
What this also tells me is there are a lot of computers connected to the internet that probably shouldn’t be.
By @jujube3 - 4 months
Sounds like we saved a lot of tons of CO2.
By @bedobi - 4 months
> Apparently Southwest Airlines’ ingenious strategy of never upgrading from Windows 3.1 allowed it to remain unscathed.

this is pretty damning both ways

on the one hand, it's insane, unfathomable and inconceivable that anyone can run anything critical on windows 3.1 (!!!)

on the other hand, it's equally insane, unfathomable and inconceivable that those who do are actually better off - 30 years of "progress" is actually just bs? what are we as an industry "even doing here"???? is computing actually a solved problem and we're really just mostly reinventing the wheel and enshittifying perfectly already working systems?

By @ssivark - 4 months
> Apparently Southwest Airlines’ ingenious strategy of never upgrading from Windows 3.1 allowed it to remain unscathed.

OMFG, does this mean we need to be prepared for a (juicy) “IT failure” that brings down Southwest at some point?

By @knappe - 4 months
For everyone flabbergasted by Southwest running ~Windows 3.1~ old software, I have bad news about the telecom industry. I worked at Ericsson at an R&D branch and one of the projects in the works was to move one of the main pieces of routing equipment that handled millions of telephony operatorations a day away from an ancient version of Windows.

A lot of code lives on much longer than you think. The general attitude we took was that most of the code we were writing would be running for at least 30 years. And that was the attitude at an R&D branch, arguably a side of that industry where we were working on the new tech.

Edit: Win 3.1 or something else, the point still stands. There is a lot of old software running out there that will continue to run our core services. Legacy software doesn't just mean v1 versus v2, it can mean v1 versus v41.

By @xyst - 4 months
> Apparently Southwest Airlines’ ingenious strategy of never upgrading from Windows 3.1 allowed it to remain unscathed.

The “ingenious” strategy saved them from a weeks worth of downtime this year. But that same “ingenious” strategy was the primary reason for their meltdown in 2022

[1] https://www.npr.org/2022/12/30/1146377342/5-things-to-know-a...

[2] https://www.nytimes.com/2022/12/28/travel/southwest-airlines...