September 25th, 2024

How to avoid a BSOD on your 2B dollar spacecraft

Engineers faced challenges during testing a $2 billion spacecraft due to a Blue Screen of Death incident caused by incorrect memory addresses, requiring careful rebooting of systems after 12 hours of troubleshooting.

Read original articleLink Icon
ConfusionConcernSkepticism
How to avoid a BSOD on your 2B dollar spacecraft

The article discusses the challenges faced during the testing of a $2 billion spacecraft, particularly focusing on a critical incident involving a Blue Screen of Death (BSOD) scenario. During Closed Loop Tests (CLTs), the spacecraft encountered an unexplainable error due to the use of incorrect memory addresses that were meant for a different spacecraft. This led to a crash of the Enhanced Remote Interface Units (ERIUs), which are essential for communication between the onboard computer and various sensors. The engineers faced two main problems: the inability to command subsystems to power down and the lack of active telemetry, which raised concerns about the vehicle's state. To resolve the issue, they needed to reboot the ERIU, which required rebooting the entire onboard computer—a process fraught with risks, including the potential for the spacecraft to enter safemode. After careful planning and execution, the team successfully powered down the vehicle after 12 hours of troubleshooting. The author reflects on the experience, noting the importance of teamwork and the lack of recognition for their efforts, emphasizing that effective problem-solving can often go unnoticed.

- The spacecraft testing involved complex Closed Loop Tests (CLTs) to assess system responses.

- Incorrect memory addresses led to a crash of the Enhanced Remote Interface Units (ERIUs).

- Engineers faced significant challenges in rebooting the onboard computer without triggering safemode.

- Successful resolution required collaboration and careful execution of commands.

- The incident highlighted the importance of thorough documentation and teamwork in aerospace engineering.

AI: What people are saying
The comments on the article reveal several key points regarding the spacecraft's operating system and the incident described.
  • Many commenters clarify that the spacecraft does not run Windows, but rather a custom operating system designed for its specific needs.
  • There is concern about the lack of a thorough root cause analysis following the incident, raising questions about safety protocols.
  • Some commenters speculate on the potential causes of the incident, including possible inexperienced programming or inadequate testing.
  • Several users express disbelief at the idea of using Windows in a spacecraft context, emphasizing the importance of reliability in space missions.
  • There is a general call for better communication and transparency regarding the incident and its implications for future missions.
Link Icon 16 comments
By @linebeck - 7 months
Author here: I should clarify the satellite is not running Windows. Instead, it’s running its own custom OS written in C called Flight Software (FSW) specifically designed for the satellite onboard computer.

Re-reading the post, I see how the title, my analogies, and poor attempts at humor would give the incorrect description of what’s happening with the satellite when it enters safemode. I’ll amend the post soon.

Thanks for the feedback, I’ll be better next time.

By @GlenTheMachine - 7 months
Thee are a bunch of comments here asking why one would run Windows on a spacecraft.

I am a spacecraft engineer. I don’t see anything in the linked article indicating that they are actually running Windows - the BSOD claim is tongue-in-cheek, or at least that’s how I read it. I also don’t know of anyone anywhere that runs Windows on a spacecraft, with the exception of laptops used by astronauts. Typically one runs vxWorks, or maybe QNX. Some experimental (high risk, low cost) systems run Linux. Older spacecraft don't run any OS at all, everything is running on bare metal, and that may be true for a handful of current spacecraft as well.

Windows is used in some places by ground controllers, but these days they tend to be running Linux a lot more often.

By @pif - 7 months
By @jesprenj - 7 months
Was the spacecraft from the event described in the article an actual spacecraft in space or a simulation of a space mission on the ground?
By @PoignardAzur - 7 months
> I think what surprised me the most was how nonchalant the response was. We had documented all of our actions, so other people had read what happened and knew something had gone on. I wasn’t expecting any fanfare but we weren’t even debriefed on what happened.

That's... Concerning. No root cause analysis? Not even an internal one?

By @rdist - 7 months
And here I thought we were going to rehash Crowdstrike ;-)
By @jwrallie - 7 months
I would bet the schedule didn't allow much time to doing subsystem level test with on-board computer, so everyone went to the big test praying for the best.

That or inexperienced programmers were involved, assuming they were not scared of modifying memory addresses directly.

As for the safe-mode, if it happened maybe you could say you were randomly injecting errors in the memory during runtime and spacecraft entered safe mode as expected, would not be far off from the truth, just do not mention it was unintended :)

By @LorenPechtel - 7 months
Why is it using memory-mapped stuff in the first place rather than some sort of messaging system that would allow more defensive programming?
By @joelkevinjones - 7 months
As much as I hate writing "getter" functions for referencing global variables, I would when I knew I didn't have the right address yet. Write them first to error out loudly, then when you have the actual addresses replace the error out code.
By @egberts1 - 7 months
You can always run Minix3 which basically keeps on running after a kernel OOPS.
By @bronlund - 7 months
Clickbait. Unlike british missile submarines, they are not using Windows.
By @farceSpherule - 7 months
Or you can avoid contracting with Boeing.
By @dangoodmanUT - 7 months
Step 1: Use linux
By @sharpshadow - 7 months
One must have balls of steel to run windows on a spaceship.