July 22nd, 2024

Systemd Talks Up Automatic Boot Assessment in Light of the CrowdStrike Outage

In response to the CrowdStrike-Microsoft outage, systemd's lead developer, Lennart Poettering, promotes systemd's Automatic Boot Assessment for Linux systems. Despite its support, major distributions have not adopted it. Poettering stresses the importance of implementing such features for system security and recovery.

Read original articleLink Icon
Systemd Talks Up Automatic Boot Assessment in Light of the CrowdStrike Outage

In response to the recent CrowdStrike-Microsoft outage affecting Windows systems, systemd's lead developer, Lennart Poettering, highlighted the potential of systemd's Automatic Boot Assessment feature to prevent similar incidents on Linux systems. This functionality enables automatic reversion to a previous OS or kernel version in case of boot failures, providing easier recovery. Despite systemd's long-standing support for this feature, major Linux distributions have yet to adopt it. Poettering emphasized the importance of implementing boot counting and automatic fallback mechanisms as standard practices in modern systems to enhance security and robustness. He criticized commercial distros for not integrating this feature and highlighted the need for improved boot stack security. Those interested in learning more about systemd's Automatic Boot Assessment feature can find additional information on systemd.io.

Link Icon 6 comments
By @foresto - 9 months
Ever since systemd landed in the distros I use (Debian & family), that project has been by far the single most common cause of breakage on the systems for which I am responsible.

However good or bad the intentions and ideas here might be, the project has demonstrated many times over that it is not capable of reliably filling the roles to which it aspires. I'm not interested in extending its reach even further.

In short, no thanks.

By @rlpb - 9 months
Ubuntu uses grub recordfail and has done so for years. Userspace records if the boot was successful and the bootloader can use this to change it's behaviour (eg. boot a previous kernel). I think by default it stops at the boot menu for user intervention but on a headless system you can configure it to automatically boot the previous kernel/initrd instead. I think Debian has the same but I'm not sure.

Ubuntu Core (eg. for the upcoming immutable desktop) also supports this kind of thing fully automatically.

So it's not just systemd and the alternatives are widely deployed already.

But anyway as others point out it won't mitigate risk on a system that injects bad code from outside of the boot process.

By @greatgib - 9 months
Lennart is omitting to say that systemd is probably responsible for that much worse terrible crashes than CrowdStrike.

It's probably the worse thing to do to give the key to the boot kingdom to systemd, doing random things to your configuration when it feels so...

By @westurner - 9 months
> The only problem? Major Linux distributions aren't yet onboard with using the Automatic Boot Assessment feature.

systemd.io/AUTOMATIC_BOOT_ASSESSMENT/ : https://systemd.io/AUTOMATIC_BOOT_ASSESSMENT/

From https://news.ycombinator.com/item?id=29995566 :

> Which distro has the best out-of-the-box output for:?

  systemd-analyze security
> Is there a tool like `audit2allow` for systemd units?

And also automatic variance in boot sequences with timeouts.

Where does it explain that a systemd service unit is always failing at boot?

By @salawat - 9 months
Lennart is talking out of his ass, as the implication with CrowdStrike is that the Falcon module would have been running well for a while before the content update, and would only have fallen over once the agent looked for the new content update on system and tried to load it.

Any assessment systemd could have done would see failure to boot, and would either try to roll back to a kernel with an old agent module version, which would probably do the same thing, or go back to a kernel without the Crowd Strike Module at all if available.

Computers aren't magic. They don't know things. They can't bisect their own configuration or intuit what subcomponent caused what behavior. That's your job.