June 21st, 2024

CRIU, a project to implement checkpoint/restore functionality for Linux

CRIU is a Linux tool for freezing and saving container/application states, enabling live migration and snapshots. Integrated into software like Docker, it offers CLI, RPC, and C API for checkpointing. Various resources and events showcase its capabilities and development progress.

Read original article

CRIU, a project to implement checkpoint/restore functionality for Linux

CRIU, or Checkpoint/Restore In Userspace, is a Linux software that enables freezing a running container or application and saving its state to disk for later restoration. This functionality allows for live migration, snapshots, remote debugging, and more. Initially a Virtuozzo project, CRIU has evolved with community support and is now integrated into various software like OpenVZ, LXC/LXD, Docker, and Podman. Users can access CRIU through different installation methods and utilize its CLI, RPC, and C API for checkpoint/restore operations. The project also offers usage scenarios, troubleshooting guides, and information on what can and cannot be checkpointed. For those interested in development, there are resources available such as mailing lists, image file format descriptions, and plugins. CRIU's capabilities are showcased through events like CloudNativeSecurityCon and Open Source Summit, highlighting topics such as container checkpointing in Kubernetes and GPU-accelerated containers. Additionally, the project provides insights into upcoming features, ongoing development tasks, and external articles related to checkpointing technologies.

FreeBSD Bhyve Companion Tools

The author details transitioning from VirtualBox to FreeBSD Bhyve, praising Bhyve's benefits in a FreeBSD setting. Tools like VNC connection and pause/resume scripts optimize Bhyve operations, simplifying VM management.

I found an 8 years old bug in Xorg

An 8-year-old Xorg bug related to epoll misuse was found by a picom developer. The bug caused windows to disappear during server lock, traced to CloseDownClient events. Despite limited impact, the developer seeks alternative window tree updates, emphasizing testing and debugging tools.

SquirrelFS: Using the Rust compiler to check file-system crash consistency

The paper introduces SquirrelFS, a crash-safe file system using Rust's typestate pattern for compile-time operation order enforcement. Synchronous Soft Updates ensure crash safety by maintaining metadata update order. SquirrelFS offers correctness guarantees without separate proofs, quickly verifying crash consistency during compilation. Comparative evaluations show SquirrelFS performs similarly or better than NOVA and WineFS.

Writing an IR from Scratch and survive to write a post

Eduardo Blázquez developed an Intermediate Representation (IR) for the Kunai Static Analyzer during his PhD, aiming to enhance Dalvik bytecode analysis. The project, shared on GitHub and published in SoftwareX, transitioned to Shuriken. Blázquez drew inspiration from Triton and LLVM, exploring various IR structures like ASTs and CFGs. MjolnIR, Kunai's IR, utilized a Medium Level IL design with control-flow graphs representing methods. Blázquez's approach involved studying compiler design resources.

Homegrown Rendering with Rust

Embark Studios develops a creative platform for user-generated content, emphasizing gameplay over graphics. They leverage Rust for 3D rendering, introducing the experimental "kajiya" renderer for learning purposes. The team aims to simplify rendering for user-generated content, utilizing Vulkan API and Rust's versatility for GPU programming. They seek to enhance Rust's ecosystem for GPU programming.

20 comments

By @monus - 10 months

I built crik[1] to orchestrate CRIU operations inside a container running in Kubernetes so that you can migrate containers when spot node gets a shutdown signal. Presented it at KubeCon Paris 2024 [2] with a deep dive for those interested in the technical details.

[1]: https://github.com/qawolf/crik

[2]: The Party Must Go On - Resume Pods After Spot Instance Shutdown, https://kccnceu2024.sched.com/event/1YeP3

By @londons_explore - 10 months

I pulled apart the innards of CRIU because I needed to be able to checkpoint and restore a process within a few microseconds.

The project ended up being a dead end because it turned out running my program in a QEMU whole system vm and then fork()ING QEMU worked faster.

By @ahlCVA - 10 months

I once used CRIU to implement the hacky equivalent of save-lisp-and-die to speed up the startup process of a low-powered embedded system where the main application was misguidedly implemented in Erlang and loading all the code took minutes each time the device started. It worked better than it should have (though in the end it wasn't shipped because nobody (except the customer) cared enough about the startup behavior and eventually the product got canned (for different reasons)).

By @zeotroph - 10 months

I discovered CRIU in this video below (1h) "Container Migration and CRIU Details with Adrian Reber (Red Hat)", it has a live demo and the details about how much "user space" it really is. Here with the RH podman fork of docker.

Since everyone is treating containers as cattle CRIU doesn't seem to get much attention, and might be why a video and not a blog post was my first introduction.

https://www.youtube.com/watch?v=-7DgNxyuz_o

By @eikenberry - 10 months

I'm keeping an eye on this project as a way to give containers used with immutable distro installs (eg. silverblue) a kind of user-space hibernation feature. So I could hibernate different container workspaces at will. I would find this very useful for development projects where I often have a lot of state that I lose whenever I need to reboot or whatever. Last time I looked there were still to many limitations on what it could checkpoint, but maybe one day.

By @jasonvorhe - 10 months

Interesting. I built a very primitive prototype for a hosting company a while a back where I wanted to figure out if we could offer something close to a live migration of one Linux account on host x to host y without causing a lot of downtime. The product didn't support containers and isolation was just based on Linux user accounts so we couldn't just use Docker.

Just a few months ago I was talking to a startup founder at KubeCon who built a product based on CRIU. Unfortunately I forgot the company's name. (And I can't find that git repo with the prototype anywhere, even in my backups. Sad.)

By @vfclists - 10 months

Some one uses it to start Emacs very quickly - https://gitlab.com/blak3mill3r/emacs-ludicrous-speed

By @jlokier - 10 months

CRIU is used by LXD to save the state of an LXD container, very similar to suspending or snapshotting a virtual machine.

Unfortunately, I was disappointed to find `lxd stop --stateful` couldn't save any of my LXD containers. There was always some error or other. This is how I learned about CRIU, as it was due to limitations of CRIU when used with the sorts of things running in LXD.

  # lxc stop --stateful test
  (00.121636) Error (criu/namespaces.c:423): Can't dump nested uts namespace for 2685261
  (00.121645) Error (criu/namespaces.c:682): Can't make utsns id
  (00.150794) Error (criu/util.c:631): exited, status=1
  (00.190680) Error (criu/util.c:631): exited, status=1
  (00.191997) Error (criu/cr-dump.c:1768): Dumping FAILED.
  Error: snapshot dump failed

LXD is generally used with "distro-like" containers, like running a small Debian or Ubuntu distro, rather than single-application containers as are used with Docker.

It turns out CRIU can't save the state of those types of containers, so in practice `lxd stop --stateful` never worked for me.

I'd have to switch to VMs if I want their state saved across host reboots, but those don't have other behaviours regarding host-guest filesystem sharing that I needed.

In practice this meant I had to live with never rebooting the host. Thankfully Linux just keeps on working for years without a reboot :-)

By @whartung - 10 months

How do things like this handle sockets? Is there some kind of first class event that the app can detect, or does it just "close" them all and assume the app can cleanly reconnect to reestablish them (once they detect that the socket has rudely closed on them)?

By @albertzeyer - 10 months

We considered to use sth like this to cache some Python program state to speed up the startup time, as the startup time was quite long for some of our scripts (due to slow NFS, but also importing lots of libs, like PyTorch or TensorFlow). We wanted to store the program state right after importing the modules and loading some static stuff, before executing the actual script or doing other dynamic stuff. So updating the script is still possible while keeping the same state.

Back then, CRIU turned out to not be an option for us. E.g. one of the problems was that it was not possible to be used as non-root (https://github.com/checkpoint-restore/criu/pull/1930). I see that this PR was merged now, so maybe this works now? Not sure if there are other issues.

We also considered DMTCP (https://github.com/dmtcp/dmtcp/) as another alternative to CRIU, but that had other issues (I don't remember).

The solution I ended up was to implement a fork server. Some server proc starts initially and only preloads the modules and maybe other things, and then waits. Once I want to execute some script, I can fork from the server and use this forked process right away. I used similar logic as in reptyr (https://github.com/nelhage/reptyr) to redirect the PTY. This worked quite well.

https://github.com/albertz/python-preloaded

By @arjvik - 10 months

For my OS class's final project last quarter, I built a way to live-migrate a process (running on a custom OS we built from scratch) from one Raspberry Pi to another, essentially using checkpoint/restore!

Getting the code cleaned up enough to post it has been on my to-do list for quite some time, and this has inspired me to do it soon!

By @PhilipRoman - 10 months

Would love a low-tech version of this which simply suspends the process and puts all mapped pages in swap (no persistence across reboot ofc). I think it could be used for scheduling large memory-bound jobs whose resource usage is not known in advance.

By @EatFlamingDeath - 10 months

Can this be used for something like Steam Deck? It would be nice for when you are running a game and needs to stop but will resume gameplay later.

By @Retr0id - 10 months

I've interacted with some of these features as a means of code injection into running processes. (checkpoint, patch the checkpoint data, restore)

It's useful because, by design, it's difficult for the process to even notice it's been stopped. And while it's stopped, you can apply arbitrary patches completely atomically.

By @paulddraper - 10 months

This is what supports Docker's checkpoint create/restore.

And Docker is a very convenient way to do this, e.g. workaround the PID limitation.

(Though I really wish it got more attention https://github.com/docker/cli/issues/4245 )

By @overspeed - 10 months

Great project!

For long running containerised simulations, this saves a lot of time on failures ( as long as you have a safe place to write the snapshots to ) by not restarting from 0 every time.

By @usixk - 10 months

seriously cool project, used it at a prev workplace to checkpoint http servers for absolutely dirt nasty start speeds

By @mgaunard - 10 months

How does it compare to dumping a core, or regarding what the process is doing for reverse debugging?

CRIU, a project to implement checkpoint/restore functionality for Linux

Related

FreeBSD Bhyve Companion Tools

I found an 8 years old bug in Xorg

SquirrelFS: Using the Rust compiler to check file-system crash consistency

Writing an IR from Scratch and survive to write a post

Homegrown Rendering with Rust

Related

FreeBSD Bhyve Companion Tools

I found an 8 years old bug in Xorg

SquirrelFS: Using the Rust compiler to check file-system crash consistency

Writing an IR from Scratch and survive to write a post

Homegrown Rendering with Rust