July 10th, 2024

Towards Idempotent Rebuilds?

The blog post explores idempotent rebuilds in Debian and Ubuntu packages. It introduces debdistrebuild, aiming to enhance reproducibility by analyzing rebuild differences. Challenges like build paths and dependencies are highlighted, emphasizing trust in binary distributions.

Read original articleLink Icon
Towards Idempotent Rebuilds?

The blog post discusses the concept of idempotent rebuilds in the context of Debian and Ubuntu packages. The author introduces a new project called debdistrebuild, aiming to rebuild packages from various distributions to analyze differences and improve reproducibility. The project successfully rebuilds a portion of packages from Debian bullseye, bookworm, and other distributions, highlighting challenges like varying build paths and version dependencies. The post emphasizes the importance of achieving 100% idempotent rebuilds to enhance trust in binary distributions. It also touches on the complexities of circular dependencies and the need to use consistent build dependencies for reproducible builds. The author suggests that rebuilding packages with the same original build dependencies could lead to a higher number of reproducibly built packages. Overall, the post delves into the technical intricacies of package rebuilding, reproducibility issues, and the pursuit of idempotent rebuilds to ensure the integrity of software distributions.

Link Icon 11 comments
By @no-dr-onboard - 6 months
I maintain the reproducible builds effort for my company and, please, let me tell you that this is the main pitfall of the whole effort.

There is always going to be a degree of un-reproducibility just due to the nature of math. If you don't have the same system, same compiler version (down to the minor or patch level), same dependency versions, same build flags, filesystem ordering, OS handling etc. . .you're going to get differences.

The RB project has readily disclosed that there is a degree of "significantly reproducible" sussing that each end user is going to have to do. The fact that the Debian maintainers chose not to display the degree of reproducibility is probably because showing low reproducibility scores undermines the efforts to evangelize the movement.

I think that's understandable, but also is a bit of a two edged sword. If we don't disclose scores, we allow for the misrepresentation that "this is safe because it has the word reproducible in it". If we disclose scores, we get articles like this saying "wow, thats a really low score, wtf" and short lived paranoia gives way to ambivalence about the whole thing.

It's difficult to capture the nuance in this in pithy tidbits, hence blog post on HN with me explaining this :).

By @compiler-guy - 6 months
For whatever it is worth, Google internal builds using the internal version of bazel are deterministic and reproducible. And google spends a lot of time and effort keeping them that way. You do have to ensure that nothing ever sorts based on pointer value, for example.

Clang works fine as a compiler for this--there is nothing in it that normally produces different results due to timing or whatever. When something does leak in, we fix it upstream. You do have to ensure that no one uses __DATE__ or similar macros, or that you redefine them to a known value on the command line.

By @alganet - 6 months
> (...) rebuilding packages using a bootsrappable build process, both seem orthogonal to the idempotent rebuild problem

You know what would be awesome? If someone could start from, let's say, live-bootstrap[1] and build towards matching the checksums for some distro kernel+toolchain.

It sounds like the same kind of problem, it all comes down to knowing what build conditions affect the resulting binaries, so I think you nailed the problem description on this and yes, it all feels very orthogonal from that perspective!

Thanks for writing this blog entry!

[1]: http://github.com/fosslinux/live-bootstrap

By @vzaliva - 6 months
Coming from maths, I am confused by use of the term "idempotent" here. Unless we are talking about bootstrapping a compiler and I do not see how it applicable here. Am I missing something?
By @simpaticoder - 6 months
Idempotent, deterministic builds are an argument in favor of synthetic virtual machines. Synthetic VMs like the JVM or CLR should, at least insofar as they don't contain native code, execute in a manner largely isolated from the vagaries of minor hardware/OS differences. Not an expert, but native VMs do not and cannot isolate processes from hardware details (e.g. Xen, Virtual Box), or from OS details (e.g. Docker, containerd).
By @nixosbestos - 6 months
Men will do anything except go to therapy ^H^H^H^H^H^H^H^H^H^H learn Nix.
By @spyspy - 6 months
I spent some time trying to prove two go binaries were the same in the name of reproducible builds but couldn’t figure out if it was possible, even though I had built both myself and knew they were in effect the exact same. Go binaries have some sort of randomness (timestamp? Map entry? No idea) that I couldn’t pin down. Sometimes the hash of the binaries were the same and sometimes they weren’t. Short of cataloguing and hashing every file that went into the build I couldn’t figure it out and gave up.
By @outsomnia - 6 months
I was expecting to see the word "yocto", did I miss it?

This goes quite far along the path, building all the build tools and toolchain to the same version before building the packages.

By @rsc - 6 months
Idempotent is a somewhat confusing word choice here. "Verifiable builds" seems more a accurate description of what they want. (See also https://go.dev/blog/rebuild.)
By @ethegwo - 6 months
As I know, LLVM optimizer (maybe also GCC) is not idempotent, does it mean we need to pay performance for idempotent buildings?
By @xiande04 - 6 months
Nix?