August 13th, 2024

The new PostgreSQL 17 make dist

PostgreSQL 17 improves tarball creation by using `git archive`, enhancing reproducibility and traceability while addressing previous complexities and security concerns, though challenges in complete reproducibility remain.

Read original articleLink Icon
The new PostgreSQL 17 make dist

PostgreSQL 17 introduces significant changes to the process of creating source code tarballs, which are essential for software distribution. Previously, the tarball creation involved a complex system that included prebuilt files, which complicated maintenance and raised concerns about reproducibility and security. The new approach utilizes the `git archive` command, allowing for a reproducible and verifiable tarball generation directly from a specific Git commit. This change enhances the integrity of the software supply chain, as users can now trace the tarball back to the exact Git repository, ensuring consistency across builds. The transition to this method addresses historical issues related to build output management and the need for a clean source environment. While the new system is a step forward, challenges remain, particularly regarding reproducible builds in all scenarios and the traceability of code origins in the Git repository. Future improvements may include implementing signed commits to further enhance security and integrity.

- PostgreSQL 17 changes tarball creation to use `git archive` for reproducibility.

- The new method improves software supply chain integrity and traceability.

- Previous tarball creation methods were complex and raised security concerns.

- Challenges remain in achieving complete reproducibility and tracking code origins.

- Future enhancements may include the use of signed commits for better security.

Link Icon 9 comments
By @hlandau - 3 months
Personally I've always considered it bad hygiene to commit generated outputs, but this article notes that this takes on a new significance in the light of supply chain security concerns. Good changes from PostgreSQL here.

Generated output, vendored source trees, etc. aren't, or can't be, meaningfully audited as part of a code review process, so they're basically merged without real audit or verification.

My personal preference is never to include generated output in a repository or tarball, including e.g. autoconf/automake scripts. This is directly contrary to the advice of the autotools documentation, which wants people to ship these unauditably gargantuan and obtuse generated scripts as part of tarballs... an approach which created an ideal space for things like the XZ backdoor.

By @steeleduncan - 3 months
Nix and Guix have their issues, but it is hard to read something like this and not wonder why you would migrate to them when facing issues like this

There is a learning curve for either Nix or Guix that puts many off. However its not that steep, certainly it is many orders of magnitude easier than maintaining PostgreSQL, and once you are over that you no longer need to do things like keeping a dedicated clean machine just to pack a tarball. Write the derivation and anyone, anywhere, on any machine can generate the exact same tarball with a one liner

The barrier caused by the initial steps of learning Nix/Guix is a shame because once you are over it, it is difficult to see why software is built any other way (the same may apply to bazel, but i have no experience with that).

By @carderne - 3 months
Does this make the downstream packagers’ jobs harder — they must now presumably have Perl, Bison, Flex and DocBook installed on the packaging machines?
By @cryptonector - 3 months
Historically in autoconf codebases (which PostgreSQL is) `make dist` is done: a) after `./configure` (so you have a Makefile, naturally), which is b) after `autoreconf -fi` (so you have a `./configure`). This allows the dist archive to contain the outputs of `autoreconf -fi` so that users need not have autoconf installed and they can just run `./configure`.

Switching to `git archive` is fine, and you can add files to that, but https://github.com/postgres/postgres/blob/master/GNUmakefile... doesn't. So, I guess users now _have to_ run `autoreconf -fi`? No, because those are now committed in the source tree (https://github.com/postgres/postgres/blob/master/configure).

By @gpvos - 3 months
> Currently, the Git version used to produce the release tarballs (on the above-mentioned “clean” box) is too old to create reproducible .tar.gz tarballs, but it will create reproducible .tar.bz2 tarballs.

What is different about gzip and bzip2 that causes this?

By @klysm - 3 months
The death of autoconf could be one of the biggest wins for the software world
By @alexvitkov - 3 months
Flex and Bison are simple enough tools (probably, you never know with GNU stuff) that you should be able to just vendor & compile them as part of the build process if you actually care about reproducible builds.
By @iamcreasy - 3 months
> Some packagers have policies that everything needs to be built from source, so they’d just delete and rebuild the prebuilt files anyway.

What packages are they referring?

By @mgaunard - 3 months
tl;dr they're moving to using "git archive" to ensure what's in the tarball is what's under git.