July 3rd, 2024

Reproducibility in Disguise

Reproducibility in software development is supported by tools like Bazel, addressing lifecycle challenges. Vendor dependencies for reproducibility face complexity, leading to proposed solutions like vendoring all dependencies for control.

Read original articleLink Icon
Reproducibility in Disguise

Reproducibility has gained importance in software development, with tools like Bazel being widely adopted for their ability to address lifecycle challenges. While large companies vendor dependencies for reproducibility, this practice is not feasible for most developers due to the complexity of managing transitive dependencies. Bazel has introduced non-vendored dependencies and a versioning solver system to help users integrate with language package managers. However, the use of language package management tools can introduce issues like diamond dependencies, undermining the promised reproducibility. An example within the Python ecosystem illustrates how shared libraries and semantic versioning can lead to conflicts between packages relying on different versions of the same library. The solution proposed is to vendor all dependencies to ensure control over versions and avoid diamond dependency problems. Despite the promise of reproducibility, the reliance on external package management systems can introduce complexities that compromise the intended benefits of tools like Bazel.

Related

Chasing a Bug in a SAT Solver

Chasing a Bug in a SAT Solver

Adolfo Ochagavía and Prefix.dev swiftly resolved a bug in the SAT-based dependency solver, resolvo, with community input. The incident emphasizes open-source collaboration and potential debugging tool enhancements for software quality.

Avoiding Emacs Bankruptcy

Avoiding Emacs Bankruptcy

Avoid "Emacs bankruptcy" by choosing efficient packages, deleting unnecessary configurations, and focusing on Emacs's core benefits. Prioritize power-to-weight ratio to prevent slowdowns and maintenance issues. Regularly reassess for a streamlined setup.

Is Guix full-source bootstrap a lie?

Is Guix full-source bootstrap a lie?

The article discusses Guix's transparent and secure full-source bootstrap process, enabling users to verify over 22,000 nodes like Python PyTorch with 1150 dependencies. It emphasizes verifying each step to prevent backdoors or fraud.

Software Engineering Practices (2022)

Software Engineering Practices (2022)

Gergely Orosz sparked a Twitter discussion on software engineering practices. Simon Willison elaborated on key practices in a blog post, emphasizing documentation, test data creation, database migrations, templates, code formatting, environment setup automation, and preview environments. Willison highlights the productivity and quality benefits of investing in these practices and recommends tools like Docker, Gitpod, and Codespaces for implementation.

Zig-style generics are not well-suited for most languages

Zig-style generics are not well-suited for most languages

Zig-style generics, inspired by C++, are critiqued for limited universality. Zig's simplicity contrasts with Rust and Go's constraints. Metaprogramming praised for accessibility, but error messages and compiler support pose challenges. Limited type inference compared to Swift and Rust.

Link Icon 4 comments
By @nrclark - 7 months
One other hidden Bazel trap that I've seen is for companies to migrate a large codebase to Bazel, but then to rely on OS-provided tools and libraries. Commonly, this gets paired with a glib answer like "it's fine, we build from inside of a Docker container". But I've never seen that Docker image linked into Bazel's dependency resolver, or the compose scripts used to launch the container.

This has the following effects:

    1. There are unexpressed package/tool dependencies.
    2. Across a large organization, Bazel's reproducibility guarantees go out the window.
    3. Developers can't just clone the repo and start using Bazel. Instead, they have to pull down some pinned Docker image, or build it themselves and lose reproducibility.
    4. This effectively poisons the cache whenever the Docker image is updated or rebuilt. If using a shared remote cache, it can be a major issue.
If an organization isn't big enough to vendor every single tool dependency, shared library, etc (which basically requires building out an OS distribution in Bazel), what's the right way to approach this problem?
By @skybrian - 7 months
I'm guessing this problem might be language-specific. Does Bazel do better at importing external dependencies for Go?
By @spankalee - 7 months
edit: The previous title was "Reproducibility in Disguise: Bazel, Dependencies, and the Versioning Lie"

Eh... the real lie is the signal version policy in the first place. It can be broken, at least in Google, and must be often otherwise there'd be massive gridlock.

If you want to bring in a new third-party library that has a dependency that already exists in the monorepo, you'd better _pray_ that the version in the monorepo is already compatible. If not, you have to update that library and potentially every target that depends on it, maybe transitively. Trying to do that can sometimes take years - no joke.

So the single-version policy flexes, and exceptions are supposed to be temporary, but they're not always.

In the end there are two competing sometimes good, sometimes bad goals:

- Libraries should be somewhat discouraged from making too many or too flippant breaking changes. By having to update clients, they feel the pain and take migration more into account.

- Libraries should be able to make real improvements, even if they are breaking changes, and having clients shouldn't burden them with unbounded costs. They should be able to distribute reasonable costs of updating by letting clients update on their own time with versioning.

Vendored third party packages only highlight this tension, because the external world has largely settled on versioning as the approach which clearly conflicts with signle-versions, but it really existed anyway.

By @fragmede - 7 months
The gross hack is you import libbfoo1 and libfoo2 when breaking changes like that are involved