Reproducibility in Practice

What is reproducibility in software engineering? It works on my machine

Reproducibility is the confidence that an application will behave similarly in development, test, and production. Reproducibility can make it easier to track down bugs and fix them.

Reproducibility is a confidence interval, not an absolute concept. It forces the engineer to understand the tradeoffs between flexibility and correctness and determine under which conditions reproducibility is desired or necessary.

Here's a few different examples of reproducibility in practice.

Vendoring

Vendoring is the process of saving the state of all software dependencies. Usually this takes the form of downloading all the dependencies and committing them to source control. For instance, node_modules and Go's vendor folder.

Why is vendoring useful?

Resolved compile-time or runtime dependencies may be different in different environments. A package exists locally, or a maintainer has updated the package remotely on the package repository. By committing a certain set of dependencies that have been tested, other developers can be more confident that the project will build and run on their machine.

Reproducible Vendoring

Committing dependencies to the repository is a good step, but how were those dependencies resolved?

When it comes time to update, how do we know what transitive dependencies will also need to be updated?
How do we trust that a developer did not sneak in malicious code when committing a large number of vendored dependencies?

By having a manifest of checksums and versions for each user-specified dependency, along with a program that can "solve" the transitive dependencies based on that file, we can solve both of these problems. The same program can be ran in CI to shift the trust model to the solver binary instead of the code review. The solver binary will also be able to update dependencies transitively.

Examples Go's go.mod, npm's package-lock.json.

Declarative Configuration

Declarative configuration, in contrast to imperative configuration, describes a desired state of software, rather than the explicit commands to create that state.

Like vendoring, the declarative model is more reproducible because the imperative model does not account for the current state of the environment. Has the application already been deployed? Does a folder exist or need to be created? While the imperative model can imperatively check all of these conditions initially at runtime, the state may change over time and produce undesirable results. Once you start watching the state continuously, you've arrived at the declarative model.

Like reproducible vendoring, the declarative model is about shifting the reproducible burden to a application - in this case, a reconciler or a controller that manages the state.

Most importantly, declarative configuration allows for infrastructure as code. It allows you to codify the state of the infrastructure, which means that it can be reproduced easier.

Containers

Container can be used to provide reproducibility in terms of the root filesystem, environment variables, PID, and user.

Containers can provide reproducibility in two aspects

Runtime reproducibility
Build reproducibility

By ensuring the rootfs, environment variables, and user in running deployments, we can reduce the possibility of an ill-provisioned node, bad behaved sibling processes, and unexpected filesystem state.

In contrast to the previous strategies for reproducibility, containers are about creating a specification bundle that behaves the same on any linux kernel. Namespaces make sure that the processes view of the world looks the same regardless of the the state of the world.

You can think of build reproducibility in a similar matter, except for the state of the world when the artifact is built rather than when it is ran. Note well: the Dockerfile doesn't provide great reproducibility, in the sense that it still doesn't solve the issue of vendored dependencies or the availability and reproducibility of networked dependencies, but it is a step in the right direction.

Byte-for-byte reproducible builds on the same "environment"

Build a binary, take the checksum, and build it again on a similar machine. Chances are, you won't get the same checksum as you did before? Why?

Compilers like GCC can capture the build path and use that in compilers, nondeterministic random ids can be injected as well.

Build systems like Bazel, Pants, and Buck are all aiming to be reproducible build systems.

There is an effort to make debian packages reproducible as well.