Docker Is a Compiler

Jun 1, 2021

Docker is a compiler that deals with higher-level abstractions. That makes it an extremely powerful tool.

First, a refresher on compilers. Classical compiler design splits a static compiler into three phases - a front end, the optimizer, and the back end. The front end parses source code, error checks, and builds an AST (scanner, parser, analyzer). The optimizer may do a variety of transformations to make the code run faster. The back end does instruction selection, register allocation, and instruction scheduling (code generator).

Docker's front end starts with the Dockerfile. However, this is quickly changing. Docker introduced BuildKit a few years ago, which provides the API to plug in your own syntax (I wrote the first proof-of-concept alternative to the Dockerfile, the mockerfile, in Jan 2019).

Docker then scans, parses, and analyzes the AST that gets generated from the high-level instructions into a Low-Level-Build language (LLB) that is clearly influenced by LLVM, at least by name. Docker can then optimize those layers - automatic garbage collection, concurrent dependency resolution, and efficient instruction caching. Finally, Docker outputs either a container image - which can be seen as a statically linked binary, or generic artifacts (a binary, a set of files, etc).

But what are the practical implications of this? Once we start thinking about Docker as a compiler, we unlock completely new workflows. We can start optimizing at a higher level abstraction - files and layers, in addition to variables and functions.