Self-hosted Compilers and Bootstrapped AI

May 11, 2023

The Go compiler was initially written in C but is now entirely written in Go. The Rust compiler is written in Rust (initially in OCaml). These compilers are capable of compiling themselves. Linux is compiled and developed on Linux. PyPy is a self-hosted Python interpreter.

Compilers and operating systems don’t start out as self-hosted. There’s a chicken and egg problem — you need a system to run the software, but you need software to write the system. You can’t write the first compiler for a language in that language. But you can write the second compiler in the language and use the first compiler to compile it.

There are more examples at every part of the stack —  git development happens in git, webpack bundles its own distributions, and RubyGems and Rake are written in Ruby.

In AI, we’re starting to see the emergence of a related idea.

  • First-generation models are bootstrapped on public data (e.g., LAION), and then second-generation models use previous models (e.g., ChatGPT) to generate fine-tuning or other training data.
  • Agents that dynamically generate code and execute it. It can be seen as a sort of self-modifying code. There’s no reason an agent couldn’t improve or modify its own code, either. An extension to the REPL (Read Eval Print Loop) becomes something like Read Eval Print Loop Improve Transform (REPLIT).
  • Models that are used to provide explainability for other models.
  • Reinforcement learning can generally be seen as bootstrapping a system — running the system and receiving feedback from the runtime environment is enough to improve the system.