U-Shaped Utility of Monorepos

Dec 11, 2021

When you're organizing your code, you essentially have two choices – tracking changes in many smaller repositories or tracking changes in a single large repository. This is the age-old debate of "monorepos(itories) vs. multi/poly repos".

Monorepos have U-shaped utility. They are great for extremely small or extremely large organizations, and terrible for everything in between. Why's that?

Scaling with a monorepo. Google, Microsoft, Facebook, Uber, Airbnb, and Twitter all use monorepos. Why?

  • Shared tooling – projects can share build and CI/CD pipelines and other tooling. Creating a new project is as simple as creating a folder.
  • Dependency management – ensures that when a shared library is updated, tests can be run against all consumers and all consumers must be updated. Not guaranteed to keep you out of the Nine Circles of Dependency Hell but goes a long way.
  • Code transparency
  • Atomic commits – two dependent projects can be changed at the same time.

Starting with a monorepo. This is not as popular of a strategy, but I believe that starting with a monorepo is often the right choice.

  • Context-switching between projects is simpler.
  • Service boundaries might not be well-defined or constantly changing early on. Prematurely code-splitting can mean an enormous hit to developer productivity.
  • Guards against toolchain sprawl to some degree. If everything is in one repository, it's much more natural to want to reuse existing code and keep the number of concepts small.

The trough of despair. If you're a medium size organization, monorepos can be tough. You'll need to build bespoke tooling to handle most things.

  • Authorization. How do you control who has merge rights to different parts of the code? Before GitHub had the OWNERS files, teams had to build this tooling themselves (e.g. Kubernetes built this).
  • Build tooling. Multi-language build scripts can quickly become slow. Moving to an entire build system like Bazel or Pants can be even slower. What changes need to be recompiled and what tests need to be re-ran on code changes?
  • Merge queues. Merge requests can pile up and become difficult to merge, requiring a special ordering or constant maintenance to keep them up to date. Merge queues bundle these changes and merge them at the same time. There's not many off the shelf tools that do this.
  • Secret management, environment management, and a host of other issues that require a dedicated developer experience and platform team.

If I were starting a project today, I'd most certainly start off with a monorepo. As services start to evolve and the application becomes more complex, I'd split it into the minimum number of repositories, making sure that highly dependent services still stayed in the same repository.