How Git Merges

Nov 22, 2021

One of the most critical features of modern-day version control relies on a poorly understood algorithm – diff3. What is it, how does it work, and why it could be improved?

There are two critical operations that need to know the differences between files:

  • Reviewing new changes
  • Merging multiple changes

First, version control systems like git don't actually store the patched differences between files. Instead, git stores completely new files for every revision and computes the differences at runtime. That's why you can switch between really old revisions and new revisions quickly – otherwise, every patch would need to be sequentially applied.

So how does git know the differences between files? Comparing two files git uses a set of algorithms that mostly follow Myers' An O(ND) Difference Algorithm and Its Variations. It's closely related to a common interview question, the longest common subsequence problem.

But one of the most challenging aspects of version control is the 3-way merge. Let's say you, and a coworker independently make different revisions to a common document and then try to reconcile the differences. It's possible that some changes can be merged together automatically, but there might exist conflicts.

One merge tool that git uses is called diff3 or sometimes kdiff3, which, as you can guess, produces a merged output of 3 different files that share a common ancestor. diff3 was originally written in 1976 in Version 7 of Unix. The algorithm is a set of heuristics that tries to capture most edge cases of merging text together, but it's not straightforward or easy to understand. Doug McIlroy (or an engineer named Paul Jensen) wrote the first version of diff3 for Unix. You can read McIlroy's paper, An Algorithm for Differential File Comparison.

You can enable printing merge conflicts in diff3 format with the command. This has the benefit of showing you the diff output that includes the common ancestor, which usually helps when resolving conflicts.

$ git config --global merge.conflictstyle diff3

You can browse the source code of diff3 here.