Engineers often find themselves resolving merge conflicts manually.
Why isn't diff3
good enough to do conflict resolution automatically? Without going into the theory, I'll go over some of the issues with diff3
. If you're interesting in diving deeper, you can read A Formal Investigation of Diff3.
Problems with diff3
- Not idempotent. If you run the algorithm over and over again, you can continue to propagate changes. Intuition might say that an algorithm that merges changes together should converge.
- Not semantic. It takes no knowledge of structure. That means it doesn't understand programming languages or abstract syntax trees. What if the merge strategy could understand if the output was valid code?
- Can't work for CRDTs, operational transforms, or other structured data. This follows from the lack of semantics, but it means that tools like Notion and Google Docs can't use
diff3
to merge changes. - Fails for changes that are very similar to each other, but textually far from the parent. We would expect two different branches that are very similar to each other to merge easily. Unfortunately, if they are far from the parent branch, it's difficult for
diff3
to work. - Not stable. Formally, stability means that there exists a constant such that for small enough changes there is a guaranteed small merge.
What's been tried
Semantic merge strategies for specific languages. Here's a tool called SemanticMerge that works for #C, Java, and C.
Patch based algebras like darcs, which was an alternative version control system to git. You can read about the theory behind how darcs stored patches and resolved conflicts here.
Machine Learning for Merge Conflicts
What if we could train an algorithm on resolving common merge conflicts? We have millions of public merge conflict resolutions on GitHub as a data set. With a little magic, we could probably recreate the original diff'd conflict as well.
It seems like this is the best way to capture semantic differences across different languages – resolution that you would normally only get by parsing a language-specific AST or understanding syntax. Tricky patterns from dependency management conflicts could be easily learned and fixed.