TensorFlow vs. PyTorch

Sep 12, 2022

If you take a look at some of the popular machine learning models written in the last few years (YOLOv5, Stable Diffusion), they've been written in PyTorch, not TensorFlow.

I remember when TensorFlow was released in 2015. Kubernetes was released around the same time (part of Google's reasoning for open-sourcing both was to not make the same mistakes they did with Hadoop/Map Reduce – see Diseconomies of Scale at Google). It was a time when many of the deep learning models (Inception, ResNet, other CNNs, and DNNs) were built with TensorFlow, and the industry rallied around the framework. Facebook released PyTorch a year later.

Since then, PyTorch seems to be growing faster than TensorFlow.  

Why did PyTorch seem to win?

  • A more collaborative project – TensorFlow accepts the occasional outside contribution, but development is led internally by Google. External contributors were often blocked by failing internal tests that they couldn't debug.
  • An imperative vs declarative API. While declarative APIs can sometimes be more optimized and purer, imperative APIs are usually simpler to use.
  • There's so much more to the model than model design. Arguably, the "hard" part is often all the other things: figuring out training at scale, debugging, and the deployment pipeline.

Why might TensorFlow still win?

  • Facebook does not design its own chips. Google has TPUs, which can be optimized for TensorFlow (and vice versa). Facebook has joined companies like Microsoft and AMD in a partnership called Onnx to do something similar.
  • TFLite is still bounds ahead for mobile deployment of models. Google's organizational knowledge of building and operating Android seems to help.