The AI Partnership Race

Mar 30, 2023

A look at how Internet-age companies are partnering with AI startups for (1) research, (2) distribution, and (3) hardware.

Why? Because large companies can’t innovate the way startups can.

  • Slow product cycles from layers of bureaucracy
  • Extreme reputation risk (when things inevitably go wrong)
  • Disruption to existing business models (e.g., Search)
  • Distribution to a small but important new audience (e.g., LLM builders)
  • Incentives to make risky bets (e.g., startup equity)

Research

While all these companies have research organizations (Google researchers wrote the Transformers paper), there’s an inherent risk to productionizing these models. They could output incorrect, biased, or otherwise damaging information. They might train on the wrong data or bring up privacy concerns (a death sentence for companies already under intense public scrutiny). The startups gain some proprietary distribution for their models, but most importantly, access to the hardware and cloud resources they need to train foundational models.

Distribution

Distribution is all you need (right?). These companies defined distribution, so why would they need partners? While the field is still nascent, it’s essential to get the technology into the hands of the right people. That means (1) developers, (2) startups building the next big thing (3) and data scientists and machine learning practitioners. Startups that are hyper-focused on creating a dense community of these personas can beat the hyper-scaled companies that have to do everything. The partners can then access the foundational models and state-of-the-art research (e.g., OpenAI Codex / GitHub Copilot) to build differentiated products.

Hardware

Right now, access to GPUs is scarce. Startups fight to request quota on the cloud providers (and others, building their own data centers). At the lowest level, many of these models are tightly coupled to the underlying hardware, and I believe we’ll only see more application-specific chips (read: transformer-optimized architectures). Cloud hyper-scalers have started building this ability over the last few years, and Google and Apple have done so via chips for mobile. Even Meta has built out this capability with its partnerships in VR headsets. For what it’s worth, Meta has also worked to abstract the underlying GPUs (re: commoditize) out of PyTorch.

I left NVIDIA out of this, as it seems like they aren’t restricting themselves to a single cloud provider and are even moving up the stack via open-source.