Commoditization of Large Language Models: Part 3

Feb 25, 2023

Meta just open-sourced the weights of LLaMa, a foundational, 65-billion-parameter large language model.

I wrote part one of "Commoditization of Large Language Models" (July 2022) when EleutherAI challenged GPT-3 with open-sourcing GPT-J. I noted that GPT-3 was likely trained with mostly public datasets. The LLaMa model by Meta is trained exclusively with publicly available datasets without resorting to any proprietary datasets (read the paper). I estimated the cost at about $12 million to train GPT-3. The paper from Meta says that the 65B parameter model took 1,022,362 GPU hours (A100-80GB). On-demand prices for these GPUs are Oracle at $4/hour, Lambda Labs at $1.50/hour, Vultr at $2.68/hour, and CoreWeave at $2.21, to name a few. So with some sizable discount for committed spending, you could probably do it for $2 million or $3 million—a 4x decrease in training cost in less than a year.

I wrote part two of "Commoditization of Large Language Models" (August 2022) when Stable Diffusion open-sourced its model and weights. Since then, the company has raised $100 million and is ubiquitously used (although most people still use v1.5 instead of v2.1). As new techniques are developed, they are almost instantly implemented in Automatic1111's web UI or a startup's product offering.

So what's next?

I predict that the foundational model layer will continue to be commoditized. There's significant legal and reputation risk to open-sourcing some models (Meta states that LLaMa cannot be commercially used). Yet, some companies will trade that risk for distribution.

It might be one of the well-funded startups building foundational models (Anthropic, OpenAI, Cohere, AI21Labs, StabilityAI).

It might be one of the hardware or cloud providers (NVIDIA, HuggingFace, AWS).

It might be a company that can weaken a competitor's (Google) moat (Apple, Meta, Microsoft).

The fight for foundational model distribution will be tough – these models have little to no lock-in. Customers can easily switch between them – GooseAI (GPT-NeoX) has a one-line change needed to make the OpenAI API library use their endpoints. Prompts might not precisely translate from one model to the next, but they are relatively interchangeable.

Customers often store different (smaller) embeddings than those ultimately used to answer their LLM queries, even in vector stores. For example, they might use a HuggingFace embedding to retrieve documents via similarity or cosine search but then make a natural language query with those documents against OpenAI.

Where does the value end up? Besides my guesses from Generative AI Value Chain, we might have to wait for part four.