The infrastructure layer in AI might have the shortest half-life. Why?
- Hardware is moving faster than ever. Supply issues. Competition among the biggest players. The profitable NVIDIA monopoly. Hardware rarely moves this fast. New developments that happened over years, now happen over months. The immediate interface to hardware is changing quickly as well – optimizations in WebGPU, CUDA, Metal, Triton, PyTorch, Mojo, TPUs, and more. Half-life is a function of the layer below and the layer above. Optimizations to $model.cpp are quickly obsoleted by new models, new techniques, and new hardware.
- The axis of competition is optimization at the infrastructure layer. Faster is hard to turn into a long-term competitive advantage. Someone else can always undercut you by changing the requirements. Are you the fastest scale-to-zero infrastructure? Another startup will come along and offer scale-to-zero Llama models that are much faster (e.g., they might preload all the weights across their fleet). Or maybe someone will offer an edge runtime with tiny models that have lower latency than serving potentially big models on generic hardware. Optimization is good, but optimization is fragile.
- Research is moving faster than ever. New context length tricks – sliding windows, special tokens, and other techniques change the way that we want to train and inference our models.
- Long feedback cycles. Training a model takes time. Startups are trying to short-circuit the process with money. Some will use the advantage to front-run the competition and anticipate the cheap, but many will fail.
- Ambiguity at other layers. What will generative AI be used for in the application stack? What will the model architectures look like? When everything else is uncertain, the safest bet is to build tools. There is the Myth of the AI Infrastructure Phase. But what if the tool builders outpace the tool users?
- Infrastructure is commoditizing fast. Many companies have the incentive to open-source their infrastructure components. Hardware companies that want you to use their hardware (NVIDIA). Data companies that want you to use their models (Meta). Startups fighting for bottoms-up distribution. Product companies trying to gain goodwill and recruiting mindshare.