LLM Ops, Part 1

You're integrating an LLM API into your application. You have a great idea on how to augment your product with an API from OpenAI, Anthropic, or another foundational model provider. And unlike the last iteration of machine learning models, you don't need to label your data or build sophisticated pipelines to train a bespoke model. Just call an API.

The completion API is simple – enter a natural language prompt of what you want to do, and the API will output the natural language result.

A simple problem at first glance, but to use it in production takes real ops.

LLM Ops is everything else you need to do to get good results – today, it's prompt engineering (not for long if I'm right); tomorrow, it's monitoring, QA, workflows, and extensions. In the future, LLMs will need to be extended in two fundamental ways:

Storage: Proprietary/real-time data – The in-context learning window (i.e., "the prompt") is only so large. It will increase over time, but it might never fit a full repository of code, a folder of documents, all the pages of a website, or a database. Today, the state-of-the-art is to pre-process it and store it in a vector database and only pull the most relevant documents to add to the prompt.
Compute: Actions/tools – LLMs are bad at math. They can't calculate an md5 sum. They can try to emulate running code, but it's often slower and incorrect. But they are surprisingly good at using tools, functions, and APIs if given a chance. A hybrid approach of LLMs deciphering but delegating tasks is likely to be extremely important in the future.

This is a simplification of the whole process – as the workflow matures, there will be many more things to build. For example, will LLMOps be served by a few companies? Or will it become an embedded role just like DevOps or MLOps? Will the foundational model APIs be extended to provide these workflows?