Choosing the Right Model

Mar 15, 2023

Even though Stable Diffusion 2.0 has been available since November 2022, most developers are still using version 1.5. The newer version delivers more realistic images and beats 1.5 on many benchmarks. So why haven't developers switched?

When bigger doesn't always mean better – other considerations that users and customers consider when choosing a model other than reasoning ability.

Network effects. Developers were dismayed when they discovered their prompt engineering didn't translate to the newer version. Downstream projects had already been built around the 1.5 architecture (even in a few months). There are real network effects to users building around and using your models.

Fine-tuning / RLHF. With the ChatGPT API being 1/10th the price of GPT-3.5, why haven't developers instantly switched over every application? ChatGPT has a reinforcement learning human feedback layer that tailors it to chat applications. When the reward model is vastly different than the application, this can make even strong models unusable.

Cost. Not all tasks require the largest model. Smaller models are capable of answering simple questions. Sometimes using a smaller model multiple times might be cheaper than calling a large model (that runs on expensive hardware).

Latency. Inference on larger models is slower. For latency-sensitive use cases, inference latency measured in hundreds of milliseconds is unacceptable.

Size. Does the model need to run on-device? Stable Diffusion has worked on macOS since near inception, and LLMs are just starting to become more accessible on commodity hardware (see LLaMA). Models that run in more places have more network effects (we've seen this with programming languages).

Training Data. Stable Diffusion 2.0 did not include celebrities or NSFW content in the training set. While this is universally a good thing, you might extend this idea to other training data sets – certain models will be more useful if they've been trained (fine-tuned) on relevant data for the task.