A List of 1 Billion+ Parameter LLMs

There are already over 50 different 1B+ parameter LLMs accessible via open-source checkpoints or proprietary APIs. That’s not counting any private models or models with academic papers but no available API or model weights. There’s even more if you count fine-tuned models like Alpaca or InstructGPT. A list of the ones I know about (this is an evolving document).

GPT-J (6B) (EleutherAI)

GPT-Neo (1.3B, 2.7B, 20B) (EleutherAI)

Pythia (1B, 1.4B, 2.8B, 6.9B, 12B)

Polyglot (1.3B, 3.8B, 5.8B)

J1 (7.5B, 17B, 178B) (AI21)

LLaMa (7B, 13B, 33B, 65B) (Meta)

OPT (1.3B, 2.7B, 13B, 30B, 66B, 175B) (Meta)

Fairseq (1.3B, 2.7B, 6.7B, 13B) (Meta)

Cerebras-GPT (1.3B, 2.7B, 6.7B, 13B) (Cerebras)

GLM-130B

YaLM (100B) (Yandex)

UL2 20B (Google)

PanGu-α (200B) (Huawei)

Cohere (Medium, XLarge)

Claude (instant-v1.0, v1.2) (Anthropic)

CodeGen (2B, 6B, 16B) (Salesforce)

NeMo (1.3B, 5B, 20B) (NVIDIA)

RWKV (14B)

BLOOM (1B, 3B, 7B)

GPT-4 (OpenAI)

GPT-3.5 (OpenAI)

GPT-3 (ada, babbage, curie, davinci) (OpenAI)

Codex (cushman, davinci) (OpenAI)

T5 (11B) (Google)

CPM-Bee (10B)

Fine-tuned models

Alpaca (7B)

Convo (6B)

J1-Grande-Instruct (17B) (AI21)

InstructGPT (175B)

BLOOMZ (176B)

Flan-UL2 (20B)

Flan-T5 (11B)

T0 (11B)

Galactica (120B) (Meta)