ChatML and the ChatGPT API

Mar 2, 2023

OpenAI released a ChatGPT API today that's 1/10th the price of the leading model, text-davinci-003.

More interesting, though, is the release of ChatML, a markup language used to create chat experiences via LLM. You can read the initial documentation here. It's not included in the API today, but Greg Brockman hints it will be surfaced in the future.

The actual syntax isn't as important as the principles (it's still early in development). But, for the curious, the syntax is below:

[
 {"token": "<|im_start|>"},
 "system\nYou are ChatGPT, a large language model trained by OpenAI. Answer as concisely as possible.\nKnowledge cutoff: 2021-09-01\nCurrent date: 2023-03-01",
 {"token": "<|im_end|>"}, "\n", {"token": "<|im_start|>"},
 "user\nHow are you",
 {"token": "<|im_end|>"}, "\n", {"token": "<|im_start|>"},
 "assistant\nI am doing well!",
 {"token": "<|im_end|>"}, "\n", {"token": "<|im_start|>"},
 "user\nHow are you now?",
 {"token": "<|im_end|>"}, "\n"
]

Why is it so important?

Structured vs. Unstructured input. Another data point towards my prediction that prompt engineering will converge toward structured input and output. It's necessary for non-NLP systems (the majority of software out there) to consume. From the description in the repo:

Traditionally, GPT models consumed unstructured text. ChatGPT models instead expect a structured format, called Chat Markup Language (ChatML for short).

Models need orchestration. I'm not sure what ChatML is doing on the backend. Maybe it's just compiling to underlying embeddings, but I bet there's more orchestration. An LLM is a component of a rich user experience, but the infrastructure immediately around the model can unlock new performance and capabilities (see RLHF).
OpenAI is moving up the stack. Vanilla LLMs don't have real lock-in – it's just text in and text out. While GPT-3.5 is well ahead of the pack, there will be real competitors that follow. There are already providers (other LLMs or LLM observability companies) that can swap or middleman the calls in the OpenAI Python library simply by changing a single line of code. ChatML and similar experiences create lock-in and can be differentiated outside pure performance.
This layer is hard to master but extremely valuable.