Consumption Pricing Units in AI

A look at some of the current consumption-based pricing units in AI (and some alternative ones).

Consumption-based pricing units should fit the following general criteria:

Relevant to the core functionality
Easy to measure and understand
Scalable and adaptable across customer segments
Incentivizes efficient use

Per Prompt Token / Completion Token — OpenAI charges different amounts per prompt token (user input) and per completion token (model output). Why? Self-attention is quadratic in time and space complexity with regard to input length, so longer input queries take more CPU and memory. Models output the next predicted token, so more tokens in the output naturally scale with compute time and memory. Used in many of the foundational model providers (e.g., OpenAI).

Per word (character, sentence, document, bytes)— Tokens roughly correspond to 3/4 of a word in OpenAI’s embedding model (this differs per model). So words are a good approximation of query cost. Used in products where the word is the unit of value (e.g., Jasper).

Per API call — A flat fee per API call, regardless of the contents. It could take an average of the cost to carry the message with some protections in place to discourage adverse selection. Works when the API is targeted enough to correspond to the unit of value (e.g., Twilio).

Per dedicated hardware bundle (vCPU, Memory, Storage) — Charge based on a compute, storage, and network egress/ingress bundle. Many foundational model hosting providers offer this (e.g., HuggingFace).

Per minute (second, session) — For more continuous interaction (e.g., chat), you might price per minute of interaction with the model. A weaker form of per compute bundle.

Per input/output metric (entities, content, complexity) — Use some metric to measure the input or output — number of entities returned, types of content processed (images, text, tables, other media), or “complexity.”

Per success — Evaluate the outputs against success criteria (e.g., code that compiles or passes a test). Only charge for successful responses. The analogy to this is paying per conversion with ads.

Per action — A model that outputs multiple choices of actions (e.g., tools to use, APIs to call, etc.) could charge per action.