Input tokens

Tokens consumed by the prompt, system message and retrieved context fed into the model.

Input tokens are billed at a lower rate than output tokens across all major providers. Enterprise workloads are typically 70–90% input tokens due to long context, RAG retrieval and system prompts.

Related terms

Output tokens

Tokens generated by the model in response — typically 3–5x more expensive than input tokens.

Context window

The maximum number of tokens a model can process in a single request.

RAG (Retrieval-Augmented Generation)

An architecture that retrieves relevant documents and injects them into the model's context.