Input tokens
Tokens consumed by the prompt, system message and retrieved context fed into the model.
Input tokens are billed at a lower rate than output tokens across all major providers. Enterprise workloads are typically 70–90% input tokens due to long context, RAG retrieval and system prompts.
Related terms
Output tokens
Tokens generated by the model in response — typically 3–5x more expensive than input tokens.
Context window
The maximum number of tokens a model can process in a single request.
RAG (Retrieval-Augmented Generation)
An architecture that retrieves relevant documents and injects them into the model's context.