Context window

The maximum number of tokens a model can process in a single request.

Context windows range from 128K (most frontier models) to 2M (Gemini 2.5 Pro). Larger windows reduce the need for chunking and can cut total token usage 30–50% on long-document workloads.

Related terms

Input tokens

Tokens consumed by the prompt, system message and retrieved context fed into the model.

RAG (Retrieval-Augmented Generation)

An architecture that retrieves relevant documents and injects them into the model's context.