Reference
The AI economics glossary
Concise, working definitions of the concepts that drive enterprise AI cost — for finance, infrastructure and AI leaders planning multi-year rollouts.
Token
The unit of text processed by an LLM — roughly 4 characters or 0.75 words in English.
Input tokens
Tokens consumed by the prompt, system message and retrieved context fed into the model.
Output tokens
Tokens generated by the model in response — typically 3–5x more expensive than input tokens.
Context window
The maximum number of tokens a model can process in a single request.
RAG (Retrieval-Augmented Generation)
An architecture that retrieves relevant documents and injects them into the model's context.
Embedding
A vector representation of text used for semantic search and RAG retrieval.
TPS (Tokens per Second)
Throughput metric for LLM inference — critical for sizing infrastructure.
AI Agent
A system where an LLM autonomously plans and executes multi-step tasks using tools.
Frontier model
The highest-capability tier of LLMs — GPT-5, Claude Opus 4, Gemini 2.5 Pro.
Inference cost
The compute cost of running a trained model to generate outputs.