The AI economics glossary

Concise, working definitions of the concepts that drive enterprise AI cost — for finance, infrastructure and AI leaders planning multi-year rollouts.

Token

The unit of text processed by an LLM — roughly 4 characters or 0.75 words in English.

Input tokens

Tokens consumed by the prompt, system message and retrieved context fed into the model.

Output tokens

Tokens generated by the model in response — typically 3–5x more expensive than input tokens.

Context window

The maximum number of tokens a model can process in a single request.

RAG (Retrieval-Augmented Generation)

An architecture that retrieves relevant documents and injects them into the model's context.

Embedding

A vector representation of text used for semantic search and RAG retrieval.

TPS (Tokens per Second)

Throughput metric for LLM inference — critical for sizing infrastructure.

AI Agent

A system where an LLM autonomously plans and executes multi-step tasks using tools.

Frontier model

The highest-capability tier of LLMs — GPT-5, Claude Opus 4, Gemini 2.5 Pro.

Inference cost

The compute cost of running a trained model to generate outputs.