TPS (Tokens per Second)

Throughput metric for LLM inference — critical for sizing infrastructure.

Peak TPS load on enterprise AI rollouts typically runs 3–8x the average. Capacity planning should target the 95th-percentile peak, not the mean.

Related terms

Output tokens

Tokens generated by the model in response — typically 3–5x more expensive than input tokens.