← Glossary

TPS (Tokens per Second)

Throughput metric for LLM inference — critical for sizing infrastructure.

Peak TPS load on enterprise AI rollouts typically runs 3–8x the average. Capacity planning should target the 95th-percentile peak, not the mean.

Related terms