Inference cost

The compute cost of running a trained model to generate outputs.

Inference cost is the dominant component of enterprise AI operating expense — typically 70–90% of the total AI bill. It scales linearly with tokens consumed and is the primary lever for cost control.

Related terms

Token

The unit of text processed by an LLM — roughly 4 characters or 0.75 words in English.

TPS (Tokens per Second)

Throughput metric for LLM inference — critical for sizing infrastructure.