RAG (Retrieval-Augmented Generation)
An architecture that retrieves relevant documents and injects them into the model's context.
RAG adds a persistent indexing baseline (embedding + vector storage) plus per-query retrieval overhead. Total RAG cost is the sum of indexing refresh, embedding generation, vector storage and LLM inference.