← Glossary

RAG (Retrieval-Augmented Generation)

An architecture that retrieves relevant documents and injects them into the model's context.

RAG adds a persistent indexing baseline (embedding + vector storage) plus per-query retrieval overhead. Total RAG cost is the sum of indexing refresh, embedding generation, vector storage and LLM inference.

Related terms