/ rag / architecture
A visual walkthrough of how Retrieval-Augmented Generation pipelines are structured.
how to read this diagram
- Documents are split into chunks and converted to embeddings (numerical vectors).
- These embeddings are stored in a vector database for fast similarity search.
- When a query comes in, the retriever searches the vector DB for the most relevant chunks.
- The retrieved chunks are passed to the LLM along with the original query.
- The LLM generates a final answer grounded in those documents.