/ rag / architecture

A visual walkthrough of how Retrieval-Augmented Generation pipelines are structured.

how to read this diagram

  • Documents are split into chunks and converted to embeddings (numerical vectors).
  • These embeddings are stored in a vector database for fast similarity search.
  • When a query comes in, the retriever searches the vector DB for the most relevant chunks.
  • The retrieved chunks are passed to the LLM along with the original query.
  • The LLM generates a final answer grounded in those documents.