architecture

A visual walkthrough of how Retrieval-Augmented Generation pipelines are structured.

how to read this diagram

Documents are split into chunks and converted to embeddings (numerical vectors).
These embeddings are stored in a vector database for fast similarity search.
When a query comes in, the retriever searches the vector DB for the most relevant chunks.
The retrieved chunks are passed to the LLM along with the original query.
The LLM generates a final answer grounded in those documents.