A simple Retrieval-Augmented Generation (RAG) app built with LangChain, FastAPI, and plain HTML/JS.
Upload a PDF or text document, then ask questions — the app retrieves relevant chunks and uses an LLM to answer.
┌──────────┐ ┌──────────────┐ ┌────────────┐
│ Browser │──────▶│ FastAPI │──────▶│ LangChain │
│ (HTML) │◀──────│ Backend │◀──────│ + FAISS │
└──────────┘ └──────────────┘ └────────────┘
│ │
Upload doc OpenAI API
/query (embeddings + chat)
- Upload — The document is split into small overlapping chunks.
- Embed — Each chunk is converted to a vector using OpenAI Embeddings.
- Store — Vectors are stored in an in-memory FAISS index.
- Query — The user's question is embedded, the top-k similar chunks are retrieved, and passed as context to the LLM which generates an answer.
RAG/
├── main.py # FastAPI backend
├── static/
│ └── index.html # Frontend
├── requirements.txt
├── .env.example
└── README.md
cd RAG
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtcp .env.example .env
# Edit .env and paste your real API keyuvicorn main:app --reloadOpen http://127.0.0.1:8000 in your browser.
- Click Upload and select a
.pdfor.txtfile. - Type a question in the text box and click Ask.
- The answer and source chunks will appear below.
| Concept | Where in code |
|---|---|
| Document loading | load_document() — uses LangChain's PyPDFLoader / TextLoader |
| Text chunking | build_vector_store() — RecursiveCharacterTextSplitter |
| Embeddings | OpenAIEmbeddings() converts text → vectors |
| Vector store | FAISS.from_documents() — similarity search index |
| Retrieval chain | RetrievalQA.from_chain_type() — retrieves context + generates answer |
| API endpoint | FastAPI @app.post("/query") |
- This uses in-memory FAISS — data is lost on restart.
- Uses
gpt-3.5-turboby default. Change the model inget_qa_chain(). - For production, add authentication, persistent storage, and rate limiting.