RAG PDF Chatbot

Overview

This project implements a PDF chatbot that allows users to interact with the content of their PDF documents using a conversational interface.

Key Features

PDF Upload & Processing
Upload PDFs through a simple Streamlit interface. The uploaded PDF is processed to extract text, which is then split into semantically meaningful chunks.
Semantic Text Chunking
Utilizes Langchain's RecursiveCharacterTextSplitter to divide text into manageable pieces while preserving semantic context.
Embedding Generation & Storage
Uses Google Generative AI to generate embeddings for each text chunk. These embeddings are stored in ChromaDB for efficient retrieval.
Conversational Query Interface
A chat-based UI built with Streamlit enables users to ask questions about the PDF content. The application retrieves relevant information and generates context-aware responses.
Retrieval Augmented Generation (RAG)
The system retrieves the most contextually relevant text chunks from the document and then leverages generative AI to produce precise, context-aware responses. HyDE Optimization further refines this process by generating hypothetical answers that guide the retrieval phase, ensuring that the best possible context is used for the final output.

Dependencies

Streamlit – For the web-based chat interface.
PDFPlumber – To extract text from PDF documents.
ChromaDB – For embedding storage and fast retrieval.
Google Generative AI (google-generativeai) – To generate embeddings and conversational responses.
Python-Dotenv – To load environment variables.
Langchain – For semantic text splitting.

Local Setup Guide

1. Install Python 3.9+

Download from python.org then verify:

python --version

2. Clone the Repository

git clone <repository-url>
cd <repository-folder-name>

3. Set Up Virtual Environment

python -m venv venv
venv\Scripts\activate # Windows
source venv/bin/activate # macOS/Linux

4. Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

5. Configure API Key

In project root, create a new file named .env
Get key from Google AI Studio

Add to .env:

GEMINI_API_KEY="YOUR_API_KEY"  # Paste your key here

6. Run the Application

streamlit run app.py

The application will start running at http://localhost:8501.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
uploads		uploads
utils		utils
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG PDF Chatbot

Overview

Key Features

Dependencies

Local Setup Guide

1. Install Python 3.9+

2. Clone the Repository

3. Set Up Virtual Environment

4. Install Dependencies

5. Configure API Key

6. Run the Application

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG PDF Chatbot

Overview

Key Features

Dependencies

Local Setup Guide

1. Install Python 3.9+

2. Clone the Repository

3. Set Up Virtual Environment

4. Install Dependencies

5. Configure API Key

6. Run the Application

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages