📄 DocSearch RAG (PDF Chatbot)

A Retrieval Augmented Generation (RAG) engine that allows users to chat with PDF documents using natural language. Built with the "Modern Data Stack": Astra DB (Vector Search), LangChain, and OpenAI.

🏗️ Architecture

Ingestion: PDF documents are loaded and split into 400-token chunks.
Vector Store: Chunks are embedded using OpenAI (text-embedding-3-small) and stored in DataStax Astra DB (Cassandra) for sub-millisecond retrieval.
Retrieval: Hybrid search locates the top 3 most relevant context chunks.
Generation: GPT-3.5 Turbo synthesizes the answer based only on the retrieved context to minimize hallucinations.

🛠️ Tech Stack

Frontend: Streamlit
Orchestration: LangChain v0.1
Database: Astra DB Serverless (Cassandra Vector)
LLM: OpenAI GPT-3.5 Turbo

🚀 How to Run

Clone the repository
Install dependencies: pip install -r requirements.txt
Add your bundle.zip (Secure Connect Bundle) to the root directory.
Run the app: streamlit run app.py

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 DocSearch RAG (PDF Chatbot)

🏗️ Architecture

🛠️ Tech Stack

🚀 How to Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📄 DocSearch RAG (PDF Chatbot)

🏗️ Architecture

🛠️ Tech Stack

🚀 How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages