Welcome to the Agentic Retrieval RAG System, an end-to-end, production-ready document Q&A application powered by PageIndex. This project demonstrates a paradigm shift in Retrieval-Augmented Generation (RAG).
Traditional Similarity-based RAG pipelines rely heavily on text chunking, embedding generation, and vector databases, which often lead to context loss and the "lost-in-the-middle" problem. This system utilizes Vectorless Reasoning-based RAG, allowing AI to natively read and comprehend entire documents—giving transparent, human-expert-level responses.
- Vectorless Agentic Retrieval: Process massive document contexts without manual chunking, vectors, or external embedding databases.
- Lightning-Fast Execution: Environment and dependency management powered by
uv, the fastest package installer and resolver for Python. - Real-Time Streaming: Tokens stream directly to the chat UI for an interactive, ChatGPT-like experience.
- Production-Ready Architecture: Cleanly separated modular design (
app.pyfor UI,src/for logic & API wrappers). - Persistent File Logging: Beautiful console & rolling file logging via
loguru.
page-index-chat-rag/
├── src/
│ ├── __init__.py
│ ├── logger.py # Loguru configuration for console and persistent file logging
│ └── rag_api.py # Backend logic & PageIndex API integration
├── logs/ # Auto-generated application logs
├── .env.example # Example environment variables
├── .gitignore # Comprehensive python project git ignore
├── app.py # Streamlit Frontend application
├── pyproject.toml # uv Project Configuration
├── uv.lock # Lockfile for exact dependency reproducibility
└── README.md # Project Documentation
- Python 3.11 or higher
- uv installed (
curl -LsSf https://astral.sh/uv/install.sh | shorpip install uv) - A PageIndex API Key (Get yours at Dash PageIndex)
Clone the repository and install dependencies seamlessly:
git clone https://github.com/your-username/page-index-chat-rag.git
cd page-index-chat-rag
# Sync dependencies and create an ultra-fast virtual environment
uv syncCopy the example environment variable file and add your API key:
cp .env.example .envEdit .env and insert your API key:
PAGEINDEX_API_KEY=your_pageindex_api_key_hereActivate the virtual environment and launch Streamlit:
source .venv/bin/activate
streamlit run app.pyThe interface will automatically open in your browser at http://localhost:8501.
- Upload: Users upload a PDF securely through the Streamlit UI.
- Process: The backend (
src/rag_api.py) transmits the file to PageIndex and polls the server until the document parsing is fully resolved. - Chat: Users submit highly complex, full-document reasoning queries.
- Agentic Retrieval: The underlying PageIndex model completely skips naive top-K chunking. Instead, it natively reads the unstructured file, locates exact pages implicitly, and streams back perfect, context-aware answers.
Made With ❤️ By @muhammadadeelai.