RAGPickerz is a Retrieval-Augmented Generation (RAG) system designed for document analysis and question answering. It was developed for the Bajaj Finserv Hackathon.
- Document Loading: Supports PDF and DOCX file formats
- Vector Store: Uses FAISS for efficient document embeddings and similarity search
- Semantic Chunking: Splits documents into meaningful chunks using semantic analysis
- Question Answering: Processes questions and retrieves relevant answers from documents
- Batch Processing: Handles multiple questions concurrently
- FastAPI Integration: RESTful API for easy integration
- Clone the repository:
git clone https://github.com/iamanimeshdev/RAGPickerz.git
cd RAGPickerz- Install dependencies:
pip install -r requirements.txt- Set up environment variables:
cp .env.example .env
# Edit .env with your configuration# Start the FastAPI server
uvicorn app.main:app --reloadThe API will be available at http://localhost:8000
Method: POST Content-Type: multipart/form-data
Parameters:
questions: List of questions (JSON array)file: Document file (PDF or DOCX)
Example Request:
curl -X POST "http://localhost:8000/api/v1/hackrx/run" \
-F "questions=[\"What is the policy coverage?\", \"What are the exclusions?\"]" \
-F "file=@document.pdf"Response:
{
"answers": [
"The policy covers medical expenses up to 5 lakhs.",
"Pre-existing conditions are excluded from coverage."
]
}python rag_pipeline/main.pyThe system uses the following configuration:
- Vector Database Path:
.\faiss_index - Chunk Size: 350 characters
- Chunk Overlap: 50 characters
- Embedding Model:
sentence-transformers/all-MiniLM-L6-v2 - LLM Model:
gemma-3n-e2b-it
RAGPickerz/
├── app/
│ ├── __init__.py
│ ├── main.py # FastAPI application entry point
│ └── routers/
│ └── hackrx.py # API router for HackRx endpoints
├── faiss_index/ # Directory for FAISS vector store
├── rag_pipeline/
│ ├── __init__.py
│ ├── config.py # Configuration and model setup
│ ├── document_loader.py # Document loading utilities
│ ├── embedder.py # Vector store building and management
│ ├── query_pipeline.py # Query processing pipeline
│ ├── retriever.py # Document retrieval from vector store
│ └── main.py # Main pipeline execution
├── requirements.txt # Python dependencies
├── readme.md # This file
└── .gitignore # Git ignore rules
- PDF (.pdf)
- Word Documents (.docx)
- Document Loading: Optimized for large documents
- Vector Indexing: FAISS provides fast similarity search
- Concurrent Processing: Handles multiple questions simultaneously
- Caching: Vector store is persisted to disk for reuse
The system includes comprehensive error handling:
- Unsupported file types
- Empty documents
- Rate limiting for API requests
- Server errors with descriptive messages
- Python 3.8+
- pip
- A Google AI Platform account (for LLM access)
Create a .env file with the following:
# Google AI Platform credentials
GOOGLE_API_KEY=your_api_key_here
# Run the test suite
python -m pytest tests/This project is licensed under the MIT License - see the LICENSE file for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
For issues and questions, please open an issue on the GitHub repository.