RAGPickerz

RAGPickerz is a Retrieval-Augmented Generation (RAG) system designed for document analysis and question answering. It was developed for the Bajaj Finserv Hackathon.

Features

Document Loading: Supports PDF and DOCX file formats
Vector Store: Uses FAISS for efficient document embeddings and similarity search
Semantic Chunking: Splits documents into meaningful chunks using semantic analysis
Question Answering: Processes questions and retrieves relevant answers from documents
Batch Processing: Handles multiple questions concurrently
FastAPI Integration: RESTful API for easy integration

Installation

Clone the repository:

git clone https://github.com/iamanimeshdev/RAGPickerz.git
cd RAGPickerz

Install dependencies:

pip install -r requirements.txt

Set up environment variables:

cp .env.example .env
# Edit .env with your configuration

Usage

Running the API Server

# Start the FastAPI server
uvicorn app.main:app --reload

The API will be available at http://localhost:8000

Making API Requests

Endpoint: `/api/v1/hackrx/run`

Method: POST Content-Type: multipart/form-data

Parameters:

questions: List of questions (JSON array)
file: Document file (PDF or DOCX)

Example Request:

curl -X POST "http://localhost:8000/api/v1/hackrx/run" \
  -F "questions=[\"What is the policy coverage?\", \"What are the exclusions?\"]" \
  -F "file=@document.pdf"

Response:

{
  "answers": [
    "The policy covers medical expenses up to 5 lakhs.",
    "Pre-existing conditions are excluded from coverage."
  ]
}

Running the Pipeline Directly

python rag_pipeline/main.py

Configuration

The system uses the following configuration:

Vector Database Path: .\faiss_index
Chunk Size: 350 characters
Chunk Overlap: 50 characters
Embedding Model: sentence-transformers/all-MiniLM-L6-v2
LLM Model: gemma-3n-e2b-it

File Structure

RAGPickerz/
├── app/
│   ├── __init__.py
│   ├── main.py              # FastAPI application entry point
│   └── routers/
│       └── hackrx.py        # API router for HackRx endpoints
├── faiss_index/             # Directory for FAISS vector store
├── rag_pipeline/
│   ├── __init__.py
│   ├── config.py           # Configuration and model setup
│   ├── document_loader.py  # Document loading utilities
│   ├── embedder.py         # Vector store building and management
│   ├── query_pipeline.py   # Query processing pipeline
│   ├── retriever.py        # Document retrieval from vector store
│   └── main.py             # Main pipeline execution
├── requirements.txt         # Python dependencies
├── readme.md               # This file
└── .gitignore              # Git ignore rules

Supported Document Formats

PDF (.pdf)
Word Documents (.docx)

Performance

Document Loading: Optimized for large documents
Vector Indexing: FAISS provides fast similarity search
Concurrent Processing: Handles multiple questions simultaneously
Caching: Vector store is persisted to disk for reuse

Error Handling

The system includes comprehensive error handling:

Unsupported file types
Empty documents
Rate limiting for API requests
Server errors with descriptive messages

Development

Prerequisites

Python 3.8+
pip
A Google AI Platform account (for LLM access)

Environment Variables

Create a .env file with the following:

# Google AI Platform credentials
GOOGLE_API_KEY=your_api_key_here

Testing

# Run the test suite
python -m pytest tests/

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

Support

For issues and questions, please open an issue on the GitHub repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAGPickerz

Features

Installation

Usage

Running the API Server

Making API Requests

Endpoint: `/api/v1/hackrx/run`

Running the Pipeline Directly

Configuration

File Structure

Supported Document Formats

Performance

Error Handling

Development

Prerequisites

Environment Variables

Testing

License

Contributing

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
app		app
faiss_index		faiss_index
rag_pipeline		rag_pipeline
.gitignore		.gitignore
Dockerfile		Dockerfile
readme.md		readme.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

RAGPickerz

Features

Installation

Usage

Running the API Server

Making API Requests

Endpoint: /api/v1/hackrx/run

Running the Pipeline Directly

Configuration

File Structure

Supported Document Formats

Performance

Error Handling

Development

Prerequisites

Environment Variables

Testing

License

Contributing

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Endpoint: `/api/v1/hackrx/run`

Packages