A comprehensive Retrieval-Augmented Generation (RAG) chatbot system for Singapore Institute of Technology (SIT), featuring hybrid search capabilities with FAISS vector similarity and BM25 keyword matching, powered by OpenAI GPT and ElevenLabs voice integration.
- Intelligent RAG System: Hybrid search combining FAISS vector similarity and BM25 keyword matching
- Voice Integration: Speech-to-Text and Text-to-Speech using ElevenLabs APIs
- Interactive Frontend: React-based chat interface with animated otter mascot
- Dual Input Modes: Support for both text and voice interactions
- Real-time Processing: Fast query processing with optimized vector database
- SIT-Specific Knowledge: Trained on SIT course information and institutional data
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Frontend β β Backend β β RAG System β
β (React/JS) βββββΊβ (Node.js) βββββΊβ (Python) β
β Port 3000 β β Port 3000 β β Port 8000 β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β β β
β β β
βΌ βΌ βΌ
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Voice APIs β β File Upload β β Vector DB β
β (ElevenLabs) β β (Multer) β β (LanceDB) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
Before running the system, ensure you have the following installed:
- Node.js (v16 or higher)
- Python (v3.8 or higher)
- FFmpeg (for audio processing)
- OpenAI API Key: For GPT-based response generation
- ElevenLabs API Key: For speech-to-text and text-to-speech functionality
git clone https://github.com/Finance-LLMs/SIT-Chatbot-RAG.git
cd SIT-Chatbot-RAG- Download the database from Google Drive: LanceDB Database
- Extract the zip file
- Copy the
datafolder from the extracted files - Place it in the root directory
Your structure should look like:
SIT-Chatbot-RAG/
βββ data/
β βββ vector-index-lancedb/
β βββ bm25_index.lance/
β βββ faiss_index.lance/
βββ SIT-chatbot-main/
β βββ backend/
β βββ src/
β βββ ...
βββ SITCHATBOTLLM/
β βββ server.py
β βββ requirements.txt
β βββ ...
βββ ...
cd SITCHATBOTLLM
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Set up environment variables
cp .env.example .env # Create if doesn't exist
# Edit .env and add your OpenAI API keycd SIT-chatbot-main
# Install Node.js dependencies
npm install
# Set up environment variables
cd backend
cp .env.example .env # Create if doesn't exist
# Edit .env and add your API keysCreate .env files with the following structure:
OPENAI_API_KEY=your_openai_api_key_here
ELEVENLABS_API_KEY=your_elevenlabs_api_key_hereOPENAI_API_KEY=your_openai_api_key_hereThe system requires three components to run simultaneously:
cd SITCHATBOTLLM
python server.py- Runs on:
http://localhost:8000 - Provides: Chat completions endpoint with RAG functionality
cd SIT-chatbot-main/backend
node server.js- Runs on:
http://localhost:3000 - Provides: Frontend serving, API proxy, voice processing
cd SIT-chatbot-main
npm run build # Build the frontendOpen your browser and navigate to: http://localhost:3000
- Type your question in the chat input
- Click "Send" or press Enter
- Receive AI-powered responses based on SIT knowledge base
- Click the microphone icon to start recording
- Speak your question clearly
- The system will transcribe, process, and respond with both text and voice
- "What courses does SIT offer in cloud computing?"
- "Tell me about the admission requirements"
- "What are the fees for the DevOps program?"
- "How long is the Cloud Computing course?"
- Storage: LanceDB with 4,076+ indexed documents
- Search: Hybrid FAISS + BM25 for optimal retrieval
- Content: SIT course catalogs, program information, policies
- STT Model: ElevenLabs Scribe v1 with English language optimization
- TTS Model: ElevenLabs Turbo v2.5 for natural speech synthesis
- Audio Optimization: 16kHz, mono, PCM format using FFmpeg
POST /api/chat- Chat completions (proxy to RAG system)POST /api/speech-to-text- Convert audio to textPOST /api/text-to-speech- Convert text to audio
SIT-Chatbot-RAG/
βββ README.md # This file
βββ SIT-chatbot-main/ # Frontend & API Server
β βββ backend/
β β βββ server.js # Node.js Express server
β β βββ .env # API keys
β β βββ uploads/ # Temporary audio files
β βββ src/
β β βββ app.js # React frontend logic
β β βββ index.html # Main HTML file
β β βββ styles.css # Styling
β βββ dist/ # Built frontend files
β βββ package.json # Node.js dependencies
β βββ webpack.config.js # Build configuration
βββ SITCHATBOTLLM/ # RAG Backend
β βββ server.py # FastAPI RAG server
β βββ bm25_chunk_search.py # Search implementation
β βββ requirements.txt # Python dependencies
β βββ .env # OpenAI API key
βββ data/ # Additional data files
βββ vector-index-lancedb/ # LanceDB storage
This project is licensed under the MIT License.