A beautiful, self-hosted AI assistant powered by Qwen2-0.5B-Instruct with a modern React frontend and secure Python FastAPI backend.
-H "Content-Type: application/json" \
-H "X-API-Key: your-secure-api-key-here" \
-d '{"prompt": "Test with valid key"}'
- Modern UI: Beautiful Tailwind CSS React frontend with animations and responsive design
- Self-Hosted AI: Uses Qwen2-0.5B-Instruct model for local AI inference
- Secure: API key authentication and IP whitelisting
- Production Ready: Includes deployment scripts with Fabric
- Real-time Chat: Interactive chat interface with loading animations
- Copy to Clipboard: Easy copying of AI responses
- Python 3.8+
- Node.js 16+
- Git
- CUDA-compatible GPU (optional, for faster inference)
git clone https://github.com/yourusername/self-hosted-budget-ai-api.git
cd self-hosted-budget-ai-apicd backend
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# The model will be automatically downloaded on first run
# This may take several minutes depending on your internet connectioncd ../frontend
# Install dependencies
npm install
# Build for production (optional)
npm run buildThe backend uses environment variables and configuration files:
- Environment Variables (
.envfile):
DEV_MODE=true
API_KEYS_FILE=config/api_keys.txt
WHITELIST_FILE=config/whitelist.txt
MODEL_CACHE_DIR=models
MAX_NEW_TOKENS=512
TEMPERATURE=0.7
HOST=0.0.0.0
PORT=8000- API Keys (
config/api_keys.txt):
demo-key-12345
your-secure-api-key-here
- IP Whitelist (
config/whitelist.txt):
127.0.0.1
::1
192.168.1.0/24
10.0.0.0/8
The frontend automatically connects to http://localhost:8000 in development mode.
- Start the Backend:
cd backend
source venv/bin/activate
python -m app.main- Start the Frontend (in a new terminal):
cd frontend
npm run dev- Access the Application:
- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
Use the provided Fabric deployment scripts:
cd backend
fab setup --host=your-server.com --user=deploy
fab deploy --host=your-server.com --user=deployPOST /api/generate
Headers:
Content-Type: application/jsonX-API-Key: your-api-key
Request Body:
{
"prompt": "Your question or prompt here"
}Response:
{
"response": "AI generated response"
}This application uses Qwen2-0.5B-Instruct, a compact yet powerful language model:
- Size: ~500MB
- Context Length: 32K tokens
- Languages: Multilingual support
- Performance: Optimized for efficiency and speed
The model will be automatically downloaded on first run to the models/ directory. This includes:
- Model weights
- Tokenizer files
- Configuration files
Download Size: ~500MB Disk Space Required: ~1GB (including cache)
- API Key Authentication: All requests require valid API keys
- IP Whitelisting: Restrict access to specific IP addresses/ranges
- Development Mode: Automatic localhost access in dev mode
- CORS Protection: Configured for secure cross-origin requests
- Initial Server Setup:
fab setup --host=your-server.com --user=deploy- Deploy Application:
fab deploy --host=your-server.com --user=deploy- Available Commands:
fab status # Check application status
fab logs # View application logs
fab rollback # Rollback to previous version
fab backup_config # Backup configuration files-
Server Requirements:
- Ubuntu 20.04+ or similar
- Python 3.8+
- Node.js 16+
- Nginx
- PM2 (for process management)
-
Setup Steps:
- Clone repository to
/var/www/self-hosted-budget-ai-api - Install dependencies
- Configure Nginx reverse proxy
- Start services with PM2
- Clone repository to
- Modern Design: Glassmorphism UI with gradient backgrounds
- Responsive: Works on desktop, tablet, and mobile
- Animations: Smooth transitions with Framer Motion
- Dark Theme: Beautiful dark theme with purple/cyan accents
- Real-time Chat: Interactive chat interface
- Loading States: Animated loading indicators
- Error Handling: User-friendly error messages
- Copy Functionality: One-click copying of AI responses
self-hosted-budget-ai-api/
βββ backend/
β βββ app/
β β βββ main.py # FastAPI application
β β βββ models.py # AI model handling
β β βββ auth.py # Authentication
β β βββ config.py # Configuration
β βββ config/ # Configuration files
β βββ deploy/ # Deployment scripts
β βββ requirements.txt # Python dependencies
βββ frontend/
β βββ src/
β β βββ App.jsx # Main React component
β β βββ main.jsx # React entry point
β βββ package.json # Node.js dependencies
β βββ tailwind.config.js # Tailwind configuration
βββ nginx/ # Nginx configuration
- Backend: Add new endpoints in
app/main.py - Frontend: Modify
src/App.jsxfor UI changes - Styling: Use Tailwind CSS classes
- Deployment: Update Fabric scripts as needed
-
Model Download Fails:
- Check internet connection
- Ensure sufficient disk space
- Try clearing the
models/directory
-
CUDA Out of Memory:
- Reduce
MAX_NEW_TOKENSin.env - Use CPU inference by setting
CUDA_VISIBLE_DEVICES=""
- Reduce
-
Frontend Build Fails:
- Clear
node_modulesand reinstall:rm -rf node_modules && npm install - Check Node.js version compatibility
- Clear
-
API Authentication Errors:
- Verify API key in
config/api_keys.txt - Check IP whitelist in
config/whitelist.txt
- Verify API key in
- GPU Acceleration: Ensure CUDA is properly installed
- Model Caching: Keep the
models/directory for faster startup - Memory Management: Monitor system resources during inference
This project is licensed under the MIT License - see the LICENSE file for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
For issues and questions:
- Create an issue on GitHub
- Check the troubleshooting section
- Review the API documentation
- Qwen Team: For the excellent Qwen2-0.5B-Instruct model
- Hugging Face: For the transformers library
- FastAPI: For the amazing web framework
- React & Tailwind: For the beautiful frontend stack