Retrieval-based Voice Conversion - A comprehensive training system with web interface for custom voice model training.
This project is based on PolTrain by Politrees
Side project of RVC Starter
# Install dependencies
pip install gradio torch torchaudio
# Launch WebUI
python webui/launch.pyThen open your browser to http://127.0.0.1:7860
Choose from a wide range of optimization algorithms:
- AdamW - Default, stable
- Adam - Classic adaptive
- AdaBelief - Fast convergence
- AdaBeliefV2 - With AMSGrad
- Adafactor - Memory efficient
- AMSGrad - Prevents oscillations
- SGD - With Nesterov momentum
- RAdam - Rectified Adam
- Lion - Sign-based, memory efficient
- AdamP - Better generalization
- Sophia - Second-order clipping
Complete web interface with 6 tabs:
| Tab | Description |
|---|---|
| 🏠 Home | Dashboard, system info, quick start guide |
| 📊 Data Preprocessing | Audio processing, feature extraction |
| ⚙️ Training Config | Model architecture, hyperparameters |
| 🚀 Model Training | Real-time training, monitoring |
| 🎵 Voice Conversion | Inference, pitch adjustment |
| 📁 Model Management | Export, delete, organize models |
VCTrain/
├── webui/ # Gradio WebUI
│ ├── app.py # Main application
│ ├── launch.py # Launcher script
│ ├── requirements.txt # Dependencies
│ ├── README.md # WebUI documentation
│ └── tabs/ # Tab modules
│ ├── home_tab.py
│ ├── data_preprocessing_tab.py
│ ├── training_config_tab.py
│ ├── model_training_tab.py
│ ├── inference_tab.py
│ └── model_management_tab.py
│
├── rvc/
│ ├── train/
│ │ ├── train.py # Training script
│ │ ├── utils/
│ │ │ └── optimizers/ # 11 optimizer implementations
│ │ ├── preprocess/ # Data preprocessing
│ │ └── ...
│ ├── lib/ # Core libraries
│ └── configs/ # Configuration files
│
├── experiments/ # Training outputs (created automatically)
└── logs/ # Training logs
python rvc/train/train.py \
--experiment_dir "experiments" \
--model_name "my_voice" \
--optimizer "AdamW" \
--total_epoch 300 \
--batch_size 8 \
--sample_rate 48000 \
--gpus "0"# Default
python webui/launch.py
# Custom port
python webui/launch.py --port 7861
# Public share link
python webui/launch.py --share
# With authentication
python webui/launch.py --auth username:password| Optimizer | Learning Rate | Best For |
|---|---|---|
| AdamW | 1e-4 | Default choice |
| AdaBelief | 1e-4 | Fast convergence |
| Adafactor | Auto | Low VRAM |
| Lion | 1e-5 | Memory efficient |
| Sophia | 5e-5 | Stable training |
| Optimizer | Speed | Quality | VRAM Usage |
|---|---|---|---|
| AdamW | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ |
| AdaBelief | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐ |
| Adafactor | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ |
| Lion | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ |
| Sophia | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐ |
- Prepare Data → Collect clean audio files (WAV, 32kHz+)
- Preprocess → Use Data Preprocessing tab
- Configure → Set parameters in Training Config tab
- Train → Start training in Model Training tab
- Convert → Use trained model in Voice Conversion tab
- Use clean audio without background noise
- Minimum 10 minutes of speech recommended
- Consistent volume levels
- Remove silence and breaths
- Start with 100 epochs for testing
- Use 300+ epochs for production
- Monitor loss values (should decrease)
- Target mel similarity: 70%+
- Use Adafactor for low VRAM
- Reduce batch size
- Enable gradient checkpointing
- Use FP16 mixed precision
- Python: 3.8+
- PyTorch: 2.0+
- GPU: CUDA 11.7+ (optional, CPU supported)
- RAM: 8GB+ recommended
torch>=2.0.0
torchaudio>=2.0.0
gradio>=4.0.0
librosa>=0.10.0
tensorboard>=2.13.0- WebUI README - Detailed WebUI documentation
- Optimizers README - Optimizer guide
- PolTrain - Base project
- RVC - Voice conversion technology
- Gradio - Web UI framework
- PyTorch - Deep learning framework
Same license as the original PolTrain project.
Happy Training! 🎤