Skip to content

BF667-IDLE/VCTrain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

477 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎤 RVC Training WebUI (VCTrain)

Retrieval-based Voice Conversion - A comprehensive training system with web interface for custom voice model training.

This project is based on PolTrain by Politrees
Side project of RVC Starter


🚀 Quick Start

Launch the WebUI

# Install dependencies
pip install gradio torch torchaudio

# Launch WebUI
python webui/launch.py

Then open your browser to http://127.0.0.1:7860


📋 Features

🎯 11+ Optimizers

Choose from a wide range of optimization algorithms:

  • AdamW - Default, stable
  • Adam - Classic adaptive
  • AdaBelief - Fast convergence
  • AdaBeliefV2 - With AMSGrad
  • Adafactor - Memory efficient
  • AMSGrad - Prevents oscillations
  • SGD - With Nesterov momentum
  • RAdam - Rectified Adam
  • Lion - Sign-based, memory efficient
  • AdamP - Better generalization
  • Sophia - Second-order clipping

🌐 Gradio WebUI

Complete web interface with 6 tabs:

Tab Description
🏠 Home Dashboard, system info, quick start guide
📊 Data Preprocessing Audio processing, feature extraction
⚙️ Training Config Model architecture, hyperparameters
🚀 Model Training Real-time training, monitoring
🎵 Voice Conversion Inference, pitch adjustment
📁 Model Management Export, delete, organize models

📁 Project Structure

VCTrain/
├── webui/                      # Gradio WebUI
│   ├── app.py                  # Main application
│   ├── launch.py               # Launcher script
│   ├── requirements.txt        # Dependencies
│   ├── README.md               # WebUI documentation
│   └── tabs/                   # Tab modules
│       ├── home_tab.py
│       ├── data_preprocessing_tab.py
│       ├── training_config_tab.py
│       ├── model_training_tab.py
│       ├── inference_tab.py
│       └── model_management_tab.py
│
├── rvc/
│   ├── train/
│   │   ├── train.py            # Training script
│   │   ├── utils/
│   │   │   └── optimizers/     # 11 optimizer implementations
│   │   ├── preprocess/         # Data preprocessing
│   │   └── ...
│   ├── lib/                    # Core libraries
│   └── configs/                # Configuration files
│
├── experiments/                # Training outputs (created automatically)
└── logs/                       # Training logs

🔧 Usage

Command Line Training

python rvc/train/train.py \
  --experiment_dir "experiments" \
  --model_name "my_voice" \
  --optimizer "AdamW" \
  --total_epoch 300 \
  --batch_size 8 \
  --sample_rate 48000 \
  --gpus "0"

WebUI Launch Options

# Default
python webui/launch.py

# Custom port
python webui/launch.py --port 7861

# Public share link
python webui/launch.py --share

# With authentication
python webui/launch.py --auth username:password

⚙️ Optimizer Guide

Recommended Settings

Optimizer Learning Rate Best For
AdamW 1e-4 Default choice
AdaBelief 1e-4 Fast convergence
Adafactor Auto Low VRAM
Lion 1e-5 Memory efficient
Sophia 5e-5 Stable training

Performance Comparison

Optimizer Speed Quality VRAM Usage
AdamW ⭐⭐⭐ ⭐⭐⭐ ⭐⭐
AdaBelief ⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐
Adafactor ⭐⭐ ⭐⭐⭐ ⭐⭐⭐⭐
Lion ⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐
Sophia ⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐

📊 Training Workflow

  1. Prepare Data → Collect clean audio files (WAV, 32kHz+)
  2. Preprocess → Use Data Preprocessing tab
  3. Configure → Set parameters in Training Config tab
  4. Train → Start training in Model Training tab
  5. Convert → Use trained model in Voice Conversion tab

💡 Tips

Dataset Quality

  • Use clean audio without background noise
  • Minimum 10 minutes of speech recommended
  • Consistent volume levels
  • Remove silence and breaths

Training

  • Start with 100 epochs for testing
  • Use 300+ epochs for production
  • Monitor loss values (should decrease)
  • Target mel similarity: 70%+

VRAM Optimization

  • Use Adafactor for low VRAM
  • Reduce batch size
  • Enable gradient checkpointing
  • Use FP16 mixed precision

🛠️ Requirements

  • Python: 3.8+
  • PyTorch: 2.0+
  • GPU: CUDA 11.7+ (optional, CPU supported)
  • RAM: 8GB+ recommended

Core Dependencies

torch>=2.0.0
torchaudio>=2.0.0
gradio>=4.0.0
librosa>=0.10.0
tensorboard>=2.13.0

📚 Documentation


🙏 Acknowledgments

  • PolTrain - Base project
  • RVC - Voice conversion technology
  • Gradio - Web UI framework
  • PyTorch - Deep learning framework

📝 License

Same license as the original PolTrain project.


Happy Training! 🎤

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Contributors

Languages