Fine tuning a YOLOv9 model.
Originally a homework assignment to students to get an understanding of the fine tuning process. (The outline of the code was originally taken from https://learnopencv.com/fine-tuning-yolov9/.)
Now it's a set of code to do the same using the UTA GPU cluster, and the UTA HPC cluster w/o GPUs (for this using technologies like Slurm, etc.).
conda env create --name yolo_fine_tune -f environment.yml
mkdir datasets
cd datasets
ln -s ../SkyFusion-YOLOv9
# otherwise, it can't find the data
# Get the data on the cluster via sFTP.
# The data is here: https://www.kaggle.com/datasets/pranavdurai/skyfusion-aerial-imagery-object-detection-dataset?resource=download
Make sure you have access to cn-1e1901.shost.uta.edu.
conda activate yolo_fine_tune
python yolov9_fine_tune.py
For UTA GPU cluster:
Log into VPN (Ivanti)
ssh -Y <netID>@cn-1e1901.shost.uta.edu
python yolov9_fine_tune.py --got_data=True
# training images are read from SkyFusion-YOLOv9/train (see DAYA_YML_PATH)
# the code will download these
# weights are saved/loaded from runs/detect/train3/weights/best.pt (see WT_PATH)
# one set of wts w/ 3 epochs of training were added. can unzip
# test images are read from data/test (see TE_IMG_BASE_PATH)
# results are written to results (see RESULT_IMG_BASE_PATH)
Compare results to what's in results/naive_model. There should be no ship or airplane detected
#in yolov9_fine_tune.py
#set GOT_DATA to False
#set DO_TRAIN to True
python yolov9_fine_tune.py
If the GPU runs out of memory, and there is more than one GPU available, try this:
CUDA_AVAILABLE_DEVICES=1; python yolov9_fine_tune.py --gpu_number=1
# GPUs are counted started at 0
Each training epoch.
This is using the the timing statements in the code, not "time <cmd>" -- that would include time to download data, etc.
On UTA GPU node, 4 A30's (24 GB of RAM):
Tensor cores: 224, Ada
CUDA Cores: 3804
Peak FP 32 TFLOPS: 5.2
More here: https://www.pny.com/nvidia-a30
Often you will get an out of memory error.
On UTA HPC head node -- the job will be killed
On UTA HPC CHECK node
No GPU, takes > 20 minutes :(
On AIS GPU server,
GeForce RTX 4070 Super (12 GB of RAM): 1 m, 33 s*
* Was getting: "WARNING: CUDA OutOfMemoryError in TaskAlignedAssigner, using CPU"
Tensor Cores: 224, Lovelace
CUDA Cores: 7168
RT Cores: 56
Peak FP32 TFLOPs: 35.4
More here: https://www.techpowerup.com/gpu-specs/geforce-rtx-4070-super.c4186, https://postperspective.com/review-nvidia-rtx-4070-super-founders-edition/
GeForce RTX 4070 (12 GB of RAM):
Tensor Cores: 184, Lovelace
CUDA Cores: 5888
RT Cores: 46
Peak FP32 TFLOPs: 29.15
More here: https://www.techpowerup.com/gpu-specs/geforce-rtx-4070.c3924
On Suhas' UTARI Dell laptop:
More here: https://www.notebookcheck.net/NVIDIA-RTX-A2000-Laptop-GPU-GPU-Benchmarks-and-Specs.532536.0.html
On SJSU Colab, T4 (16 GB of RAM, GDDR6): 1 m, 50 seconds
Tensor Cores: 320, Turing
CUDA Cores: 2560
Peak FP32 TFLOPs: 8.1
More here: https://www.pny.com/nvidia-tesla-t4
On Suhas' HP laptop, GeForce RTX 4060 (8 GB of RAM, GDDR6): 18 minutes
CUDA cores: 96
Peak FP32 TFLOPs: 15.11
More here: https://www.techpowerup.com/gpu-specs/geforce-rtx-4060.c3946
# unzip weights
cd runs/detect/train3/weights/
unzip best.pt.zip
Then go back to Test naive model
Results will be written to results
Compare (e.g., via diff) yolo_fine_tune/results and yolo_fine_tune/results/train3/
E.g.,
cd results
diff result_ship.json train3/result_ship.json
There will be some issues w/ numerical precision starting w/ the 3rd digit.
E.g., .717... v .718...
new
< "[\n {\n \"name\": \"ship\",\n \"class\": 1,\n \"confidence\": 0.71787
old
> "[\n {\n \"name\": \"ship\",\n \"class\": 1,\n \"confidence\": 0.71837
conda deactivate
conda env export --no-builds > environment.yml
Check in your code
-
Use Ivanti VPN if you are off campus
-
ssh
{netID}@hpcr8o2rnp.uta.edu -
sbatch FineTune_check.slurm- You’ll see something like this:
Submitted batch job 14580 - When it’s done, you’ll see two new files
slurm-14580.errandslurm-14580.out - Those are the output of
stderrandstdout. Examine those and other files for the status of your job, results, etc.
- You’ll see something like this:
If you need an interactive session (shell prompt) on a compute node, use command srun --partition=NAME --pty /bin/bash, where partition NAME is an available partition.
- Type
exitwhen done. Without doing this, common commands likegitwon't be there.
Partition names and availability can be found with the sinfo command. As of this email (Jan 21, 2025), we have NORMAL, LONG, SHORT, LOW and CHECK partitions. Our normal and long partition has the best hardware... NORMAL parition will run job for 8 days. LONG will run for 16 days. We also have SHORT and LOW partition... CHECK will allocate the next available compute node but is limited to 30 minutes of runtime.
- NB: the partitions are lowercase but they are written in uppercase to show that they are not the usual English words.
Nodes cannot download data, etc. from the internet. You have to transfer data onto the cluster via sFTP.
See https://go.uta.edu/hpcinfo and "HPC Users Group" on MS Teams for more tips.
sinfo shows nodes and their status
squeue -p {spartition} to see job status (can add "|grep {username}")
More here https://it.engineering.oregonstate.edu/hpc/slurm-howto
tail the stdout file