Classification of skin lesions (among 7 classes) using the file https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DBW86T and using the pytorch resnet model. The success rate for the specific test file (unseen data) that comes with the download file is 81.13%.
According to the specifications of the download file, the 7 types of injuries to be detected are:
akiec : Actinic keratoses and intraepithelial carcinoma / Bowen’s disease
bkl : benign keratosis-like lesions (solar lentigines / seborrheic keratoses and lichen-planus like keratoses
bcc: basal cell carcinoma
df: dermatofibroma
mel: melanoma
nv: melanocytic nevi
vasc: vascular lesions (angiomas, angiokeratomas, pyogenic granulomas and hemorrhage
INSTALLATION:
All packages, if any are missing, can be installed with a simple pip in case the programs indicate their absence in the environment.
If not yet installed, this packages are:
pip install numpy
pip install pandas
pip install keras
pip install tensorflow
pip install opencv-python
pip install scikit-learn
pip install torch
pip install torchvision
Download all the files that accompany this project in a single folder.
By downloading the file from https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DBW86T in the directory where the project is located, a file called dataverse_files.zip is obtained, which once decompressed as dataverse_files contains, among others, the files HAM10000_images_part1.zip and HAM10000_images_part2.zip, which once unzipped must be unified into a single HAM10000_images folder (through a simple copy and paste) in the same dataverse_files directory
In that folder: dataverse_files, the file ISIC2018_Task3_Test_Images.zip must be descompressed, which produces two nested directories named each one ISIC2018_Task3_Test_Images with 1115 images to test
Next, the structure necessary for the operation of resnet pytorch is created, consisting of a folder Dir_SkinCancer_Resnet_Pytorch from which a folder called train and another called valid hang, each with a subfolder for each of the 7 classes, by executing:
python Create_DirSkinCancer_Resnet_Pytorch.py
This structure is then filled from the images contained in dataverse_files\HAM10000_images and following the order indicated in the file dataverse_files\HAM10000_metadata, by executing:
python Fill_DirSkinCancer_Resnet_Pytorch.py
Next, the structure necessary for the operation of resnet pytorch is created with the specific test file, consisting of a folder Dir_Test_SkinCancer_Resnet_Pytorch from which a folder called test hangs with subfolders for each of the 7 classes, by executing:
python Create_Test_DirSkinCancer_Resnet_Pytorch.py
This structure is then filled from the images contained in dataverse_files\ISIC2018_Task3_Test_Images\ISIC2018_Task3_Test_Images and following the order indicated in the file dataverse_files\ISIC2018_Task3_Test_GroundTruth.csv, by executing:
python Fill_Test_DirSkinCancer_Resnet_Pytorch.py
To avoid resnet errors, when you find a valid folder in which one of its subfolders does not have images, unzip the attached valid.zip and copy the resulting valid folder (be careful there may be two nested valid ones, only consider the last one) over the folder Dir_SkinCancer_Resnet_Pytorch overwriting the old one, this way you ensure that all valid subfolders have at least some image.
TRAINING:
The model is trained and obtained by executing:
python Train224x224_SkinCancer_Resnet_Pytorch.py
The execution log is attached as a file LOG_TrainSkinCancer_10epoch.txt and as a result the model checkpoint_SkinCancer_epoch.pth is obtained (it is not attached because its size exceeds the file size limit that can be uploaded to github)
To enhance the training developed on a personal computer, a training program has been implemented to continue the training using a model obtained in a previous step.
python CONTrain224x224_SkinCancer_Resnet_Pytorch.py
which continues the training based on the checkpoint_SkinCancer_epoch.pth model obtained in a previous step.
The training log with 40 epochs is recorded in the document LOG40epoch.txt
Next, the model obtained: checkpoint_SkinCancer_epoch.pth, is tested
EVALUATION:
python Guess_Test_224x224SkinCancer_Resnet_Pytorch.py
Who performs the test
The log of its execution is attached as LOG_Test_SkinCancer.txt
The screen indicates the successes and failures, giving a success rate of 81.13%
To obtain predictions on images for which the skin lesion classification is not known and which are assumed to be in a folder called Test within the project, the program will be executed:
Recognize_SkinCancer_Resnet_Pytorch.py
Through the console, the prediction is obtained for each image and also in the output file
ModelsResults.txt
The success rate may change if images of different quality than those used in the training process are tested. The program comes prepared to test 8 images (Test folder attached that must be unzipped) downloaded from https://www.skincancer.org/es/skin-cancer-information/skin-cancer-pictures/ and interpretation of the accuracy or approximation would require an expert. By modifying the path in line 90 of the program, you can test the set of images you want.
CONCLUSIONS:
The model obtained is only suitable for detecting benign skin lesions (nv: melanocytic nevi), with a hit rate of 94.93% :862 correct images, 46 false negatives, and 144 false positives. This is the class with the largest number of images for training and testing.
For the other lesions, the hit rate is much lower and unacceptable.
Therefore, this project requires further improvement.
March 24, 2026
I have tested the generally successful and fast feature extraction procedure using a CNN followed by an SVM, adapting the project found at https://www.kaggle.com/code/saadmohamed99/plant-disease-classification. I used the following programs:
Train_SkinLesions_FeaturesExtracted_SVM.py for training
and
Test_SkinLesions_FeaturesExtracted_SVM.py for testing with data not used in the training procedure (unseen data).
The results do not improve, as can be seen in the attached log file: LOG_SkinLesions_FeaturesExtracted_SVM.doc
References:
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DBW86T
https://www.kaggle.com/code/hadeerismail/skin-cancer-prediction-cnn-acc-98 I have tried to adapt this project in other cases, but it shows overfitting, there are poor results when trying to test the model with unseen data
https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000
https://www.skincancer.org/es/skin-cancer-information/skin-cancer-pictures/
https://github.com/ablanco1950/SkinLessionClassification_Yolo26
The results are similar to those obtained at: https://www.kaggle.com/code/ajayrajparashar/skin-cancer-detection-using-cnn-and-vit-ham10000
https://www.kaggle.com/code/saadmohamed99/plant-disease-classification