This repository contains my Mid-Term Module Assessment for the Machine Learning (CSO7013) module at St Mary's University Twickenham London.
| Student | Nevin Tom |
| Student ID | 2517238 |
| Module | CSO7013 Machine Learning |
| Assessment | Mid-Term Module Assessment |
| University | St Mary's University Twickenham London |
This project implements a Convolutional Neural Network (CNN) to detect malaria-infected blood cells from microscope images. The model classifies thin blood smear images as either:
- π¦ Parasitized β> Infected with malaria parasites
- β Uninfected β> Healthy cells
Malaria causes over 600,000 deaths annually (WHO, 2023). Diagnosis requires trained microscopists to manually examine blood samples β a significant bottleneck in resource-limited areas. This CNN achieves 96.42% accuracy and can assist in automated screening.
| Metric | Value |
|---|---|
| Accuracy | 96.42% |
| Precision | 0.9560 |
| Recall | 0.9749 |
| F1 Score | 0.9654 |
| ROC-AUC | 0.9936 |
| Model | Accuracy | Improvement |
|---|---|---|
| Logistic Regression (Baseline) | 61.96% | β |
| CNN | 96.42% | +34.46% |
git clone https://github.com/Nevvyboi/MalariaCellDetection.git
cd MalariaCellDetection# Create virtual environment
python -m venv venv
# Activate it
venv\Scripts\activate # Windows
source venv/bin/activate # Mac/Linux
# Install dependencies
pip install -r requirements.txt- Download from: Kaggle - Malaria Cell Images
- Extract and place in
data/cellImages/:
data/
βββ cellImages/
βββ Parasitized/ β 13,779 infected images
βββ Uninfected/ β 13,779 healthy images
# Quick test (3 epochs, ~2 minutes)
python run.py --quick
# Full training (35 epochs)
python run.py
# Predict on random image
python run.py --predict-randomMalariaCellDetection/
β
βββ run.py # π Main entry point
βββ README.md # π This file
βββ requirements.txt # π¦ Dependencies
βββ LICENSE # βοΈ MIT License
β
βββ src/ # π Source code
β βββ config.py # βοΈ Hyperparameters & settings
β βββ dataset.py # π Data loading & preprocessing
β βββ model.py # π§ CNN architecture
β βββ train.py # ποΈ Training loop
β βββ evaluate.py # π Evaluation & visualization
β βββ baseline.py # π Baseline model
β
βββ data/ # π Dataset (download separately)
β βββ cellImages/
β
βββ models/ # πΎ Saved model weights
β βββ bestModel.pth
β
βββ outputs/ # π Generated visualizations
βββ trainingHistory.png
βββ confusionMatrix.png
βββ rocCurve.png
Input: RGB Image (128Γ128Γ3)
β
βΌ
βββββββββββββββββββββββββββββββ
β Conv Block 1 (32 filters) β
β Conv β BatchNorm β ReLU β
β MaxPool(2Γ2) β
βββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββ
β Conv Block 2 (64 filters) β
β Conv β BatchNorm β ReLU β
β MaxPool β Dropout(0.25) β
βββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββ
β Conv Block 3 (128 filters) β
β Conv β BatchNorm β ReLU β
β MaxPool(2Γ2) β
βββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββ
β Conv Block 4 (256 filters) β
β Conv β BatchNorm β ReLU β
β MaxPool β Dropout(0.25) β
βββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββ
β Fully Connected (512) β
β ReLU β Dropout(0.5) β
βββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββ
β Output (2 classes) β
β Parasitized / Uninfected β
βββββββββββββββββββββββββββββββ
| Parameter | Value |
|---|---|
| Input Size | 128Γ128 |
| Batch Size | 64 |
| Epochs | 35 |
| Learning Rate | 0.001 |
| Optimizer | Adam |
| Loss Function | CrossEntropyLoss |
| LR Scheduler | ReduceLROnPlateau |
| Early Stopping | 7 epochs |
| Random Seed | 42 |
| Command | Description |
|---|---|
python run.py |
Full training (35 epochs) |
python run.py --quick |
Quick test (3 epochs, 2000 images) |
python run.py --epochs 50 |
Custom epochs |
python run.py --evaluate-only |
Evaluate saved model |
python run.py --predict "image.png" |
Predict single image |
python run.py --predict-random |
Predict random test image |
torch>=2.0.0
torchvision>=0.15.0
numpy>=1.24.0
matplotlib>=3.7.0
seaborn>=0.12.0
scikit-learn>=1.2.0
tqdm>=4.65.0
Pillow>=9.5.0
NIH Malaria Cell Images Dataset
| Property | Value |
|---|---|
| Source | National Institutes of Health (NIH) |
| Platform | Kaggle |
| License | Public Domain (US Government Work) |
| Total Images | 27,558 |
| Parasitized | 13,779 (50%) |
| Uninfected | 13,779 (50%) |
β "Dataset not found" error
Ensure the folder structure is correct:
data/
βββ cellImages/
βββ Parasitized/
βββ Uninfected/
β "3 classes detected" error
Delete any extra folders inside cellImages/. Only Parasitized and Uninfected should exist.
β CUDA out of memory
python run.py --batch-size 32β Training too slow
Use quick mode for testing:
python run.py --quick-
Dataset: National Library of Medicine. (2018). Malaria Cell Images Dataset. Kaggle. https://www.kaggle.com/datasets/iarunava/cell-images-for-detecting-malaria
-
Paper: Rajaraman, S., et al. (2018). Pre-trained convolutional neural networks as feature extractors toward improved malaria parasite detection in thin blood smear images. PeerJ, 6, e4568.
-
Statistics: World Health Organization. (2023). World Malaria Report 2023.
This project is licensed under the MIT License - see the LICENSE file for details.
Made with β€οΈ for CSO7013 Machine Learning
St Mary's University Twickenham London | 2025


