Skip to content

Nevvyboi/MalariaCellDetection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”¬ Malaria Cell Detection Using Convolutional Neural Networks

Python PyTorch License Accuracy

St Mary's University Twickenham London
CSO7013 Machine Learning β€” Mid-Term Module Assessment


πŸ“š About This Project

This repository contains my Mid-Term Module Assessment for the Machine Learning (CSO7013) module at St Mary's University Twickenham London.

Student Nevin Tom
Student ID 2517238
Module CSO7013 Machine Learning
Assessment Mid-Term Module Assessment
University St Mary's University Twickenham London

🎯 Project Overview

This project implements a Convolutional Neural Network (CNN) to detect malaria-infected blood cells from microscope images. The model classifies thin blood smear images as either:

  • 🦠 Parasitized β€”> Infected with malaria parasites
  • βœ… Uninfected β€”> Healthy cells

Why This Matters

Malaria causes over 600,000 deaths annually (WHO, 2023). Diagnosis requires trained microscopists to manually examine blood samples β€” a significant bottleneck in resource-limited areas. This CNN achieves 96.42% accuracy and can assist in automated screening.


πŸ“Š Results

Metric Value
Accuracy 96.42%
Precision 0.9560
Recall 0.9749
F1 Score 0.9654
ROC-AUC 0.9936

Baseline Comparison

Model Accuracy Improvement
Logistic Regression (Baseline) 61.96% β€”
CNN 96.42% +34.46%

Training & Evaluation Visualizations

Training History Confusion Matrix ROC Curve
Training Confusion ROC

πŸš€ Quick Start

1. Clone the Repository

git clone https://github.com/Nevvyboi/MalariaCellDetection.git
cd MalariaCellDetection

2. Set Up Environment

# Create virtual environment
python -m venv venv

# Activate it
venv\Scripts\activate        # Windows
source venv/bin/activate     # Mac/Linux

# Install dependencies
pip install -r requirements.txt

3. Download Dataset

  1. Download from: Kaggle - Malaria Cell Images
  2. Extract and place in data/cellImages/:
data/
└── cellImages/
    β”œβ”€β”€ Parasitized/     ← 13,779 infected images
    └── Uninfected/      ← 13,779 healthy images

4. Run the Model

# Quick test (3 epochs, ~2 minutes)
python run.py --quick

# Full training (35 epochs)
python run.py

# Predict on random image
python run.py --predict-random

πŸ“ Project Structure

MalariaCellDetection/
β”‚
β”œβ”€β”€ run.py                  # πŸš€ Main entry point
β”œβ”€β”€ README.md               # πŸ“– This file
β”œβ”€β”€ requirements.txt        # πŸ“¦ Dependencies
β”œβ”€β”€ LICENSE                 # βš–οΈ MIT License
β”‚
β”œβ”€β”€ src/                    # πŸ“ Source code
β”‚   β”œβ”€β”€ config.py           # βš™οΈ Hyperparameters & settings
β”‚   β”œβ”€β”€ dataset.py          # πŸ“Š Data loading & preprocessing
β”‚   β”œβ”€β”€ model.py            # 🧠 CNN architecture
β”‚   β”œβ”€β”€ train.py            # πŸ‹οΈ Training loop
β”‚   β”œβ”€β”€ evaluate.py         # πŸ“ˆ Evaluation & visualization
β”‚   └── baseline.py         # πŸ“‰ Baseline model
β”‚
β”œβ”€β”€ data/                   # πŸ“‚ Dataset (download separately)
β”‚   └── cellImages/
β”‚
β”œβ”€β”€ models/                 # πŸ’Ύ Saved model weights
β”‚   └── bestModel.pth
β”‚
└── outputs/                # πŸ“Š Generated visualizations
    β”œβ”€β”€ trainingHistory.png
    β”œβ”€β”€ confusionMatrix.png
    └── rocCurve.png

🧠 Model Architecture

Input: RGB Image (128Γ—128Γ—3)
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Conv Block 1 (32 filters)  β”‚
β”‚  Conv β†’ BatchNorm β†’ ReLU    β”‚
β”‚  MaxPool(2Γ—2)               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Conv Block 2 (64 filters)  β”‚
β”‚  Conv β†’ BatchNorm β†’ ReLU    β”‚
β”‚  MaxPool β†’ Dropout(0.25)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Conv Block 3 (128 filters) β”‚
β”‚  Conv β†’ BatchNorm β†’ ReLU    β”‚
β”‚  MaxPool(2Γ—2)               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Conv Block 4 (256 filters) β”‚
β”‚  Conv β†’ BatchNorm β†’ ReLU    β”‚
β”‚  MaxPool β†’ Dropout(0.25)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Fully Connected (512)      β”‚
β”‚  ReLU β†’ Dropout(0.5)        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Output (2 classes)         β”‚
β”‚  Parasitized / Uninfected   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Hyperparameters

Parameter Value
Input Size 128Γ—128
Batch Size 64
Epochs 35
Learning Rate 0.001
Optimizer Adam
Loss Function CrossEntropyLoss
LR Scheduler ReduceLROnPlateau
Early Stopping 7 epochs
Random Seed 42

πŸ’» Command Line Options

Command Description
python run.py Full training (35 epochs)
python run.py --quick Quick test (3 epochs, 2000 images)
python run.py --epochs 50 Custom epochs
python run.py --evaluate-only Evaluate saved model
python run.py --predict "image.png" Predict single image
python run.py --predict-random Predict random test image

πŸ“‹ Requirements

torch>=2.0.0
torchvision>=0.15.0
numpy>=1.24.0
matplotlib>=3.7.0
seaborn>=0.12.0
scikit-learn>=1.2.0
tqdm>=4.65.0
Pillow>=9.5.0

πŸ“– Dataset

NIH Malaria Cell Images Dataset

Property Value
Source National Institutes of Health (NIH)
Platform Kaggle
License Public Domain (US Government Work)
Total Images 27,558
Parasitized 13,779 (50%)
Uninfected 13,779 (50%)

πŸ”§ Troubleshooting

❌ "Dataset not found" error

Ensure the folder structure is correct:

data/
└── cellImages/
    β”œβ”€β”€ Parasitized/
    └── Uninfected/
❌ "3 classes detected" error

Delete any extra folders inside cellImages/. Only Parasitized and Uninfected should exist.

❌ CUDA out of memory
python run.py --batch-size 32
❌ Training too slow

Use quick mode for testing:

python run.py --quick

πŸ“š References

  1. Dataset: National Library of Medicine. (2018). Malaria Cell Images Dataset. Kaggle. https://www.kaggle.com/datasets/iarunava/cell-images-for-detecting-malaria

  2. Paper: Rajaraman, S., et al. (2018). Pre-trained convolutional neural networks as feature extractors toward improved malaria parasite detection in thin blood smear images. PeerJ, 6, e4568.

  3. Statistics: World Health Organization. (2023). World Malaria Report 2023.


πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


Made with ❀️ for CSO7013 Machine Learning

St Mary's University Twickenham London | 2025

About

πŸ”¬ CNN-based malaria detection from blood cell images | 96.42% accuracy | CSO7013 Machine Learning Mid-Term Assessment

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages