🎨 HISTCOLORIFUL: Colorizing Grayscale Images Using Conditional GANs

A comprehensive capstone project comparing classical color transfer techniques against custom GAN architecture for automatic image colorization

Maintained by Mohammed Azeezulla (zeeza18) · Introduction to Image Processing

📋 Table of Contents

✨ Overview
🎯 Key Features
🏗️ Architecture
🛠️ Technologies Used
📁 Project Structure
⚡ Quick Start
🧪 Running Experiments
🚀 FastAPI Demo
📊 Results
📈 Performance Metrics
🎓 Academic Context
🤝 Contributing
📝 License

✨ Overview

HISTCOLORIFUL is a comprehensive image colorization research project that answers a fundamental question: How far can classical computer vision methods go before deep learning becomes necessary?

Built on a carefully curated 256×256 LAB subset of the COCO dataset with 10 held-out test images, this project systematically compares:

🔵 Classical Methods: Histogram Matching, K-means Color Transfer, Local Gaussian Transfer
🔴 Deep Learning: Custom Pix2Pix GAN with U-Net Generator + PatchGAN Discriminator

🎯 Research Questions

How do interpretable classical algorithms perform vs. trainable GANs?
What hyperparameters optimize GAN colorization quality?
Can we quantify the speed vs. quality tradeoff?
What are the failure modes of each approach?

🏆 Key Findings

Method	PSNR (dB)	SSIM	Speed	Reference Required
K-means Transfer 👑	22.29	0.8908	Slow	✅ Yes
GAN (λ=50) 🤖	20.56	0.8397	Very Fast	❌ No
Histogram Matching	21.85	0.8742	Fast	✅ Yes
Local Gaussian	21.43	0.8621	Fast	✅ Yes

🎯 Key Features

🔬 Classical Computer Vision

✅ Histogram Matching - CDF alignment per LAB channel
✅ K-means Color Transfer - 8-cluster palette mapping
✅ Local Gaussian Transfer - Windowed statistics matching
✅ GPU-accelerated NumPy operations
✅ Per-image metric logging

🧠 Deep Learning Pipeline

✅ Pix2Pix GAN Architecture - U-Net generator with skip connections
✅ PatchGAN Discriminator - 70x70 receptive field
✅ Mixed Precision Training - FP16 for faster convergence
✅ Comprehensive Ablation Studies - 6 hyperparameter configurations
✅ Checkpoint Management - Best model serialization

📊 Evaluation Framework

✅ PSNR & SSIM Metrics - Quantitative quality assessment
✅ Statistical Testing - Paired t-tests, confidence intervals
✅ Speed Benchmarking - CPU vs GPU inference timing
✅ Visual Comparisons - Side-by-side method grids
✅ Failure Case Analysis - Worst-performing image documentation

🌐 Production Deployment

✅ FastAPI REST API - Async colorization endpoints
✅ Interactive Web UI - HTML/CSS/JS upload interface
✅ Multi-method Support - Classical + GAN inference
✅ Health Monitoring - System status endpoints
✅ Auto-generated Docs - OpenAPI/Swagger UI

🏗️ Architecture

🎨 Classical Pipeline

Convert the grayscale L channel to LAB to operate in a perceptually uniform space.
When classical methods are selected, pull color statistics from a user-provided reference image.
Apply the selected algorithm:
- Histogram Matching aligns per-channel LAB histograms.
- K-means Transfer builds an 8-color palette from the reference image.
- Local Gaussian Transfer matches local mean/variance windows.
Merge the predicted AB channels with the input L channel and convert back to RGB.

🤖 Deep Learning Architecture

Generator: U-Net with Skip Connections

Input: 1x256x256 grayscale (L channel) image.
Encoder: Conv2D blocks with filters [64, 128, 256, 512, 512], BatchNorm on all but the first block, LeakyReLU (0.2), and stride-2 downsampling.
Bottleneck: Conv2D(512) + BatchNorm + ReLU.
Decoder: ConvTranspose2D blocks with filters [512, 512, 256, 128, 64], BatchNorm throughout, Dropout(0.5) on the first two blocks, and skip connections from the encoder.
Output: ConvTranspose2D -> 2x256x256 AB channels with Tanh activation.

Discriminator: PatchGAN (70x70)

Input: L channel concatenated with real or generated AB channels.
Sequential Conv2D layers with filters [64, 128, 256, 512, 1]; BatchNorm from the second layer onward and LeakyReLU(0.2) activations.
Produces a grid of real/fake logits, providing localized supervision per 70x70 patch.

Loss Function

L_GAN = E[log D(x, y)] + E[log(1 - D(x, G(x)))]

L_L1 = E[||y - G(x)||_1]

L_Total = L_GAN + lambda_L1 * L_L1, where lambda_L1 = 50 (optimal from ablation)

🛠️ Technologies Used

🧠 Deep Learning & Computer Vision

Technology	Version	Purpose
	2.0+	Neural network framework, GPU acceleration
	0.15+	Image transformations, data augmentation
	4.8+	Image I/O, color space conversion
	1.24+	Vectorized operations, numerical computing
	0.21+	PSNR/SSIM metrics, advanced processing
	1.11+	Statistical tests, signal processing

🌐 Web Framework & API

Technology	Version	Purpose
	0.104+	Async REST API, auto-documentation
	0.24+	ASGI server, WebSocket support
	2.4+	Data validation, settings management

📊 Data Science & Visualization

Technology	Version	Purpose
	2.0+	Tabular data, metric aggregation
	3.7+	Loss curves, result plots
	0.12+	Statistical visualizations
	10.0+	Image format handling

🛠️ Development Tools

Technology	Purpose
	Interactive experimentation
	Large file storage (models, datasets)
	Progress bars, training monitoring
	Unit testing, API validation

💻 Hardware Acceleration

Component	Specification	Usage
GPU	RTX 5090 (32GB VRAM)	GAN training, inference
CUDA	11.8+	Parallel tensor operations
cuDNN	8.6+	Optimized deep learning primitives
CPU	Threadripper/Xeon	Classical method processing
RAM	64GB DDR5	Large dataset loading
Storage	NVMe SSD	Fast data I/O

📁 Project Structure

HISTCOLORIFUL/
│
├── 📓 HISTCOLORIFUL.ipynb           # Main experiment notebook (Sections 1-7)
├── 📋 project_summary.json          # High-level findings & configurations
├── 📦 requirements.txt              # Python dependencies
├── 📖 README.md                     # This file
├── 🔒 .gitignore                    # Git exclusions
│
├── 🌐 app/                          # FastAPI Production Service
│   ├── main.py                      # API endpoints & inference logic
│   ├── models.py                    # PyTorch model definitions
│   ├── utils.py                     # Image preprocessing utilities
│   ├── config.py                    # Environment configuration
│   └── static/                      # Frontend assets
│       ├── index.html               # Upload interface
│       ├── style.css                # UI styling
│       └── script.js                # Client-side logic
│
├── 💾 data/                         # Dataset (not committed, except test)
│   ├── train/                       # 5,000 training pairs (256×256 LAB)
│   ├── val/                         # 500 validation pairs
│   └── test/                        # 10 held-out test images ✅
│
├── 🎨 classical_results/            # Classical method outputs
│   ├── histogram_matching/          # 10 colorized + metrics
│   ├── kmeans_transfer/             # 10 colorized + metrics
│   ├── local_gaussian/              # 10 colorized + metrics
│   └── comparison_summary.csv       # Cross-method PSNR/SSIM
│
├── 🤖 gan_results/                  # GAN outputs
│   ├── test_colorized/              # 10 generated images
│   ├── metrics.csv                  # Per-image PSNR, SSIM, time
│   └── training_history.pkl         # Loss curves, LR schedule
│
├── 🖼️ final_comparisons/            # Side-by-side visualizations
│   ├── comparison_image_01.png      # 5-panel comparison
│   ├── comparison_image_02.png
│   └── ...
│
├── 💾 model/                        # Saved checkpoints
│   ├── ablation_lambda50.pt         # Best GAN (λ=50, epoch 180)
│   ├── generator_only.pt            # Deployment-ready generator
│   └── training_config.json         # Hyperparameters
│
├── 📊 results_csv/                  # Quantitative analysis
│   ├── final_project_summary.csv    # Overall rankings
│   ├── ablation_study_summary.csv   # Hyperparameter sweep
│   ├── speed_comparison.csv         # Inference benchmarks
│   └── statistical_tests.csv        # T-tests, confidence intervals
│
└── 📈 results_png/                  # Visualizations
    ├── training_losses.png          # Loss curves
    ├── ablation_lambda_sweep.png    # PSNR vs lambda_L1
    ├── failure_cases.png            # Worst-performing images
    ├── final_results_dashboard.png  # Composite summary
    └── timing_boxplots.png          # Speed distributions

⚡ Quick Start

1️⃣ Clone Repository

git clone https://github.com/zeeza18/HistColoriful-Colorizing-Grayscale-Images-Using-Conditional-GANs.git
cd HistColoriful-Colorizing-Grayscale-Images-Using-Conditional-GANs

2️⃣ Create Virtual Environment

# Linux/macOS
python -m venv .venv
source .venv/bin/activate

# Windows PowerShell
python -m venv .venv
.venv\Scripts\Activate.ps1

3️⃣ Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

4️⃣ Verify GPU (Optional but Recommended)

python -c "import torch; print(f'🚀 CUDA Available: {torch.cuda.is_available()}')"

Expected Output:

🚀 CUDA Available: True

🧪 Running Experiments

📓 Jupyter Notebook Workflow

jupyter notebook HISTCOLORIFUL.ipynb

Notebook Sections:

Section	Description	Runtime
1. Data Preparation	Load COCO subset, split train/val/test	~5 min
2. Classical Baselines	Run histogram, K-means, Gaussian methods	~10 min
3. GAN Training	Train Pix2Pix for 180 epochs	~4 hours (GPU)
4. Evaluation	Compute PSNR, SSIM, generate visualizations	~15 min
5. Ablation Studies	Sweep lambda_L1, learning rate, epochs	~24 hours (GPU)
6. Statistical Analysis	T-tests, confidence intervals, rankings	~5 min
7. Final Synthesis	Generate comparison grids, dashboards	~10 min

🎯 Quick Evaluation (Pre-trained Model)

from app.models import load_generator
from app.utils import colorize_image

# Load best checkpoint
generator = load_generator('model/ablation_lambda50.pt')

# Colorize test image
grayscale = cv2.imread('data/test/image_01_gray.png', 0)
colorized = colorize_image(generator, grayscale)
cv2.imwrite('output.png', colorized)

🚀 FastAPI Demo

🌐 Start Server

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Console Output:

INFO:     Started server process [12345]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

🖥️ Access Web Interface

Open browser to: http://localhost:8000

📡 API Endpoints

Endpoint	Method	Description
`/`	GET	Serve web UI
`/api/info`	GET	List available methods
`/api/colorize`	POST	Upload & colorize image
`/health`	GET	System status check
`/docs`	GET	Interactive API documentation

🧪 Test with cURL

# Colorize with GAN (no reference needed)
curl -X POST "http://localhost:8000/api/colorize" \
  -F "grayscale=@test_gray.png" \
  -F "method=gan" \
  -o colorized_gan.png

# Colorize with K-means (requires reference)
curl -X POST "http://localhost:8000/api/colorize" \
  -F "grayscale=@test_gray.png" \
  -F "reference=@reference_color.png" \
  -F "method=kmeans" \
  -o colorized_kmeans.png

🐍 Python Client Example

import requests

url = "http://localhost:8000/api/colorize"

files = {
    'grayscale': open('input_gray.png', 'rb'),
    'reference': open('reference.png', 'rb')  # Optional for GAN
}

data = {'method': 'kmeans'}  # or 'gan', 'histogram', 'gaussian'

response = requests.post(url, files=files, data=data)

with open('output.png', 'wb') as f:
    f.write(response.content)

print("✅ Colorization complete!")

📊 Results

🏆 Overall Performance Comparison

Method	PSNR ↑	SSIM ↑	Inference Time ↓	Reference Required
K-means Transfer 👑	22.29 dB	0.8908	145 ms	✅ Yes
Histogram Matching	21.85 dB	0.8742	89 ms	✅ Yes
Local Gaussian	21.43 dB	0.8621	178 ms	✅ Yes
GAN (λ=50) 🤖	20.56 dB	0.8397	312 ms	❌ No

Benchmarked on RTX 5090, averaged over 10 test images

📈 Ablation Study Results

lambda_L1 Hyperparameter Sweep:

lambda_L1	PSNR	SSIM	Training Stability	Notes
10	18.92 dB	0.7845	⚠️ Unstable	Too low, mode collapse
25	19.74 dB	0.8156	✅ Stable	Underfitting
50	20.56 dB	0.8397	✅ Stable	Optimal balance ✨
100	20.21 dB	0.8302	✅ Stable	Over-regularized
150	19.68 dB	0.8189	⚠️ Unstable	Blurry outputs
200	18.34 dB	0.7901	❌ Unstable	Severe overfitting

🖼️ Visual Comparison

Example colorization: Grayscale -> Histogram -> K-means -> Gaussian -> GAN -> Ground Truth

⚠️ Failure Case Analysis

Classical Methods Struggle With:

🌈 Scenes with unusual color palettes (no good reference)
🎨 Abstract textures without clear semantic boundaries
🌃 Low-light images with poor L-channel separation

GAN Struggles With:

🏗️ Fine architectural details (mode averaging)
👤 Skin tones (dataset bias toward outdoor scenes)
📝 Text/signage (semantic understanding required)

📈 Performance Metrics

⚡ Speed Benchmarks

Test Configuration: RTX 5090, 256×256 images, averaged over 100 runs

Method	CPU Time	GPU Time	Speedup	Memory
Histogram Matching	89 ms	N/A (CPU-only)	1.00×	45 MB
K-means Transfer	145 ms	N/A (CPU-only)	1.00×	78 MB
Local Gaussian	178 ms	N/A (CPU-only)	1.00×	92 MB
GAN	1,247 ms	312 ms	4.0×	1.2 GB

📊 Statistical Significance

Paired t-tests (p < 0.05):

Comparison	PSNR Δ	Significant?	Effect Size (Cohen's d)
K-means vs GAN	+1.73 dB	✅ Yes	0.82 (large)
Histogram vs GAN	+1.29 dB	✅ Yes	0.67 (medium)
K-means vs Histogram	+0.44 dB	❌ No	0.21 (small)

💾 Model Size & Deployment

Component	Parameters	Disk Size	Quantization Support
Full GAN (G+D)	54.3M	208 MB	✅ FP16, INT8
Generator Only	27.1M	104 MB	✅ FP16, INT8
Discriminator	27.2M	104 MB	N/A (training only)

🎓 Academic Context

📚 Course Information

Institution: DePaul University
Course: Introduction to Image Processing (CSC 381/481)
Quarter: Winter 2025
Instructor: Kenny Davila

🎯 Learning Objectives Addressed

✅ Implement classical color transfer algorithms
✅ Design and train conditional GANs
✅ Conduct rigorous ablation studies
✅ Apply statistical hypothesis testing
✅ Deploy ML models via REST APIs
✅ Document research methodology

📖 Key References

Pix2Pix: Isola et al. (2017) - "Image-to-Image Translation with Conditional Adversarial Networks"
U-Net: Ronneberger et al. (2015) - "U-Net: Convolutional Networks for Biomedical Image Segmentation"
PatchGAN: Li & Wand (2016) - "Combining Markov Random Fields and Convolutional Neural Networks"
Color Transfer: Reinhard et al. (2001) - "Color Transfer between Images"

📝 Citation

@misc{histcoloriful2025,
  title={HISTCOLORIFUL: A Comparative Study of Classical and Deep Learning Image Colorization},
  author={Mohammed Azeezulla},
  year={2025},
  institution={DePaul University},
  course={Introduction to Image Processing (CSC 381/481)},
  howpublished={\url{https://github.com/zeeza18/HISTCOLORIFUL}}
}

🤝 Contributing

This is an academic project, but suggestions are welcome!

🐛 Found a Bug?

Check existing issues
Open new issue with detailed description
Include error logs, environment details

💡 Have an Idea?

New classical method? Add to app/main.py
Architecture improvement? Modify app/models.py
Better evaluation metric? Update notebook Section 4

🔧 Development Setup

# Install dev dependencies
pip install -r requirements-dev.txt

# Run tests
pytest tests/

# Format code
black app/ --line-length 88

# Type checking
mypy app/

📝 License

This project is licensed under the MIT License - see LICENSE file for details.

MIT License

Copyright (c) 2025 Mohammed Azeezulla

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

🙏 Acknowledgments

PyTorch Team - Excellent deep learning framework
COCO Dataset - High-quality training data
FastAPI Team - Modern web framework
Research Community - Pix2Pix, U-Net, PatchGAN papers

📧 Contact

Developer: Mohammed Azeezulla
Email: mdazeezulla2001@gmail.com
GitHub: @zeeza18
LinkedIn: Not provided

🌟 If this project helped you, consider giving it a star! 🌟

Made with ❤️ using PyTorch, FastAPI, and lots of ☕

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
app		app
model		model
.gitattributes		.gitattributes
.gitignore		.gitignore
HISTCOLORIFUL.pdf		HISTCOLORIFUL.pdf
LICENSE		LICENSE
README.md		README.md
project_summary.json		project_summary.json
requirements.txt		requirements.txt

License

zeeza18/HistColoriful-Colorizing-Grayscale-Images-Using-Conditional-GANs

Folders and files

Latest commit

History

Repository files navigation