A comprehensive capstone project comparing classical color transfer techniques against custom GAN architecture for automatic image colorization
Maintained by Mohammed Azeezulla (zeeza18) Β· Introduction to Image Processing
- β¨ Overview
- π― Key Features
- ποΈ Architecture
- π οΈ Technologies Used
- π Project Structure
- β‘ Quick Start
- π§ͺ Running Experiments
- π FastAPI Demo
- π Results
- π Performance Metrics
- π Academic Context
- π€ Contributing
- π License
HISTCOLORIFUL is a comprehensive image colorization research project that answers a fundamental question: How far can classical computer vision methods go before deep learning becomes necessary?
Built on a carefully curated 256Γ256 LAB subset of the COCO dataset with 10 held-out test images, this project systematically compares:
- π΅ Classical Methods: Histogram Matching, K-means Color Transfer, Local Gaussian Transfer
- π΄ Deep Learning: Custom Pix2Pix GAN with U-Net Generator + PatchGAN Discriminator
- How do interpretable classical algorithms perform vs. trainable GANs?
- What hyperparameters optimize GAN colorization quality?
- Can we quantify the speed vs. quality tradeoff?
- What are the failure modes of each approach?
| Method | PSNR (dB) | SSIM | Speed | Reference Required |
|---|---|---|---|---|
| K-means Transfer π | 22.29 | 0.8908 | Slow | β Yes |
| GAN (Ξ»=50) π€ | 20.56 | 0.8397 | Very Fast | β No |
| Histogram Matching | 21.85 | 0.8742 | Fast | β Yes |
| Local Gaussian | 21.43 | 0.8621 | Fast | β Yes |
- β Histogram Matching - CDF alignment per LAB channel
- β K-means Color Transfer - 8-cluster palette mapping
- β Local Gaussian Transfer - Windowed statistics matching
- β GPU-accelerated NumPy operations
- β Per-image metric logging
- β Pix2Pix GAN Architecture - U-Net generator with skip connections
- β PatchGAN Discriminator - 70x70 receptive field
- β Mixed Precision Training - FP16 for faster convergence
- β Comprehensive Ablation Studies - 6 hyperparameter configurations
- β Checkpoint Management - Best model serialization
- β PSNR & SSIM Metrics - Quantitative quality assessment
- β Statistical Testing - Paired t-tests, confidence intervals
- β Speed Benchmarking - CPU vs GPU inference timing
- β Visual Comparisons - Side-by-side method grids
- β Failure Case Analysis - Worst-performing image documentation
- β FastAPI REST API - Async colorization endpoints
- β Interactive Web UI - HTML/CSS/JS upload interface
- β Multi-method Support - Classical + GAN inference
- β Health Monitoring - System status endpoints
- β Auto-generated Docs - OpenAPI/Swagger UI
- Convert the grayscale L channel to LAB to operate in a perceptually uniform space.
- When classical methods are selected, pull color statistics from a user-provided reference image.
- Apply the selected algorithm:
- Histogram Matching aligns per-channel LAB histograms.
- K-means Transfer builds an 8-color palette from the reference image.
- Local Gaussian Transfer matches local mean/variance windows.
- Merge the predicted AB channels with the input L channel and convert back to RGB.
Generator: U-Net with Skip Connections
- Input: 1x256x256 grayscale (L channel) image.
- Encoder: Conv2D blocks with filters [64, 128, 256, 512, 512], BatchNorm on all but the first block, LeakyReLU (0.2), and stride-2 downsampling.
- Bottleneck: Conv2D(512) + BatchNorm + ReLU.
- Decoder: ConvTranspose2D blocks with filters [512, 512, 256, 128, 64], BatchNorm throughout, Dropout(0.5) on the first two blocks, and skip connections from the encoder.
- Output: ConvTranspose2D -> 2x256x256 AB channels with Tanh activation.
Discriminator: PatchGAN (70x70)
- Input: L channel concatenated with real or generated AB channels.
- Sequential Conv2D layers with filters [64, 128, 256, 512, 1]; BatchNorm from the second layer onward and LeakyReLU(0.2) activations.
- Produces a grid of real/fake logits, providing localized supervision per 70x70 patch.
Loss Function
L_GAN = E[log D(x, y)] + E[log(1 - D(x, G(x)))]
L_L1 = E[||y - G(x)||_1]
L_Total = L_GAN + lambda_L1 * L_L1, where lambda_L1 = 50 (optimal from ablation)
| Technology | Version | Purpose |
|---|---|---|
| 0.104+ | Async REST API, auto-documentation | |
| 0.24+ | ASGI server, WebSocket support | |
| 2.4+ | Data validation, settings management |
| Technology | Version | Purpose |
|---|---|---|
| 2.0+ | Tabular data, metric aggregation | |
| 3.7+ | Loss curves, result plots | |
| 0.12+ | Statistical visualizations | |
| 10.0+ | Image format handling |
| Technology | Purpose |
|---|---|
| Interactive experimentation | |
| Large file storage (models, datasets) | |
| Progress bars, training monitoring | |
| Unit testing, API validation |
| Component | Specification | Usage |
|---|---|---|
| GPU | RTX 5090 (32GB VRAM) | GAN training, inference |
| CUDA | 11.8+ | Parallel tensor operations |
| cuDNN | 8.6+ | Optimized deep learning primitives |
| CPU | Threadripper/Xeon | Classical method processing |
| RAM | 64GB DDR5 | Large dataset loading |
| Storage | NVMe SSD | Fast data I/O |
HISTCOLORIFUL/
β
βββ π HISTCOLORIFUL.ipynb # Main experiment notebook (Sections 1-7)
βββ π project_summary.json # High-level findings & configurations
βββ π¦ requirements.txt # Python dependencies
βββ π README.md # This file
βββ π .gitignore # Git exclusions
β
βββ π app/ # FastAPI Production Service
β βββ main.py # API endpoints & inference logic
β βββ models.py # PyTorch model definitions
β βββ utils.py # Image preprocessing utilities
β βββ config.py # Environment configuration
β βββ static/ # Frontend assets
β βββ index.html # Upload interface
β βββ style.css # UI styling
β βββ script.js # Client-side logic
β
βββ πΎ data/ # Dataset (not committed, except test)
β βββ train/ # 5,000 training pairs (256Γ256 LAB)
β βββ val/ # 500 validation pairs
β βββ test/ # 10 held-out test images β
β
βββ π¨ classical_results/ # Classical method outputs
β βββ histogram_matching/ # 10 colorized + metrics
β βββ kmeans_transfer/ # 10 colorized + metrics
β βββ local_gaussian/ # 10 colorized + metrics
β βββ comparison_summary.csv # Cross-method PSNR/SSIM
β
βββ π€ gan_results/ # GAN outputs
β βββ test_colorized/ # 10 generated images
β βββ metrics.csv # Per-image PSNR, SSIM, time
β βββ training_history.pkl # Loss curves, LR schedule
β
βββ πΌοΈ final_comparisons/ # Side-by-side visualizations
β βββ comparison_image_01.png # 5-panel comparison
β βββ comparison_image_02.png
β βββ ...
β
βββ πΎ model/ # Saved checkpoints
β βββ ablation_lambda50.pt # Best GAN (Ξ»=50, epoch 180)
β βββ generator_only.pt # Deployment-ready generator
β βββ training_config.json # Hyperparameters
β
βββ π results_csv/ # Quantitative analysis
β βββ final_project_summary.csv # Overall rankings
β βββ ablation_study_summary.csv # Hyperparameter sweep
β βββ speed_comparison.csv # Inference benchmarks
β βββ statistical_tests.csv # T-tests, confidence intervals
β
βββ π results_png/ # Visualizations
βββ training_losses.png # Loss curves
βββ ablation_lambda_sweep.png # PSNR vs lambda_L1
βββ failure_cases.png # Worst-performing images
βββ final_results_dashboard.png # Composite summary
βββ timing_boxplots.png # Speed distributions
git clone https://github.com/zeeza18/HistColoriful-Colorizing-Grayscale-Images-Using-Conditional-GANs.git
cd HistColoriful-Colorizing-Grayscale-Images-Using-Conditional-GANs# Linux/macOS
python -m venv .venv
source .venv/bin/activate
# Windows PowerShell
python -m venv .venv
.venv\Scripts\Activate.ps1pip install --upgrade pip
pip install -r requirements.txtpython -c "import torch; print(f'π CUDA Available: {torch.cuda.is_available()}')"Expected Output:
π CUDA Available: True
jupyter notebook HISTCOLORIFUL.ipynbNotebook Sections:
| Section | Description | Runtime |
|---|---|---|
| 1. Data Preparation | Load COCO subset, split train/val/test | ~5 min |
| 2. Classical Baselines | Run histogram, K-means, Gaussian methods | ~10 min |
| 3. GAN Training | Train Pix2Pix for 180 epochs | ~4 hours (GPU) |
| 4. Evaluation | Compute PSNR, SSIM, generate visualizations | ~15 min |
| 5. Ablation Studies | Sweep lambda_L1, learning rate, epochs | ~24 hours (GPU) |
| 6. Statistical Analysis | T-tests, confidence intervals, rankings | ~5 min |
| 7. Final Synthesis | Generate comparison grids, dashboards | ~10 min |
from app.models import load_generator
from app.utils import colorize_image
# Load best checkpoint
generator = load_generator('model/ablation_lambda50.pt')
# Colorize test image
grayscale = cv2.imread('data/test/image_01_gray.png', 0)
colorized = colorize_image(generator, grayscale)
cv2.imwrite('output.png', colorized)uvicorn app.main:app --reload --host 0.0.0.0 --port 8000Console Output:
INFO: Started server process [12345]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
Open browser to: http://localhost:8000
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Serve web UI |
/api/info |
GET | List available methods |
/api/colorize |
POST | Upload & colorize image |
/health |
GET | System status check |
/docs |
GET | Interactive API documentation |
# Colorize with GAN (no reference needed)
curl -X POST "http://localhost:8000/api/colorize" \
-F "grayscale=@test_gray.png" \
-F "method=gan" \
-o colorized_gan.png
# Colorize with K-means (requires reference)
curl -X POST "http://localhost:8000/api/colorize" \
-F "grayscale=@test_gray.png" \
-F "reference=@reference_color.png" \
-F "method=kmeans" \
-o colorized_kmeans.pngimport requests
url = "http://localhost:8000/api/colorize"
files = {
'grayscale': open('input_gray.png', 'rb'),
'reference': open('reference.png', 'rb') # Optional for GAN
}
data = {'method': 'kmeans'} # or 'gan', 'histogram', 'gaussian'
response = requests.post(url, files=files, data=data)
with open('output.png', 'wb') as f:
f.write(response.content)
print("β
Colorization complete!")| Method | PSNR β | SSIM β | Inference Time β | Reference Required |
|---|---|---|---|---|
| K-means Transfer π | 22.29 dB | 0.8908 | 145 ms | β Yes |
| Histogram Matching | 21.85 dB | 0.8742 | 89 ms | β Yes |
| Local Gaussian | 21.43 dB | 0.8621 | 178 ms | β Yes |
| GAN (Ξ»=50) π€ | 20.56 dB | 0.8397 | 312 ms | β No |
Benchmarked on RTX 5090, averaged over 10 test images
lambda_L1 Hyperparameter Sweep:
| lambda_L1 | PSNR | SSIM | Training Stability | Notes |
|---|---|---|---|---|
| 10 | 18.92 dB | 0.7845 | Too low, mode collapse | |
| 25 | 19.74 dB | 0.8156 | β Stable | Underfitting |
| 50 | 20.56 dB | 0.8397 | β Stable | Optimal balance β¨ |
| 100 | 20.21 dB | 0.8302 | β Stable | Over-regularized |
| 150 | 19.68 dB | 0.8189 | Blurry outputs | |
| 200 | 18.34 dB | 0.7901 | β Unstable | Severe overfitting |
Example colorization: Grayscale -> Histogram -> K-means -> Gaussian -> GAN -> Ground Truth
Classical Methods Struggle With:
- π Scenes with unusual color palettes (no good reference)
- π¨ Abstract textures without clear semantic boundaries
- π Low-light images with poor L-channel separation
GAN Struggles With:
- ποΈ Fine architectural details (mode averaging)
- π€ Skin tones (dataset bias toward outdoor scenes)
- π Text/signage (semantic understanding required)
Test Configuration: RTX 5090, 256Γ256 images, averaged over 100 runs
| Method | CPU Time | GPU Time | Speedup | Memory |
|---|---|---|---|---|
| Histogram Matching | 89 ms | N/A (CPU-only) | 1.00Γ | 45 MB |
| K-means Transfer | 145 ms | N/A (CPU-only) | 1.00Γ | 78 MB |
| Local Gaussian | 178 ms | N/A (CPU-only) | 1.00Γ | 92 MB |
| GAN | 1,247 ms | 312 ms | 4.0Γ | 1.2 GB |
Paired t-tests (p < 0.05):
| Comparison | PSNR Ξ | Significant? | Effect Size (Cohen's d) |
|---|---|---|---|
| K-means vs GAN | +1.73 dB | β Yes | 0.82 (large) |
| Histogram vs GAN | +1.29 dB | β Yes | 0.67 (medium) |
| K-means vs Histogram | +0.44 dB | β No | 0.21 (small) |
| Component | Parameters | Disk Size | Quantization Support |
|---|---|---|---|
| Full GAN (G+D) | 54.3M | 208 MB | β FP16, INT8 |
| Generator Only | 27.1M | 104 MB | β FP16, INT8 |
| Discriminator | 27.2M | 104 MB | N/A (training only) |
- Institution: DePaul University
- Course: Introduction to Image Processing (CSC 381/481)
- Quarter: Winter 2025
- Instructor: Kenny Davila
β
Implement classical color transfer algorithms
β
Design and train conditional GANs
β
Conduct rigorous ablation studies
β
Apply statistical hypothesis testing
β
Deploy ML models via REST APIs
β
Document research methodology
- Pix2Pix: Isola et al. (2017) - "Image-to-Image Translation with Conditional Adversarial Networks"
- U-Net: Ronneberger et al. (2015) - "U-Net: Convolutional Networks for Biomedical Image Segmentation"
- PatchGAN: Li & Wand (2016) - "Combining Markov Random Fields and Convolutional Neural Networks"
- Color Transfer: Reinhard et al. (2001) - "Color Transfer between Images"
@misc{histcoloriful2025,
title={HISTCOLORIFUL: A Comparative Study of Classical and Deep Learning Image Colorization},
author={Mohammed Azeezulla},
year={2025},
institution={DePaul University},
course={Introduction to Image Processing (CSC 381/481)},
howpublished={\url{https://github.com/zeeza18/HISTCOLORIFUL}}
}This is an academic project, but suggestions are welcome!
- Check existing issues
- Open new issue with detailed description
- Include error logs, environment details
- New classical method? Add to
app/main.py - Architecture improvement? Modify
app/models.py - Better evaluation metric? Update notebook Section 4
# Install dev dependencies
pip install -r requirements-dev.txt
# Run tests
pytest tests/
# Format code
black app/ --line-length 88
# Type checking
mypy app/This project is licensed under the MIT License - see LICENSE file for details.
MIT License
Copyright (c) 2025 Mohammed Azeezulla
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
- PyTorch Team - Excellent deep learning framework
- COCO Dataset - High-quality training data
- FastAPI Team - Modern web framework
- Research Community - Pix2Pix, U-Net, PatchGAN papers
Developer: Mohammed Azeezulla
Email: mdazeezulla2001@gmail.com
GitHub: @zeeza18
LinkedIn: Not provided
