MPKx

LGN-inspired vision with stride-based M/P/K pathways, K-gating, and binocular processing

Benchmarks • Architecture • AutoResearch Results • Quick Start • License

874K params full model • 15.5K Pi model • 33 FPS on RPi5 • 0.89MB tiny • No pretraining

MPKx implements the parallel visual pathways of the mammalian Lateral Geniculate Nucleus (LGN). Unlike standard CNNs with uniform stride, MPKx differentiates M, P, and K pathways by spatial sampling density — mirroring how biological M cells tile space more sparsely than P cells.

Key insight: same 5×5 kernel, different Fibonacci strides (2:3:5). This produces resolutions that converge toward the golden ratio and gives each pathway a naturally different field of view.

	ResNet-18	MPKx
Parameters	11.2M	0.87M
CIFAR-100	~45%	46.5%
STL-10	~70%	74.9%
Caltech-101	~60%	59.5%
No pretraining	Requires ImageNet	From scratch

13x fewer parameters, competitive accuracy, no pretraining, minimal augmentation.

Benchmarks

Image Classification (Single-Pass, No Feedback)

Dataset	Classes	Resolution	Val Acc	Params	Notes
STL-10	10	224×224	74.9%	0.86M	Only 5K train samples, from scratch
CIFAR-100	100	224×224	46.5%	0.87M	224px upsampled, no extra augmentation
Caltech-101	101	224×224	59.5%	0.87M	30 train images/class, full 101 classes
TinyImageNet	200	64×64	40.6%	0.21M	ResNet-18 gets ~41.5% with 52x more params
Kvasir-v2	8	224×224	89%	0.21M	Medical endoscopy
ImageNet-100	100	224×224	60.8%	0.54M
Fashion-MNIST	10	224×224	—	—	Also supported via train_mpk_smallvision.py

Video Classification (Temporal Variant)

Dataset	Classes	Resolution	Accuracy	Params	Notes
UCF-101	101	112×112	77%	0.58M	8-frame temporal M-pathway, from scratch

Edge Deployment

Device	Model	Size	FPS	Accuracy
Raspberry Pi 5 (no heatsink)	MPKx-Pi	76KB	33	82%
MacBook M3	MPKx	0.89MB	200+	89%

Finding: Augmentation Hurts at Small Scales

Dataset	No Augmentation	With Augmentation	Change
CIFAR-100 (32×32)	52.8%	46.0%	-6.8%
TinyImageNet (64×64)	40.6%	24.1%	-16.5%
ImageNet-100 (224×224)	60.8%	~62%	+1-2%

Consistent with NetAug (Cai et al., 2022): regularization hurts tiny models that underfit rather than overfit. At small resolutions, the Fibonacci stride architecture provides sufficient multi-scale coverage that augmentation becomes redundant noise.

AutoResearch Results

The experiments/autoresearch/ directory contains a three-dataset benchmark pipeline designed to test MPKx across different vision domains with consistent hyperparameters:

experiments/autoresearch/
  model.py                          # MPKx architecture (standalone)
  train_cifar100.py                 # CIFAR-100 harness (100 classes)
  train_mpk_smallvision.py          # STL-10 / Caltech-101 / Fashion-MNIST harness
  run_mpk_three_datasets_rtx6000.sh # Sequential pipeline runner
  training_curves/                  # Per-epoch validation curves

Full Results (2026-04-28, RTX 6000)

Shared hyperparameters: batch_size=80, channels=56, lr=0.004, cosine schedule, weight_decay=0.01, horizontal flip only, 100 epochs each.

Dataset	Classes	Steps	Val Acc	Val Top-5	Params	Peak VRAM	Runtime
CIFAR-100	100	62,500	46.49%	76.95%	874K	3,195 MB	~1h27m
STL-10	10	6,200	74.91%	98.42%	864K	3,195 MB	~14m
Caltech-101	101	3,200	59.46%	78.12%	874K	3,195 MB	~9m

All three datasets fit in the same 3.2 GB VRAM footprint. The model trains from scratch with no pretrained weights — each dataset starts with random initialization.

Key observations:

STL-10 (74.9%) is the strongest result — only 5K training samples in a 10-class setup, and the model converges to near-zero training loss (~0.02) by epoch 60+, with validation accuracy plateauing around 74-75%.
CIFAR-100 (46.5%) shows steady improvement across all 100 epochs, with the best single-epoch val_acc hitting 47.1% at epoch 96. Loss drops from 4.25 to 0.76, no sign of overfitting despite 0% dropout.
Caltech-101 (59.5%) is the hardest — 101 classes with only 30 training images each, yet top-5 accuracy reaches 78%. The model doesn't plateau until ~epoch 80, suggesting longer training wouldn't help without more data.

Detailed per-epoch curves: docs/results/2026-04-28_three-dataset-benchmark.md

Quick Start

import torch
from model import MPKx

# Create model (adjust num_classes for your dataset)
model = MPKx(num_classes=10, ch=56, use_stereo=True)
print(f"Parameters: {sum(p.numel() for p in model.parameters()):,}")
# Output: ~864K for 10 classes

# Inference
x = torch.randn(1, 3, 224, 224)
out = model(x)  # [1, num_classes]

Training

# Clone and install
git clone https://github.com/DJLougen/MPKnet.git
cd MPKnet
pip install -r requirements.txt

# Run the three-dataset benchmark pipeline (requires RTX 6000 or similar)
# Or train individual datasets:

# CIFAR-100
python experiments/autoresearch/train_cifar100.py

# STL-10 / Caltech-101 / Fashion-MNIST
python experiments/autoresearch/train_mpk_smallvision.py --dataset stl10 --epochs 100
python experiments/autoresearch/train_mpk_smallvision.py --dataset caltech101 --epochs 100

Architecture

MPKx models the Lateral Geniculate Nucleus (LGN) — the relay station between retina and visual cortex.

The Three Pathways

Pathway	Biological Role	Implementation	What It Captures
M (Magnocellular)	~10% of LGN, motion, global gist	Stride 5 (coarse)	Shape, motion, layout
P (Parvocellular)	~80% of LGN, fine detail, color	Stride 2 (fine)	Texture, edges, color
K (Koniocellular)	~10% of LGN, projects to M and P	Stride 3 (intermediate)	Context-dependent gating

Processing Pipeline

input → stereo disparity → retinal preprocessing (center-surround)
  → M (stride 5) ──┐
  → P (stride 2) ──┼── K-gated → V1 fusion → classifier
  → K (stride 3) ──┘     ↑
         └── generates cross-stream attention ──┘

Core Principles

Same kernel, different stride — All pathways use 5×5 kernels. Fibonacci strides (2:3:5) differentiate them, producing resolutions that converge toward the golden ratio.
Parallel processing — M/P/K run independently until fusion. No cross-talk within pathways.
Late fusion only — No pooling within pathways. Global adaptive average pool only at the end.
K modulates, doesn't process — K generates bidirectional cross-stream gain gates for M and P (tanh gating: 1 + α·tanh(gate), α=0.5). Range [0.5, 1.5] allows both suppression and facilitation.
Binocular processing — Left/right eye views processed independently through M/P/K pathways, fused only at the V1 stage.

What's Novel

First Fibonacci strides in CNNs — Derived from biological spatial frequency tuning, not empirical search
First complete M/P/K implementation — Prior work (Magno-Parvo CNN, EVNets, SlowFast) models M/P only
Biologically-grounded cross-stream gating — K→M/P gating mirrors koniocellular projections in LGN
Late binocular fusion — Eyes segregated through LGN blocks, matching known anatomy

Why This Works

Biology processes vision with 20 watts. One hypothesis: efficiency comes from the wiring diagram, not raw neuron count.

MPKx borrows this principle — the connectivity pattern is inspired by biology:

"It's what you multiply and where you multiply."

Ablation Study

Variant	Params	Effect
Full MPKx (M+P+K)	874K	Baseline
No K-gating	~840K	Expected drop in disambiguation
No stereo	~870K	Expected drop on fine detail
Stride variations (non-Fibonacci)	—	Under investigation

Benchmark ablations forthcoming across all three autoresearch datasets.

Interpretable Failures (Kvasir-v2 Medical)

Method: Evaluated MPKx-Pi on Kvasir-v2 validation set (1600 samples), tracking all misclassifications with confidence scores.

Key finding: 63% of errors (183/292) cluster in just two bidirectional pairs.

True Class	→ Predicted	Count	Mean Conf
esophagitis	→ normal-z-line	58	68%
dyed-lifted-polyps	→ dyed-resection-margins	51	69%
dyed-resection-margins	→ dyed-lifted-polyps	40	60%
normal-z-line	→ esophagitis	34	61%

Failure Type	Count	% of Failures
Confident failures (≥80% conf)	44	15%
Ambiguous failures (<50% conf)	22	8%
Close calls (<15% margin)	69	24%

Why it matters: Failures cluster in predictable, explainable pairs. You know which cases need human review and why the model failed.

Roadmap

Biological extensions:

Surround suppression — V1-like center-surround for better edge discrimination
Temporal M pathway — 3D convolutions in M pathway for video (UCF-101: 77%)
RGC layer — Midget/Parasol/Bistratified cells feeding M/P/K pathways
Retinotectal pathway — Superior colliculus for saccades
V1 orientation columns — Edge detection specialization
V1→LGN feedback — Corticogeniculate modulation as learned gain (experimental variant exists)

Applications:

Detection head — YOLO-style head using M/P as multi-scale FPN
Medical uncertainty — MC Dropout for epistemic uncertainty quantification
VLM encoder — Lightweight vision encoder for vision-language models
Webcam eye tracking — Real-time gaze estimation from eye crops
Thermal glider fire detection — 3D-printed gliders for wildfire monitoring

Citation

@misc{MPKNet,
  author = {Lougen, D.J.},
  title = {MPKx: An LGN-Inspired Architecture for Efficient Visual Processing},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/DJLougen/MPKnet}
}

Patent pending: US 63/950,391

License & Commercial Use

PolyForm Small Business License with Humanitarian Exception.

Use Case	Cost
Academic research	Free
Personal projects	Free
Startups (<$100K revenue)	Free
Non-profits & NGOs	Free
Educational institutions	Free
Low-income region deployment	Free
Commercial (>$100K revenue)	Contact me

These use cases should never be paywalled.

For commercial licensing: d.lougen@mail.utoronto.ca

Acknowledgements

Thanks to Paul Dassonville (UO) for introducing me to these cells, and Jay Pratt (U of T) for ongoing collaboration on koniocellular research.

Daniel J. Lougen · University of Toronto

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
docs/results		docs/results
experiments/autoresearch		experiments/autoresearch
figures		figures
public		public
.gitignore		.gitignore
MPKx.py		MPKx.py
MPKx_v2.py		MPKx_v2.py
README.md		README.md
train_cifar100_v2.py		train_cifar100_v2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MPKx

Benchmarks

Image Classification (Single-Pass, No Feedback)

Video Classification (Temporal Variant)

Edge Deployment

Finding: Augmentation Hurts at Small Scales

AutoResearch Results

Full Results (2026-04-28, RTX 6000)

Quick Start

Training

Architecture

The Three Pathways

Processing Pipeline

Core Principles

What's Novel

Why This Works

Ablation Study

Interpretable Failures (Kvasir-v2 Medical)

Roadmap

Citation

License & Commercial Use

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MPKx

Benchmarks

Image Classification (Single-Pass, No Feedback)

Video Classification (Temporal Variant)

Edge Deployment

Finding: Augmentation Hurts at Small Scales

AutoResearch Results

Full Results (2026-04-28, RTX 6000)

Quick Start

Training

Architecture

The Three Pathways

Processing Pipeline

Core Principles

What's Novel

Why This Works

Ablation Study

Interpretable Failures (Kvasir-v2 Medical)

Roadmap

Citation

License & Commercial Use

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages