Systematic benchmarking and layer-by-layer analysis of computer vision models for depth estimation, object detection, segmentation, and pose estimation.
| Model | Inference (s) | FPS | Memory (MB) | Parameters | Layers |
|---|---|---|---|---|---|
| DepthPro | 14.811 ± 1.201 | 0.07 | 3643 | 952M | 57 |
| Depth Anything V2 Small | 0.058 ± 0.047 | 17.13 | 105 | 24.8M | 33 |
| Depth Anything V2 Base | 0.111 ± 0.055 | 9.02 | 381 | 97.5M | 33 |
| Depth Anything V2 Large | 0.262 ± 0.009 | 3.82 | 1290 | 335M | 33 |
| YOLO11n-Detect | 0.178 ± 0.283 | 5.63 | 19 | 2.62M | 89 |
| YOLO11n-Segment | 0.071 ± 0.062 | 14.16 | 21 | 2.87M | 102 |
| YOLO11n-Pose | 0.089 ± 0.102 | 11.21 | 20 | 2.87M | 98 |
| MobileSAM | 7.238 ± 2.302 | 0.14 | 67 | 10.1M | 138 |
Hardware: NVIDIA GeForce RTX 3060 Laptop GPU, CUDA 12.6, PyTorch 2.9.0+cu126
- Depth Anything V2 Small is 255× faster than DepthPro (0.058s vs 14.8s)
- YOLO11n-Segment outperforms Detection variant by 2.5× (14.16 FPS vs 5.63 FPS)
- Zero sparsity across all 289 dissected layers indicates excellent training efficiency
- Depth Anything V2 Base has sign encoding bug (positive depth values require flip)
- Detection variant suffers 159% coefficient of variation (GPU scheduling issues)
Depth Estimation
- DepthPro (Apple) - 57 Conv/ConvTranspose layers
- Depth Anything V2 Small/Base/Large - 33 Conv2d layers each
Object Detection & Segmentation
- YOLO11n-Detect - 89 Conv2d layers
- YOLO11n-Segment - 102 Conv2d/ConvTranspose2d layers
- YOLO11n-Pose - 98 Conv2d layers
- MobileSAM - 138 layers
vision-bench/ # Benchmark framework
unified_benchmark.py # Main benchmark script
results/ # Performance metrics & reports
viz/ # Layer visualizations (PNG + NPY)
docs/ # Detailed analysis reports
MODEL_ANALYSIS.md # Cross-model comparison
DEPTHPRO_DETAILED.md # DepthPro findings
DEPTH_ANYTHING_V2_DETAILED.md # Depth Anything analysis
YOLO11_FAMILY_DETAILED.md # YOLO11 variants
apps/ # Interactive applications
server/ # FastAPI model serving
compare/ # Comparison utilities
dissect/ # Model dissection tools
explore/ # Experimentation scripts
# Install dependencies
pip install -e .
# Run comprehensive benchmark
python vision-bench/unified_benchmark.py
# Dissect specific model
python dissect/dissect_depthpro.py
python dissect/dissect_depthanything.py
# Launch interactive app
streamlit run apps/depthpro_app.pyA professional Next.js web interface for exploring benchmark results:
cd explorer
npm install
npm run devFeatures:
- Benchmarks: Performance metrics comparison
- Layer Visualizations: Detailed layer analysis with statistics
- Graph View: Interactive computational graph
- Live Monitor: Real-time benchmark progress
Deployment: Ready for Vercel with automatic GitHub data source
- See
explorer/README.mdfor setup - See
explorer/DATA_SOURCE.mdfor technical details
All layer dissection visualizations saved as:
- PNG: 4×4 grid of 16 channels (visual inspection)
- NPY: 8 channels (numerical analysis)
See docs/ for comprehensive analysis:
- MODEL_ANALYSIS.md - Cross-model benchmark comparison
- DEPTHPRO_DETAILED.md - 57-layer dissection, extreme activation analysis
- DEPTH_ANYTHING_V2_DETAILED.md - 3-variant comparison, sign bug documentation
- YOLO11_FAMILY_DETAILED.md - 89/102/98 layer analysis per variant
- Python 3.11+
- PyTorch 2.9+ with CUDA 12.6
- transformers (DepthPro)
- ultralytics (YOLO11)
- onnx / onnxruntime
- opencv-python
- numpy, matplotlib
Repository: Vision-Dissect
Owner: infiniV
Branch: main
Benchmark Date: November 11, 2025




