perf: use reduce-overhead compile mode #124

vijk777 · 2026-01-22T15:25:40Z

Summary

Added benchmark script for profiling training loop performance
Found that mode="reduce-overhead" in torch.compile gives 4.9x speedup
Applied optimization to main training loop

Benchmark Results

Mode	ms/batch	Speedup
none (no compile)	132.95	1.0x
default	98.29	1.4x
default+compiled_backward	92.09	1.4x
reduce-overhead	28.41	4.7x
reduce-overhead+fused	27.25	4.9x
max-autotune	26.21	5.1x

Key Findings

reduce-overhead mode is the big win - Uses CUDA graphs to minimize kernel launch overhead
Fused Adam provides marginal additional gain (~2%)
Compiled backward (autograd) helps ~6% with default mode, but no benefit with reduce-overhead
AMP (mixed precision) has compilation bugs with reduce-overhead mode
max-autotune is slightly faster but has long compilation time

Expected Impact

Original epoch 3 timing: 195.40s
Expected new timing: ~40s (4.9x faster)

benchmark script for profiling training loop performance. found reduce-overhead compile mode gives 4.9x speedup. results: - none: 132.95ms/batch - default: 98.29ms/batch - reduce-overhead+fused+bwd: 27.11ms/batch Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

benchmarking showed reduce-overhead mode gives significant speedup by using CUDA graphs to minimize kernel launch overhead. before: ~98ms/batch (default compile) after: ~27ms/batch (reduce-overhead) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

vijk777 and others added 3 commits January 22, 2026 07:12

add max-autotune compile mode to benchmark

6e54ff6

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

vijk777 merged commit 5d2e202 into main Jan 22, 2026
1 of 2 checks passed

vijk777 deleted the vj/perf branch January 22, 2026 15:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: use reduce-overhead compile mode #124

perf: use reduce-overhead compile mode #124

Uh oh!

vijk777 commented Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

perf: use reduce-overhead compile mode #124

perf: use reduce-overhead compile mode #124

Uh oh!

Conversation

vijk777 commented Jan 22, 2026

Summary

Benchmark Results

Key Findings

Expected Impact

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants