Add challenge 83: Fused Residual Add and RMS Norm (Medium) by claude[bot] · Pull Request #234 · AlphaGPU/leetgpu-challenges

claude · 2026-04-04T04:14:53Z

Summary

Adds challenge 83: Fused Residual Add and RMS Norm (Medium difficulty)
Models the residual_add_rms_norm pattern used before every sublayer in LLaMA, Mistral, and other modern LLMs
Solver must compute out[i,j] = (x[i,j] + residual[i,j]) / rms_i * weight[j] in a single fused GPU kernel, where rms_i = sqrt(mean(z_i^2) + eps) for row i
Key GPU concepts: per-row parallel reduction (shared memory + warp tree), kernel fusion to avoid materializing the intermediate z = x + residual in global memory, and broadcasting the learned weight vector

All 6 starter files present (.cu, .pytorch.py, .triton.py, .jax.py, .cute.py, .mojo)
pre-commit run --all-files passes (black, isort, flake8, clang-format)
Validated with run_challenge.py --action submit on NVIDIA TESLA T4 — all functional + performance tests pass
Checklist items verified: <p> start, <h2> sections, <pre> for 1D examples, performance bullet matches generate_performance_test()

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add challenge 83: Fused Residual Add and RMS Norm (Medium)

ad9795b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude bot requested review from ishaan-arya, kunal-mansukhani and shxjames as code owners April 4, 2026 04:14