Skip to content

Add challenge 83: Fused Residual Add and RMS Norm (Medium)#234

Open
claude[bot] wants to merge 1 commit intomainfrom
add-challenge-83-fused-residual-add-rms-norm
Open

Add challenge 83: Fused Residual Add and RMS Norm (Medium)#234
claude[bot] wants to merge 1 commit intomainfrom
add-challenge-83-fused-residual-add-rms-norm

Conversation

@claude
Copy link
Copy Markdown
Contributor

@claude claude bot commented Apr 4, 2026

Summary

  • Adds challenge 83: Fused Residual Add and RMS Norm (Medium difficulty)
  • Models the residual_add_rms_norm pattern used before every sublayer in LLaMA, Mistral, and other modern LLMs
  • Solver must compute out[i,j] = (x[i,j] + residual[i,j]) / rms_i * weight[j] in a single fused GPU kernel, where rms_i = sqrt(mean(z_i^2) + eps) for row i
  • Key GPU concepts: per-row parallel reduction (shared memory + warp tree), kernel fusion to avoid materializing the intermediate z = x + residual in global memory, and broadcasting the learned weight vector

Test plan

  • All 6 starter files present (.cu, .pytorch.py, .triton.py, .jax.py, .cute.py, .mojo)
  • pre-commit run --all-files passes (black, isort, flake8, clang-format)
  • Validated with run_challenge.py --action submit on NVIDIA TESLA T4 — all functional + performance tests pass
  • Checklist items verified: <p> start, <h2> sections, <pre> for 1D examples, performance bullet matches generate_performance_test()

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants