feat(ws1): NativeLMHeadOp pure-PyTorch ground-truth reference + numerical contract tests by maxiaosong1124 · Pull Request #170 · RL-Align/RL-Kernel

maxiaosong1124 · 2026-06-22T04:06:20Z

Summary

Adds the pure-PyTorch ground-truth reference op for the language-model head — the
output layer of the WS1 batch-invariant forward chain — built on top of the numerical
contract defined in #108. Ships the
op, its registry wiring, docs, and a 15-case test suite that pins down both alignment
axes (Axis-A bitwise batch invariance, Axis-B per-dtype accuracy), plus a GPU-only smoke
test at the real Qwen3-8B projection dims.

Refs #108

Terminology

This PR uses the WS1 alignment vocabulary from #108:

Axis-A — batch invariance (reproducibility). A row's logits must not depend on how
many rows share the batch (batch size, slicing, padding). Asserted bitwise
(torch.equal). This is what keeps train-time (large batch) and sample-time (small
batch / dynamic padding) numerics identical so the policy ratio doesn't drift.
Axis-B — accuracy. The low-precision (bf16 / fp16) forward path. Unlike the
lossless embedding gather, lm_head is a reduction over hidden, so low-precision
accumulation drifts from the fp32 reference and is checked against a tolerance
window (not bitwise).

Motivation / Context

#108 establishes the ground-truth
harness and numerical contract for the WS1 batch-invariant forward chain. The final stage
of the Qwen3-8B stack projects hidden states back to vocabulary logits:

logits = hidden @ weight.t() # weight is HF [out, in]

This PR provides the deterministic fp32 reference path that downstream kernels (Triton /
CUDA / ROCm) will be validated against. For Qwen3-8B the weight is the output projection
[vocab=151936, hidden=4096] in the HF nn.Linear [out, in] convention (transposed
internally), is independent from the embedding table (tie_word_embeddings=false),
and has no bias.

Changes

rl_engine/kernels/ops/pytorch/linear/lm_head.py — NativeLMHeadOp
- forward() — project in the input dtype, output the input dtype (Axis-B path)
- forward_fp32() — upcast to fp32, accumulate in fp32, forced fp32 output
  (ground-truth / backward golden source)
- Formula: out = hidden @ weight.t() (+ bias)
- Weight is HF [out, in] and transposed internally — the one difference from the bare
  matmul op; do not use interchangeably
- Pure function — inputs never mutated in place; output dtype follows hidden
rl_engine/kernels/registry.py — register PYTORCH_NATIVE_LM_HEAD in OpBackend
and add lm_head dispatch to the cuda / rocm / cpu priority maps
tests/test_lm_head.py — 15 tests (details below)
docs/operators/lm_head.md + nav / index wiring

How this satisfies the #108 contract

#108 requirement	How it's met here
Deterministic reference path	`forward_fp32()` accumulates in fp32; tests use fixed-seed `torch.Generator` so outputs are reproducible
Per-dtype tolerance policy (bitwise vs tight-tolerance)	Axis-A asserted bitwise (`torch.equal`); Axis-B reduction drift checked against a per-dtype tolerance measured relative to the output peak (bf16 ~0.37%, fp16 ~0.05%)
Batch-config sweep / validation helper	Batch-invariance checks compute on the full batch, then assert sliced/padded rows are bitwise identical to their full-batch counterparts, across fp32 / bf16 / fp16
Realistic shapes covered	GPU-only smoke test at the real Qwen3-8B dims (`vocab=151936, hidden=4096`); CPU tests keep the real `hidden=4096` reduction length (only vocab is shrunk) so the drift is representative; skips when CUDA / GPU memory is unavailable

Reduction-specific note (Axis-A reduction order)

A single torch.matmul is not bitwise batch-invariant by default: multi-threaded CPU
GEMM splits the hidden (K) reduction across threads by the M = batch*seq dimension, so
"compute full then slice" ≠ "compute slice" once hidden is large. The tests pin a single
thread to fix the reduction order (a local stand-in for the planned
testing/determinism.py::deterministic_context). On GPU cuBLAS likewise splits K by M,
so a batch-invariant GEMM is a downstream kernel concern — the GPU smoke test validates the
full-vocab shape and fp32 correctness, not Axis-A bitwise.

Test Environment


OS	Ubuntu (kernel 5.15.0-124-generic)
Python	3.12.3
PyTorch	2.8.0+cu128
CUDA / cuDNN	12.8 / 9.10.02

Test Results

python -m pytest tests/test_lm_head.py -v
17 passed

The 17 tests cover:

fp32 correctness vs naive matmul, asserted bitwise (torch.equal); fp32 forward
path bitwise-equal to the ground truth
bf16 / fp16 dtype-path accuracy — max abs error relative to output peak (bf16 ~0.37%,
fp16 ~0.05%), with error stats printed
output shape (hidden.shape[:-1] + (vocab,))
bias semantics (None default == no bias; provided [vocab] bias added)
Axis-A batch invariance — slice + padding variants across fp32 / bf16 / fp16, asserted
bitwise under a pinned single-thread reduction
purity (neither hidden, weight, nor bias mutated in place)
gradient flow to hidden / weight (fp32 autograd = backward golden source), verified
against the closed-form grads
registry dispatch resolves lm_head → NativeLMHeadOp
GPU-only real-shape smoke test (Qwen3-8B vocab=151936, hidden=4096)

Checklist

✅ Pure-PyTorch reference, no custom extension required
✅ Covered at the real Qwen3-8B projection dims (vocab=151936, hidden=4096)
✅ Axis-A bitwise batch invariance enforced (fixed single-thread reduction order)
✅ Axis-B per-dtype tolerance calibrated (relative-to-peak, stats in PR)
✅ Registered in OpBackend + cuda/rocm/cpu priority maps
✅ All 17 tests pass locally

Summary by CodeRabbit

New Features
- Added an LM Head operator to compute vocabulary logits from hidden states, with optional vocabulary bias.
- Introduced a dedicated fp32 reference execution path for more reliable high-precision comparisons.
Documentation
- Added LM Head operator documentation and updated the Operators navigation/index to include it.
Bug Fixes
- Improved kernel dispatch so lm_head routes to the PyTorch-native implementation instead of using the generic fallback.
Tests
- Added extensive LM Head test coverage for correctness, dtype/accuracy behavior, shapes/bias handling, padding/batch invariance, gradients, and dispatch routing.

WS1 ground-truth language-model-head op for issue RL-Align#108 (Qwen3-8B output projection, vocab=151936 x hidden=4096, tie_word_embeddings=false, no bias): - NativeLMHeadOp: out = hidden @ weight.t() (+ bias), a reduction over hidden exposing the forward / forward_fp32 dual-path contract (fp32 ground truth + dtype-behavior path); weight is HF [out, in] and transposed internally (the one difference from the bare matmul op); pure function, no in-place mutation. - register PYTORCH_NATIVE_LM_HEAD in OpBackend and the cuda/rocm/cpu priority maps. - tests/test_lm_head.py: fp32 correctness vs naive matmul (bitwise), bf16/fp16 dtype-path accuracy (relative-to-peak tolerance, bf16 ~0.37% / fp16 ~0.05% of output peak), bias semantics, Axis-A batch invariance (slice + padding, all dtypes) under a pinned single-thread reduction so the CPU GEMM K-split is M-independent, purity, closed-form gradient flow to hidden/weight, registry dispatch, and a GPU-only real-shape smoke test (vocab=151936, hidden=4096). - docs/operators/lm_head.md + nav/index wiring.

coderabbitai · 2026-06-22T04:06:34Z

📝 Walkthrough

Walkthrough

Adds a new lm_head operator (NativeLMHeadOp) implementing hidden-to-vocab projection via transposed weight matmul with optional bias. The class exposes a dtype-preserving forward path and an fp32 reference forward_fp32 path that disables autocast and CUDA TF32. It is registered in the kernel registry for cuda, rocm, and cpu platforms, covered by a full pytest suite, and documented in docs/operators/lm_head.md.

Changes

LM Head Operator

Layer / File(s)	Summary
NativeLMHeadOp implementation and registry wiring `rl_engine/kernels/ops/pytorch/linear/__init__.py`, `rl_engine/kernels/ops/pytorch/linear/lm_head.py`, `rl_engine/kernels/registry.py`	Adds the `linear` package init, `NativeLMHeadOp` with `forward`/`forward_fp32`/`__call__` methods and the `_lm_head` core helper (`hidden @ weight.t()` + optional bias with dtype casting and strict fp32 context control), the `PYTORCH_NATIVE_LM_HEAD` enum member in `OpBackend`, and `lm_head` entries in the per-platform dispatch priority table.
Operator documentation and navigation registration `docs/operators/lm_head.md`, `docs/operators/README.md`, `docs/.nav.yml`	Adds the full `lm_head` operator doc specifying the dual-path API contract (dtype-preserving `forward` vs fp32 reference `forward_fp32` with autocast/TF32 handling), tensor shapes/dtypes, dispatch behavior, accuracy expectations (Axis-B drift tolerances and Axis-A batch invariance), performance notes, and known limitations (fallback-only, fixed-reduction-order constraint); registers the page in nav YAML and operator README index.
Test suite `tests/test_lm_head.py`	Covers fp32 bitwise correctness against naive matmul with TF32 disabled, autocast/TF32 behavior validation, dtype-path accuracy bounds for bf16/fp16 with max-absolute-error scaling, output shape and bias semantics, batch invariance via slicing and padding with single-thread CPU pinning, input purity assertions, gradient correctness via closed-form expected gradients, registry dispatch resolution, and a memory-gated CUDA smoke test at Qwen3-8B dimensions.

Sequence Diagram

sequenceDiagram
  participant Caller
  participant KernelRegistry
  participant NativeLMHeadOp
  participant _lm_head

  Caller->>KernelRegistry: get_op("lm_head")
  KernelRegistry-->>Caller: NativeLMHeadOp instance
  Caller->>NativeLMHeadOp: forward(hidden, weight, bias)
  NativeLMHeadOp->>_lm_head: compute_dtype=hidden.dtype, output_dtype=hidden.dtype
  _lm_head->>_lm_head: cast hidden/weight to compute_dtype
  _lm_head->>_lm_head: out = hidden @ weight.t()
  _lm_head->>_lm_head: add bias if present
  _lm_head->>_lm_head: cast to output_dtype
  _lm_head-->>NativeLMHeadOp: logits
  NativeLMHeadOp-->>Caller: logits [batch, seq, vocab]
  Caller->>NativeLMHeadOp: forward_fp32(hidden, weight, bias)
  NativeLMHeadOp->>_lm_head: compute_dtype=float32, output_dtype=float32, strict_fp32=True
  _lm_head-->>NativeLMHeadOp: logits float32
  NativeLMHeadOp-->>Caller: logits [batch, seq, vocab] fp32

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

inaniloquentee
Flink-ddd
bitborne

Poem

🐇 Hip-hop, a new head is here to stay,
Hidden states projected, vocab on display!
forward_fp32 the golden path we trust,
Transposed weights and bias — compute we must.
The registry now knows just where to go,
Tests ensure the logits flow just so! 🎉

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: adding a pure-PyTorch LM-head operator implementation with numerical contract tests, which directly aligns with the primary deliverable.
Docstring Coverage	✅ Passed	Docstring coverage is 95.83% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

docs/.nav.yml (1)

13-17: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Maintain alphabetical ordering of operator entries in navigation.

The new operators/lm_head.md entry should be inserted in alphabetical order between operators/grpo-loss.md and operators/ratio-kl.md, not appended at the end. This keeps navigation consistent and easier to scan.

📖 Proposed ordering fix

   - Operators:
     - operators/README.md
     - operators/fused-logp.md
     - operators/grpo-loss.md
+    - operators/lm_head.md
     - operators/ratio-kl.md
-    - operators/sampling.md
+    - operators/sampling.md

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/.nav.yml` around lines 13 - 17, The operators list in the navigation
file has an alphabetically misplaced entry. Move the operators/lm_head.md entry
to its correct alphabetical position between operators/grpo-loss.md and
operators/ratio-kl.md, as it should come after "grpo-loss" and before "ratio-kl"
when entries are sorted alphabetically. Remove it from its current position at
the end of the operators list and insert it in the proper alphabetical order to
maintain consistency and readability of the navigation structure.

docs/operators/README.md (1)

21-26: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Maintain alphabetical ordering of operator index entries.

The new [LM Head] entry should be inserted in alphabetical order between [GRPO Loss] and [Policy Ratio + KL Penalty], not after [Sampling]. This keeps the index consistent and easier to navigate.

📖 Proposed ordering fix

 - [Fused LogP](fused-logp.md)
 - [GRPO Loss](grpo-loss.md)
+- [LM Head](lm_head.md)
 - [Policy Ratio + KL Penalty](ratio-kl.md)
 - [Sampling](sampling.md)
-- [LM Head](lm_head.md)
 - [Operator Doc Template](../contributing/operator-doc-template.md)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/operators/README.md` around lines 21 - 26, Reorder the operator index
entries in the README.md file to maintain alphabetical ordering. Move the [LM
Head] link from its current position after [Sampling] to its correct
alphabetical position between [GRPO Loss] and [Policy Ratio + KL Penalty].
Ensure all entries in the list are ordered alphabetically by their display names
to keep the index consistent and easy to navigate.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_lm_head.py`:
- Around line 196-200: The _enough_gpu_memory function can fail test collection
when torch.cuda.mem_get_info() raises a RuntimeError in partially configured
CUDA environments. Wrap the torch.cuda.mem_get_info() call in a try-except block
that catches RuntimeError and returns False when the error is caught, allowing
tests to be skipped gracefully instead of failing during collection.
- Around line 86-87: The test functions test_native_lm_head_dtype_path_accuracy,
and the two other similar test functions at lines 130 and 141 unconditionally
parametrize with torch.float16, which can cause failures on CPU hardware where
half-precision matmul is not guaranteed to be supported. Extract the dtype
tuples used in the parametrize decorators into module-level constants, then
replace the direct parametrization with pytest.param calls that include
conditional runtime checks to skip torch.float16 on CPU backends. Apply this
pattern consistently across all three affected test functions to prevent
backend-dependent test failures.

---

Nitpick comments:
In `@docs/.nav.yml`:
- Around line 13-17: The operators list in the navigation file has an
alphabetically misplaced entry. Move the operators/lm_head.md entry to its
correct alphabetical position between operators/grpo-loss.md and
operators/ratio-kl.md, as it should come after "grpo-loss" and before "ratio-kl"
when entries are sorted alphabetically. Remove it from its current position at
the end of the operators list and insert it in the proper alphabetical order to
maintain consistency and readability of the navigation structure.

In `@docs/operators/README.md`:
- Around line 21-26: Reorder the operator index entries in the README.md file to
maintain alphabetical ordering. Move the [LM Head] link from its current
position after [Sampling] to its correct alphabetical position between [GRPO
Loss] and [Policy Ratio + KL Penalty]. Ensure all entries in the list are
ordered alphabetically by their display names to keep the index consistent and
easy to navigate.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 18702bfa-053c-4402-b236-414de2b74d14

📥 Commits

Reviewing files that changed from the base of the PR and between cd0ca43 and 5ae1c4b.

📒 Files selected for processing (7)

docs/.nav.yml
docs/operators/README.md
docs/operators/lm_head.md
rl_engine/kernels/ops/pytorch/linear/__init__.py
rl_engine/kernels/ops/pytorch/linear/lm_head.py
rl_engine/kernels/registry.py
tests/test_lm_head.py

- Gate CPU float16 matmul parametrizations behind a runtime support probe so unsupported backends skip rather than fail collection. - Harden _enough_gpu_memory against RuntimeError from mem_get_info in partially-configured CUDA environments. - Add docstrings across the op and test suite to meet coverage. - Sort lm_head entries alphabetically in operator nav/README.

Keep consistent with other PyTorch native ops.

inaniloquentee · 2026-06-22T08:26:08Z

+        output_dtype: torch.dtype,
+    ) -> torch.Tensor:
+        """Core matmul: cast to ``compute_dtype``, project, optionally add bias, cast out."""
+        h = hidden.to(compute_dtype)


One precision nit: forward_fp32() casts the inputs to fp32, but the matmul can still run under autocast or CUDA TF32 settings, so it may not be a true fp32 golden reference. Since downstream kernels will compare against this path, could we explicitly disable autocast/TF32 around the matmul, or document that required precision context?

Thanks, @inaniloquentee .

forward_fp32() now disables autocast and CUDA TF32 around the matmul, while saving/restoring the previous TF32 setting so global state does not leak. The regular forward() path is unchanged and still
follows the ambient precision context.

I also added regression coverage for this:

CPU autocast case: forward_fp32 remains equal to the fp32 reference and restores the TF32 flag.

CUDA TF32 case: forward_fp32 is checked against a higher-precision reference when CUDA is available.

Docs were updated to note the precision-context behavior.

Wrap the forward_fp32 matmul to disable autocast and CUDA TF32 (saving and restoring the global allow_tf32 flag) so the fp32 golden path is not silently downcast by the caller's ambient precision context. The dtype-behavior forward path is left to follow ambient precision intentionally. Add tests: forward_fp32 stays true fp32 under ambient autocast and restores the TF32 flag (CPU); numerically beats a TF32 matmul (GPU). Pin TF32 off in the fp32-vs-naive bitwise test. Sync docs accordingly.

coderabbitai

🧹 Nitpick comments (1)

docs/operators/lm_head.md (1)

108-115: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Break the test coverage list into separate sentences for readability.

The test coverage section is accurate and comprehensive, but lines 108–115 condense nine distinct test concerns into a single long sentence that impairs readability. Consider splitting into 2–3 sentences by major category (e.g., "Covers: [precision & dtype behavior]...." then "Also covers: [batch invariance & purity]..." then "GPU-only smoke test...").

📝 Suggested restructuring

- Covers: fp32 correctness vs naive matmul (bitwise, with ambient TF32 pinned off),
- `forward_fp32` precision-context safety (true fp32 under ambient autocast + restores the
- global TF32 flag on CPU; numerically beats a TF32 matmul on GPU), bf16/fp16 dtype-path
- accuracy (relative-to-peak tolerance, with `bias`), output shape, bias semantics, Axis-A
- batch invariance (slice + padding, single-thread reduction, all dtypes), input purity,
- gradient flow to `hidden`/`weight` (closed-form check), registry dispatch, and a GPU-only
- smoke test at the real Qwen3-8B dims (`vocab=151936, hidden=4096`) that skips when CUDA or
- GPU memory is unavailable.
+ Covers: fp32 correctness vs naive matmul (bitwise, with ambient TF32 pinned off) and
+ `forward_fp32` precision-context safety (true fp32 under ambient autocast, restores the
+ global TF32 flag on CPU, numerically beats a TF32 matmul on GPU).
+
+ Also covers: bf16/fp16 dtype-path accuracy (relative-to-peak tolerance, with `bias`),
+ output shape, bias semantics, Axis-A batch invariance (slice + padding, single-thread
+ reduction, all dtypes), input purity, and gradient flow to `hidden`/`weight` (closed-form).
+
+ Registry dispatch and a GPU-only smoke test at the real Qwen3-8B dims (`vocab=151936,
+ hidden=4096`) round out coverage; the smoke test skips when CUDA or GPU memory is unavailable.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/operators/lm_head.md` around lines 108 - 115, The test coverage
description in the lm_head.md documentation is a single long sentence that lists
nine distinct test concerns, making it difficult to read and parse. Break this
single sentence into 2-3 shorter sentences organized by major category: group
precision and dtype-related tests together (fp32 correctness, forward_fp32
safety, bf16/fp16 accuracy, output shape and bias semantics), then create a
second sentence for batch invariance and purity tests (Axis-A batch invariance,
input purity, gradient flow), and finally add a third sentence for the GPU-only
smoke test at Qwen3-8B dimensions. This restructuring will improve readability
while maintaining all the technical details.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@docs/operators/lm_head.md`:
- Around line 108-115: The test coverage description in the lm_head.md
documentation is a single long sentence that lists nine distinct test concerns,
making it difficult to read and parse. Break this single sentence into 2-3
shorter sentences organized by major category: group precision and dtype-related
tests together (fp32 correctness, forward_fp32 safety, bf16/fp16 accuracy,
output shape and bias semantics), then create a second sentence for batch
invariance and purity tests (Axis-A batch invariance, input purity, gradient
flow), and finally add a third sentence for the GPU-only smoke test at Qwen3-8B
dimensions. This restructuring will improve readability while maintaining all
the technical details.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 75685450-1855-40ee-b44d-c14e47328e11

📥 Commits

Reviewing files that changed from the base of the PR and between a3fb370 and 517b12d.

📒 Files selected for processing (3)

docs/operators/lm_head.md
rl_engine/kernels/ops/pytorch/linear/lm_head.py
tests/test_lm_head.py

🚧 Files skipped from review as they are similar to previous changes (2)

rl_engine/kernels/ops/pytorch/linear/lm_head.py
tests/test_lm_head.py

maxiaosong1124 requested review from Flink-ddd, KJLdefeated and inaniloquentee as code owners June 22, 2026 04:06

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread tests/test_lm_head.py Outdated

Comment thread tests/test_lm_head.py

maxiaosong1124 added 2 commits June 22, 2026 14:11

update lm_head.py

a3fb370

Keep consistent with other PyTorch native ops.

inaniloquentee requested changes Jun 22, 2026

View reviewed changes

Flink-ddd added the needs-gpu-ci label Jun 22, 2026

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ws1): NativeLMHeadOp pure-PyTorch ground-truth reference + numerical contract tests#170

feat(ws1): NativeLMHeadOp pure-PyTorch ground-truth reference + numerical contract tests#170
maxiaosong1124 wants to merge 4 commits into
RL-Align:mainfrom
maxiaosong1124:feat/ws1-lm-head-pytorch-op

maxiaosong1124 commented Jun 22, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

inaniloquentee Jun 22, 2026

Uh oh!

maxiaosong1124 Jun 22, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

maxiaosong1124 commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Terminology

Motivation / Context

Changes

How this satisfies the #108 contract

Reduction-specific note (Axis-A reduction order)

Test Environment

Test Results

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

inaniloquentee Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

maxiaosong1124 Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maxiaosong1124 commented Jun 22, 2026 •

edited

Loading

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading