Skip to content

Conversation

@bythew3i
Copy link
Collaborator

@bythew3i bythew3i commented Dec 5, 2025

... also move weight padding to weight load instead of inside kernel.

Root Cause:

  1. TopK on padded gating scores could return out of range topK index
  2. Missing sync_barrier to make sure all devices finish metadata propagation before a2a scatter.
  3. Large bt (block num tokens) caused OOB read and write

TODO:

  • Tune the block sizes in MoE for GPT-OSS

@github-actions
Copy link

github-actions bot commented Dec 5, 2025

Description

Start with a short description of what the PR does and how this is a change from
the past.

The rest of the description includes relevant details and context, examples:

  • why is this change being made,
  • the problem being solved and any relevant context,
  • why this is a good solution,
  • some information about the specific implementation,
  • shortcomings of the solution and possible future improvements.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456
FIXES: #123456

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure:

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have made or will make corresponding changes to any relevant documentation.

@kyuyeunk kyuyeunk added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 6, 2025
Signed-off-by: Jevin Jiang <jevin0change@gmail.com>
Signed-off-by: Jevin Jiang <jevin0change@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants