[Kernel][FusedMoE] Fix MoE crash and hang issues #1252

bythew3i · 2025-12-05T03:20:41Z

... also move weight padding to weight load instead of inside kernel.

Root Cause:

TopK on padded gating scores could return out of range topK index
Missing sync_barrier to make sure all devices finish metadata propagation before a2a scatter.
Large bt (block num tokens) caused OOB read and write

TODO:

Tune the block sizes in MoE for GPT-OSS

github-actions · 2025-12-05T04:17:57Z

Description

Start with a short description of what the PR does and how this is a change from
the past.

The rest of the description includes relevant details and context, examples:

why is this change being made,
the problem being solved and any relevant context,
why this is a good solution,
some information about the specific implementation,
shortcomings of the solution and possible future improvements.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456
FIXES: #123456

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure:

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have made or will make corresponding changes to any relevant documentation.

tpu_inference/runner/kv_cache.py

tpu_inference/layers/vllm/quantization/unquantized.py

tpu_inference/kernels/fused_moe/v1/kernel.py

Signed-off-by: Jevin Jiang <jevin0change@gmail.com>

bythew3i requested review from hfan, kyuyeunk, mrjunwan-lang, py4, sixiang-google, vanbasten23, wenxindongwork and yaochengji as code owners December 5, 2025 03:20

kyuyeunk reviewed Dec 5, 2025

View reviewed changes

tpu_inference/runner/kv_cache.py Show resolved Hide resolved

tpu_inference/layers/vllm/quantization/unquantized.py Show resolved Hide resolved

bythew3i force-pushed the jevin-moe branch from 5269e57 to 6d72170 Compare December 5, 2025 19:04

kyuyeunk approved these changes Dec 6, 2025

View reviewed changes

kyuyeunk added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 6, 2025

kyuyeunk reviewed Dec 6, 2025

View reviewed changes

tpu_inference/kernels/fused_moe/v1/kernel.py Outdated Show resolved Hide resolved

Fix MoE crash and hang issues

7f59769

Signed-off-by: Jevin Jiang <jevin0change@gmail.com>

bythew3i force-pushed the jevin-moe branch from 6d72170 to 7f59769 Compare December 6, 2025 06:27

Fix bit_width

24400bd

Signed-off-by: Jevin Jiang <jevin0change@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Kernel][FusedMoE] Fix MoE crash and hang issues #1252

[Kernel][FusedMoE] Fix MoE crash and hang issues #1252

Uh oh!

bythew3i commented Dec 5, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Kernel][FusedMoE] Fix MoE crash and hang issues #1252

Are you sure you want to change the base?

[Kernel][FusedMoE] Fix MoE crash and hang issues #1252

Uh oh!

Conversation

bythew3i commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root Cause:

TODO:

Uh oh!

github-actions bot commented Dec 5, 2025

Description

Tests

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bythew3i commented Dec 5, 2025 •

edited

Loading