[ATOM-vLLM][Qwen3.5][Qwen3Next][GDN] add packed decode fast path for GDN by zejunchen-zejun · Pull Request #508 · ROCm/ATOM

zejunchen-zejun · 2026-04-07T14:07:27Z

have a packed recurrent attention triton kernel for fast path
Qwen3-Next-80B-A3B-Instruct-FP8 TP4 1K/8K Concurrency 4 TTPS + 1.6%
Qwen3-Next-80B-A3B-Instruct-FP8 TP4 1K/8K Concurrency 32 TTPS + 1.13%

Copilot

Pull request overview

This PR adds a packed recurrent decode “fast path” for the GDN attention backend in the vLLM OOT plugin, and enables it for the pure non-speculative decode case (targeting Qwen3.5/Qwen3Next on vLLM 0.19.0).

Changes:

Introduces a non-spec decode helper (_forward_core_decode_non_spec) that uses fused_recurrent_gated_delta_rule_packed_decode.
Gates the packed-decode path behind vllm_envs.VLLM_ENABLE_FLA_PACKED_RECURRENT_DECODE and runtime metadata conditions (no spec masks, decode-only).
Updates KV cache access to use compilation_config[layer_name].kv_cache directly (removing virtual_engine indexing).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

atom/plugin/vllm/attention_backend/attention_gdn.py

fast path, enable the pure non-spec decode GDN fast path into the OOT plugin backend for Qwen3.5/Qwen3Next Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

zejunchen-zejun · 2026-04-14T11:54:13Z

No performance benefit

Copilot AI review requested due to automatic review settings April 7, 2026 14:07

zejunchen-zejun marked this pull request as draft April 7, 2026 14:07

Copilot started reviewing on behalf of zejunchen-zejun April 7, 2026 14:09 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

atom/plugin/vllm/attention_backend/attention_gdn.py Show resolved Hide resolved

zejunchen-zejun changed the title ~~[plugin][OOT Qwen3.5][GDN][vLLM 0.19.0] add GDN packed decode fast path, enable the pure non-spec decode GDN fast path into the OOT plugin backend for Qwen3.5/Qwen3Next~~ [plugin][OOT Qwen3.5][GDN] add GDN packed decode fast path Apr 7, 2026

zejunchen-zejun added 2 commits April 13, 2026 14:23

[plugin][OOT Qwen3.5][GDN][vLLM 0.19.0] add GDN packed decode

1dfb784

fast path, enable the pure non-spec decode GDN fast path into the OOT plugin backend for Qwen3.5/Qwen3Next Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

remove cat op and use continguous buffer for mixed_qkv

9583056

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

zejunchen-zejun force-pushed the zejun/vLLM_19_qwen3.5_fast_path_0407 branch from 799dc44 to 9583056 Compare April 13, 2026 06:24

zejunchen-zejun changed the title ~~[plugin][OOT Qwen3.5][GDN] add GDN packed decode fast path~~ [ATOM-vLLM][Qwen3.5][GDN] add GDN packed decode fast path Apr 13, 2026

zejunchen-zejun changed the title ~~[ATOM-vLLM][Qwen3.5][GDN] add GDN packed decode fast path~~ [ATOM-vLLM][Qwen3.5][Qwen3Next][GDN] add packed decode fast path for GDN Apr 13, 2026

zejunchen-zejun closed this Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ATOM-vLLM][Qwen3.5][Qwen3Next][GDN] add packed decode fast path for GDN#508

[ATOM-vLLM][Qwen3.5][Qwen3Next][GDN] add packed decode fast path for GDN#508
zejunchen-zejun wants to merge 2 commits intomainfrom
zejun/vLLM_19_qwen3.5_fast_path_0407

zejunchen-zejun commented Apr 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

zejunchen-zejun commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zejunchen-zejun commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

zejunchen-zejun commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zejunchen-zejun commented Apr 7, 2026 •

edited

Loading