Skip to content

Add linear_attn entries to Qwen3.5 base_model_tp_plan#47009

Open
ZAID646 wants to merge 1 commit into
huggingface:mainfrom
ZAID646:fix/qwen3_5-tp-plan
Open

Add linear_attn entries to Qwen3.5 base_model_tp_plan#47009
ZAID646 wants to merge 1 commit into
huggingface:mainfrom
ZAID646:fix/qwen3_5-tp-plan

Conversation

@ZAID646

@ZAID646 ZAID646 commented Jul 1, 2026

Copy link
Copy Markdown

CI

Fixes #46846

Qwen3.5 is a hybrid model: ~75% of decoder layers use linear_attention (Gated DeltaNet) with their own projection matrices (in_proj_qkv, in_proj_z, in_proj_b, in_proj_a, out_proj). These were missing from base_model_tp_plan, causing:

  • OOM at TP>1 (weights not sharded)
  • RuntimeError in model.generate() (Conv1d channel mismatch after in_proj_qkv)

Fix: Add all linear_attn.* projections with "colwise_gather_output" pattern, which shards the weight matrix (fixing OOM) and all-gathers activations before the depthwise Conv1d (fixing the shape mismatch).

Updated both modular_qwen3_5.py (source) and configuration_qwen3_5.py (auto-generated).

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3_5

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

CI recap

Dashboard: View test results in Grafana
Latest run: 28538882882:2
Result: success | Jobs: 2 | Tests: 10 | Failures: 0 | Duration: 45s

@Rocketknight1

Copy link
Copy Markdown
Member

cc @Cyrilvallez for TP!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Qwen3.5] Missing linear_attn entries in base_model_tp_plan causes OOM and shape error at TP>1

2 participants