Skip to content

Add MLU support to is_flash_linear_attention_available#46995

Open
atri2549 wants to merge 3 commits into
huggingface:mainfrom
atri2549:fla-mlu-support
Open

Add MLU support to is_flash_linear_attention_available#46995
atri2549 wants to merge 3 commits into
huggingface:mainfrom
atri2549:fla-mlu-support

Conversation

@atri2549

@atri2549 atri2549 commented Jul 1, 2026

Copy link
Copy Markdown

CI

What does this PR do?

This PR adds MLU device support to is_flash_linear_attention_available() by allowing flash linear attention (fla) to be used on MLU devices in addition to CUDA.

Why is this needed?

MLU devices support flash linear attention, but the current availability check only allows CUDA. This prevents models that rely on fla (e.g., Qwen3.5, Qwen3-Next, OLMo-Hybrid) from using the feature on MLU hardware.

Implementation details

  • Add is_torch_mlu_available() check to is_flash_linear_attention_available() using the same cuda or mlu pattern already used in is_flash_attn_2_available().
  • No behavioral change for CUDA or any other accelerator.

Impact

Enables flash linear attention on MLU devices. No impact on other backends.

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

CI recap

Dashboard: View test results in Grafana
Latest run: 28510401907:2
Result: failure | Jobs: 13 | Tests: 63,153 | Failures: 0 | Duration: 17h 49m

@Rocketknight1

Copy link
Copy Markdown
Member

cc @ArthurZucker for FA I think!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants