Skip to content

Conversation

@Gasoonjia
Copy link
Contributor

we skip using triton sdpa when deploying gemma3 model into executorch cuda backend to prevent perf regression. Will revert the config once our triton/cuda sdpa kernel can uniformly beat decomposed sdpa kernel.

Also remove unnecessary conv1d_to_conv2d decomposition; it is already inside et.

@larryliu0820 larryliu0820 merged commit eeafd42 into huggingface:main Dec 8, 2025
63 of 83 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants