cuda with fused int4 #16154

mergennachin · 2025-12-09T15:57:56Z

GPU Device Caching for Encoder Output in CUDA Backend
Fused INT4 weight-only quantized matmul pass for CUDA backend

Add CUDA GPU caching functionality for encoder outputs to improve performance in ASR applications by avoiding redundant computation. Key changes: - Add GPU caching mechanism in cuda_backend.cpp with RAII management - Add clear_stored_tensor option for cache control - Add encoder output caching support in ASR runner

pytorch-bot · 2025-12-09T15:58:01Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16154

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 1 Unrelated Failure

As of commit 8f11434 with merge base 6cca6e6 ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner / linux-job (gh)
>>> Lint for backends/cuda/tests/test_fuse_int4_quant_matmul.py:
pull / test-multimodal-linux (gemma3-4b) / linux-job (gh)
RuntimeError: Command docker exec -t b51c344de91f58543d4822f0857159bdbb1ea9517f503c8151c50eb2b7b4b678 /exec failed with exit code 139
pull / unittest-editable / macos / macos-job (gh)
export/tests/test_target_recipes.py::TestTargetRecipes::test_linear_model
Test CUDA Builds / test-model-cuda-e2e (openai, whisper-large-v3-turbo, quantized-int4-tile-packed) / linux-job (gh)
RuntimeError: Command docker exec -t 73cbbeee2e03cd56301da6b0bbcef9175a06ed309c6379ed60d9aa1910abbfef /exec failed with exit code 1
Test CUDA Builds / test-model-cuda-e2e (openai, whisper-small, quantized-int4-tile-packed) / linux-job (gh)
RuntimeError: Command docker exec -t a1fcafc075113f59213c1f42407d11ab019d3c3c010872216d9d90ea60fab2a4 /exec failed with exit code 1

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / android / run-emulator (gh) (#16137)
Timeout waiting for emulator to boot.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-12-09T15:58:38Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Add fusion pass that combines multiple int4pack_mm operations sharing the same input tensor into a single fused operation, reducing kernel launch overhead for LLM attention (Q/K/V) and MLP (Gate/Up) projections. Key changes: - Add FuseInt4WeightOnlyQuantMatmulPass in backends/cuda/passes/ - Add CSEPass before fusion to merge duplicate preprocessing chains - Fix AotiBackend.preprocess to properly handle PassResult from passes that return new graph_modules (using _update_exported_program_graph_module) - Add comprehensive tests for the fusion pass

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 9, 2025

mergennachin force-pushed the cuda_with_fused_int4 branch 3 times, most recently from 59621ef to d4782c6 Compare December 9, 2025 18:49

mergennachin force-pushed the cuda_with_fused_int4 branch from d4782c6 to 8f11434 Compare December 10, 2025 14:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cuda with fused int4 #16154

cuda with fused int4 #16154

Uh oh!

mergennachin commented Dec 9, 2025

Uh oh!

pytorch-bot bot commented Dec 9, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cuda with fused int4 #16154

Are you sure you want to change the base?

cuda with fused int4 #16154

Uh oh!

Conversation

mergennachin commented Dec 9, 2025

Uh oh!

pytorch-bot bot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16154

❌ 5 New Failures, 1 Unrelated Failure

Uh oh!

github-actions bot commented Dec 9, 2025

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Dec 9, 2025 •

edited

Loading

This PR needs a `release notes:` label