Commit d4782c6
committed
Fused INT4 weight-only quantized matmul pass for CUDA backend
Add fusion pass that combines multiple int4pack_mm operations sharing the
same input tensor into a single fused operation, reducing kernel launch
overhead for LLM attention (Q/K/V) and MLP (Gate/Up) projections.
Key changes:
- Add FuseInt4WeightOnlyQuantMatmulPass in backends/cuda/passes/
- Add CSEPass before fusion to merge duplicate preprocessing chains
- Fix AotiBackend.preprocess to properly handle PassResult from passes
that return new graph_modules (using _update_exported_program_graph_module)
- Add comprehensive tests for the fusion pass1 parent 0e13ae6 commit d4782c6
File tree
6 files changed
+1573
-3
lines changed- backends
- aoti
- cuda
- passes
- tests
6 files changed
+1573
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
| 25 | + | |
24 | 26 | | |
25 | 27 | | |
26 | 28 | | |
| |||
156 | 158 | | |
157 | 159 | | |
158 | 160 | | |
159 | | - | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
160 | 195 | | |
161 | 196 | | |
162 | 197 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
| 18 | + | |
17 | 19 | | |
18 | 20 | | |
19 | 21 | | |
| |||
50 | 52 | | |
51 | 53 | | |
52 | 54 | | |
53 | | - | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
54 | 61 | | |
55 | 62 | | |
56 | 63 | | |
57 | 64 | | |
58 | 65 | | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
59 | 75 | | |
60 | 76 | | |
61 | 77 | | |
| |||
68 | 84 | | |
69 | 85 | | |
70 | 86 | | |
71 | | - | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
72 | 91 | | |
73 | 92 | | |
74 | 93 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
0 commit comments