Commit 1037f70
Gemma 4 26B-A4B MoE: full architecture support + KV compression + NEON optimization
Gemma 4 hybrid MoE architecture (128 experts, 8 active) with dual-FFN,
hybrid attention (sliding+full), QK-norm, learned RoPE, and GeGLU activation.
Architecture fixes (10 bugs):
- Load dense FFN weights alongside MoE experts (root cause of garbage output)
- Parallel dual-FFN: Dense MLP + MoE from same input, outputs summed
- layer_output_scale: simple multiply (was incorrectly residual-contribution)
- Attention scale = 1.0 for QK-normed models
- MoE expert activation: GeGLU (was SwiGLU)
- MoE router: separate unweighted RMS norm + 1/sqrt(dim) scaling
- V normalization (unweighted RMS norm per head)
- Disable attention softcap for Gemma 4
- RoPE full dimension (remove STEP35 halving)
- IQ3_XXS dequantization with 256-entry grid codebook
Performance (-53% per-token latency):
- IQ3_XXS/IQ4_NL NEON fused dot for MoE experts
- Q8_0 two-accumulator NEON with prefetch
- GeGLU NEON (fast tanh via Schraudolph exp)
- GGUF embedding: skip 2.8GB FP32 alloc, use Q6_K fused dot
- Skip Q4 weight conversion for Gemma 4 MoE (Q8_0 fused dot faster)
KV compression for QK-normed models:
- Auto FP32 keys + Q4 values (QK-norm keys too sparse for 4-bit)
- All KV types produce correct output: "Paris", "서울"
- Hybrid cache layout: max(sliding, full) head_dim allocation
- 3.5x V memory reduction with perfect quality preservation
Score: 99.7% (34/34 tests, 0 warnings, 7.53x compression, 5.78x SIMD)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 05f067e commit 1037f70
File tree
10 files changed
+954
-112
lines changed- docs
- include/turboquant
- src/engine
10 files changed
+954
-112
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
173 | 173 | | |
174 | 174 | | |
175 | 175 | | |
176 | | - | |
| 176 | + | |
177 | 177 | | |
178 | | - | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
179 | 197 | | |
180 | 198 | | |
181 | 199 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
20 | | - | |
| 19 | + | |
| 20 | + | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| |||
136 | 137 | | |
137 | 138 | | |
138 | 139 | | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
139 | 151 | | |
140 | 152 | | |
141 | 153 | | |
| |||
167 | 179 | | |
168 | 180 | | |
169 | 181 | | |
170 | | - | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
171 | 187 | | |
172 | | - | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
173 | 204 | | |
174 | 205 | | |
175 | 206 | | |
| |||
179 | 210 | | |
180 | 211 | | |
181 | 212 | | |
182 | | - | |
| 213 | + | |
183 | 214 | | |
184 | 215 | | |
185 | 216 | | |
186 | 217 | | |
187 | 218 | | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
188 | 228 | | |
189 | 229 | | |
190 | 230 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
233 | 233 | | |
234 | 234 | | |
235 | 235 | | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
236 | 240 | | |
237 | 241 | | |
238 | 242 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
224 | 224 | | |
225 | 225 | | |
226 | 226 | | |
| 227 | + | |
227 | 228 | | |
228 | 229 | | |
229 | 230 | | |
| |||
263 | 264 | | |
264 | 265 | | |
265 | 266 | | |
| 267 | + | |
266 | 268 | | |
267 | 269 | | |
268 | 270 | | |
| |||
0 commit comments