fix: Gemma 4 + TurboQuant KV no longer crashes on second prompt when --cache-reuse enabled by sujitvasanth · Pull Request #10 · AtomicBot-ai/atomic-llama-cpp-turboquant

sujitvasanth · 2026-05-11T14:28:26Z

Overview

The previous cache bug #9 prevented the discovery of a knock on problem in the RoPE implementation. This fix is necessary to allow TurboQuant to function properly with cache reuse with gemma 4.

TurboQuant (turbo2/3/4) uses kernel-level WHT rotation, which is position-invariant -- WHT preserves inner products so no RoPE correction is needed after a KV position shift.

build_graph_shift() assumed standard quantized tensors with upstream rotation, but TurboQuant sets attn_rot_k=0 and handles rotation at kernel level. Building the shift graph with turbo-padded tensors causes a null buffer assert and segfault on the second prompt.

Fix: skip build_graph_shift() layers and get_has_shift() entirely for turbo KV types. Position tracking via seq_add() still works correctly -- only the broken RoPE re-rotation kernel is skipped.

Additional information

Combined with the previous PR that recognises caching in Gemma 4 this leads to near instataneous chat conversations on llama-sever web gui, when previously there was a reprocessing lag of 7 seconds plus, and a crash with any prompt causing a sliding window shift.
I have tested to around 6k of available 250k context and working flawlessly now.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: Yers, coauthored with Claude, I have built and tested - confirm fully functional in my rtx3060+gtx1660 setup on ubuntu 20.04

…--cache-reuse enabled TurboQuant (turbo2/3/4) uses kernel-level WHT rotation which is position-invariant -- WHT preserves inner products so no RoPE correction is needed after a KV position shift. build_graph_shift() assumed standard quantized tensors with upstream rotation, but TurboQuant sets attn_rot_k=0 and handles rotation at kernel level. Building the shift graph with turbo-padded tensors causes a null buffer assert and segfault on the second prompt. Fix: skip build_graph_shift() layers and get_has_shift() entirely for turbo KV types. Position tracking via seq_add() still works correctly -- only the broken RoPE re-rotation kernel is skipped.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Gemma 4 + TurboQuant KV no longer crashes on second prompt when --cache-reuse enabled#10

fix: Gemma 4 + TurboQuant KV no longer crashes on second prompt when --cache-reuse enabled#10
sujitvasanth wants to merge 1 commit into
AtomicBot-ai:feature/turboquant-kv-cachefrom
sujitvasanth:fix/turbo-rope-shift-gemma4

sujitvasanth commented May 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sujitvasanth commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Requirements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sujitvasanth commented May 11, 2026 •

edited

Loading