forked from TheTom/llama-cpp-turboquant
-
Notifications
You must be signed in to change notification settings - Fork 14
Pull requests: AtomicBot-ai/atomic-llama-cpp-turboquant
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
feat: one-sided target probability acceptance for MTP drafts increases acceptance rate and throughput compared to argmax alone
examples
server
#8
opened May 11, 2026 by
sujitvasanth
Loading…
llama: prefix MTP assistant tensors with 'mtp.' on load allowing use of -ot 'mtp..*=CUDA0' flag
#7
opened May 11, 2026 by
sujitvasanth
Loading…
Enhance CUDA flash attention kernel selection for DKQ=512 with low gq…
ggml
Nvidia GPU
#6
opened May 8, 2026 by
Ooooze
Loading…
Repro: MTP path on CUDA aborts at fattn.cu:109 (DKQ=512) for Gemma 4 — Blackwell sm_120 + Ampere sm_86
documentation
Improvements or additions to documentation
#5
opened May 8, 2026 by
jameseiten
•
Draft
ProTip!
Updated in the last three days: updated:>2026-05-08.