Update dependency ggml-org/llama.cpp to v9272 - autoclosed#153
Closed
renovate[bot] wants to merge 1 commit into
Closed
Update dependency ggml-org/llama.cpp to v9272 - autoclosed#153renovate[bot] wants to merge 1 commit into
renovate[bot] wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
b9066→b9272Release Notes
ggml-org/llama.cpp (ggml-org/llama.cpp)
vb9272Compare Source
Details
app : add batched-bench, fit-params, quantize & perplexity (#23459)
Signed-off-by: Adrien Gallouët angt@huggingface.co
Signed-off-by: Adrien Gallouët angt@huggingface.co
Signed-off-by: Adrien Gallouët angt@huggingface.co
Signed-off-by: Adrien Gallouët angt@huggingface.co
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
vb9271Compare Source
Details
mtp: use inp_out_ids for skipping logit computation (#23433)
when doing a follow-up decode for the draft model, we were always doing the logit computation even though it is not required.
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
vb9270Compare Source
Details
vocab : add Carbon-3B (HybridDNATokenizer) support (#23410)
Adds a new BPE pre-type LLAMA_VOCAB_PRE_TYPE_CARBON for the
HybridDNATokenizer used by HuggingFaceBio/Carbon-{500M,3B,8B}.
The base BPE is Qwen3-4B-Base's; what differs is that text inside ... regions is chunked into fixed 6-mers (right-padded
with 'A' on the trailing partial), and any base outside ACGT maps
to .
src/llama-vocab.{h,cpp}: new pre-type, dispatched from
llm_tokenizer_bpe_session::tokenize.
src/llama-vocab-carbon.h: pure helpers (tokenize_carbon,
emit_dna_kmers) factored out for unit testing — no llama_vocab
dependency, vocab access goes through a std::function.
conversion/base.py: detect HybridDNATokenizer by class name in
get_vocab_base_pre (chktxt collides with Qwen3 base since it
has no ), and pass trust_remote_code=True in get_vocab_base
so the custom tokenizer class can load.
tests/test-tokenizer-carbon.cpp: 12 cases covering single 6-mer,
multi 6-mer, lowercase, invalid base -> , partial k-mer
right-pad, mixed text+DNA, empty , unterminated ,
two regions, vocab miss.
vocab : align Carbon-3B changes with llama.cpp conventions
Fold tokenize_carbon + emit_dna_kmers inline into
llm_tokenizer_bpe_session (drop src/llama-vocab-carbon.h),
matching how every other tokenizer keeps its helpers inside
llama-vocab.cpp.
Replace the standalone unit test with the conventional
test-tokenizer-0 row backed by models/ggml-vocab-carbon.gguf
(vocab-only conversion) + .inp/.out fixtures covering single
6-mer, multi 6-mer, lowercase, invalid base -> , partial
right-pad, mixed text+DNA, empty , unterminated ,
two regions.
Register "carbon" in convert_hf_to_gguf_update.py's model list
(pointing at HuggingFaceBio/Carbon-3B) and teach both
AutoTokenizer call sites in the updater to pass
trust_remote_code=True for it, matching how t5 is special-cased.
vocab : move Carbon dispatch to _set_vocab_carbon + LlamaModel branch
Refactor the conversion-side changes to follow the per-tokenizer-family
convention used by _set_vocab_qwen, _set_vocab_interns1, _set_vocab_glm,
etc. instead of conditionalising the shared get_vocab_base /
get_vocab_base_pre paths.
conversion/base.py: add _set_vocab_carbon — self-contained, loads
with trust_remote_code=True so HybridDNATokenizer's merged Qwen3 + DNA
vocab is visible, writes tokenizer.ggml.pre = "carbon" directly.
conversion/llama.py: branch in LlamaModel.set_vocab on
tokenizer_config.json["tokenizer_class"] == "HybridDNATokenizer" and
dispatch to _set_vocab_carbon. Same precedent as conversion/bert.py
(tokenizer_class branch between BertTokenizer / RobertaTokenizer) and
conversion/phi.py.
conversion/base.py: revert the conditional in get_vocab_base and the
class-name short-circuit in the auto-generated get_vocab_base_pre.
tests : expand ggml-vocab-carbon.gguf fixtures with model-card examples
Add 6 cases from the Carbon-3B model card on top of the existing edge
coverage: the unterminated basic-completion prompt, the closed 33-bp
example, the metadata-conditioned prompt (with <vertebrate_mammalian>
and <protein_coding_region> which BPE-decompose since they are not in
the vocab), the documented anti-pattern of raw DNA without tags,
and the two likelihood-scoring examples. Brings the suite to 19 cases.
Refactor per upstream review:
Previously the tokenizer was modelled as LLAMA_VOCAB_TYPE_BPE plus a
new LLAMA_VOCAB_PRE_TYPE_CARBON, which (a) put a CARBON-specific
branch inside llm_tokenizer_bpe_session::tokenize (only existing
pre-types differ in regex, not dispatch logic), and (b) conflated
"hybrid DNA tokenization" with "Qwen3 BPE pre-tokenizer".
This change moves it to its own vocab type, peer to PLAMO2, with the
GGUF model name matching the HF tokenizer class (HybridDNATokenizer):
owns std::unique_ptr<llm_tokenizer_bpe> for non- text and
routes raw text through a DNA-aware splitter; wired into
init_tokenizer, tokenize, type_name, byte_to_token, and the
BPE-style token_to_piece case (DNA k-mers + //
are pure ASCII, so byte-level BPE decoding handles them).
LLAMA_VOCAB_TYPE_HYBRIDDNA gets its own branch in the vocab-type
config block alongside SPM/WPM/UGM/RWKV, where pre_type is set
to QWEN2 and the matching add_space_prefix / escape_whitespaces /
clean_spaces flags are applied — mirroring qwen2's BPE path so
byte-level BPE merging stays bit-identical to the Python
reference for non-DNA text.
tokenizer.ggml.model = "hybriddna" (no separate pre).
"HybridDNATokenizer" same as bert.py / phi.py do.
regenerated metadata.
trust_remote_code special-case (no longer needed since dispatch
is now class-name driven, not chkhsh).
Verified end-to-end against HuggingFaceBio/Carbon-{500M,3B,8B}:
tokenization is bit-identical to the Python HybridDNATokenizer for
all 19 test fixtures plus the model-card metadata-conditioned
prompt; greedy completion produces the same DNA continuation as
the Python reference; spec-dec with 500M as draft for 8B still
works.
vocab : relax llm_tokenizer_bpe assert to allow HYBRIDDNA
vocab : drop llm_tokenizer_bpe vocab-type assert
vocab : write tokenizer.ggml.pre for HYBRIDDNA, share BPE dispatch
vocab : assert BPE or HYBRIDDNA in llm_tokenizer_bpe
vocab : annotate #endif with PRETOKENIZERDEBUG
vocab : drop local hybriddna fixture (moves to ggml-org/vocabs)
deduplicate
simplify
simplify
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
vb9267Compare Source
Details
ggml : Check the right iface method before using the fallback 2d get (#23306)
Probably no backends implement only one of 2d get/set, but this
might be annoying for some future backend developer trying to add
2d get/set.
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
vb9266Compare Source
Details
llama-graph: fix null-buffer crash in llm_graph_input_attn_kv_iswa for SWA-only models (#23131)
When a model has zero non-SWA attention layers (e.g. a SWA-only slice of Gemma 4),
the base KV cache has no layer tensors. The input tensors (self_k_idxs, self_v_idxs,
self_kq_mask) are created as graph input nodes but never consumed by any compute node,
so the backend scheduler never allocates a buffer for them. Calling
mctx->get_base()->set_input_k_idxs() on an unallocated tensor then hits
GGML_ASSERT(buffer) at ggml-backend.cpp:194.
The same scenario applies symmetrically: if a model had zero SWA layers, the SWA
tensors would be unallocated.
Fix: guard both the base and SWA set_input calls with null/buffer checks, matching
the pattern already used by llm_graph_input_mem_hybrid_iswa::set_input (line ~674)
which has the comment: 'base tensors may not be allocated if there are no non-SWA
attention layers'.
Also fix can_reuse() in the same class to skip the ne[0] and kq_mask checks for
unallocated tensors, preventing a null-dereference on the reuse path.
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
vb9265Compare Source
Details
hexagon: ssm-conv fix for large prompts (#23307)
hexagon: remove gathers and better handling of vtcm in ssm-conv
hexagon: relax ssm-conv gating requirements
hexagon: add new prefill ssm-conv backend test
hexagon: remove trailing white space
hex-rope: uninline rope_cache_init, otherwise it breaks after rebaseing with SSM_CONV changes
Co-authored-by: Max Krasnyansky maxk@qti.qualcomm.com
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
vb9264Compare Source
Details
app : show version (#23426)
Signed-off-by: Adrien Gallouët angt@huggingface.co
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
vb9263Compare Source
Details
mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision (#23329)
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
vb9260Compare Source
Details
opencl: refactor backend initilization (#23318)
opencl: refactor initialization
opencl: refactor GPU identification
opencl: rename for consistency
opencl: cache global mem size in dev_ctx
opencl: adjust log level
opencl: load argsort and flash_attn kernels in supports_op
argsort kernel must be built for supports_op for querying the max
workgroups
flash_attn kernel has many variants, only load them when needed
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
vb9259Compare Source
Details
common/speculative : fix nullptr crash in get_devices_str (#23386)
ggml_backend_dev_by_name always appends a nullptr sentinel to the devices
vector. Skipping nullptr entries prevents assertion failure in
ggml_backend_dev_name.
Assisted-by: llama.cpp:local pi
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
vb9258Compare Source
Details
mtmd : DeepSeek-OCR image processing fixes, img_tool::resize padding refactor (#23345)
image-text reordering fix removed
refactor bool add_padding + pad_rounding enum into a single pad_style enum
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
vb9257Compare Source
Details
vulkan: optimize operations in the IM2COL shader (#22685)
vulkan: optimize operations in the IM2COL shader
Add comments and improve the code formatting
macOS/iOS:
Linux:
Configuration
📅 Schedule: (UTC)
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.