Skip to content

Update dependency ggml-org/llama.cpp to v9272 - autoclosed#153

Closed
renovate[bot] wants to merge 1 commit into
mainfrom
renovate/ggml-org-llama.cpp-9272.x
Closed

Update dependency ggml-org/llama.cpp to v9272 - autoclosed#153
renovate[bot] wants to merge 1 commit into
mainfrom
renovate/ggml-org-llama.cpp-9272.x

Conversation

@renovate
Copy link
Copy Markdown
Contributor

@renovate renovate Bot commented May 21, 2026

ℹ️ Note

This PR body was truncated due to platform limits.

This PR contains the following updates:

Package Update Change
ggml-org/llama.cpp major b9066b9272

Release Notes

ggml-org/llama.cpp (ggml-org/llama.cpp)

vb9272

Compare Source

Details

app : add batched-bench, fit-params, quantize & perplexity (#​23459)

  • app : add batched-bench, fit-params, quantize & perplexity

Signed-off-by: Adrien Gallouët angt@huggingface.co

  • Add missing main.cpp

Signed-off-by: Adrien Gallouët angt@huggingface.co

  • Add EOL

Signed-off-by: Adrien Gallouët angt@huggingface.co


Signed-off-by: Adrien Gallouët angt@huggingface.co

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

vb9271

Compare Source

Details

mtp: use inp_out_ids for skipping logit computation (#​23433)

when doing a follow-up decode for the draft model, we were always doing the logit computation even though it is not required.

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

vb9270

Compare Source

Details

vocab : add Carbon-3B (HybridDNATokenizer) support (#​23410)

  • vocab : add Carbon-3B (HybridDNATokenizer) support

Adds a new BPE pre-type LLAMA_VOCAB_PRE_TYPE_CARBON for the
HybridDNATokenizer used by HuggingFaceBio/Carbon-{500M,3B,8B}.
The base BPE is Qwen3-4B-Base's; what differs is that text inside ... regions is chunked into fixed 6-mers (right-padded
with 'A' on the trailing partial), and any base outside ACGT maps
to .

  • src/llama-vocab.{h,cpp}: new pre-type, dispatched from
    llm_tokenizer_bpe_session::tokenize.

  • src/llama-vocab-carbon.h: pure helpers (tokenize_carbon,
    emit_dna_kmers) factored out for unit testing — no llama_vocab
    dependency, vocab access goes through a std::function.

  • conversion/base.py: detect HybridDNATokenizer by class name in
    get_vocab_base_pre (chktxt collides with Qwen3 base since it
    has no ), and pass trust_remote_code=True in get_vocab_base
    so the custom tokenizer class can load.

  • tests/test-tokenizer-carbon.cpp: 12 cases covering single 6-mer,
    multi 6-mer, lowercase, invalid base -> , partial k-mer
    right-pad, mixed text+DNA, empty , unterminated ,
    two regions, vocab miss.

  • vocab : align Carbon-3B changes with llama.cpp conventions

  • Fold tokenize_carbon + emit_dna_kmers inline into
    llm_tokenizer_bpe_session (drop src/llama-vocab-carbon.h),
    matching how every other tokenizer keeps its helpers inside
    llama-vocab.cpp.

  • Replace the standalone unit test with the conventional
    test-tokenizer-0 row backed by models/ggml-vocab-carbon.gguf
    (vocab-only conversion) + .inp/.out fixtures covering single
    6-mer, multi 6-mer, lowercase, invalid base -> , partial
    right-pad, mixed text+DNA, empty , unterminated ,
    two regions.

  • Register "carbon" in convert_hf_to_gguf_update.py's model list
    (pointing at HuggingFaceBio/Carbon-3B) and teach both
    AutoTokenizer call sites in the updater to pass
    trust_remote_code=True for it, matching how t5 is special-cased.

  • vocab : move Carbon dispatch to _set_vocab_carbon + LlamaModel branch

Refactor the conversion-side changes to follow the per-tokenizer-family
convention used by _set_vocab_qwen, _set_vocab_interns1, _set_vocab_glm,
etc. instead of conditionalising the shared get_vocab_base /
get_vocab_base_pre paths.

  • conversion/base.py: add _set_vocab_carbon — self-contained, loads
    with trust_remote_code=True so HybridDNATokenizer's merged Qwen3 + DNA
    vocab is visible, writes tokenizer.ggml.pre = "carbon" directly.

  • conversion/llama.py: branch in LlamaModel.set_vocab on
    tokenizer_config.json["tokenizer_class"] == "HybridDNATokenizer" and
    dispatch to _set_vocab_carbon. Same precedent as conversion/bert.py
    (tokenizer_class branch between BertTokenizer / RobertaTokenizer) and
    conversion/phi.py.

  • conversion/base.py: revert the conditional in get_vocab_base and the
    class-name short-circuit in the auto-generated get_vocab_base_pre.

  • tests : expand ggml-vocab-carbon.gguf fixtures with model-card examples

Add 6 cases from the Carbon-3B model card on top of the existing edge
coverage: the unterminated basic-completion prompt, the closed 33-bp
example, the metadata-conditioned prompt (with <vertebrate_mammalian>
and <protein_coding_region> which BPE-decompose since they are not in
the vocab), the documented anti-pattern of raw DNA without tags,
and the two likelihood-scoring examples. Brings the suite to 19 cases.

  • vocab : promote HybridDNATokenizer to its own LLAMA_VOCAB_TYPE

Refactor per upstream review:

This should be its own tokenizer model, ie. carbonhybriddna instead
of gpt2 and not carbon pre-tokenizer. That way you can keep the
correct pre-tokenizer, in case that ever changes.

Previously the tokenizer was modelled as LLAMA_VOCAB_TYPE_BPE plus a
new LLAMA_VOCAB_PRE_TYPE_CARBON, which (a) put a CARBON-specific
branch inside llm_tokenizer_bpe_session::tokenize (only existing
pre-types differ in regex, not dispatch logic), and (b) conflated
"hybrid DNA tokenization" with "Qwen3 BPE pre-tokenizer".

This change moves it to its own vocab type, peer to PLAMO2, with the
GGUF model name matching the HF tokenizer class (HybridDNATokenizer):

  • include/llama.h: new LLAMA_VOCAB_TYPE_HYBRIDDNA = 7.
  • src/llama-vocab.cpp: new llm_tokenizer_hybriddna + session that
    owns std::unique_ptr<llm_tokenizer_bpe> for non- text and
    routes raw text through a DNA-aware splitter; wired into
    init_tokenizer, tokenize, type_name, byte_to_token, and the
    BPE-style token_to_piece case (DNA k-mers + //
    are pure ASCII, so byte-level BPE decoding handles them).
    LLAMA_VOCAB_TYPE_HYBRIDDNA gets its own branch in the vocab-type
    config block alongside SPM/WPM/UGM/RWKV, where pre_type is set
    to QWEN2 and the matching add_space_prefix / escape_whitespaces /
    clean_spaces flags are applied — mirroring qwen2's BPE path so
    byte-level BPE merging stays bit-identical to the Python
    reference for non-DNA text.
  • src/llama-vocab.h: drop the short-lived LLAMA_VOCAB_PRE_TYPE_CARBON.
  • conversion/base.py: _set_vocab_hybriddna writes
    tokenizer.ggml.model = "hybriddna" (no separate pre).
  • conversion/llama.py: dispatch on tokenizer_class ==
    "HybridDNATokenizer" same as bert.py / phi.py do.
  • models/ggml-vocab-hybriddna.gguf{,.inp,.out}: renamed fixture +
    regenerated metadata.
  • convert_hf_to_gguf_update.py: drop the stale chkhsh entry and
    trust_remote_code special-case (no longer needed since dispatch
    is now class-name driven, not chkhsh).

Verified end-to-end against HuggingFaceBio/Carbon-{500M,3B,8B}:
tokenization is bit-identical to the Python HybridDNATokenizer for
all 19 test fixtures plus the model-card metadata-conditioned
prompt; greedy completion produces the same DNA continuation as
the Python reference; spec-dec with 500M as draft for 8B still
works.

  • vocab : relax llm_tokenizer_bpe assert to allow HYBRIDDNA

  • vocab : drop llm_tokenizer_bpe vocab-type assert

  • vocab : write tokenizer.ggml.pre for HYBRIDDNA, share BPE dispatch

  • vocab : assert BPE or HYBRIDDNA in llm_tokenizer_bpe

  • vocab : annotate #endif with PRETOKENIZERDEBUG

  • vocab : drop local hybriddna fixture (moves to ggml-org/vocabs)

  • deduplicate

  • simplify

  • simplify


Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

vb9267

Compare Source

Details

ggml : Check the right iface method before using the fallback 2d get (#​23306)

Probably no backends implement only one of 2d get/set, but this
might be annoying for some future backend developer trying to add
2d get/set.

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

vb9266

Compare Source

Details

llama-graph: fix null-buffer crash in llm_graph_input_attn_kv_iswa for SWA-only models (#​23131)

When a model has zero non-SWA attention layers (e.g. a SWA-only slice of Gemma 4),
the base KV cache has no layer tensors. The input tensors (self_k_idxs, self_v_idxs,
self_kq_mask) are created as graph input nodes but never consumed by any compute node,
so the backend scheduler never allocates a buffer for them. Calling
mctx->get_base()->set_input_k_idxs() on an unallocated tensor then hits
GGML_ASSERT(buffer) at ggml-backend.cpp:194.

The same scenario applies symmetrically: if a model had zero SWA layers, the SWA
tensors would be unallocated.

Fix: guard both the base and SWA set_input calls with null/buffer checks, matching
the pattern already used by llm_graph_input_mem_hybrid_iswa::set_input (line ~674)
which has the comment: 'base tensors may not be allocated if there are no non-SWA
attention layers'.

Also fix can_reuse() in the same class to skip the ne[0] and kq_mask checks for
unallocated tensors, preventing a null-dereference on the reuse path.

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

vb9265

Compare Source

Details

hexagon: ssm-conv fix for large prompts (#​23307)

  • hexagon: remove gathers and better handling of vtcm in ssm-conv

  • hexagon: relax ssm-conv gating requirements

  • hexagon: add new prefill ssm-conv backend test

  • hexagon: remove trailing white space

  • hex-rope: uninline rope_cache_init, otherwise it breaks after rebaseing with SSM_CONV changes


Co-authored-by: Max Krasnyansky maxk@qti.qualcomm.com

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

vb9264

Compare Source

Details

app : show version (#​23426)

Signed-off-by: Adrien Gallouët angt@huggingface.co

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

vb9263

Compare Source

Details

mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision (#​23329)

  • HunyuanOCR shares the same HF arch and vision layout as HunyuanVL butwas split into a separate path that skipped the +0.1 bilinear sampler used by the HF reference.
  • Collapse OCR into the HUNYUANVL projector + HUNYUAN_VL text arch

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

vb9260

Compare Source

Details

opencl: refactor backend initilization (#​23318)

  • opencl: refactor initialization

  • opencl: refactor GPU identification

  • opencl: rename for consistency

  • opencl: cache global mem size in dev_ctx

  • opencl: adjust log level

  • opencl: load argsort and flash_attn kernels in supports_op

  • argsort kernel must be built for supports_op for querying the max
    workgroups

  • flash_attn kernel has many variants, only load them when needed

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

vb9259

Compare Source

Details

common/speculative : fix nullptr crash in get_devices_str (#​23386)

ggml_backend_dev_by_name always appends a nullptr sentinel to the devices
vector. Skipping nullptr entries prevents assertion failure in
ggml_backend_dev_name.

Assisted-by: llama.cpp:local pi

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

vb9258

Compare Source

Details

mtmd : DeepSeek-OCR image processing fixes, img_tool::resize padding refactor (#​23345)

  • mtmd : deepseek-ocr fixes, improvements and refactoring
  • image processing changes to achieve full parity with Pillow (reference impl)
  • SAM mask casting only when flash-attn is on
  • SAM refactor (build_sam() extracted so deepseek-ocr-2 can reuse it)
  • llama-chat changes to fix server/WebUI issue (new media_markers_first())
  • adapted test-chat-template and added test cases for deepseek-ocr
  • changed regression test for deepseek-ocr to use CER+chrF scores for ground-truth comparison; removed embedding-model
  • ty.toml ignore unresolved-import for tools/mtmd/tests/**
  • image-text reordering fix removed

  • refactor bool add_padding + pad_rounding enum into a single pad_style enum

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

vb9257

Compare Source

Details

vulkan: optimize operations in the IM2COL shader (#​22685)

  • vulkan: optimize operations in the IM2COL shader

  • Add comments and improve the code formatting

macOS/iOS:

Linux:

Note

PR body was truncated to here.


Configuration

📅 Schedule: (UTC)

  • Branch creation
    • At any time (no schedule defined)
  • Automerge
    • At any time (no schedule defined)

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate renovate Bot changed the title Update dependency ggml-org/llama.cpp to v9272 Update dependency ggml-org/llama.cpp to v9272 - autoclosed May 22, 2026
@renovate renovate Bot closed this May 22, 2026
@renovate renovate Bot deleted the renovate/ggml-org-llama.cpp-9272.x branch May 22, 2026 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants