Update dependency ggml-org/llama.cpp to v9272 - autoclosed by renovate[bot] · Pull Request #153 · henrywang/lux

Adds a new BPE pre-type LLAMA_VOCAB_PRE_TYPE_CARBON for the
HybridDNATokenizer used by HuggingFaceBio/Carbon-{500M,3B,8B}.
The base BPE is Qwen3-4B-Base's; what differs is that text inside ... regions is chunked into fixed 6-mers (right-padded
with 'A' on the trailing partial), and any base outside ACGT maps
to .

src/llama-vocab.{h,cpp}: new pre-type, dispatched from
llm_tokenizer_bpe_session::tokenize.
src/llama-vocab-carbon.h: pure helpers (tokenize_carbon,
emit_dna_kmers) factored out for unit testing — no llama_vocab
dependency, vocab access goes through a std::function.
conversion/base.py: detect HybridDNATokenizer by class name in
get_vocab_base_pre (chktxt collides with Qwen3 base since it
has no ), and pass trust_remote_code=True in get_vocab_base
so the custom tokenizer class can load.
tests/test-tokenizer-carbon.cpp: 12 cases covering single 6-mer,
multi 6-mer, lowercase, invalid base -> , partial k-mer
right-pad, mixed text+DNA, empty , unterminated ,
two regions, vocab miss.
vocab : align Carbon-3B changes with llama.cpp conventions
Fold tokenize_carbon + emit_dna_kmers inline into
llm_tokenizer_bpe_session (drop src/llama-vocab-carbon.h),
matching how every other tokenizer keeps its helpers inside
llama-vocab.cpp.
Replace the standalone unit test with the conventional
test-tokenizer-0 row backed by models/ggml-vocab-carbon.gguf
(vocab-only conversion) + .inp/.out fixtures covering single
6-mer, multi 6-mer, lowercase, invalid base -> , partial
right-pad, mixed text+DNA, empty , unterminated ,
two regions.
Register "carbon" in convert_hf_to_gguf_update.py's model list
(pointing at HuggingFaceBio/Carbon-3B) and teach both
AutoTokenizer call sites in the updater to pass
trust_remote_code=True for it, matching how t5 is special-cased.
vocab : move Carbon dispatch to _set_vocab_carbon + LlamaModel branch

Refactor the conversion-side changes to follow the per-tokenizer-family
convention used by _set_vocab_qwen, _set_vocab_interns1, _set_vocab_glm,
etc. instead of conditionalising the shared get_vocab_base /
get_vocab_base_pre paths.

conversion/base.py: add _set_vocab_carbon — self-contained, loads
with trust_remote_code=True so HybridDNATokenizer's merged Qwen3 + DNA
vocab is visible, writes tokenizer.ggml.pre = "carbon" directly.
conversion/llama.py: branch in LlamaModel.set_vocab on
tokenizer_config.json["tokenizer_class"] == "HybridDNATokenizer" and
dispatch to _set_vocab_carbon. Same precedent as conversion/bert.py
(tokenizer_class branch between BertTokenizer / RobertaTokenizer) and
conversion/phi.py.
conversion/base.py: revert the conditional in get_vocab_base and the
class-name short-circuit in the auto-generated get_vocab_base_pre.
tests : expand ggml-vocab-carbon.gguf fixtures with model-card examples

Add 6 cases from the Carbon-3B model card on top of the existing edge
coverage: the unterminated basic-completion prompt, the closed 33-bp
example, the metadata-conditioned prompt (with <vertebrate_mammalian>
and <protein_coding_region> which BPE-decompose since they are not in
the vocab), the documented anti-pattern of raw DNA without tags,
and the two likelihood-scoring examples. Brings the suite to 19 cases.

vocab : promote HybridDNATokenizer to its own LLAMA_VOCAB_TYPE

Refactor per upstream review:

This should be its own tokenizer model, ie. carbonhybriddna instead
of gpt2 and not carbon pre-tokenizer. That way you can keep the
correct pre-tokenizer, in case that ever changes.

Previously the tokenizer was modelled as LLAMA_VOCAB_TYPE_BPE plus a
new LLAMA_VOCAB_PRE_TYPE_CARBON, which (a) put a CARBON-specific
branch inside llm_tokenizer_bpe_session::tokenize (only existing
pre-types differ in regex, not dispatch logic), and (b) conflated
"hybrid DNA tokenization" with "Qwen3 BPE pre-tokenizer".

This change moves it to its own vocab type, peer to PLAMO2, with the
GGUF model name matching the HF tokenizer class (HybridDNATokenizer):

include/llama.h: new LLAMA_VOCAB_TYPE_HYBRIDDNA = 7.
src/llama-vocab.cpp: new llm_tokenizer_hybriddna + session that
owns std::unique_ptr<llm_tokenizer_bpe> for non- text and
routes raw text through a DNA-aware splitter; wired into
init_tokenizer, tokenize, type_name, byte_to_token, and the
BPE-style token_to_piece case (DNA k-mers + //
are pure ASCII, so byte-level BPE decoding handles them).
LLAMA_VOCAB_TYPE_HYBRIDDNA gets its own branch in the vocab-type
config block alongside SPM/WPM/UGM/RWKV, where pre_type is set
to QWEN2 and the matching add_space_prefix / escape_whitespaces /
clean_spaces flags are applied — mirroring qwen2's BPE path so
byte-level BPE merging stays bit-identical to the Python
reference for non-DNA text.
src/llama-vocab.h: drop the short-lived LLAMA_VOCAB_PRE_TYPE_CARBON.
conversion/base.py: _set_vocab_hybriddna writes
tokenizer.ggml.model = "hybriddna" (no separate pre).
conversion/llama.py: dispatch on tokenizer_class ==
"HybridDNATokenizer" same as bert.py / phi.py do.
models/ggml-vocab-hybriddna.gguf{,.inp,.out}: renamed fixture +
regenerated metadata.
convert_hf_to_gguf_update.py: drop the stale chkhsh entry and
trust_remote_code special-case (no longer needed since dispatch
is now class-name driven, not chkhsh).

Verified end-to-end against HuggingFaceBio/Carbon-{500M,3B,8B}:
tokenization is bit-identical to the Python HybridDNATokenizer for
all 19 test fixtures plus the model-card metadata-conditioned
prompt; greedy completion produces the same DNA continuation as
the Python reference; spec-dec with 500M as draft for 8B still
works.

vocab : relax llm_tokenizer_bpe assert to allow HYBRIDDNA
vocab : drop llm_tokenizer_bpe vocab-type assert
vocab : write tokenizer.ggml.pre for HYBRIDDNA, share BPE dispatch
vocab : assert BPE or HYBRIDDNA in llm_tokenizer_bpe
vocab : annotate #endif with PRETOKENIZERDEBUG
vocab : drop local hybriddna fixture (moves to ggml-org/vocabs)
deduplicate
simplify
simplify

Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

`vb9267`

Compare Source

Details

ggml : Check the right iface method before using the fallback 2d get (#23306)

Probably no backends implement only one of 2d get/set, but this
might be annoying for some future backend developer trying to add
2d get/set.

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

`vb9266`

Compare Source

Details

llama-graph: fix null-buffer crash in llm_graph_input_attn_kv_iswa for SWA-only models (#23131)

When a model has zero non-SWA attention layers (e.g. a SWA-only slice of Gemma 4),
the base KV cache has no layer tensors. The input tensors (self_k_idxs, self_v_idxs,
self_kq_mask) are created as graph input nodes but never consumed by any compute node,
so the backend scheduler never allocates a buffer for them. Calling
mctx->get_base()->set_input_k_idxs() on an unallocated tensor then hits
GGML_ASSERT(buffer) at ggml-backend.cpp:194.

The same scenario applies symmetrically: if a model had zero SWA layers, the SWA
tensors would be unallocated.

Fix: guard both the base and SWA set_input calls with null/buffer checks, matching
the pattern already used by llm_graph_input_mem_hybrid_iswa::set_input (line ~674)
which has the comment: 'base tensors may not be allocated if there are no non-SWA
attention layers'.

Also fix can_reuse() in the same class to skip the ne[0] and kq_mask checks for
unallocated tensors, preventing a null-dereference on the reuse path.

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

`vb9265`

Compare Source

Details

hexagon: ssm-conv fix for large prompts (#23307)

hexagon: remove gathers and better handling of vtcm in ssm-conv
hexagon: relax ssm-conv gating requirements
hexagon: add new prefill ssm-conv backend test
hexagon: remove trailing white space
hex-rope: uninline rope_cache_init, otherwise it breaks after rebaseing with SSM_CONV changes

Co-authored-by: Max Krasnyansky maxk@qti.qualcomm.com

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

`vb9264`

Compare Source

Details

app : show version (#23426)

Signed-off-by: Adrien Gallouët angt@huggingface.co

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

`vb9263`

Compare Source

Details

mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision (#23329)

HunyuanOCR shares the same HF arch and vision layout as HunyuanVL butwas split into a separate path that skipped the +0.1 bilinear sampler used by the HF reference.
Collapse OCR into the HUNYUANVL projector + HUNYUAN_VL text arch

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

`vb9260`

Compare Source

Details

opencl: refactor backend initilization (#23318)

opencl: refactor initialization
opencl: refactor GPU identification
opencl: rename for consistency
opencl: cache global mem size in dev_ctx
opencl: adjust log level
opencl: load argsort and flash_attn kernels in supports_op
argsort kernel must be built for supports_op for querying the max
workgroups
flash_attn kernel has many variants, only load them when needed

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

`vb9259`

Compare Source

Details

common/speculative : fix nullptr crash in get_devices_str (#23386)

ggml_backend_dev_by_name always appends a nullptr sentinel to the devices
vector. Skipping nullptr entries prevents assertion failure in
ggml_backend_dev_name.

Assisted-by: llama.cpp:local pi

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

`vb9258`

Compare Source

Details

mtmd : DeepSeek-OCR image processing fixes, img_tool::resize padding refactor (#23345)

mtmd : deepseek-ocr fixes, improvements and refactoring

image processing changes to achieve full parity with Pillow (reference impl)
SAM mask casting only when flash-attn is on
SAM refactor (build_sam() extracted so deepseek-ocr-2 can reuse it)
llama-chat changes to fix server/WebUI issue (new media_markers_first())
adapted test-chat-template and added test cases for deepseek-ocr
changed regression test for deepseek-ocr to use CER+chrF scores for ground-truth comparison; removed embedding-model
ty.toml ignore unresolved-import for tools/mtmd/tests/**

image-text reordering fix removed
refactor bool add_padding + pad_rounding enum into a single pad_style enum

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

`vb9257`

Compare Source

Details

vulkan: optimize operations in the IM2COL shader (#22685)

vulkan: optimize operations in the IM2COL shader
Add comments and improve the code formatting

macOS/iOS:

Linux:

Ubuntu x64 (CPU)
Ubuntu arm64 (CPU)
[Ubuntu s390x (CPU)](https://redirect.github.com/ggml-org/llama.cpp/releases/download/b9257/llama-b9257-bin-ubuntu-s3

✂ Note

PR body was truncated to here.

Configuration

📅 Schedule: (UTC)

Branch creation
- At any time (no schedule defined)
Automerge
- At any time (no schedule defined)

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

Update dependency ggml-org/llama.cpp to v9272

e39ea95

renovate Bot changed the title ~~Update dependency ggml-org/llama.cpp to v9272~~ Update dependency ggml-org/llama.cpp to v9272 - autoclosed May 22, 2026

renovate Bot closed this May 22, 2026

renovate Bot deleted the renovate/ggml-org-llama.cpp-9272.x branch May 22, 2026 01:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update dependency ggml-org/llama.cpp to v9272 - autoclosed#153

Update dependency ggml-org/llama.cpp to v9272 - autoclosed#153
renovate[bot] wants to merge 1 commit into
mainfrom
renovate/ggml-org-llama.cpp-9272.x

renovate Bot commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

renovate Bot commented May 21, 2026

Release Notes

Configuration

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants