[codex] fix Gemma 4 thinking generation prompt by romgenie · Pull Request #1 · CompleteTech-LLC-AI-Research/llama.cpp

romgenie · 2026-04-30T02:02:43Z

What changed

Fixes the Gemma 4 generation prompt in both shipped Gemma 4 templates so thinking mode opens the thought channel for generation instead of closing an empty thought block. Non-thinking mode now leaves the model turn open without injecting <|channel>thought\n<channel|>.

Why

The previous guard was inverted: it emitted a closed empty thought channel when enable_thinking was false. In practice this prevented Gemma 4 from producing visible reasoning when thinking was enabled and leaked the wrong control tokens into non-thinking prompts.

Validation

Built test-chat in an Ubuntu 24.04 Podman container and ran ./build-test/bin/test-chat --template google-gemma-4.
Ran the full ./build-test/bin/test-chat suite.
Built the Vulkan server image with .devops/vulkan.Dockerfile using UBUNTU_VERSION=24.04 as localhost/llama.cpp:server-vulkan-latest-gemma4-test.
Smoke-ran the image with llama-server --version, which loaded the Vulkan backend and reported version: 8981 (d77599234).

fix Gemma 4 thinking generation prompt

ff89113

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] fix Gemma 4 thinking generation prompt#1

[codex] fix Gemma 4 thinking generation prompt#1
romgenie wants to merge 1 commit into
masterfrom
codex/gemma4-thinking-template-latest

romgenie commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

romgenie commented Apr 30, 2026

What changed

Why

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant