feat: add Gemma4 VL support by anxiangsir · Pull Request #141 · EvolvingLMMs-Lab/LLaVA-OneVision-2

anxiangsir · 2026-05-17T15:37:28Z

Summary

Add Gemma4-VL model family registration plus language, vision, adapter, and multimodal data pipeline support.
Add HF↔mcore Gemma4 checkpoint conversion tools and 26B-A4B quick-start scripts.
Add Gemma4 regression and consistency smoke tests covering attention masks, RoPE, vision state dicts, model skeleton, and fixture loading.

Validation

pytest tests/_shared/test_gemma4_per_layer_window_size.py tests/_shared/test_gemma4_rope_inv_freq.py tests/_shared/test_gemma4_vision_state_dict.py tests/_shared/test_gemma4_vl_attention_mask.py tests/_shared/test_gemma4_vl_skeleton.py tests/consistency_gemma4/test_model_consistency.py::test_collection_smoke -v → 25 passed.
HF→mcore fresh conversion completed for /ov2/pretrain_models/google/gemma-4-26B-A4B-it into tmp_test_gemma4_mcore_ckpt_conversion_fresh.
mcore→HF roundtrip completed into tmp_test_gemma4_roundtrip_hf_fresh; original vs roundtrip HF had 1013/1013 keys, 0 missing/extra/shape mismatches, and full tensor equality with max_diff=0.0.
Existing full consistency passed previously against tmp_test_gemma4_mcore_ckpt_tp1_pp1_ep1; rerun against the fresh checkpoint was blocked by current A800 GPU memory pressure/OOM, not by checkpoint key/layout errors.

Notes

This PR intentionally excludes unrelated dirty worktree files such as offline packing changes, generated checkpoints, logs, and temporary scripts.
The new shared config/argument fields default to no-op values and are only activated by the Gemma4 model config.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

Copilot

Pull request overview

Adds end-to-end Gemma4-VL model support to LLaVA-OneVision-2, including the Gemma4 hybrid-attention LLM, vision tower, adapter, multimodal data plumbing, HF↔mcore checkpoint converters, quick-start training scripts, and accompanying smoke/consistency tests.

Changes:

New gemma4_vl model family (LLM with hybrid sliding/full attention, K=V tying, per-layer scalars; Gemma4 ViT + adapter) and registration through provider/arguments/data plugins/chat template.
New tools/convert_checkpoint/custom/gemma4_vl/ converters (LLM, ViT body, vision patch, adapter, and merger) plus two-stage quick-start shell scripts.
New regression tests in tests/_shared/ and a tests/consistency_gemma4/ collection-smoke fixture.

Reviewed changes

Copilot reviewed 51 out of 53 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
aiak_training_llm/models/gemma4_vl/*	Gemma4 LLM/ViT/adapter/provider definitions
aiak_training_llm/data/chat_templete.py	Registers `gemma4` chat template (turn delimiter strings look malformed)
aiak_training_llm/data/mm_plugin.py	Adds `Gemma4VLPlugin`; introduces a redundant `torch` import
aiak_training_llm/train/arguments.py	Adds Gemma4 architecture-specific CLI args; list/dict defaults lack `type=`
aiak_training_llm/train/pretrain/pretrain_gemma4_vl.py	Gemma4 pretraining entry; contains dead `is_video` / `attn_mask_type` assignments
tools/convert_checkpoint/custom/gemma4_vl/{llm,vit,vision_patch,adapter,merge_megatron}.py	HF↔mcore conversion + merger; carry stale Baidu copyright headers
examples/gemma4_vl/quick_start_26b_a4b/stage{1,2}_*.sh	Quick-start scripts with shebang placed mid-file rather than line 1
tests/consistency_gemma4/conftest.py	Consistency fixture; passes `--chat-template qwen2-vl` for a Gemma4 model
tests/shared/test_gemma4*.py	New smoke tests for window/RoPE/attention mask/vision state dict/skeleton

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        slots=["<|turn>user\n{{content}}<turn|>\n<|turn>model\n"]
+    ),
+    format_assistant=StringFormatter(slots=["{{content}}<turn|>\n"]),
+    format_system=StringFormatter(slots=["<|turn>system\n{{content}}<turn|>\n"]),


+TOKENIZER_PATH=${TOKENIZER_PATH:-"/ov2/pretrain_models/google/gemma-4-26B-A4B-it"}
+CHECKPOINT_PATH=${CHECKPOINT_PATH:-"/workspace/LLaVA-OneVision-2/stage_0_gemma4_26b_a4b_release"}
+
+#! /bin/bash


+TOKENIZER_PATH=${TOKENIZER_PATH:-"/ov2/pretrain_models/google/gemma-4-26B-A4B-it"}
+CHECKPOINT_PATH=${CHECKPOINT_PATH:-"/workspace/LLaVA-OneVision-2/stage_0_gemma4_26b_a4b_release"}
+
+#! /bin/bash


+        "--chat-template",
+        "qwen2-vl",


+        attn_mask = build_gemma4_mm_attention_mask(
+            tokens, mm_token_type_ids, sliding_window=sliding_window
+        )
+        attn_mask_type = AttnMaskType.causal


+        video_grid_thw = tensor_parallel.broadcast_data(["video_grid_thw"], data, torch.int32)["video_grid_thw"]
+
+    packed_seq_params = None
+    is_video = video_token_id in tokens


    import torch



+    group.add_argument('--layer-pattern', default=[],
+                       help='Gemma4 per-layer attention pattern. Usually set by --model-name.')
+    group.add_argument('--per-layer-kv-channels', default={},
+                       help='Gemma4 per-layer-type head dim overrides. Usually set by --model-name.')
+    group.add_argument('--per-layer-num-query-groups', default={},
+                       help='Gemma4 per-layer-type KV head overrides. Usually set by --model-name.')
+    group.add_argument('--attention-k-eq-v', action='store_true', default=False,
+                       help='Enable Gemma4 K=V tying. Usually set by --model-name.')
+    group.add_argument('--kv-tied-layers', default=[],
+                       help='Gemma4 K=V tied layer indices. Usually set by --model-name.')


+
+            # K=V: value shares the K tensor. No clone — downstream kernels (TE/flash-attn)
+            # treat K and V as read-only. Saves one tensor allocation per layer.
+            value = key


+################################################################################
+#
+# Copyright (c) 2024 Baidu.com, Inc. All Rights Reserved
+#
+################################################################################


anxiangsir and others added 6 commits May 17, 2026 23:34

feat: register Gemma4 VL model family

f59615c

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

feat: add Gemma4 VL language model core

24a3d1d

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

feat: add Gemma4 VL vision stack

a02cb15

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

feat: add Gemma4 VL data pipeline

2e0bae3

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

feat: add Gemma4 checkpoint conversion tools

f91220a

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

test: add Gemma4 VL regression coverage

5d6380d

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

Copilot AI review requested due to automatic review settings May 17, 2026 15:37

Copilot started reviewing on behalf of anxiangsir May 17, 2026 15:37 View session

Copilot AI reviewed May 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Gemma4 VL support#141

feat: add Gemma4 VL support#141
anxiangsir wants to merge 6 commits into
llava_onevision2from
feat/gemma4-vl

anxiangsir commented May 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anxiangsir commented May 17, 2026

Summary

Validation

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants