Skip to content

Support ByteDance-Seed/BAGEL-7B-MoT quantization in w4a16 format#1633

Open
lvliang-intel wants to merge 9 commits intomainfrom
lvl/support_bagel_mot
Open

Support ByteDance-Seed/BAGEL-7B-MoT quantization in w4a16 format#1633
lvliang-intel wants to merge 9 commits intomainfrom
lvl/support_bagel_mot

Conversation

@lvliang-intel
Copy link
Copy Markdown
Contributor

@lvliang-intel lvliang-intel commented Mar 27, 2026

Description

This PR adds proper BAGEL model quantization support to the standard AutoRound LLM quantization flow and fixes the exported quantization metadata required by downstream vLLM-Omni loading.

Main changes:

(1) Route BAGEL through the LLM compressor.
(2) Load BAGEL with a dedicated custom loader because transformers does not natively recognize the bagel architecture.
(3) Gracefully handle AutoConfig.from_pretrained failures for unsupported model types such as bagel.
(4) Export the correct block_name_to_quantize metadata so downstream runtimes only quantize BAGEL LLM blocks instead of non-LLM modules like connector or vision components.
(5) Add a BAGEL-specific ignore policy to preserve image-generation-sensitive modules in FP16:
a. all moe_gen modules
b. shared attention projections (q_proj, k_proj, v_proj, o_proj)
(6) Fix save_pretrained() in the BAGEL custom loader to use state_dict() instead of named_parameters(), ensuring registered buffers (e.g., rotary embedding caches) are included in the saved model.safetensors for correct reload and inference.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify):

Related Issues

#1608

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Copilot AI review requested due to automatic review settings March 27, 2026 12:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds BAGEL-7B-MoT (ByteDance-Seed/BAGEL-7B-MoT) support to AutoRound’s quantization flow, including custom model loading and metadata/ignore-layer handling needed for downstream runtimes (e.g., vLLM-Omni).

Changes:

  • Introduces a custom BAGEL loader and routes BAGEL through the LLM compressor flow.
  • Adds BAGEL-specific block selection/ignore-layer policies and extends “extra files” copying for BAGEL sub-configs.
  • Improves robustness by handling AutoConfig.from_pretrained(...) failures for unsupported model types.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
auto_round/utils/model.py Routes BAGEL loading, adjusts MLLM detection, and adds extra model files + quant-block hinting.
auto_round/utils/bagel_loader.py New BAGEL custom loader/wrapper and save logic for vLLM-Omni compatibility.
auto_round/special_model_handler.py Registers BAGEL multimodal blocks + BAGEL ignore-layer policy.
auto_round/modeling/unfused_moe/init.py Makes config pre-check resilient to unsupported/unknown model types.
auto_round/compressors/base.py Makes config loading resilient and adds support for model-provided quant-block hints.

Comment on lines +251 to +266
def load_bagel_model(model_path, torch_dtype="auto", device_map=None):
"""Load a BAGEL model for quantization.

Args:
model_path: Path to the BAGEL model directory.
torch_dtype: Data type for model weights.
device_map: Device map for model placement.

Returns:
Tuple of (model, tokenizer).
"""
# Load configs
config_path = os.path.join(model_path, "config.json")
with open(config_path, "r", encoding="utf-8") as f:
bagel_config_dict = json.load(f)

Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_bagel_model() assumes model_path is a local directory and immediately opens os.path.join(model_path, "config.json"). However callers (e.g., mllm_load_model) may pass a HF repo id. Add a guard at the start to resolve repo ids to a local snapshot (e.g., if not os.path.isdir(model_path): model_path = download_or_get_path(model_path, ...)) so BAGEL loading works for both local and remote models.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Comment on lines +334 to +350
# BAGEL requires a custom loader (Qwen2 + not extensions, not in transformers)
_config_path = (
os.path.join(pretrained_model_name_or_path, "config.json")
if os.path.isdir(pretrained_model_name_or_path)
else None
)
if _config_path and os.path.exists(_config_path):
with open(_config_path) as _f:
_mt = json.load(_f).get("model_type")
if _mt == "bagel":
from auto_round.utils.bagel_loader import load_bagel_model

model, tokenizer = load_bagel_model(
pretrained_model_name_or_path,
torch_dtype=torch_dtype,
)
model = _to_model_dtype(model, model_dtype)
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds new BAGEL-specific branching in llm_load_model / mllm_load_model and new multimodal-detection behavior, but there are no corresponding unit tests under test/ to cover (a) BAGEL being treated as LLM-only by is_mllm_model for both local and remote paths, and (b) BAGEL routing to the custom loader. The test suite already covers similar branching for other model types (e.g., GLM image), so adding focused tests here would prevent regressions.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Comment on lines +334 to +350
# BAGEL requires a custom loader (Qwen2 + not extensions, not in transformers)
_config_path = (
os.path.join(pretrained_model_name_or_path, "config.json")
if os.path.isdir(pretrained_model_name_or_path)
else None
)
if _config_path and os.path.exists(_config_path):
with open(_config_path) as _f:
_mt = json.load(_f).get("model_type")
if _mt == "bagel":
from auto_round.utils.bagel_loader import load_bagel_model

model, tokenizer = load_bagel_model(
pretrained_model_name_or_path,
torch_dtype=torch_dtype,
)
model = _to_model_dtype(model, model_dtype)
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BAGEL routing here only triggers when pretrained_model_name_or_path is a local directory (checks os.path.isdir + reads local config.json). If the user passes a HF repo id (the common AutoRound flow), this branch is skipped and AutoModelForCausalLM.from_pretrained() will still be attempted, which is expected to fail for model_type=bagel. Consider detecting BAGEL for remote repos too (e.g., hf_hub_download config.json or download_or_get_path + read config.json) and then call load_bagel_model with the resolved local snapshot path.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Comment on lines 790 to 805
model_path = model_or_path if isinstance(model_or_path, str) else model_or_path.name_or_path

# Check model_type exclusion: some models have multimodal components
# but should be quantized as LLM (e.g., BAGEL not).
_model_type = None
if isinstance(model_or_path, torch.nn.Module) and hasattr(model_or_path, "config"):
_model_type = getattr(model_or_path.config, "model_type", None)
elif isinstance(model_path, str) and os.path.isdir(model_path):
_cfg_path = os.path.join(model_path, "config.json")
if os.path.exists(_cfg_path):
with open(_cfg_path) as _f:
_model_type = json.load(_f).get("model_type")
if _model_type in _LLM_ONLY_MODEL_TYPES:
return False
# For dummy model, model_path could be "".
if model_path and not os.path.isdir(model_path):
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_mllm_model() checks _LLM_ONLY_MODEL_TYPES (e.g., bagel) only before download_or_get_path() runs. For a remote HF repo id, _model_type stays None at that point, the model is downloaded, and the function then proceeds to detect multimodal artifacts (e.g., preprocessor_config.json) and will incorrectly return True for BAGEL. Move the model_type check to after the potential download (or re-check once model_path is resolved) so BAGEL is consistently treated as LLM-only for both local and remote inputs.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

@lvliang-intel
Copy link
Copy Markdown
Contributor Author

Quantize Script
quantize_bagel.py

Run inference with vLLM-Omni(with patch for bagel mot model)

run_bagel.py

CUDA_VISIBLE_DEVICES=0 python run_bagel.py --model /mnt/disk4/lvl/BAGEL-7B-MoT/ --prompt "A cute cat sitting on a windowsill" --output orginal_bagel_model_output.png

Image

CUDA_VISIBLE_DEVICES=0 python run_bagel.py --model /mnt/disk4/lvl/BAGEL-7B-MoT-W4A16/ --prompt "A cute cat sitting on a windowsill" --output quantized_bagel_model_output.png

Image

Signed-off-by: lvliang-intel <liang1.lv@intel.com>
lvliang-intel and others added 3 commits March 27, 2026 21:10
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@wenhuach21
Copy link
Copy Markdown
Contributor

How about upstreaming the model once it’s supported (assuming the license allows it)? There’s no need to wait for the PR to be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants