Nvidia ModelOpt workaround for issue 28072 #30164

shengliangxu · 2025-12-06T00:26:53Z

Purpose

This is a workaround for issue #28072

Before this issue gets officially resolved, this workaround can make ModelOpt quantized models loading work without issues.

Test Plan

Tested using nvidia/Qwen2.5-VL-7B-Instruct-NVFP4

Test Result

vllm serve succeed

Before this issue gets officially resolved, this workaround can make ModelOpt quantized models work without issues. Tested using https://huggingface.co/nvidia/Qwen2.5-VL-7B-Instruct-NVFP4/tree/refs%2Fpr%2F2 Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

github-actions · 2025-12-06T00:27:01Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request introduces a workaround to fix an issue with loading ModelOpt quantized models, specifically addressing how exclude_modules with wildcards are handled during weight remapping. The change correctly transforms module_path* patterns into module_path and module_path.* to ensure proper matching. The logic is sound and effectively resolves the issue. I have one suggestion to improve the code's readability and maintainability.

gemini-code-assist · 2025-12-06T00:28:41Z

vllm/model_executor/layers/quantization/modelopt.py

+                if len(exclude) >= 2 and exclude[-1] == "*" and exclude[-2] != ".":
+                    new_exclude_modules.append(exclude[:-1])
+                    new_exclude_modules.append(exclude[:-1] + ".*")
+                else:
+                    new_exclude_modules.append(exclude)


The logic to identify and transform the wildcard patterns is correct. However, using direct string indexing (exclude[-1], exclude[-2], exclude[:-1]) can be less readable and potentially more error-prone for future modifications compared to using built-in string methods. Refactoring this to use endswith() and removesuffix() would make the code's intent clearer and improve its maintainability.

Suggested change

if len(exclude) >= 2 and exclude[-1] == "*" and exclude[-2] != ".":

new_exclude_modules.append(exclude[:-1])

new_exclude_modules.append(exclude[:-1] + ".*")

else:

new_exclude_modules.append(exclude)

if len(exclude) > 1 and exclude.endswith("*") and not exclude.endswith(".*"):

base = exclude.removesuffix("*")

new_exclude_modules.append(base)

new_exclude_modules.append(f"{base}.*")

else:

new_exclude_modules.append(exclude)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-12-06T00:30:32Z

vllm/model_executor/layers/quantization/modelopt.py

+            new_exclude_modules = []
+            for exclude in self.exclude_modules:
+                if len(exclude) >= 2 and exclude[-1] == "*" and exclude[-2] != ".":
+                    new_exclude_modules.append(exclude[:-1])
+                    new_exclude_modules.append(exclude[:-1] + ".*")


Workaround expands exclusions beyond intended prefix

The new loop replaces each module_path* entry with plain module_path/module_path.* before mapping. In is_layer_excluded the legacy substring branch checks exclude_module in prefix, so these plain entries now match any module name containing the substring rather than only those starting with the ModelOpt wildcard. With configs that blacklist branches such as vision_model*, prefixes like encoder.vision_model_adapter (which would not have matched the original wildcard) will now be treated as excluded, leaving unrelated layers unquantized and reducing performance. This over‑exclusion did not occur before the trailing * was stripped.

Useful? React with 👍 / 👎.

wangshangsam · 2025-12-06T00:45:54Z

@shengliangxu (if you haven't done so already) could you test it to make sure it works for Qwen/Qwen3-VL-235B-A22B-Instruct also? Thanks a lot!

shengliangxu · 2025-12-06T01:02:30Z

@shengliangxu (if you haven't done so already) could you test it to make sure it works for Qwen/Qwen3-VL-235B-A22B-Instruct also? Thanks a lot!

Yes, I have tested multiple models including this one. This model does not have online modelopt quantized checkpoint so I do not put into PR description.

shengliangxu requested review from mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners December 6, 2025 00:26

Merge branch 'main' into shengliangx/modelopt-workaround-issue-28072

8173895

mergify bot added the nvidia label Dec 6, 2025

github-project-automation bot added this to NVIDIA Dec 6, 2025

gemini-code-assist bot reviewed Dec 6, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Dec 6, 2025

View reviewed changes

wangshangsam mentioned this pull request Dec 6, 2025

[Bug]: WeightsMapper used in QuantizationConfig does not handle regex/wildcard paths #28072

Open

1 task

wangshangsam assigned shengliangxu Dec 6, 2025

wangshangsam added the quantization label Dec 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Nvidia ModelOpt workaround for issue 28072 #30164

Nvidia ModelOpt workaround for issue 28072 #30164

shengliangxu commented Dec 6, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Dec 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 6, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Dec 6, 2025

Uh oh!

wangshangsam commented Dec 6, 2025

Uh oh!

shengliangxu commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Nvidia ModelOpt workaround for issue 28072 #30164

Are you sure you want to change the base?

Nvidia ModelOpt workaround for issue 28072 #30164

Conversation

shengliangxu commented Dec 6, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Dec 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

wangshangsam commented Dec 6, 2025

Uh oh!

shengliangxu commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shengliangxu commented Dec 6, 2025 •

edited by github-actions bot

Loading