-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Nvidia ModelOpt workaround for issue 28072 #30164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Nvidia ModelOpt workaround for issue 28072 #30164
Conversation
Before this issue gets officially resolved, this workaround can make ModelOpt quantized models work without issues. Tested using https://huggingface.co/nvidia/Qwen2.5-VL-7B-Instruct-NVFP4/tree/refs%2Fpr%2F2 Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a workaround to fix an issue with loading ModelOpt quantized models, specifically addressing how exclude_modules with wildcards are handled during weight remapping. The change correctly transforms module_path* patterns into module_path and module_path.* to ensure proper matching. The logic is sound and effectively resolves the issue. I have one suggestion to improve the code's readability and maintainability.
| if len(exclude) >= 2 and exclude[-1] == "*" and exclude[-2] != ".": | ||
| new_exclude_modules.append(exclude[:-1]) | ||
| new_exclude_modules.append(exclude[:-1] + ".*") | ||
| else: | ||
| new_exclude_modules.append(exclude) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic to identify and transform the wildcard patterns is correct. However, using direct string indexing (exclude[-1], exclude[-2], exclude[:-1]) can be less readable and potentially more error-prone for future modifications compared to using built-in string methods. Refactoring this to use endswith() and removesuffix() would make the code's intent clearer and improve its maintainability.
| if len(exclude) >= 2 and exclude[-1] == "*" and exclude[-2] != ".": | |
| new_exclude_modules.append(exclude[:-1]) | |
| new_exclude_modules.append(exclude[:-1] + ".*") | |
| else: | |
| new_exclude_modules.append(exclude) | |
| if len(exclude) > 1 and exclude.endswith("*") and not exclude.endswith(".*"): | |
| base = exclude.removesuffix("*") | |
| new_exclude_modules.append(base) | |
| new_exclude_modules.append(f"{base}.*") | |
| else: | |
| new_exclude_modules.append(exclude) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| new_exclude_modules = [] | ||
| for exclude in self.exclude_modules: | ||
| if len(exclude) >= 2 and exclude[-1] == "*" and exclude[-2] != ".": | ||
| new_exclude_modules.append(exclude[:-1]) | ||
| new_exclude_modules.append(exclude[:-1] + ".*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Workaround expands exclusions beyond intended prefix
The new loop replaces each module_path* entry with plain module_path/module_path.* before mapping. In is_layer_excluded the legacy substring branch checks exclude_module in prefix, so these plain entries now match any module name containing the substring rather than only those starting with the ModelOpt wildcard. With configs that blacklist branches such as vision_model*, prefixes like encoder.vision_model_adapter (which would not have matched the original wildcard) will now be treated as excluded, leaving unrelated layers unquantized and reducing performance. This over‑exclusion did not occur before the trailing * was stripped.
Useful? React with 👍 / 👎.
|
@shengliangxu (if you haven't done so already) could you test it to make sure it works for |
Yes, I have tested multiple models including this one. This model does not have online modelopt quantized checkpoint so I do not put into PR description. |
Purpose
This is a workaround for issue #28072
Before this issue gets officially resolved, this workaround can make ModelOpt quantized models loading work without issues.
Test Plan
Tested using nvidia/Qwen2.5-VL-7B-Instruct-NVFP4
Test Result
vllm serve succeed