-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Nvidia ModelOpt workaround for issue 28072 #30164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -187,7 +187,24 @@ def get_quant_method( | |||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| def apply_vllm_mapper(self, hf_to_vllm_mapper: "WeightsMapper"): | ||||||||||||||||||||||||
| if len(self.exclude_modules) > 0: | ||||||||||||||||||||||||
| self.exclude_modules = hf_to_vllm_mapper.apply_list(self.exclude_modules) | ||||||||||||||||||||||||
| # This is a workaround for the weights remapping issue: | ||||||||||||||||||||||||
| # https://github.com/vllm-project/vllm/issues/28072 | ||||||||||||||||||||||||
| # Right now, the Nvidia ModelOpt library use just one wildcard pattern: | ||||||||||||||||||||||||
| # module_path* | ||||||||||||||||||||||||
| # It gets applied if the whole tree of modules rooted at module_path | ||||||||||||||||||||||||
| # is not quantized. Here we replace such pattern by 2 patterns that are | ||||||||||||||||||||||||
| # collectively equivalent to the original pattern: | ||||||||||||||||||||||||
| # module_path | ||||||||||||||||||||||||
| # module_path.* | ||||||||||||||||||||||||
| new_exclude_modules = [] | ||||||||||||||||||||||||
| for exclude in self.exclude_modules: | ||||||||||||||||||||||||
| if len(exclude) >= 2 and exclude[-1] == "*" and exclude[-2] != ".": | ||||||||||||||||||||||||
| new_exclude_modules.append(exclude[:-1]) | ||||||||||||||||||||||||
| new_exclude_modules.append(exclude[:-1] + ".*") | ||||||||||||||||||||||||
| else: | ||||||||||||||||||||||||
| new_exclude_modules.append(exclude) | ||||||||||||||||||||||||
|
Comment on lines
+201
to
+205
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The logic to identify and transform the wildcard patterns is correct. However, using direct string indexing (
Suggested change
|
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| self.exclude_modules = hf_to_vllm_mapper.apply_list(new_exclude_modules) | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| @staticmethod | ||||||||||||||||||||||||
| def get_config_filenames() -> list[str]: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new loop replaces each
module_path*entry with plainmodule_path/module_path.*before mapping. Inis_layer_excludedthe legacy substring branch checksexclude_module in prefix, so these plain entries now match any module name containing the substring rather than only those starting with the ModelOpt wildcard. With configs that blacklist branches such asvision_model*, prefixes likeencoder.vision_model_adapter(which would not have matched the original wildcard) will now be treated as excluded, leaving unrelated layers unquantized and reducing performance. This over‑exclusion did not occur before the trailing*was stripped.Useful? React with 👍 / 👎.