fix: layers_to_transform now correctly matches layer index on MoE models#3028
fix: layers_to_transform now correctly matches layer index on MoE models#3028Mr-Neutr0n wants to merge 1 commit intohuggingface:mainfrom
Conversation
…t index When using layers_to_transform on MoE models, the regex was incorrectly matching the LAST digit in the module path (expert index) instead of the FIRST digit (layer index). For example, with key 'model.layers.1.mlp.experts.0.up_proj': - Old behavior: matched 0 (expert index) - Fixed behavior: matches 1 (layer index) The fix uses non-greedy .*? instead of greedy .* in the default regex pattern, so it finds the first occurrence of the digit pattern. Added test cases for MoE scenarios to prevent regression. Fixes huggingface#3016
|
Friendly bump! Let me know if there's anything I should update or improve to help move this forward. |
|
Thanks for this PR. Please check my comment here. Edit: Add MichalMraz as co-author here. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| ("model.layers.1.mlp.experts.0.up_proj", ["up_proj"], [1], ["layers"], True), | ||
| ("model.layers.1.mlp.experts.0.up_proj", ["up_proj"], [0], ["layers"], False), # expert 0, but layer 1 | ||
| ("model.layers.1.mlp.experts.1.up_proj", ["up_proj"], [1], ["layers"], True), # expert 1, layer 1 | ||
| ("model.layers.2.mlp.experts.1.up_proj", ["up_proj"], [1], ["layers"], False), # layer 2, not 1 |
There was a problem hiding this comment.
These tests don't really test the mentioned bug. E.g. if the key is "model.layers.1.mlp.experts.0.up_proj", with the layers_pattern being ["layers"], there is just a single match, namely "layers.1". "experts.0" would never be matched in this situation, even with the current code from main. In fact, that code path is never reached, as we land here:
peft/src/peft/tuners/tuners_utils.py
Lines 1726 to 1731 in 8f73327
At this stage, we still match the last number, but since there is only one match, it doesn't matter. With the change from this PR, however, I would always expect the first number to match to be consistent with the other code path. Therefore, I would suggest to change that regex too, and to update the examples to really test for this.
|
ping @Mr-Neutr0n |
|
pong |
|
Will address the comments. Out for an emergency the past few days. Thanks for considering! |
Summary
Fix
layers_to_transformincorrectly matching MoE expert indices instead of layer indices.Problem
When using
layers_to_transformon MoE (Mixture of Experts) models, modules under paths likemodel.layers.<L>.mlp.experts.<E>.*were incorrectly filtered based on the expert index<E>instead of the layer index<L>.Example:
For key
model.layers.1.mlp.experts.0.up_proj:10This caused
layers_to_transform=[1]to incorrectly excludemodel.layers.1.mlp.experts.0.up_proj(because0 != 1) and incorrectly includemodel.layers.2.mlp.experts.1.up_proj(because1 == 1).Root Cause
The regex in
check_target_module_exists()used greedy.*which matched as much as possible, leaving only the LAST digit in the path (the expert index) for the capture group:Solution
Changed to non-greedy
.*?which matches as little as possible, capturing the FIRST digit in the path (the layer index):Test Cases Added
Added 7 test cases covering MoE scenarios:
layers_patternlayers_pattern(default behavior)Fixes #3016