Fix moe layers to transform#3017
Conversation
|
Thanks for reporting this issue and providing a fix. I haven't reviewed the PR in detail, but it looks that later, another PR was provided by @Mr-Neutr0n in #3028 that fixes the same issue. As your PR was first, I would treat it preferentially, but I think the solution and the tests in that PR look tighter. Could you please check if you agree with that solution? One way forward would be to go with the other solution but to treat both of you as co-author. LMK what you think. |
|
Thank you @BenjaminBossan, I checked the solution in PR #3208 and I agree it is tighter and covers more cases. I think treating both of us as co-authors is fair and I'm happy to go with that. |
|
Great, thanks @MichalMraz. If @Mr-Neutr0n agrees with this, I would review his PR instead and, when merging, add Michal as co-author. |
|
I'm more than happy to do so! |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
|
hey @BenjaminBossan please review this PR. thanks! |
|
@Mr-Neutr0n I think we agreed to proceed with #3028 instead of this PR. I reviewed that one and am awaiting your updates there. |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
Fixes #3016
What
layers_to_transformfiltering could treat.experts.<idx>as the layer index for MoE module names, leading toincorrect selection of expert modules.
Expected behavior
A module like
model.layers.1.mlp.experts.0.up_projshould be targeted iff1 in layers_to_transform(regardless of expert index).
Changes
layers_to_transform