Add support for LoRA with Transformer Engine#3048
Add support for LoRA with Transformer Engine#3048balvisio wants to merge 5 commits intohuggingface:mainfrom
Conversation
|
Thanks for this PR to add support for TE in PEFT. Before proceeding further, do you have a practical example of how to use it with a transformers model? I assume one way would be for the user to employ accelerate to apply TE. Also, I saw that you intend to replace the |
|
Thank you for taking a look at this. The primary goal of this PR is to support models with TE layers already inside them. In fact, using te.Linear for the adapters is not strictly necessary. However, by default, this TE layers will use the torch default type. |
I see, could you please provide a small example of how this looks in practice? We should eventually add a unit test with a real model architecture anyway, so this example would also help for that.
I think for consistency, we should stick with float32 by default. We could think of an option to use 4bit and 8bit floats but I'd have to think about how the API for that should look.
I think it would be great to include such an example here. Maybe simplified if possible (using |
|
Yeah, we can use a bit heavier models in
Yes, so keep the base layers as they are and use normal
Is there any advantage then?
Docs would be nice, but I was thinking of the |
0a19da8 to
ffb0f7b
Compare
ffb0f7b to
d4984b1
Compare
|
@BenjaminBossan : I changed the tests to use a TE-based model and added an example. |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks for updating the PR. I could successfully run the unit tests and overall the changes look good. However, I still have a few comments, please check. Most notably, we should avoid running anything with trust_remote_code=True by default.
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks for the updates, not much is missing at this point.
Besides the comments I made, could you please also add an entry to the docs?
https://github.com/huggingface/peft/blob/main/docs/source/developer_guides/quantization.md
It doesn't have to be long, but having it helps users discover the feature.
3b57840 to
22b4527
Compare
|
@BenjaminBossan Thanks for looking at this. I have addressed your comments. |
This PR adds support so that TransformerEngine layers (https://github.com/NVIDIA/TransformerEngine) are recognized as valid layers to which LoRA adapters can be added.