Add native FSDP2 module + migration by 3outeille · Pull Request #46707 · huggingface/transformers

3outeille · 2026-06-17T04:12:24Z

applying FSDP + test_fsdp_mixin.py will be added in PR #46990

Move FSDP2 wrapping and plan verification to distributed/fsdp.py, keep integrations/fsdp.py as a backward-compatible re-export, and update core call sites to import from transformers.distributed.fsdp.

HuggingFaceDocBuilderDev · 2026-06-17T04:25:16Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…-module

ArthurZucker

Much better, still some obvious cleanups to do!

ArthurZucker · 2026-06-23T22:21:58Z

+
+
+def expand_fsdp_plan(model, fsdp_plan: dict[str, str]) -> list[tuple[str, nn.Module, str]]:
+    """Expand plan keys into ``(module_name, module, sharding_strategy)`` shard targets."""


why? {module_name: (module, sharding_strategy)} looks more manageable

this has been rework in a much clearer way now. It now returns reshard_targets, no_reshard_targets as well

ArthurZucker · 2026-06-23T22:24:19Z

+    tie_word_embeddings = getattr(model.config, "tie_word_embeddings", False)
+
+    adapted_fsdp_plan = _resolve_tied_embed_lm_head_plan(fsdp_plan, tie_word_embeddings=tie_word_embeddings)
+    shard_targets = expand_fsdp_plan(model, adapted_fsdp_plan)


you could just return the alraready reshard / no reshard

ArthurZucker · 2026-06-23T22:26:28Z

+    if tie_word_embeddings and hasattr(model, "tie_weights"):
+        model.tie_weights()


again I would advise against this, unless absolutely required but do we need all the machinery behind this? prob not no?

do you mean the guarding around model.tie_weights() or the fact that we call model.tie_weights() ? This is needed but will confirm once I added the fsdp_mixin test on the next PR

this is my only friction point. Do you need to tie the weights, or just pass the hooks or what?
In general its bad to call it twice and we would want to FSDP before its call in the main call site, or not call!

I need to tie the weights indeed.

But I see that it is called later in _finalize_model_loading(). My initial thought was to call in fsdp as a lot of stuff regarding embedding tying were performed. Maybe a comment saying #tie_embedding will be done in _finalize_model_loading() will be more appropriate ?

…-module

naming Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

…-module

…face/transformers into split/a-pr-2-fsdp-module

…-module

ArthurZucker

Mostly check the tie weight please!

ArthurZucker · 2026-06-29T05:46:25Z

+    for plan_key, sharding_strategy in fsdp_plan.items():
+        if plan_key in module_lookup:
+            # model.norm, lm_head etc.
+            targets = [(plan_key, module_lookup[plan_key])]
+        else:
+            # model.layers.*
+            targets = [
+                (module_name, module)
+                for module_name, module in module_lookup.items()
+                if replace_layer_number_by_wildcard(module_name) == plan_key
+            ]


since you have more keys than plans, you would be better off iterating the keys rather than the plans no?
And we have a single regexx omputation in tp / core model loading that makes it faster (tho this is probably not super slow!)

you join all the plans on "|" with numbered group capture !

Fine by me, performance-wise it's negligeable and both read well, so I can go with your version but I dont think there is a need to join all the plans on "|"

ArthurZucker · 2026-06-29T05:47:06Z

+    fsdp_policy_kwargs = _get_fsdp_policy_kwargs(distributed_config)
+    tie_word_embeddings = getattr(model.config, "tie_word_embeddings", False)
+
+    adapted_fsdp_plan = _resolve_tied_embed_lm_head_plan(fsdp_plan, model)


much better

ArthurZucker · 2026-06-29T05:48:43Z

+    if tie_word_embeddings and hasattr(model, "tie_weights"):
+        model.tie_weights()


this is my only friction point. Do you need to tie the weights, or just pass the hooks or what?
In general its bad to call it twice and we would want to FSDP before its call in the main call site, or not call!

…face/transformers into split/a-pr-2-fsdp-module

github-actions · 2026-07-03T05:41:14Z

CI recap

Dashboard: View test results in Grafana
Latest run: 28639742471:1
Result: success | Jobs: 14 | Tests: 47,277 | Failures: 0 | Duration: 18h 48m

3outeille added 2 commits June 17, 2026 03:51

add distributed config

799ac94

Add native FSDP2 module and migrate FSDP imports (Phase A PR-2).

22d4b52

Move FSDP2 wrapping and plan verification to distributed/fsdp.py, keep integrations/fsdp.py as a backward-compatible re-export, and update core call sites to import from transformers.distributed.fsdp.

3outeille changed the base branch from main to split/a-pr-1-distributed-config June 17, 2026 04:12

3outeille added 3 commits June 17, 2026 04:19

linting

4bfd1a6

unecessary

9487bdd

copyright edit

588884e

revert

8cc48a0

3outeille mentioned this pull request Jun 17, 2026

DIstributed branch base #46269

Open

5 tasks

3outeille and others added 6 commits June 23, 2026 13:22

Merge branch 'main' into split/a-pr-1-distributed-config

5bbc796

Merge branch 'split/a-pr-1-distributed-config' into split/a-pr-2-fsdp…

6fd7813

…-module

fix

79457b3

fix

f3e8021

Merge branch 'split/a-pr-1-distributed-config' into split/a-pr-2-fsdp…

acacae8

…-module

Merge branch 'main' into split/a-pr-1-distributed-config

54c1f4e

3outeille requested a review from ArthurZucker June 23, 2026 07:42

Merge branch 'split/a-pr-1-distributed-config' into split/a-pr-2-fsdp…

f219c74

…-module

ArthurZucker reviewed Jun 23, 2026

View reviewed changes

3outeille and others added 12 commits June 24, 2026 04:25

remove redundant test file

ea8243f

Merge branch 'main' into split/a-pr-1-distributed-config

db31b04

Merge branch 'split/a-pr-1-distributed-config' into split/a-pr-2-fsdp…

4d840dc

…-module

Update src/transformers/distributed/fsdp.py

c384fcd

naming Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

avoid looping, just look at dict

9625816

expand_fsdp returns reshard_targets, no_reshard_targets right away

59bcec5

better _resolve_tied_embed_lm_head_plan

ebf3585

cleaning

e969325

ruff

2376965

Merge branch 'main' into split/a-pr-1-distributed-config

d830114

Merge branch 'split/a-pr-1-distributed-config' into split/a-pr-2-fsdp…

a44f81f

…-module

more robust detection of embed and lm_head

0f62c45

Merge branch 'split/a-pr-2-fsdp-module' of https://github.com/hugging…

020f7d3

…face/transformers into split/a-pr-2-fsdp-module

3outeille requested a review from ArthurZucker June 24, 2026 10:20

3outeille and others added 7 commits June 24, 2026 13:48

cleaning

da302ad

ruff

dfc665c

typo

446fd6e

cleaner

5aeaff7

cleaner

819ff14

Merge branch 'main' into split/a-pr-1-distributed-config

413d775

Merge branch 'split/a-pr-1-distributed-config' into split/a-pr-2-fsdp…

7bc3722

…-module

Base automatically changed from split/a-pr-1-distributed-config to main June 24, 2026 14:47

3outeille added 7 commits June 24, 2026 23:49

Merge branch 'main' into split/a-pr-2-fsdp-module

f8f27ff

Merge branch 'main' into split/a-pr-2-fsdp-module

ce2f001

Merge branch 'main' into split/a-pr-2-fsdp-module

ec87fff

Merge branch 'main' into split/a-pr-2-fsdp-module

6e156fd

Merge branch 'main' into split/a-pr-2-fsdp-module

606df0a

Merge branch 'main' into split/a-pr-2-fsdp-module

c4aa4b7

Merge branch 'main' into split/a-pr-2-fsdp-module

c5ad67b

ArthurZucker approved these changes Jun 29, 2026

View reviewed changes

3outeille and others added 7 commits June 29, 2026 16:08

Merge branch 'main' into split/a-pr-2-fsdp-module

04c124e

Merge branch 'main' into split/a-pr-2-fsdp-module

8b57aa4

Merge branch 'main' into split/a-pr-2-fsdp-module

dd1000b

expand_fsdp_plan iterate over modules

5c95559

Merge branch 'split/a-pr-2-fsdp-module' of https://github.com/hugging…

00a11b6

…face/transformers into split/a-pr-2-fsdp-module

comment about tie embedding

e4613e6

add comment tied embedding

d558f99



		def expand_fsdp_plan(model, fsdp_plan: dict[str, str]) -> list[tuple[str, nn.Module, str]]:
		"""Expand plan keys into ``(module_name, module, sharding_strategy)`` shard targets."""

		if tie_word_embeddings and hasattr(model, "tie_weights"):
		model.tie_weights()

Uh oh!

Conversation

3outeille commented Jun 17, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jun 17, 2026

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

3outeille Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

3outeille Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jul 3, 2026

CI recap

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

3outeille commented Jun 17, 2026 •

edited by github-actions Bot

Loading

3outeille Jun 24, 2026 •

edited

Loading

3outeille Jul 3, 2026 •

edited

Loading