Skip to content

Refactor model loading and add FSDP2 support#247

Open
molepi40 wants to merge 1 commit into
modelscope:v1from
molepi40:load_v1
Open

Refactor model loading and add FSDP2 support#247
molepi40 wants to merge 1 commit into
modelscope:v1from
molepi40:load_v1

Conversation

@molepi40
Copy link
Copy Markdown
Collaborator

  1. Consolidate duplicated init_transformer / init_text_encoder / init_vae from 4 QwenImage pipeline subclasses into Pipeline base class, accepting model_cls as parameter.
  2. Add FSDP2 (fully_shard) support in base model loading: transformer blocks and text encoder layers are sharded, weights are broadcast from rank0 via set_model_state_dict.
  3. Add is_*_group_initialized() helpers in parallel_state.py to replace dist.is_initialized() checks, enabling finer-grained distributed state detection.
  4. Refactor load_model_weights to only read files on rank0 and broadcast via torch.distributed, reducing redundant I/O on multi-node setups.
  5. Fix VAE all_reduce to use GroupCoordinator.all_reduce() instead of dist.all_reduce(tensor, group=group).
  6. Add DiffusionModel.dtype property; clean up pipeline reference in Engine.close().

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors distributed state management and model initialization, moving shared logic for loading transformers, text encoders, and VAEs into the base Pipeline class with added support for FSDP sharding and rank-0 weight broadcasting. Review feedback identified an incorrect attribute path for sharding text encoder layers and recommended using local VAE implementations across QwenImage pipelines to ensure a necessary distributed all_reduce fix is applied.

Comment on lines +94 to +95
for layer in model.model.language_model.layers:
fully_shard(layer)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The attribute path model.model.language_model.layers appears to be incorrect for Qwen2_5_VLForConditionalGeneration (which is the model class used in the QwenImage pipelines). In the standard transformers implementation for Qwen2.5-VL, the layers are located at model.model.layers. Using the current path will result in an AttributeError when FSDP is enabled.

Suggested change
for layer in model.model.language_model.layers:
fully_shard(layer)
if use_fsdp:
for layer in model.model.layers:
fully_shard(layer)
fully_shard(model)

import torch
from accelerate import init_empty_weights
from diffusers.image_processor import VaeImageProcessor
from diffusers.models import AutoencoderKLQwenImage
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

AutoencoderKLQwenImage should be imported from the local models directory (diffsynth_engine.models.qwen_image) instead of diffusers.models. The local implementation contains a critical fix for all_reduce (switching from dist.all_reduce to GroupCoordinator.all_reduce) which is necessary for correct behavior in distributed environments. Importing from diffusers will bypass this fix.

Suggested change
from diffusers.models import AutoencoderKLQwenImage
from diffsynth_engine.models.qwen_image.autoencoder_kl_qwenimage import AutoencoderKLQwenImage

import torch
from accelerate import init_empty_weights
from diffusers.image_processor import PipelineImageInput, VaeImageProcessor
from diffusers.models import AutoencoderKLQwenImage
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

AutoencoderKLQwenImage should be imported from diffsynth_engine.models.qwen_image.autoencoder_kl_qwenimage to ensure the local all_reduce fix is applied, consistent with the layered pipeline implementation.

Suggested change
from diffusers.models import AutoencoderKLQwenImage
from diffsynth_engine.models.qwen_image.autoencoder_kl_qwenimage import AutoencoderKLQwenImage

import torch
from accelerate import init_empty_weights
from diffusers.image_processor import PipelineImageInput, VaeImageProcessor
from diffusers.models import AutoencoderKLQwenImage
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

AutoencoderKLQwenImage should be imported from the local models directory to ensure the distributed all_reduce fix is utilized.

Suggested change
from diffusers.models import AutoencoderKLQwenImage
from diffsynth_engine.models.qwen_image.autoencoder_kl_qwenimage import AutoencoderKLQwenImage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant