-
Notifications
You must be signed in to change notification settings - Fork 145
[Feat] Support LTX2 sequence parallel, tensor parallel, and VAE decod… #810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @gushiqiao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces significant architectural enhancements to the LightX2V framework, primarily focusing on improving scalability and performance through the implementation of Tensor Parallelism and Sequence Parallelism. It also streamlines the VAE decoding process with tiling capabilities and refines the image conditioning interface for better usability. These changes aim to make the system more robust and efficient for handling large-scale video generation tasks. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant new features, including sequence parallelism, tensor parallelism, and VAE tiling for LTX2. The implementation of parallelism is extensive and touches many parts of the codebase. Overall, the changes are well-structured, particularly the use of wrapper classes for tensor parallelism. However, I've identified a critical bug in the weight loading logic for tensor parallelism that could lead to incorrect model loading. Additionally, there are several instances of hardcoded local paths and debug print statements that should be removed before merging. I also noted a potential regression in the image conditioning API, which has become less flexible. Please review my comments for detailed feedback and suggestions.
| else: | ||
| # Non-TP weights or bias (bias is handled above if it's a TP weight's bias) | ||
| # Skip bias keys that are already processed above | ||
| if not (key.endswith(".bias") and key.replace(".bias", ".weight") in processed_weight_dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a critical bug in this condition. The check key.replace(".bias", ".weight") in processed_weight_dict will likely always be false for TP weights, because processed_weight_dict contains ranked keys (e.g., ...__tp_rank_0) for TP weights, not the original weight key. This can lead to biases of column-split TP weights being processed twice, causing incorrect model loading and behavior in tensor parallel mode.
A more robust, order-independent check is needed to see if the bias belongs to a column-split TP weight that has already been handled. I suggest checking the weight's properties directly.
| if not (key.endswith(".bias") and key.replace(".bias", ".weight") in processed_weight_dict): | |
| if not (key.endswith(".bias") and self._is_tp_weight(key.replace(".bias", ".weight")) and self._get_split_type(key.replace(".bias", ".weight")) == "col"): |
| model_path="/data/nvme0/gushiqiao/models/official_models/LTX-2/", | ||
| model_cls="ltx2", | ||
| task="i2av", | ||
| ) | ||
|
|
||
| pipe.enable_quantize( | ||
| dit_quantized=True, | ||
| dit_quantized_ckpt="Lightricks/LTX-2/ltx-2-19b-distilled-fp8.safetensors", | ||
| dit_quantized_ckpt="/data/nvme0/gushiqiao/models/official_models/LTX-2/ltx-2-19b-distilled-fp8.safetensors", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded local paths are being used for model_path and dit_quantized_ckpt. This is not portable and will cause issues for other developers. It's recommended to use relative paths, environment variables, or a mechanism to load these from a local configuration file that is not committed to the repository.
| pipe = LightX2VPipeline(model_path="/data/nvme0/gushiqiao/models/official_models/LTX-2", model_cls="ltx2", task="t2av") | ||
|
|
||
| pipe.enable_quantize( | ||
| dit_quantized=True, | ||
| dit_quantized_ckpt="Lightricks/LTX-2/ltx-2-19b-distilled-fp8.safetensors", | ||
| dit_quantized_ckpt="/data/nvme0/gushiqiao/models/official_models/LTX-2/ltx-2-19b-distilled-fp8.safetensors", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| print(config) | ||
| validate_config_paths(config) | ||
| self.runner = self._init_runner(config) | ||
| print(self.runner.config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scripts/wan/run_wan_i2v.sh
Outdated
| lightx2v_path=/data/nvme0/gushiqiao/models/code/LightX2V | ||
| model_path=/data/nvme0/gushiqiao/models/official_models/Wan2.1-I2V-14B-720P |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…e tiling