⚡️ Speed up function zero_module by 75%
#166
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 75% (0.75x) speedup for
zero_moduleininvokeai/backend/flux/controlnet/zero_module.py⏱️ Runtime :
2.22 milliseconds→1.26 milliseconds(best of203runs)📝 Explanation and details
The optimization achieves a 75% speedup by replacing
torch.nn.init.zeros_(p)withp.zero_()and wrapping the operation intorch.no_grad().Key optimizations:
p.zero_()is a direct tensor operation that zeros the parameter in-place, whiletorch.nn.init.zeros_(p)goes through PyTorch's initialization framework with additional function call overheadtorch.no_grad()prevents PyTorch from tracking operations for autograd, reducing memory overhead and computation when zeroing parameterslist(module.parameters())avoids repeated generator calls within the loopPerformance impact by test case:
Hot path benefits: Based on the function reference,
zero_moduleis called during ControlNet initialization to create zero-initialized linear layers forcontrolnet_blocksandcontrolnet_single_blocks. Since ControlNet models can have dozens of these blocks (matching the depth of the base FLUX model), this optimization significantly reduces model initialization time - a critical performance factor for ML inference pipelines where models may be loaded/reloaded frequently.The optimization is most effective for modules with multiple parameters, making it ideal for the neural network layers typically used in ControlNet architectures.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-zero_module-mhx2mq0dand push.