⚡️ Speed up method SD3DenoiseInvocation._prepare_cfg_scale by 17%
#154
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 17% (0.17x) speedup for
SD3DenoiseInvocation._prepare_cfg_scaleininvokeai/app/invocations/sd3_denoise.py⏱️ Runtime :
41.8 microseconds→35.6 microseconds(best of174runs)📝 Explanation and details
The optimization achieves a 17% speedup by eliminating redundant attribute lookups and restructuring the control flow for better efficiency.
Key optimizations applied:
Single attribute lookup: The original code accessed
self.cfg_scalemultiple times (up to 3 times in worst case). The optimized version stores it in a local variablecfg_scale = self.cfg_scaleonce, eliminating repeated attribute access overhead.Early returns: Instead of using
elifand a finalreturn cfg_scalestatement, the optimized code uses early returns (return [cfg_scale] * num_timestepsandreturn cfg_scale), reducing the execution path length.Removed variable assignment: The original code unnecessarily assigned to
cfg_scalevariable in both branches before returning. The optimized version returns directly, eliminating intermediate assignments.Why this leads to speedup:
self.cfg_scaleinvolves dictionary lookups which are more expensive than local variable accessreturn cfg_scalestatement and reduce code pathsPerformance impact by test cases:
The optimization shows consistent improvements across all scenarios:
This function appears to be part of SD3 (Stable Diffusion 3) denoising pipeline where CFG (Classifier-Free Guidance) scaling is applied at each timestep. Given that denoising typically involves hundreds of timesteps, even small per-call optimizations can compound to meaningful performance gains in image generation workflows.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-SD3DenoiseInvocation._prepare_cfg_scale-mhwt4z5qand push.