⚡️ Speed up method EmbedMaxDct.diffuse_dct_matrix by 18%
#163
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 18% (0.18x) speedup for
EmbedMaxDct.diffuse_dct_matrixininvokeai/backend/image_util/imwatermark/vendor.py⏱️ Runtime :
359 microseconds→304 microseconds(best of229runs)📝 Explanation and details
The optimized code achieves a 17% speedup through three key optimizations that reduce computational overhead in the
diffuse_dct_matrixmethod:What specific optimizations were applied:
Eliminated redundant array copying: Replaced
block.flatten()withblock.ravel(), which returns a view instead of creating a copy when possible, reducing memory allocation overhead.Vectorized absolute value computation: Replaced Python's built-in
abs()with NumPy'snp.abs()for array operations. NumPy's vectorized implementation is significantly faster for array data.Reduced redundant operations: Pre-computed and stored
flat_block[1:]andnp.abs(flat1)to avoid recomputing these values multiple times.Why these optimizations lead to speedup:
The line profiler shows the original bottleneck was
np.argmax(abs(block.flatten()[1:]))taking 74.1% of execution time. The optimized version distributes this work across multiple lines but reduces the total time from 568,762ns to 453,663ns (combined time for the equivalent operations) - a ~20% improvement on the critical path.Performance characteristics based on test results:
The optimization shows consistent 10-32% speedups across all test cases, with particularly strong gains for:
Impact on workloads:
Since this appears to be part of an invisible watermarking system for images, this function likely processes many DCT blocks during image watermark embedding. The 17% speedup would compound significantly when processing high-resolution images with hundreds or thousands of blocks, making watermark operations noticeably faster for end users.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-EmbedMaxDct.diffuse_dct_matrix-mhwz8se4and push.