⚡️ Speed up method EmbedMaxDct.infer_dct_matrix by 21%
#164
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 21% (0.21x) speedup for
EmbedMaxDct.infer_dct_matrixininvokeai/backend/image_util/imwatermark/vendor.py⏱️ Runtime :
649 microseconds→538 microseconds(best of158runs)📝 Explanation and details
The optimized code achieves a 20% speedup through several key performance improvements in the
infer_dct_matrixmethod:Primary optimization: The original code used
np.argmax(abs(block.flatten()[1:]))which performed multiple expensive operations in sequence. The optimized version breaks this into separate steps using NumPy vectorized operations:block.ravel()instead ofblock.flatten()- creates a view rather than copying data when possiblenp.abs(v)on the sliced array - leverages NumPy's vectorized absolute value instead of Python'sabs()functionnp.argmax()call on the pre-computed absolute valuesSecondary optimizations:
divmod(pos, self._block)instead of separate division and modulo operations for computing array indicesabs(val)with direct negationval = -valwhenval < 0, avoiding Python function call overheadPerformance impact: The line profiler shows the critical hotspot (finding the maximum absolute value) dropped from 69.9% to 35.2% of total execution time. While the optimization introduces more lines of code, each individual operation is significantly faster due to NumPy's vectorized implementations.
Test case benefits: The optimization shows consistent 10-32% improvements across all test scenarios, with particularly strong gains on:
abs()function provides clear gainsThis optimization is especially valuable for image processing workflows where DCT analysis is performed repeatedly on many blocks.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-EmbedMaxDct.infer_dct_matrix-mhwzsh12and push.