⚡️ Speed up method EmbedMaxDct.infer_dct_svd by 109%
#162
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 109% (1.09x) speedup for
EmbedMaxDct.infer_dct_svdininvokeai/backend/image_util/imwatermark/vendor.py⏱️ Runtime :
17.7 milliseconds→8.43 milliseconds(best of46runs)📝 Explanation and details
The optimized code achieves a 109% speedup by making a single but highly effective change to the SVD computation in the
infer_dct_svdmethod.Key optimization: The original code computes the full SVD decomposition with
u, s, v = np.linalg.svd(cv2.dct(block)), but only uses the singular valuess. The optimized version usess = np.linalg.svd(dct_block, compute_uv=False)to compute only the singular values, skipping the expensive computation of the U and V matrices.Why this leads to speedup: SVD is computationally expensive, with the full decomposition requiring O(n³) operations. When
compute_uv=False, numpy skips computing the orthogonal matrices U and V, significantly reducing both computation time and memory allocation. The line profiler shows the SVD operation time dropped from 25.1ms (90.6% of total time) to 11.5ms (79% of total time).Performance impact: The optimization is particularly effective for larger blocks, as evidenced by the test results:
The speedup scales with block size because the computational savings of skipping U and V matrix computation become more pronounced as the matrix dimensions increase. This makes the optimization especially valuable for image processing workloads that process larger DCT blocks or perform batch processing of multiple blocks.
The optimization preserves all original behavior and return values while eliminating unnecessary computation, making it a pure performance win with no functional trade-offs.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-EmbedMaxDct.infer_dct_svd-mhwyrdsband push.