⚡️ Speed up method ObjectDetectionEvalProcessor._compute_targets by 9%
#13
+45
−15
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 9% (0.09x) speedup for
ObjectDetectionEvalProcessor._compute_targetsinunstructured/metrics/object_detection.py⏱️ Runtime :
56.0 milliseconds→51.3 milliseconds(best of91runs)📝 Explanation and details
The optimization replaces the original PyTorch-based IoU computation with a Numba-compiled NumPy implementation, achieving a ~9% speedup in overall runtime.
Key optimization: The
_box_ioumethod now uses a new_box_iou_numbafunction decorated with@njit(fastmath=True, cache=True). This function performs the same intersection-over-union calculation but leverages Numba's just-in-time compilation to generate optimized machine code for the nested loops computing pairwise IoU between bounding boxes.Why this is faster:
Performance characteristics: The test results show consistent 40-100% speedups across various scenarios, with particularly strong gains in:
Trade-offs: While the line profiler shows the
_box_ioufunction itself takes longer (2.4s vs 9ms), this is misleading - it includes Numba's compilation overhead on first run. The overall function runtime improves because the compiled code is more efficient for the actual computation workload, and Numba's caching ensures subsequent calls avoid recompilation costs.The optimization is most beneficial for workloads with repeated IoU computations on moderately-sized bounding box sets, which is typical in object detection evaluation pipelines.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-ObjectDetectionEvalProcessor._compute_targets-mjceso4jand push.