⚡️ Speed up method ObjectDetectionEvalProcessor.get_metrics by 9%
#9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 9% (0.09x) speedup for
ObjectDetectionEvalProcessor.get_metricsinunstructured/metrics/object_detection.py⏱️ Runtime :
34.2 milliseconds→31.5 milliseconds(best of89runs)📝 Explanation and details
The optimized code achieves an 8% speedup by introducing Numba JIT compilation for the most computationally intensive operation: bounding box bounds clipping.
Key Optimization:
Numba JIT-accelerated bbox clipping: Replaced the PyTorch-based
_change_bbox_bounds_for_image_sizewith a Numba-compiled function_change_bbox_bounds_for_image_size_numba. The JIT compilation with@nb.njit(cache=True, nogil=True, fastmath=True)provides significant acceleration for the tight loop that clips each bounding box coordinate.Efficient tensor handling: The optimization handles both CPU and CUDA tensors intelligently - for CPU tensors, it operates directly on the underlying numpy array view (avoiding copies), while for CUDA tensors, it performs a minimal CPU copy, applies the fast Numba function, and copies back.
Class-level attribute optimization: Moved threshold constants to class attributes (
iou_thresholds,score_threshold,recall_thresholds) to avoid repeated attribute lookups during method calls.Why this works:
_change_bbox_bounds_for_image_sizetakes significant time (5.1% in original vs the Numba version being much faster), making it an ideal target for JIT optimization.Impact on workloads:
This optimization particularly benefits object detection pipelines processing many documents with dense object predictions, as the bbox clipping operation scales linearly with the number of predicted boxes. The 8% overall speedup comes with no changes to API or behavior, making it a safe performance enhancement for existing object detection evaluation workflows.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-ObjectDetectionEvalProcessor.get_metrics-mjce4c1eand push.