⚡️ Speed up method ObjectDetectionEvalProcessor._compute_page_detection_matching by 5%
#14
+91
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 5% (0.05x) speedup for
ObjectDetectionEvalProcessor._compute_page_detection_matchinginunstructured/metrics/object_detection.py⏱️ Runtime :
1.21 seconds→1.15 seconds(best of8runs)📝 Explanation and details
The optimized code achieves a 5% speedup through two key optimizations:
1. Numba-accelerated IoU computation: The most significant optimization is replacing the PyTorch
_box_iouimplementation with a Numba JIT-compiled version (_box_iou_numba). When running on CPU (which is common for object detection evaluation), this Numba implementation provides substantial performance gains by:2. Numba-accelerated bounding box clipping: The
_change_bbox_bounds_for_image_sizefunction now uses a Numba-compiled helper (_change_bbox_bounds_for_image_size_numba) that:clipoperations with faster native codePerformance characteristics from tests:
The optimizations are most effective for CPU-based evaluation workloads where object detection metrics are computed post-training. Since evaluation typically processes many images with moderate numbers of detections, the cumulative effect of these micro-optimizations provides meaningful performance gains. The code maintains a fallback to the original PyTorch implementation for GPU tensors, ensuring compatibility across different execution environments.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-ObjectDetectionEvalProcessor._compute_page_detection_matching-mjceyvscand push.