fix(annotators): clip crops, fix heatmap wrap, release capture by Borda · Pull Request #2389 · roboflow/supervision

Borda · 2026-07-02T17:32:52Z

This pull request introduces several improvements and bug fixes across the annotation, dataset export, and detection utilities in the codebase. The main themes are: preventing file overwrite collisions during dataset export, improving annotation robustness and internal API usage, and fixing subtle bugs in detection and metric calculations. The changes also include expanded test coverage for new and edge-case behaviors.

Dataset export: collision prevention and consistency

Added check_no_basename_collisions utility to prevent silent overwrites when exporting datasets with images or annotations sharing the same basename. This check is now enforced in Pascal VOC, YOLO, and image export workflows (src/supervision/dataset/utils.py, src/supervision/dataset/core.py, src/supervision/dataset/formats/yolo.py). [1] [2] [3] [4]
Ensured deterministic order when loading Pascal VOC annotations by sorting image paths and class names (src/supervision/dataset/formats/pascal_voc.py). [1] [2]

Annotation and image overlay improvements

Refactored internal image overlay logic to use a non-deprecated _overlay_image function, preventing deprecation warnings in library-internal calls and updating all internal usages (src/supervision/annotators/core.py, src/supervision/utils/image.py). [1] [2] [3] [4]
Improved annotation robustness: boxes partially or fully outside the image are now clipped or skipped, preventing errors, and heatmap overlays are now more robust to uint8 wraparound (src/supervision/annotators/core.py, tests/annotators/test_core.py). [1] [2] [3] [4]

Detection and metrics bug fixes

Fixed a bug where exporting detections from TensorFlow could mutate the caller’s data by ensuring a copy is made before in-place scaling (src/supervision/detection/core.py).
Improved mask resizing for Ultralytics masks to threshold after interpolation, preventing mask dilation (src/supervision/detection/utils/internal.py).
Fixed metric calculation to correctly penalize false positives on background images (src/supervision/metrics/detection.py).

Other improvements

Improved reproducibility and determinism in train/test splitting by using a dedicated random generator (src/supervision/dataset/utils.py).
Expanded support for new LMM/VLM models and improved handling of empty detection lists (src/supervision/detection/core.py, src/supervision/detection/vlm.py). [1] [2]
Improved resource management in video frame extraction by ensuring video handles are always released (src/supervision/utils/video.py). [1] [2]
Updated documentation to simplify installation instructions (docs/how_to/benchmark_a_model.md).

- `IconAnnotator`/`CropAnnotator` called the deprecated public `overlay_image`, emitting a warning users could not avoid and breaking both at its 0.31.0 removal; extract a private `_overlay_image` and delegate the public wrapper to it - `CropAnnotator` crashed (`cv2.error`) on detections partially outside the frame; clip boxes to scene bounds and skip degenerate boxes, matching `BlurAnnotator`/`PixelateAnnotator` - `HeatMapAnnotator` cast per-pixel frame counts to uint8, so heat silently vanished after 256 accumulated frames; derive the mask from the float accumulator directly - `get_video_frames_generator` leaked the `VideoCapture` when a consumer broke out early; release it in a `finally` - add regression tests for all four --- Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

- `train_test_split` seeded the global `random` module and shuffled the caller's list in place, so `DetectionDataset.split()` reordered its own `image_paths` and polluted process-wide randomness; use a local `random.Random` and shuffle a copy - Pascal VOC class ids were assigned in `set`-iteration and filesystem-glob order, so the same dataset produced different `class_id` values across runs; sort class names and the loaded file list - dataset exports keyed output files on basename, silently overwriting when two entries shared a name across directories (common after `merge()`); detect basename collisions and raise - add regression tests for split determinism, VOC id stability, and export collisions --- Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

- legacy `MeanAveragePrecision.from_tensors` appended per-image stats only when ground truth was present, so predictions on background images were never counted as false positives and mAP was inflated - append an FP-only stats tuple for prediction-present, target-empty images, mirroring the pattern in `f1_score.py` - add regression test asserting background false positives lower mAP --- Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

- benchmark guide installed `inference` from a personal dev branch that can be deleted or go stale; use the released package and add the `metrics` extra needed by the guide's evaluation code --- Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

- `from_tensorflow` scaled boxes in place on the array returned by `.numpy()`, which can share memory with the source tensor — corrupting caller data and double-scaling on a repeat call; copy before scaling - `from_lmm` raised a bare `KeyError` for `MOONDREAM` and `QWEN_3_VL`, which the enum and docstring advertise; map both to their `VLM` members - `from_deepseek_vl_2` returned a `(0,)`-shaped `xyxy` on empty output, so a zero-detection response crashed the `Detections` constructor; return `(0, 4)` like the other parsers - `extract_ultralytics_masks` binarized bilinear-resized masks with `> 0`, dilating every mask at object boundaries; threshold at 0.5 to match Ultralytics - add connector coverage: fake-result shims and round-trip tests (N>1, N=1, empty) for the nine previously untested `from_*` connectors and the `detection/tools/transformers.py` processors; one empty-`segments_info` panoptic case is xfail-marked pending a separate fix --- Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

- add a guard test asserting every `supervision.__all__` name is importable - cover previously untested public functions: mask/polygon converters, `pad_boxes`, `box_non_max_merge`, draw primitives, and `VideoSink`/`ImageSink` round-trips - add `DetectionDataset.split()` tests for determinism, disjoint coverage, ratio boundaries, and non-mutation of the source --- Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

Copilot

Pull request overview

This PR improves robustness and determinism across Supervision’s core workflows: dataset export/splitting, image/video utilities used by annotators, and detection/metrics correctness. It also adds broad regression coverage for these edge cases.

Changes:

Prevent silent overwrites during dataset export by detecting basename collisions; improve determinism in splitting and Pascal VOC loading.
Refactor internal image overlay usage to avoid deprecated API warnings; harden annotators (clip/skip out-of-bounds crops, prevent heatmap uint8 wraparound).
Fix correctness issues in adapters/utilities (TensorFlow box scaling mutability, Ultralytics mask resize thresholding) and ensure mAP penalizes background false positives; add/expand tests.

Holistic assessment (per CONTRIBUTING guidance):

Code quality: 4/5
Testing: 5/5
Documentation: 4/5

Reviewed changes

Copilot reviewed 30 out of 31 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`src/supervision/annotators/core.py`	Switch internal overlay to non-deprecated implementation; clip/skip invalid crop regions; fix heatmap masking to avoid wrap-related artifacts.
`src/supervision/utils/image.py`	Add `_overlay_image` as the non-deprecated internal overlay implementation backing the public wrapper.
`src/supervision/utils/video.py`	Ensure `VideoCapture.release()` is always called via `try/finally` in the frame generator.
`src/supervision/metrics/detection.py`	Record predictions on GT-empty images as false positives so AP/mAP is penalized accordingly.
`src/supervision/detection/core.py`	Avoid mutating TensorFlow result tensors by copying prior to in-place scaling; expand legacy LMM→VLM mapping.
`src/supervision/detection/utils/internal.py`	Threshold resized Ultralytics proto masks after interpolation to avoid dilation.
`src/supervision/detection/vlm.py`	Ensure DeepSeek-VL2 empty parses yield a correctly shaped empty `(0,4)` `xyxy` array.
`src/supervision/dataset/utils.py`	Add basename collision guard for flat exports; isolate RNG usage in `train_test_split` and avoid mutating caller input.
`src/supervision/dataset/core.py`	Enforce basename collision checks for Pascal VOC annotation export paths.
`src/supervision/dataset/formats/yolo.py`	Enforce basename collision checks for YOLO label export naming.
`src/supervision/dataset/formats/pascal_voc.py`	Make VOC loading deterministic by sorting image paths and class name iteration.
`docs/how_to/benchmark_a_model.md`	Simplify installation instructions (including metrics extra).
`tests/annotators/test_core.py`	Add regressions for heatmap wraparound, deprecation-warning suppression, and out-of-bounds crop handling.
`tests/utils/test_image.py`	Assert public `overlay_image` matches `_overlay_image` output while allowing deprecation warnings.
`tests/utils/test_video.py`	Verify early generator close still releases the capture.
`tests/utils/test_sinks.py`	Add round-trip filesystem tests for `ImageSink` and `VideoSink`.
`tests/metrics/test_detection.py`	Add regression ensuring background-image false positives reduce mAP vs FP-free baseline.
`tests/dataset/test_utils.py`	Add regressions for RNG isolation and basename-collision detection utility.
`tests/dataset/test_core.py`	Add export collision regression; add deterministic and non-mutating split tests; add classification dataset folder round-trip tests.
`tests/dataset/formats/test_pascal_voc.py`	Add deterministic class ordering and repeatable class-id assignment regressions for VOC loads.
`tests/draw/test_utils.py`	Expand coverage for additional drawing and grayscale utilities.
`tests/detection/test_vlm.py`	Align empty-parse behavior across VLM parsers to return empty `Detections`.
`tests/detection/test_from_adapters.py`	Add extensive adapter routing and empty-input coverage; regression for TensorFlow mutation; LMM mapping tests.
`tests/detection/tools/test_transformers.py`	Add processing-unit tests for transformers detection/segmentation conversions and dispatch.
`tests/detection/tools/__init__.py`	Package init for transformer tool tests.
`tests/detection/utils/test_internal.py`	Add regression for Ultralytics mask resize thresholding behavior.
`tests/detection/utils/test_iou_and_nms.py`	Add tests for `box_non_max_merge` behavior.
`tests/detection/utils/test_converters.py`	Add polygon/mask conversion tests and round-trips.
`tests/detection/utils/test_boxes.py`	Add tests for new `pad_boxes` utility.
`tests/helpers.py`	Add additional fake result/tensor helpers to support expanded adapter/tool test coverage.
`tests/test_public_api.py`	Add guard that every symbol in `supervision.__all__` is importable.

+                        predicted_objs[:, 5],
+                        predicted_objs[:, 4],
+                        np.zeros((0,)),
+                    )
+                )


Borda and others added 6 commits July 2, 2026 19:16

Borda requested a review from Copilot July 2, 2026 17:32

Borda requested a review from SkalskiP as a code owner July 2, 2026 17:32

Copilot started reviewing on behalf of Borda July 2, 2026 17:33 View session

Copilot AI reviewed Jul 2, 2026

View reviewed changes

Comment thread src/supervision/metrics/detection.py

Comment on lines +1435 to +1439

predicted_objs[:, 5],

predicted_objs[:, 4],

np.zeros((0,)),

)

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(annotators): clip crops, fix heatmap wrap, release capture#2389

fix(annotators): clip crops, fix heatmap wrap, release capture#2389
Borda wants to merge 6 commits into
developfrom
fix/claude2

Borda commented Jul 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Borda commented Jul 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants