Skip to content

Fix: Add dynamic resolution scaling to crop_video.py to support various video resolutions#3

Open
cvjekim wants to merge 1 commit intoetri:mainfrom
cvjekim:main
Open

Fix: Add dynamic resolution scaling to crop_video.py to support various video resolutions#3
cvjekim wants to merge 1 commit intoetri:mainfrom
cvjekim:main

Conversation

@cvjekim
Copy link
Copy Markdown

@cvjekim cvjekim commented Mar 13, 2026

Description

This PR addresses an issue in utils/crop_video.py where the facial cropping region becomes incorrect if the downloaded video's resolution does not exactly match the base resolution (e.g., 1280x720) specified in the ASD info file.

Why is this necessary?

The original implementation assumes that the input video has the exact same dimensions as the # ImageSize defined in the annotation texts. However, videos downloaded from YouTube (via yt-dlp or similar tools) often come in 1080p or other arbitrary resolutions. When a 1080p video is processed using 720p coordinates without scaling, it results in cropping the wrong background region instead of the speaker's mouth.

Changes Made

  • Added a dynamic scaling logic that compares the actual resolution of the input video frame with the # ImageSize from the ASD info file.
  • Calculated the scale factors (sw, sh) and multiplied them by the original X, Y, W, H coordinates to correctly adjust the bounding box regardless of the video's original resolution.

How to Test

  1. Download a 1080p or 4K video from the dataset list.
  2. Run the modified crop_video.py.
  3. Verify that the output .mp4 files in the utts/ directory successfully contain the cropped lip/face regions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant