Skip to content

kernel_crop_rgb_yuv444 drops the last 1–3 output columns when dstWidth is not a multiple of 4 #763

@qwe7989199

Description

@qwe7989199

Repro (Version 9.16)

NVEncC64.exe -i w1998h1080.mp4 --output-csp yuv444 -c h264 --profile high444 -o out.mp4

Source is any 1998×1080 (or other non-mod-4 width) 4:4:4 RGB video. Decode out.mp4 to rgb48 and inspect x=1996, 1997: every row/frame is a fixed colour (e.g. (0, 34796, 0)) independent of the source — it's stale surface memory.

Root cause

NVEncCore/NVEncFilterCrop.cu kernel_crop_rgb_yuv444 processes 4 pixels per thread, guarded at line 1328 by:

if (x + PIX_PER_THREAD - 1 < dstWidth && y < dstHeight) { /* vectorized path */ }

and launched with divCeil(width, block.x * 4). For width=1998 the last tile's thread sees x=1996, 1996+3 < 1998 is false, no scalar fallback, so columns 1996–1997 are never written.

Same pattern, observed

guard kernel effect
NVEncFilterCrop.cu:1328 kernel_crop_rgb_yuv444 tail drop (stale surface)
NVEncFilterCrop.cu:1463 kernel_crop_yuv444_rgb symmetric tail drop
NVEncFilterCrop.cu:2232 kernel_crop_rgb_rgb guard is only x < dstWidth with a 4-pixel write → writes 1–3 pixels past dstWidth into pitch padding

The other packed-write crop kernels in this file (the rgb3/rgb4 → yv12/nv12 / rgb → yv12/nv12 / yv12 → rgb / nv12 → rgb group, all of which process 2 or 4 pixels per thread) look like they have the same class of bound-check issue, though I have only directly reproduced the three above.

Verified locally on the attached 1998×1080 reproducer.

Thanks for your great work.

w1998h1080.mp4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions