You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Source is any 1998×1080 (or other non-mod-4 width) 4:4:4 RGB video. Decode out.mp4 to rgb48 and inspect x=1996, 1997: every row/frame is a fixed colour (e.g. (0, 34796, 0)) independent of the source — it's stale surface memory.
Root cause
NVEncCore/NVEncFilterCrop.cukernel_crop_rgb_yuv444 processes 4 pixels per thread, guarded at line 1328 by:
if (x + PIX_PER_THREAD - 1 < dstWidth && y < dstHeight) { /* vectorized path */ }
and launched with divCeil(width, block.x * 4). For width=1998 the last tile's thread sees x=1996, 1996+3 < 1998 is false, no scalar fallback, so columns 1996–1997 are never written.
Same pattern, observed
guard
kernel
effect
NVEncFilterCrop.cu:1328
kernel_crop_rgb_yuv444
tail drop (stale surface)
NVEncFilterCrop.cu:1463
kernel_crop_yuv444_rgb
symmetric tail drop
NVEncFilterCrop.cu:2232
kernel_crop_rgb_rgb
guard is only x < dstWidth with a 4-pixel write → writes 1–3 pixels past dstWidth into pitch padding
The other packed-write crop kernels in this file (the rgb3/rgb4 → yv12/nv12 / rgb → yv12/nv12 / yv12 → rgb / nv12 → rgb group, all of which process 2 or 4 pixels per thread) look like they have the same class of bound-check issue, though I have only directly reproduced the three above.
Verified locally on the attached 1998×1080 reproducer.
Repro (Version 9.16)
Source is any 1998×1080 (or other non-mod-4 width) 4:4:4 RGB video. Decode
out.mp4to rgb48 and inspect x=1996, 1997: every row/frame is a fixed colour (e.g.(0, 34796, 0)) independent of the source — it's stale surface memory.Root cause
NVEncCore/NVEncFilterCrop.cukernel_crop_rgb_yuv444processes 4 pixels per thread, guarded at line 1328 by:and launched with
divCeil(width, block.x * 4). Forwidth=1998the last tile's thread seesx=1996,1996+3 < 1998is false, no scalar fallback, so columns 1996–1997 are never written.Same pattern, observed
NVEncFilterCrop.cu:1328kernel_crop_rgb_yuv444NVEncFilterCrop.cu:1463kernel_crop_yuv444_rgbNVEncFilterCrop.cu:2232kernel_crop_rgb_rgbx < dstWidthwith a 4-pixel write → writes 1–3 pixels pastdstWidthinto pitch paddingThe other packed-write crop kernels in this file (the rgb3/rgb4 → yv12/nv12 / rgb → yv12/nv12 / yv12 → rgb / nv12 → rgb group, all of which process 2 or 4 pixels per thread) look like they have the same class of bound-check issue, though I have only directly reproduced the three above.
Verified locally on the attached 1998×1080 reproducer.
Thanks for your great work.
w1998h1080.mp4