Skip to content

Video Encode/Decode#226

Open
Daedie-git wants to merge 42 commits into
NVIDIA-RTX:mainfrom
Daedie-git:video-queues
Open

Video Encode/Decode#226
Daedie-git wants to merge 42 commits into
NVIDIA-RTX:mainfrom
Daedie-git:video-queues

Conversation

@Daedie-git
Copy link
Copy Markdown

The open encode/decode enticed me to try and get VK hardware decoding working in my software (until now, I only had it on the D3D12 side). I expect this will need further changes, just let me know.
I wasn't sure what the right way to handle the D3D12 CommandBuffer was, I settled on the variant for now.

Summary

Adds hardware video queue support and a minimal native-backed video encode/decode extension API.

This branch makes video-capable queues visible through NRI, allows D3D12/Vulkan video command buffers to be created and submitted, adds
basic multi-planar video format support, and exposes NRIVideo entry points for issuing native encode/decode commands.

Changes

Public API

  • Adds QueueType::VIDEO_DECODE
  • Adds QueueType::VIDEO_ENCODE
  • Adds multi-planar 4:2:0 formats:
    • G8_B8R8_2PLANE_420_UNORM
    • G10X6_B10X6R10X6_2PLANE_420_UNORM_3PACK16
    • G16_B16R16_2PLANE_420_UNORM
  • Adds explicit image plane bits for multi-planar images:
    • PLANE_0
    • PLANE_1
    • PLANE_2
  • Adds Include/Extensions/NRIVideo.h
    • VideoInterface
    • CmdDecodeVideo
    • CmdEncodeVideo
    • native D3D12/Vulkan encode/decode descriptor structs

Adapter and Queue Discovery

  • Reports D3D12 video decode/encode queue availability when ID3D12VideoDevice is available
  • Adds Vulkan queue family selection support for video queues
  • Uses VkQueueFamilyVideoPropertiesKHR to ensure Vulkan video queues also expose compatible codec operations
  • Reuses queue family create infos when multiple NRI queue types map to the same Vulkan queue family
  • Tracks queue storage for the new video queue types in D3D12 and Vulkan devices

D3D12 Backend

  • Maps NRI video queues to:
    • D3D12_COMMAND_LIST_TYPE_VIDEO_DECODE
    • D3D12_COMMAND_LIST_TYPE_VIDEO_ENCODE
  • Refactors CommandBufferD3D12 to hold graphics, video decode, or video encode command lists
  • Supports begin/end/reset/close for D3D12 video command lists
  • Submits video command lists through the existing queue submission path
  • Adds CmdDecodeVideo via ID3D12VideoDecodeCommandList::DecodeFrame
  • Adds CmdEncodeVideo via ID3D12VideoEncodeCommandList2::EncodeFrame
  • Updates native command buffer documentation to return ID3D12CommandList*, since video command lists are not graphics command lists

Vulkan Backend

  • Adds video-related desired device extensions, including:
    • VK_KHR_video_queue
    • VK_KHR_video_decode_queue
    • H.264/H.265/AV1 decode extensions
    • VK_KHR_video_encode_queue
    • video maintenance extensions
    • sampler YCbCr conversion
  • Adds Vulkan format mappings for the new multi-planar formats
  • Adds Vulkan plane aspect mapping for explicit plane views
  • Allows texture views to select explicit multi-planar aspects
  • Resolves vkCmdDecodeVideoKHR and vkCmdEncodeVideoKHR
  • Adds CmdDecodeVideo / CmdEncodeVideo forwarding to Vulkan command buffers

Validation / Interface Wiring

  • Wires VideoInterface through nriGetInterface
  • Adds DeviceBase::FillFunctionTable(VideoInterface&)
  • Adds backend FillFunctionTable(VideoInterface&) implementations
  • Adds validation-layer storage and forwarding for the video interface
  • Reports unsupported when the backend/device does not expose video decode or encode queues

API Shape

The new video API is intentionally small and native-backed.

NRI handles queue discovery, command buffer ownership, function table exposure, and command dispatch. Codec/session setup remains
backend-native (initially):

  • D3D12 callers pass native decoder/encoder objects and D3D12 argument structs
  • Vulkan callers pass native VkVideoDecodeInfoKHR / VkVideoEncodeInfoKHR payloads

Validation

Built successfully with:

  • Vulkan/validation-focused configuration
  • D3D12-focused configuration
  • small encode -> decode unit test in my own repo

@dzhdanNV
Copy link
Copy Markdown
Collaborator

Wow! Magnificent! Thanks a ton for bringing this work to life! I will try to review later today.

@dzhdanNV
Copy link
Copy Markdown
Collaborator

Question 1

Should VideoDecodeD3D12Desc, VideoEncodeD3D12Desc, VideoDecodeVKDesc and VideoEncodeVKDesc be moved into corresponding wrappers and the underlying objects created via NRI? (not a pro in video APIs yet)

Question 2

Should we add more formats? We should invent better names :) DXGI has short names, but they are unclear (seems to be a good start):

    G8_B8R8_2PLANE_420_UNORM,
    G10X6_B10X6R10X6_2PLANE_420_UNORM_3PACK16,
    G16_B16R16_2PLANE_420_UNORM,

Question 3

No D3D11 support? Probably not needed, but, IM, nice to have :)

(...to be continued)

@Daedie-git
Copy link
Copy Markdown
Author

Q1
I'll get back to you on this.

Q2:
How about this

NV12_UNORM
P010_UNORM
P016_UNORM

Q3
D3D11 video semantics work very differently. So regardless if it is added or not, I would do that in a separate PR.

@dzhdanNV
Copy link
Copy Markdown
Collaborator

Q2

+1, clarifying comments may be added if needed. They are pretty standard.

Q3

No objections here (assuming the interface is implementable in D3D11)

@dzhdanNV
Copy link
Copy Markdown
Collaborator

Do we need other formats, like DXGI_FORMAT_Y410 (10:10:10)?

@dzhdanNV
Copy link
Copy Markdown
Collaborator

Please, add this link somewhere in the beginning of NRIVideo.h: https://learn.microsoft.com/en-us/windows/win32/medfound/recommended-8-bit-yuv-formats-for-video-rendering

@dzhdanNV
Copy link
Copy Markdown
Collaborator

dzhdanNV commented Apr 28, 2026

I'm ready to merge it after you are done with polishing. Just let me know. Further improvements can be made in main. Can we create a minimalistic sample in https://github.com/NVIDIA-RTX/NRISamples for this?

I haven't deeply analyzed the implementation, but the API additions are nice and clear.

Also I think that these formats:

    // Depth-stencil (as a shader resource view)
    R24_UNORM_X8,       // .x - depth   // + . . . . . . . . . . . . . . .
    X24_G8_UINT,        // .y - stencil // + . . . . . . . . . . . . . . .
    R32_SFLOAT_X8_X24,  // .x - depth   // + . . . . . . . . . . . . . . .
    X32_G8_UINT_X24     // .y - stencil // + . . . . . . . . . . . . . . .

can be removed and readonlyPlanes can be used instead explicitly. readonlyPlanes are needed for RTV and actually for SRV (you already use it for video). It's a bit of a breaking change, but, IMO, a step towards better API. What do you think? Just asking for your opinion, I will implement it myself.

@Daedie-git
Copy link
Copy Markdown
Author

I'm ready to merge it after you are done with polishing. Just let me know. Further improvements can be made in main. Can we create a minimalistic sample in https://github.com/NVIDIA-RTX/NRISamples for this?

Definitely. I'll look into it as soon as I can.

I haven't deeply analyzed the implementation, but the API additions are nice and clear.

Also I think that these formats:

    // Depth-stencil (as a shader resource view)
    R24_UNORM_X8,       // .x - depth   // + . . . . . . . . . . . . . . .
    X24_G8_UINT,        // .y - stencil // + . . . . . . . . . . . . . . .
    R32_SFLOAT_X8_X24,  // .x - depth   // + . . . . . . . . . . . . . . .
    X32_G8_UINT_X24     // .y - stencil // + . . . . . . . . . . . . . . .

can be removed and readonlyPlanes can be used instead explicitly. readonlyPlanes are needed for RTV and actually for SRV (you already use it for video). It's a bit of a breaking change, but, IMO, a step towards better API. What do you think? Just asking for your opinion, I will implement it myself.

I like that.

@Daedie-git
Copy link
Copy Markdown
Author

Do we need other formats, like DXGI_FORMAT_Y410 (10:10:10)?

The current set targets the common 4:2:0 format family needed by the decode/encode path: NV12, P010, P016. Y410 is packed 4:4:4 and would add support for it in a follow-up if the need arises.

@vertver
Copy link
Copy Markdown
Contributor

vertver commented Apr 29, 2026

I don't like that you're using native API structures in NRI public API. Wouldn't be logical to implemented it as a translation layer, not in the NRIWrapper* way?

dzhdanNV added a commit that referenced this pull request Apr 30, 2026
- "TextureViewDesc::readonlyPlanes" renamed to "planes". Logic switched to "direct" from "inverse"
- removed special depth-stencil format for shader resource views ("R24_UNORM_X8", "X24_G8_UINT", "R32_SFLOAT_X8_X24" and "X32_G8_UINT_X24"). Use "planes" instead
- explained valid usage

OTHER:
- VK: untangled "planes" (aspect mask) logic to be friendly for subpass inputs
- prerequisite for video support (PR #226)
@dzhdanNV
Copy link
Copy Markdown
Collaborator

I like that.

@Daedie-git I have implemented it. Please, update your code to the latest since it's a prerequisite for the video support. Please, note that I have removed PR #222 since the code is completely different for these lines. Feel free to adjust, re-add or discuss. Thanks in advance! We are moving forward!

I don't like that you're using native API structures in NRI public API. Wouldn't be logical to implemented it as a translation layer, not in the NRIWrapper* way?

Just echoing already said, I absolutely agree with this. GAPI specific structs must be moved to corresponding NRIWrapperX extension and NRIVideo must get an abstraction. This is what we should focus on.

@Daedie-git
Copy link
Copy Markdown
Author

I like that.

@Daedie-git I have implemented it. Please, update your code to the latest since it's a prerequisite for the video support. Please, note that I have removed PR #222 since the code is completely different for these lines. Feel free to adjust, re-add or discuss. Thanks in advance! We are moving forward!

I don't like that you're using native API structures in NRI public API. Wouldn't be logical to implemented it as a translation layer, not in the NRIWrapper* way?

Just echoing already said, I absolutely agree with this. GAPI specific structs must be moved to corresponding NRIWrapperX extension and NRIVideo must get an abstraction. This is what we should focus on.

Indeed. I have addressed this locally already. I'll rebase it on your change and push it soon.

@Daedie-git
Copy link
Copy Markdown
Author

I wrapped up my changes.

@Daedie-git
Copy link
Copy Markdown
Author

I found some issues, so I will follow up with another commit.

@Daedie-git
Copy link
Copy Markdown
Author

Additional question: For the sake of testing this, I've been building up quite the test suite in my own repo. Do you want to have it in some form?
Apart from the samples you already asked for.

@dzhdanNV
Copy link
Copy Markdown
Collaborator

dzhdanNV commented May 6, 2026

Just compared prev with the latest - it's getting better!

FEEDBACK:

One more thing (just spotted):

    if (isVideoDecode || isVideoEncode) {
        if (m_Desc.videoCodec == VideoCodec::NONE)
            return Result::INVALID_ARGUMENT;
            
        ...
    }            

This check (and all similar) needs to be moved to Validation. NRI implementations never check for correctness every argument, except a rare/special case when an unhandled crash may happen. In this particular case, this is a task for Validation.

@Daedie-git
Copy link
Copy Markdown
Author

Daedie-git commented May 6, 2026

I think I have addressed all your feedback so far
Also made some additional changes, to improve consistency with the rest of the repo (like destroy functions taking ptr instead of ref)

@dzhdanNV
Copy link
Copy Markdown
Collaborator

dzhdanNV commented May 7, 2026

Thanks. Apologies, I was busy with other stuff. I will review tomorrow.

@dzhdanNV
Copy link
Copy Markdown
Collaborator

dzhdanNV commented May 7, 2026

Quick stuff:

FEEDBACK:

  • add new entities to .natvis, these:
NriForwardStruct(VideoSession);
NriForwardStruct(VideoSessionParameters);
NriForwardStruct(VideoPicture);
  • move new entities to new files with corresponding names
  • ensure that formatting and style matches other files
    • there are weird comments separating NRI API stuff from other stuff (please, add too ^_^). Example from PipelineVK.h:
      Result Create(const GraphicsPipelineDesc& graphicsPipelineDesc);
      Result Create(const ComputePipelineDesc& computePipelineDesc);
      Result Create(const RayTracingPipelineDesc& rayTracingPipelineDesc);
      Result Create(const PipelineVKDesc& pipelineVKDesc);
    
      //================================================================================================================
      // DebugNameBase
      //================================================================================================================
    
      void SetDebugName(const char* name) NRI_DEBUG_NAME_OVERRIDE;
    
      //================================================================================================================
      // NRI
      //================================================================================================================
    
      Result WriteShaderGroupIdentifiers(uint32_t baseShaderGroupIndex, uint32_t shaderGroupNum, void* dst) const;
  • update 2021 NVIDIA Corporation to 2026 :)
  • move videoFeatures after shaderFeatures
  • looks like some limits are missing in videoFeatures. Probably it's better to add something like GetVideoCapabilities to nicely wrap vkGetPhysicalDeviceVideoCapabilitiesKHR and be sure that a video session is supported. videoFeatures is nice and clear in the current form. Don't forget about alignment requirements, which may be picky on some weird platforms

@Daedie-git
Copy link
Copy Markdown
Author

Hi, I committed some changes to bring the supported feature set of both backends in line. There were some things that were accepted by one back-end that weren't by the other.

Next I will address your latest feedback.

@dzhdanNV
Copy link
Copy Markdown
Collaborator

dzhdanNV commented May 8, 2026

Thanks for the changes.

FEEDBACK:

I asked to change year only in the newly added header. Not everywhere :) The year of publication must be there. I personally prefer to not use "range" to avoid the noise in diffs. Please, revert all related changes in the already existing files.

@dzhdanNV
Copy link
Copy Markdown
Collaborator

dzhdanNV commented May 8, 2026

FEEDBACK:

  • WrapperVK: add forward declarations for constructions like this and remove the comment:
const void* vkDecodeInfo;           // VkVideoDecodeInfoKHR*
  • WrapperD3D12: same as above
  • WrapperD3D12: replace // optional with NriOptional
  • NRIVideo.h has many reserved. They are not used everywhere. Looks like corresponding structs are never mem-copied. Would it be possible to remove?
  • reformat constructions like:
    NriOptional const NriPtr(VideoH265VideoParameterSetDesc) videoParameterSets; // if provided, must include "videoParameterSetNum" entries
    uint32_t videoParameterSetNum;

to:

    const NriPtr(VideoH265VideoParameterSetDesc) videoParameterSets;
    uint32_t videoParameterSetNum;
  • this block is OK for now since it's very tricky:
    NriOptional const uint16_t* miColumnStarts; // if provided, must include "columnNum + 1" entries
    NriOptional const uint16_t* miRowStarts; // if provided, must include "rowNum + 1" entries
    NriOptional const uint16_t* widthInSuperblocksMinus1; // if provided, must include "columnNum" entries
    NriOptional const uint16_t* heightInSuperblocksMinus1; // if provided, must include "rowNum" entries
  • ... const NriPtr(VideoH265ShortTermRefPicSetDesc) shortTermRefPicSets ... is an outlier. Move numShortTermRefPicSets right after the pointer
  • remove 1-entry enum VideoEncodeRateControlMode and all references of it, including error reporting :) IMO, it does nothing, since only CQP is supported. It may be clarified in VideoEncodeRateControlDesc

remove 1-entry enum VideoEncodeRateControlMode

(OPTIONAL) Gemini suggests this:

Rate Control (VBR/CBR): In VideoEncodeRateControlMode, you only have CQP (Constant Quantization Parameter).
In production, CBR (Constant Bitrate) and VBR (Variable) are much more common for streaming.
You likely need a VideoEncodeRateControlDesc that includes targetBitrate and maxBitrate.

(OPTIONAL) Gemini suggests this:

Missing "GetVideoEncodePictureStates"? You have a brilliant helper "GetVideoDecodePictureStates".
To make the encoder just as "comfortable" to use, you should add a "GetVideoEncodePictureStates".
The encoder's reconstructed picture is technically a "write-only" target for the encoder but needs to
be "read-only" for the DPB in subsequent frames. If the user wants to peek at it on the Graphics side,
the "afterDecode / graphicsBefore" logic you have for decoders is exactly what’s needed for encoders too.

@dzhdanNV
Copy link
Copy Markdown
Collaborator

dzhdanNV commented May 8, 2026

AI review (AI vs AI? ^_^)

I think it makes sense to post, since it looks legit. I have inlined my comments:

Functional Improvements

  • Rate Control: Add CBR and VBR modes; include targetBitrate and maxBitrate in VideoEncodeRateControlDesc.
  • Symmetry: Implement GetVideoEncodePictureStates to mirror the existing decode helper for cross-queue transitions.
  • Latency: Ensure the interface explicitly supports sub-frame/slice-level submission for XR and cloud gaming.
  • HW Offload: Add optional flags for hardware-accelerated scaling and colorspace conversion within the video pass.

Technical Refinements

  • AV1 Safety: Change const uint8_t* savedOrderHints to a fixed-size uint8_t[8] to prevent null-pointer or size errors.
    [ME: how to detect defaults then? feel free to ignore]
  • Struct Bloat: Consolidate overlapping H.264 and H.265 picture descriptors to reduce redundancy between encode/decode paths.
    [ME: probably not the best idea, feel free to ignore]
  • Documentation: Define ownership for dstSlot and referenceSlot (application-managed virtual DPB vs. driver-managed hardware indices).
    [ME: bullshit?]

Synchronization Logic

[ME: just to keep in mind, all good here]

  • Timeline Fences: Standardize on the incrementing nri::Fence for all cross-queue handoffs (Video -> Graphics -> Video).
  • Sharing Modes: Ensure reconstructedPicture textures default to SharingMode::CONCURRENT on Vulkan to avoid unnecessary ownership transfers.

@dzhdanNV
Copy link
Copy Markdown
Collaborator

Thank you. I will review the latest today.

@dzhdanNV
Copy link
Copy Markdown
Collaborator

Much better! But a bit more iterations are needed:

FEEDBACK:

  • please, move implementations of:
NriForwardStruct(VideoSession);
NriForwardStruct(VideoSessionParameters);
NriForwardStruct(VideoPicture);

into corresponding hpp files. Currently ImplX.cpp contains all the implementations and it doesn't look great (it affects all backends)

  • please, ask AI to be sure to not modify the original code which doesn't overlaps with video support. For example, this block is video unrelated but it was modified without a need:
    if (SUCCEEDED(hr)) {
        m_Desc.shaderStage.compute.dispatchMaxDim[0] = options22.Max1DDispatchSize;
        m_Desc.shaderStage.task.dispatchMaxDim[0] = options22.Max1DDispatchMeshSize;
    }

original (I like it better, it matches 21 other of such blocks):

    if (FAILED(hr))
        NRI_REPORT_WARNING(this, "ID3D12Device::CheckFeatureSupport(options22) failed, result = 0x%08X!", hr);
    m_Desc.shaderStage.compute.dispatchMaxDim[0] = options22.Max1DDispatchSize;
    m_Desc.shaderStage.task.dispatchMaxDim[0] = options22.Max1DDispatchMeshSize;

I suspect there maybe more such blocks, I think AI is capable to compare with the original and fix all the cases. I see some video-unrelated changes, which probably makes sense. Ask LLM to log such cases and explain the reasoning. It's an optional request

@dzhdanNV
Copy link
Copy Markdown
Collaborator

I'm really happy that you have been working on this. And I will accept the code. Please, don't give up!

FEEDBACK:

This I found myself, but asked AI to type for me to save a bit of my time ;)

This is messy (with all dependencies):

    using CommandList = std::variant<ComPtr<ID3D12GraphicsCommandListBest>, ComPtr<ID3D12VideoDecodeCommandList>, ComPtr<ID3D12VideoEncodeCommandList>>;

    ID3D12CommandList* GetCommandList() const;

SUGGESTION:

Rewrite the code with all dependencies as:

// Change member from std::variant to base ComPtr
private:
    ComPtr<ID3D12CommandList> m_CommandList;

// Replace existing getters with type-specific static_casts
private:
    inline ID3D12GraphicsCommandListBest* GetGraphicsCommandList() const {
        NRI_CHECK(
            m_CommandListType == D3D12_COMMAND_LIST_TYPE_DIRECT || 
            m_CommandListType == D3D12_COMMAND_LIST_TYPE_COMPUTE || 
            m_CommandListType == D3D12_COMMAND_LIST_TYPE_COPY
        );
        return static_cast<ID3D12GraphicsCommandListBest*>(m_CommandList.Get());
    }

    inline ID3D12VideoDecodeCommandList* GetVideoDecodeCommandList() const {
        NRI_CHECK(m_CommandListType == D3D12_COMMAND_LIST_TYPE_VIDEO_DECODE);
        return static_cast<ID3D12VideoDecodeCommandList*>(m_CommandList.Get());
    }

    inline ID3D12VideoEncodeCommandList* GetVideoEncodeCommandList() const {
        NRI_CHECK(m_CommandListType == D3D12_COMMAND_LIST_TYPE_VIDEO_ENCODE);
        return static_cast<ID3D12VideoEncodeCommandList*>(m_CommandList.Get());
    }

and fix usages like:

void CommandBufferD3D12::ResourceBarrier(uint32_t barrierNum, const D3D12_RESOURCE_BARRIER* barriers) const {
   if (m_CommandListType == D3D12_COMMAND_LIST_TYPE_VIDEO_DECODE)
       GetVideoDecodeCommandList()->ResourceBarrier(barrierNum, barriers);
   else if (m_CommandListType == D3D12_COMMAND_LIST_TYPE_VIDEO_ENCODE)
       GetVideoEncodeCommandList()->ResourceBarrier(barrierNum, barriers);
   else
       GetGraphicsCommandList()->ResourceBarrier(barrierNum, barriers);
}

and remove CommandBufferD3D12::GetCommandList() which is just m_CommandList now.

@dzhdanNV
Copy link
Copy Markdown
Collaborator

FEEDBACK:

  • add NriOptional to h265ReferenceDescs in VideoEncodeDesc
  • (AI suggested) add support for encode Directives: add a bitmask to VideoEncodeDesc for common per-frame overrides like FORCE_IDR_FRAME or END_OF_STREAM. Something like:
NriBits(VideoEncodeBits, uint8_t,
    NONE            = 0,
    // Forces the current frame to be an IDR (H.26x) or Key Frame (AV1)
    // Useful for scene cuts or starting a new stream segment
    FORCE_KEY_FRAME = NriBit(0), 
    
    // Signals that this is the last frame of the sequence
    // The hardware will flush its internal pipeline and write any pending data
    END_OF_STREAM   = NriBit(1)
);

add Nri(VideoEncodeBits) flags; to VideoEncodeDesc

@Daedie-git
Copy link
Copy Markdown
Author

I will follow up with more changes soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants