From 1fa5dafb36ceb51ef3082f968b7656135e802116 Mon Sep 17 00:00:00 2001
From: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
Date: Tue, 17 Mar 2026 13:24:52 -0400
Subject: [PATCH 1/2] Update coco docs for GA release
Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
---
confidential-containers/attestation.rst | 73 +++
.../confidential-containers-deploy.rst | 588 ++++++++++++++++++
confidential-containers/index.rst | 43 +-
confidential-containers/licensing.rst | 27 +
confidential-containers/overview.rst | 265 ++++----
.../supported-platforms.rst | 131 ++++
confidential-containers/versions1.json | 2 +-
.../confidential-containers-deploy.rst | 31 +
gpu-operator/getting-started.rst | 2 +-
gpu-operator/index.rst | 1 +
repo.toml | 3 +-
11 files changed, 1007 insertions(+), 159 deletions(-)
create mode 100644 confidential-containers/attestation.rst
create mode 100644 confidential-containers/confidential-containers-deploy.rst
create mode 100644 confidential-containers/licensing.rst
create mode 100644 confidential-containers/supported-platforms.rst
create mode 100644 gpu-operator/confidential-containers-deploy.rst
diff --git a/confidential-containers/attestation.rst b/confidential-containers/attestation.rst
new file mode 100644
index 000000000..eae0615f6
--- /dev/null
+++ b/confidential-containers/attestation.rst
@@ -0,0 +1,73 @@
+.. license-header
+ SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ SPDX-License-Identifier: Apache-2.0
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+
+.. headings # #, * *, =, -, ^, "
+
+
+.. _attestation:
+
+***********
+Attestation
+***********
+
+The NVIDIA Reference Architecture for Confidential Containers includes built-in remote attestation support for the CPU and GPU. Attestation allows a workload owner to cryptographically verify the guest Trusted Computing Base (TCB) before secrets are released to the workload.
+
+When a workload requires a secret, for example, to decrypt a container image or model, guest components identify the active CPU and GPU enclaves, collect hardware evidence, and send it to a remote verifier/broker such as Trustee. The verifier evaluates the evidence and conditionally releases the secret.
+
+Features that depend on secrets depend on attestation.
+These features include:
+
+* Pulling encrypted images
+* Authenticated container registry support
+* Sealed secrets
+* Direct workload requests for secrets
+
+To use these features, a remote verifier/broker, like Trustee, must be provisioned in a trusted environment.
+Then you can direct your workloads to use the verifier/broker to authenticate and release secrets based on your configured policies.
+
+
+Configure Remote Verifier/Broker (Trustee)
+==========================================
+
+For an overview of attestation with Trustee, refer to the `Trustee documentation `_.
+
+Follow the `upstream Trustee documentation `_ to provision a Trustee instance in a trusted environment.
+This will configure Trustee to use the remote NVIDIA verifier, NVIDIA Remote Attestation Service (NRAS), to evaluate the evidence by default.
+
+.. note::
+
+ If attestation does not succeed after provisioning Trustee, enable debug logging by setting the ``RUST_LOG=debug`` environment variable in the Trustee environment.
+ The Trustee log can then be used to diagnose the attestation process.
+
+Next Steps
+==========
+
+
+* Configure policies to use attestation features.
+
+ `Kata Agent `_ (deployed with ``kata-deploy``) runs inside the guest virtual machine to manage the container lifecycle.
+ It enforces a strict, immutable security policy based on Rego (regorus) that prevents the untrusted host from executing unauthorized commands, such as a malicious kubectl exec.
+ Attestation-dependent features require that these policies permit the relevant operations.
+
+ Refer to the `Kata Containers Agent Policy documentation `_ for more on using policies. You can use the `genpolicy tool `_ (installed with ``kata-deploy``) to autogenerate policies, or write your own manually.
+
+ Refer to the Confidential Containers' `Init-Data `_ documentation for more information on using the genpolicy tool to autogenerate policies.
+
+* Configure workloads to use attestation features.
+ You can configure workloads to use attestation and specify configuration for encrypted images and authenticated container registries.
+
+ Refer to the `Confidential Containers Features `_ documentation for more information on using attestation features.
+
diff --git a/confidential-containers/confidential-containers-deploy.rst b/confidential-containers/confidential-containers-deploy.rst
new file mode 100644
index 000000000..01fbb014e
--- /dev/null
+++ b/confidential-containers/confidential-containers-deploy.rst
@@ -0,0 +1,588 @@
+
+.. license-header
+ SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ SPDX-License-Identifier: Apache-2.0
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+
+.. headings # #, * *, =, -, ^, "
+
+
+.. _confidential-containers-deploy:
+
+******************************
+Deploy Confidential Containers
+******************************
+
+The page describes deploying Kata Containers and the NVIDIA GPU Operator.
+These are key pieces of the NVIDIA Confidential Container Reference Architecture used to manage GPU resources on your cluster and deploy workloads into Confidential Containers.
+
+Before you begin, refer to the :doc:`Confidential Containers Reference Architecture ` for details on the reference architecture and the :doc:`Supported Platforms ` page for the supported platforms.
+
+Overview
+========
+
+The high-level workflow for configuring Confidential Containers is as follows:
+
+#. Configure the :ref:`Prerequisites `.
+
+#. :ref:`Label Nodes ` that you want to use with Confidential Containers.
+
+#. Install the :ref:`latest Kata Containers Helm chart `.
+ This installs the Kata Containers runtime binaries, UVM images and kernels, and TEE-specific shims (such as ``kata-qemu-nvidia-gpu-snp`` or ``kata-qemu-nvidia-gpu-tdx``) onto the cluster's worker nodes.
+
+#. Install the :ref:`NVIDIA GPU Operator configured for Confidential Containers `.
+ This installs the NVIDIA GPU Operator components that are required to deploy GPU passthrough workloads.
+ The GPU Operator uses the node labels to determine what software components to deploy to a node.
+
+After installation, you can change the :ref:`confidential computing mode ` and :ref:`run a sample GPU workload ` in a confidential container.
+You can also configure Attestation with Trustee and the NVIDIA Remote Attestation Service (NRAS).
+Refer to the :doc:`Attestation ` page for more information on configuring attestation.
+
+This guide will configure your cluster to deploy Confidential Containers workloads.
+Once configured, you can schedule workloads that request GPU resources and use the ``kata-qemu-nvidia-gpu-tdx`` or ``kata-qemu-nvidia-gpu-snp`` runtime classes for secure deployment.
+
+.. _coco-prerequisites:
+
+Prerequisites
+=============
+
+* Use a supported platform for Confidential Containers.
+ For more information on machine setup, refer to :doc:`Supported Platforms `.
+
+* Ensure hosts are configured to enable hardware virtualization and Access Control Services (ACS).
+ With some AMD CPUs and BIOSes, ACS might be grouped under Advanced Error Reporting (AER).
+ Enabling these features is typically performed by configuring the host BIOS.
+* Configure hosts to support IOMMU.
+
+ * If the output from running ``ls /sys/kernel/iommu_groups`` includes 0, 1, and so on, then your host is configured for IOMMU.
+ * If the host is not configured or if you are unsure, add the ``amd_iommu=on`` Linux kernel command-line argument. For most Linux distributions, add the argument to the ``/etc/default/grub`` file, for instance::
+
+ ...
+ GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on modprobe.blacklist=nouveau"
+ ...
+
+ Run ``sudo update-grub`` after making the change to configure the bootloader. Reboot the host after configuring the bootloader.
+
+* A Kubernetes cluster with cluster administrator privileges.
+
+* It is recommended that you configure your cluster's ``runtimeRequestTimeout`` in your `kubelet configuration `_ with a higher timeout value than the two minute default.
+ You could set this value to 20 minutes to match the default values for the other image pull timeout values in Kata Containers.
+
+ Refer to the :ref:`Configure Image Pull Timeouts ` section on this page for more details on adjusting the image pull timeout values.
+
+
+* Enable the ``KubeletPodResourcesGet`` Kubelet feature gate on your cluster.
+ The NVIDIA GPU runtime classes use VFIO cold-plug and requires this feature get to be enabled to allow the Kata runtime to query Kubelet's Pod Resources API to discover allocated GPU devices during sandbox creation.
+ Refer to the `Kata runtime (VFIO cold-plug) `_ section in the upstream NVIDIA GPU passthrough guide for more information.
+
+ This feature gate is enabled by default on Kubernetes v1.34 and later, but must be explicitly enabled in older versions.
+
+ Enable the ``KubeletPodResourcesGet`` feature gate by adding it to the ``/var/lib/kubelet/config.yaml`` file.
+
+ .. code-block:: yaml
+
+ featureGates:
+ KubeletPodResourcesGet: true
+
+
+ Restart the Kubelet service to apply the changes.
+
+ .. code-block:: console
+
+ $ sudo systemctl restart kubelet
+
+
+.. _installation-and-configuration:
+
+Installation and Configuration
+===============================
+
+.. _coco-label-nodes:
+
+Label Nodes
+-----------
+
+#. Label the nodes that you want to use with Confidential Containers:
+
+ .. code-block:: console
+
+ $ kubectl label node nvidia.com/gpu.workload.config=vm-passthrough
+
+The GPU Operator uses this label to determine what software components to deploy to a node.
+The ``nvidia.com/gpu.workload.config=vm-passthrough`` label specifies that the node should receive the software components to run Confidential Containers.
+You can use this label on nodes for Confidential Containers workloads, and run traditional container workloads with GPU on other nodes in your cluster.
+
+.. tip::
+
+ Skip this section if you plan to use all nodes in your cluster to run Confidential Containers and instead set ``sandboxWorkloads.defaultWorkload=vm-passthrough`` when installing the GPU Operator.
+
+To check whether the node label has been added, run the following command:
+
+.. code-block:: console
+
+ $ kubectl describe node | grep nvidia.com/gpu.workload.config
+
+Example output:
+
+.. code-block:: output
+
+ nvidia.com/gpu.workload.config: vm-passthrough
+
+.. _coco-install-kata-chart:
+
+Install the Kata Containers Helm Chart
+--------------------------------------
+
+Install Kata Containers using the ``kata-deploy`` Helm chart.
+The ``kata-deploy`` chart installs all required components from the Kata Containers project including the Kata Containers runtime binary, runtime configuration, UVM kernel, and images that NVIDIA uses for Confidential Containers and native Kata containers.
+
+The minimum required version is 3.29.0.
+
+#. Get the latest version of the ``kata-deploy`` Helm chart:
+
+ .. code-block:: console
+
+ $ export VERSION="3.29.0"
+ $ export CHART="oci://ghcr.io/kata-containers/kata-deploy-charts/kata-deploy"
+
+
+#. Install the kata-deploy Helm chart:
+
+ .. code-block:: console
+
+ $ helm install kata-deploy "${CHART}" \
+ --namespace kata-system --create-namespace \
+ --set nfd.enabled=false \
+ --wait --timeout 10m \
+ --version "${VERSION}"
+
+ *Example Output*
+
+ .. code-block:: output
+
+ LAST DEPLOYED: Wed Apr 1 17:03:00 2026
+ NAMESPACE: kata-system
+ STATUS: deployed
+ REVISION: 1
+ DESCRIPTION: Install complete
+ TEST SUITE: None
+
+ .. note::
+
+ Node Feature Discovery (NFD) is deployed by both kata-deploy and the GPU Operator. Pass ``--set nfd.enabled=false`` to disable NFD in the kata-deploy command above, so it will be deployed and managed by the GPU Operator in the next step.
+
+
+#. Optional: Verify that the kata-deploy pod is running:
+
+ .. code-block:: console
+
+ $ kubectl get pods -n kata-system | grep kata-deploy
+
+ *Example Output*
+
+ .. code-block:: output
+
+ NAME READY STATUS RESTARTS AGE
+ kata-deploy-b2lzs 1/1 Running 0 6m37s
+
+#. Optional: View the pod in the kata-system namespace and ensure it is running:
+
+ .. code-block:: console
+
+ $ kubectl get pod,svc -n kata-system
+
+ *Example Output*:
+
+ .. code-block:: output
+
+ NAME READY STATUS RESTARTS AGE
+ pod/kata-deploy-4f658 1/1 Running 0 21s
+
+ Wait a few minutes for kata-deploy to create the base runtime classes.
+
+#. Verify that the ``kata-qemu-nvidia-gpu``, ``kata-qemu-nvidia-gpu-snp``, and ``kata-qemu-nvidia-gpu-tdx`` runtime classes are available:
+
+ .. code-block:: console
+
+ $ kubectl get runtimeclass | grep kata-qemu-nvidia-gpu
+
+ *Example Output*
+
+ .. code-block:: output
+
+ NAME HANDLER AGE
+ kata-qemu-nvidia-gpu kata-qemu-nvidia-gpu 40s
+ kata-qemu-nvidia-gpu-snp kata-qemu-nvidia-gpu-snp 40s
+ kata-qemu-nvidia-gpu-tdx kata-qemu-nvidia-gpu-tdx 40s
+
+ Several runtimes are installed by the ``kata-deploy`` chart.
+ The ``kata-qemu-nvidia-gpu`` runtime class is used with Kata Containers, in a non-Confidential Containers scenario.
+ The ``kata-qemu-nvidia-gpu-snp`` and ``kata-qemu-nvidia-gpu-tdx`` runtime classes are used to deploy Confidential Containers workloads.
+
+.. _coco-install-gpu-operator:
+
+Install the NVIDIA GPU Operator
+--------------------------------
+
+Install the NVIDIA GPU Operator and configure it to deploy Confidential Container components.
+
+#. Add and update the NVIDIA Helm repository:
+
+ .. code-block:: console
+
+ $ helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
+ && helm repo update
+
+#. Install the GPU Operator with the following configuration:
+
+ .. code-block:: console
+
+ $ helm install --wait --generate-name \
+ -n gpu-operator --create-namespace \
+ nvidia/gpu-operator \
+ --set sandboxWorkloads.enabled=true \
+ --set sandboxWorkloads.mode=kata \
+ --set nfd.enabled=true \
+ --set nfd.nodefeaturerules=true \
+ --version=v26.3.0
+
+ .. tip::
+
+ Add ``--set sandboxWorkloads.defaultWorkload=vm-passthrough`` if every worker node should deploy Confidential Containers by default.
+
+ Refer to the :ref:`Confidential Containers Configuration Settings ` section on this page for more details on the Confidential Containers configuration options you can specify when installing the GPU Operator.
+
+ Refer to the :ref:`Common chart customization options ` in :doc:`Installing the NVIDIA GPU Operator ` for more details on the additional general configuration options you can specify when installing the GPU Operator.
+
+ *Example Output*
+
+ .. code-block:: output
+
+ NAME: gpu-operator
+ LAST DEPLOYED: Tue Mar 10 17:58:12 2026
+ NAMESPACE: gpu-operator
+ STATUS: deployed
+ REVISION: 1
+ TEST SUITE: None
+
+#. Verify that all GPU Operator pods, especially the Confidential Computing Manager, Sandbox Device Plugin and VFIO Manager operands, are running:
+
+ .. code-block:: console
+
+ $ kubectl get pods -n gpu-operator
+
+ *Example Output*:
+
+ .. code-block:: output
+
+ NAME READY STATUS RESTARTS AGE
+ gpu-operator-1766001809-node-feature-discovery-gc-75776475sxzkp 1/1 Running 0 86s
+ gpu-operator-1766001809-node-feature-discovery-master-6869lxq2g 1/1 Running 0 86s
+ gpu-operator-1766001809-node-feature-discovery-worker-mh4cv 1/1 Running 0 86s
+ gpu-operator-f48fd66b-vtfrl 1/1 Running 0 86s
+ nvidia-cc-manager-7z74t 1/1 Running 0 61s
+ nvidia-kata-sandbox-device-plugin-daemonset-d5rvg 1/1 Running 0 30s
+ nvidia-sandbox-validator-6xnzc 1/1 Running 1 30s
+ nvidia-vfio-manager-h229x 1/1 Running 0 62s
+
+
+#. Optional: If you have host access to the worker node, you can perform the following validation step:
+
+ a. Confirm that the host uses the vfio-pci device driver for GPUs::
+
+ $ lspci -nnk -d 10de:
+
+ *Example Output*:
+
+ .. code-block:: output
+
+ 65:00.0 3D controller [0302]: NVIDIA Corporation xxxxxxx [xxx] [10de:xxxx] (rev xx)
+ Subsystem: NVIDIA Corporation xxxxxxx [xxx] [10de:xxxx]
+ Kernel driver in use: vfio-pci
+ Kernel modules: nvidiafb, nouveau
+
+
+.. _coco-configuration-settings:
+
+Optional: Confidential Containers Configuration Settings
+--------------------------------------------------------
+
+The following are the available GPU Operator configuration settings to enable Confidential Containers:
+
+.. list-table::
+ :widths: 20 50 30
+ :header-rows: 1
+
+ * - Parameter
+ - Description
+ - Default
+
+ * - ``sandboxWorkloads.enabled``
+ - Enables sandbox workload management in the GPU Operator for virtual
+ machine-style workloads and related operands.
+ - ``false``
+
+ * - ``sandboxWorkloads.defaultWorkload``
+ - Specifies the default type of workload for the cluster, one of ``container``, ``vm-passthrough``, or ``vm-vgpu``.
+
+ Setting ``vm-passthrough`` or ``vm-vgpu`` can be helpful if you plan to run all or mostly virtual machines in your cluster.
+ - ``container``
+
+ * - ``sandboxWorkloads.mode``
+ - Specifies the sandbox mode to use when deploying sandbox workloads.
+ Accepted values are ``kubevirt`` (default) and ``kata``.
+ - ``kubevirt``
+
+ * - ``sandboxDevicePlugin.env``
+ - Optional list of environment variables passed to the NVIDIA Sandbox
+ Device Plugin pod. Each list item is an ``EnvVar`` object with required
+ ``name`` and optional ``value`` fields.
+ - ``[]`` (empty list)
+
+.. _coco-configuration-heterogeneous-clusters:
+
+Optional: Configuring the Sandbox Device Plugin to Use GPU or NVSwitch Specific Resource Types
+----------------------------------------------------------------------------------------------
+
+By default, the NVIDIA GPU Operator creates a single resource type for GPUs, ``nvidia.com/pgpu``.
+In homogenious clusters, were all GPUs are the same type, using a single resource type is fine because all available GPUs are the same type as well.
+
+In heterogeneous clusters, where you have different GPU types on your nodes, you may want to use specific GPU types for your workload.
+To do this, specify an empty ``P_GPU_ALIAS`` environment variable in the sandbox device plugin by
+the following in your GPU Operator installation:
+``--set sandboxDevicePlugin.env[0].name=P_GPU_ALIAS`` and
+``--set sandboxDevicePlugin.env[0].value=""``.
+
+When this valiable is set to ``""``, the sandbox device plugin creates GPU model-specific resource types, for example ``nvidia.com/GH100_H100L_94GB``, instead of the default ``nvidia.com/pgpu`` type.
+Use the exposed device resource types in pod specs by specifying respective resource limits.
+
+Similarly, NVSwitches are exposed as resources of type ``nvidia.com/nvswitch`` by default.
+You can include ``--set sandboxDevicePlugin.env[0].name=NVSWITCH_ALIAS`` and
+``--set sandboxDevicePlugin.env[0].value=""`` for the device plugin environment variable when installing the GPU Operator to configure advertising behavior similar to ``P_GPU_ALIAS``.
+
+.. _coco-run-sample-workload:
+
+Run a Sample Workload
+=====================
+
+A pod manifest for a confidential container GPU workload requires that you specify the ``kata-qemu-nvidia-gpu-snp`` for SEV-SNP or ``kata-qemu-nvidia-gpu-tdx`` for TDX runtime class.
+
+1. Create a file, such as the following ``cuda-vectoradd-kata.yaml`` sample, specifying the kata-qemu-nvidia-gpu-snp runtime class:
+
+ .. code-block:: yaml
+ :emphasize-lines: 7,14
+
+ apiVersion: v1
+ kind: Pod
+ metadata:
+ name: cuda-vectoradd-kata
+ namespace: default
+ spec:
+ runtimeClassName: kata-qemu-nvidia-gpu-snp
+ restartPolicy: Never
+ containers:
+ - name: cuda-vectoradd
+ image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04"
+ resources:
+ limits:
+ nvidia.com/pgpu: "1"
+ memory: 16Gi
+
+ The following are Confidential Containers configurations in the sample manifest:
+
+ * Set the runtime class to ``kata-qemu-nvidia-gpu-snp`` for SEV-SNP or ``kata-qemu-nvidia-gpu-tdx`` for TDX runtime class name, depending on node type where the workloads should run.
+
+ * In the sample above ``nvidia.com/pgpu`` is the default resource type for GPUs.
+ If you are deploying on a heterogeneous cluster, you may want to update the default behavior by specifying the ``P_GPU_ALIAS`` environment variable for the sandbox device plugin.
+ Refer to the :ref:`Configuring the Sandbox Device Plugin to Use GPU or NVSwitch Specific Resource Types ` section on this page for more details.
+
+ * If you have machines supporting multi-GPU passthrough, use a pod deployment manifest which uses 8 PGPU and 4 NVSwitch resources.
+
+ .. code-block:: yaml
+
+ limits:
+ nvidia.com/pgpu: "8"
+ nvidia.com/nvswitch: "4"
+
+ .. note::
+ If you are using NVIDIA Hopper GPUs for multi-GPU passthrough, also refer to :ref:`Managing the Confidential Computing Mode ` for details on how to set the ``ppcie`` mode.
+
+
+2. Create the pod:
+
+ .. code-block:: console
+
+ $ kubectl apply -f cuda-vectoradd-kata.yaml
+
+3. View the logs from pod after the container was started:
+
+ .. code-block:: console
+
+ $ kubectl logs -n default cuda-vectoradd-kata
+
+ *Example Output*
+
+ .. code-block:: output
+
+ [Vector addition of 50000 elements]
+ Copy input data from the host memory to the CUDA device
+ CUDA kernel launch with 196 blocks of 256 threads
+ Copy output data from the CUDA device to the host memory
+ Test PASSED
+ Done
+
+4. Delete the pod:
+
+ .. code-block:: console
+
+ $ kubectl delete -f cuda-vectoradd-kata.yaml
+
+
+.. _managing-confidential-computing-mode:
+
+Managing the Confidential Computing Mode
+=========================================
+
+You can set the default confidential computing mode of the NVIDIA GPUs by setting the ``ccManager.defaultMode=`` option.
+The default value of ccManager.defaultMode is ``on``.
+You can set this option when you install NVIDIA GPU Operator or afterward by modifying the cluster-policy instance of the ClusterPolicy object.
+
+When you change the mode, the manager performs the following actions:
+
+* Evicts the other GPU Operator operands from the node.
+
+ However, the manager does not drain user workloads. You must make sure that no user workloads are running on the node before you change the mode.
+
+* Unbinds the GPU from the VFIO PCI device driver.
+* Changes the mode and resets the GPU.
+* Reschedules the other GPU Operator operands.
+
+The supported modes are:
+
+.. list-table::
+ :widths: 15 85
+ :header-rows: 1
+
+ * - Mode
+ - Description
+ * - ``on``
+ - Enable Confidential Containers.
+ * - ``off``
+ - Disable Confidential Containers.
+ * - ``ppcie``
+ - Enable Confidential Containers with multi-node passthrough on HGX GPUs.
+
+ On the NVIDIA Hopper architecture multi-GPU passthrough uses protected PCIe (PPCIE)
+ which claims exclusive use of the NVSwitches for a single Confidential Container
+ virtual machine.
+ If you are using NVIDIA Hopper GPUs for multi-GPU passthrough,
+ set the GPU mode to ``ppcie`` mode.
+
+ The NVIDIA Blackwell architecture uses NVLink
+ encryption which places the switches outside of the Trusted Computing Base (TCB),
+ meaning the ``ppcie`` mode is not required. Use ``on`` mode in this case.
+ * - ``devtools``
+ - Development mode for software development and debugging.
+
+You can set a cluster-wide default mode and you can set the mode on individual nodes.
+The mode that you set on a node has higher precedence than the cluster-wide default mode.
+
+Setting a Cluster-Wide Default Mode
+------------------------------------
+
+To set a cluster-wide mode, specify the ccManager.defaultMode field like the following example::
+
+ $ kubectl patch clusterpolicies.nvidia.com/cluster-policy \
+ --type=merge \
+ -p '{"spec": {"ccManager": {"defaultMode": "on"}}}'
+
+Setting a Node-Level Mode
+--------------------------
+
+To set a node-level mode, apply the ``nvidia.com/cc.mode=`` label like the following example::
+
+ $ kubectl label node nvidia.com/cc.mode=on --overwrite
+
+The mode that you set on a node has higher precedence than the cluster-wide default mode.
+
+Verifying a Mode Change
+------------------------
+
+To verify that changing the mode was successful, a cluster-wide or node-level change, view the nvidia.com/cc.mode and nvidia.com/cc.mode.state node labels::
+
+ $ kubectl get node -o json | \
+ jq '.metadata.labels | with_entries(select(.key | startswith("nvidia.com/cc.mode")))'
+
+Example output when CC mode is disabled:
+
+.. code-block:: json
+
+ {
+ "nvidia.com/cc.mode": "off",
+ "nvidia.com/cc.mode.state": "on"
+ }
+
+Example output when CC mode is enabled:
+
+.. code-block:: json
+
+ {
+ "nvidia.com/cc.mode": "on",
+ "nvidia.com/cc.mode.state": "on"
+ }
+
+The "nvidia.com/cc.mode.state" variable is either "off" or "on", with "off" meaning that mode state transition is still ongoing and "on" meaning mode state transition completed.
+
+
+Configuring a Multi-GPU Passthrough Support
+===========================================
+
+To configure multi-GPU passthrough, you can specify the following resource limits in your manifests:
+
+.. code-block:: yaml
+
+ limits:
+ nvidia.com/pgpu: "8"
+ nvidia.com/nvswitch: "4"
+
+On the NVIDIA Hopper architecture multi-GPU passthrough uses protected PCIe (PPCIE) which claims exclusive use of the nvswitches for a single CVM.
+When using NVIDIA Hopper nodes for multi-GPU passthrough, transition your relevant node(s) GPU Confidential Computing mode to ``ppcie`` mode by adding the ``nvidia.com/cc.mode=ppcie`` label; see :ref:`Managing the Confidential Computing Mode ` for details.
+The NVIDIA Blackwell architecture uses NVLink encryption which places the switches outside of the Trusted Computing Base (TCB) and only requires the GPU Confidential Computing mode to be set to ``on``.
+
+.. _configure-image-pull-timeouts:
+
+Configure Image Pull Timeouts
+=============================
+
+Using the guest-pull mechanism to securely manage images in your deployment scenarios means that pulling large images may take a significant amount of time and may delay container start.
+This can lead to Kubelet de-allocating your pod before it transitions from the container creating to the container running state.
+
+It is recommended that you configure your cluster's ``runtimeRequestTimeout`` in your `kubelet configuration `_ with a higher timeout value than the two minute default.
+You could set this value to 20 minutes (``20m``) to match the default values for the NVIDIA shim configurations in Kata Containers ``create_container_timeout`` and the agent's ``image_pull_timeout``.
+
+The NVIDIA shim configurations in Kata Containers use a default ``create_container_timeout`` of 1200 seconds (20 minutes).
+This controls the time the shim allows for a container to remain in container creating state.
+
+If you need a timeout of more than 1200 seconds, you will also need to adjust the agent's ``image_pull_timeout`` value which controls the agent-side timeout for guest-image pull.
+To do this, add the ``agent.image_pull_timeout`` kernel parameter to your shim configuration, or pass an explicit value in a pod annotation in the ``io.katacontainers.config.hypervisor.kernel_params: "..."`` annotation.
+
+
+Next Steps
+==========
+
+* Refer to the :doc:`Attestation ` page for more information on configuring attestation.
+* To help manage the lifecycle of Kata Containers, it is recommended that you also install the `Kata Lifecycle Manager `_.
+ This is an Argo Workflows-based lifecycle management tool for Kata Containers upgrades and lifecycle management.
+* Additional NVIDIA Confidential Computing documentation is available at https://docs.nvidia.com/confidential-computing.
+* Licensing information is available on the :doc:`Licensing ` page.
\ No newline at end of file
diff --git a/confidential-containers/index.rst b/confidential-containers/index.rst
index cb806e094..d420f1b69 100644
--- a/confidential-containers/index.rst
+++ b/confidential-containers/index.rst
@@ -17,7 +17,7 @@
.. headings # #, * *, =, -, ^, "
**********************************************************
-NVIDIA Confidential Containers Architecture (Early Access)
+NVIDIA Confidential Containers Architecture
**********************************************************
.. toctree::
@@ -26,10 +26,13 @@ NVIDIA Confidential Containers Architecture (Early Access)
:titlesonly:
Overview
- Deploy Confidential Containers with NVIDIA GPU Operator
+ Supported Platforms
+ Deploy Confidential Containers
+ Attestation
+ Licensing
-This is documentation for NVIDIA's Early Access implementation of Confidential Containers including reference architecture information and supported platforms.
+This is documentation for NVIDIA's implementation of Confidential Containers including reference architecture information and supported platforms.
.. grid:: 3
@@ -39,29 +42,29 @@ This is documentation for NVIDIA's Early Access implementation of Confidential C
:link: overview
:link-type: doc
- Introduction and approach to Confidential Containers.
+ Start here to review the reference architecture, use cases, and software components.
- .. grid-item-card:: :octicon:`project;1.5em;sd-mr-1` Architecture
- :link: coco-architecture
- :link-type: ref
+ .. grid-item-card:: :octicon:`server;1.5em;sd-mr-1` Supported Platforms
+ :link: supported-platforms
+ :link-type: doc
- High-level flow and diagram for Confidential Containers architecture.
+ Learn about the validated hardware, OS, and component versions.
- .. grid-item-card:: :octicon:`briefcase;1.5em;sd-mr-1` Use Cases
- :link: coco-use-cases
- :link-type: ref
+ .. grid-item-card:: :octicon:`rocket;1.5em;sd-mr-1` Deploy Confidential Containers
+ :link: confidential-containers-deploy
+ :link-type: doc
- Regulated industries and workloads that benefit from confidential computing.
+ Use this page to deploy with the NVIDIA GPU Operator on Kubernetes.
- .. grid-item-card:: :octicon:`package;1.5em;sd-mr-1` Components
- :link: coco-supported-platforms-components
- :link-type: ref
+ .. grid-item-card:: :octicon:`shield-check;1.5em;sd-mr-1` Attestation
+ :link: attestation
+ :link-type: doc
- Key software components for confidential containers.
+ Learn about remote attestation, Trustee, and the NVIDIA verifier for GPU workloads.
- .. grid-item-card:: :octicon:`server;1.5em;sd-mr-1` Supported Platforms
- :link: coco-supported-platforms
- :link-type: ref
+ .. grid-item-card:: :octicon:`law;1.5em;sd-mr-1` Licensing
+ :link: licensing
+ :link-type: doc
- Platform and feature support scope for Early Access (EA).
+ Learn about the licensing information for Confidential Containers documentation.
diff --git a/confidential-containers/licensing.rst b/confidential-containers/licensing.rst
new file mode 100644
index 000000000..43d76fff9
--- /dev/null
+++ b/confidential-containers/licensing.rst
@@ -0,0 +1,27 @@
+.. license-header
+ SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ SPDX-License-Identifier: Apache-2.0
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+
+.. headings # #, * *, =, -, ^, "
+
+*********
+Licensing
+*********
+
+While the Confidential Containers (CoCo) Reference Architecture includes some components that are open source, the NVIDIA Confidential Computing capability is a licensed feature for production use cases.
+To use these products, you must have a valid NVIDIA Confidential Computing license.
+Refer to the `NVIDIA Product-Specific Terms for NVIDIA Confidential Computing `_ for more information.
+
+
diff --git a/confidential-containers/overview.rst b/confidential-containers/overview.rst
index 05d9d1331..7ad72ea7b 100644
--- a/confidential-containers/overview.rst
+++ b/confidential-containers/overview.rst
@@ -1,5 +1,5 @@
.. license-header
- SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
@@ -17,28 +17,47 @@
.. headings # #, * *, =, -, ^, "
-******************************************************
-NVIDIA Confidential Containers Overview (Early Access)
-******************************************************
+*****************************************************
+NVIDIA Confidential Containers Reference Architecture
+*****************************************************
-.. admonition:: Early Access
+NVIDIA GPUs with Confidential Computing support provide the hardware foundation for running GPU workloads inside a hardware-enforced Trusted Execution Environment (TEE).
+The NVIDIA Confidential Containers Reference Architecture provides a validated deployment model for cluster administrators interested in to leveraging NVIDIA GPU Confidential Computing capabilities on Kubernetes platforms.
- Confidential Containers are available as Early Access (EA) with curated platform and feature support. EA features are not supported in production and are not functionally complete. API and architectural designs are not final and may change.
+This documentation describes the architecture overview and the key software components, including the NVIDIA GPU Operator and Kata Containers, used to deploy and manage confidential workloads.
+This architecture build on principles of Confidential Computing and `Confidential Containers `__, the cloud-native approach to Confidential Computing.
+Its recommended to be familiar with the basic concepts of Confidential Containers, including attestation, before reading this documentation.
+Refer to the `Confidential Containers `__ documentation for more information.
.. _confidential-containers-overview:
-Overview
-========
+Background
+==========
+
NVIDIA GPUs power the training and deployment of Frontier Models—world-class Large Language Models (LLMs) that define the state of the art in AI reasoning and capability.
As organizations adopt these models in regulated industries such as financial services, healthcare, and the public sector, protecting model intellectual property and sensitive user data becomes essential. Additionally, the model deployment landscape is evolving to include public clouds, enterprise on-premises, and edge. A zero-trust posture on cloud-native platforms such as Kubernetes is essential to secure assets (model IP and enterprise private data) from untrusted infrastructure with privileged user access.
-Securing data at rest and in transit is standard. Protecting data in-use remains a critical gap. Confidential Computing (CC) addresses this gap by providing isolation, encryption, and integrity verification of proprietary application code and sensitive data during processing. CC uses hardware-based Trusted Execution Environments (TEEs), such as AMD SEV-SNP / Intel TDX technologies, and NVIDIA Confidential Computing capabilities to create trusted enclaves.
+Confidential Computing (CC) addresses this gap by using hardware-based Trusted Execution Environments (TEEs), such as AMD SEV-SNP and Intel TDX, with NVIDIA Confidential Computing capabilities to provide isolation, memory encryption, and integrity verification during processing. In addition to isolation, CC provides Remote Attestation, which allows workload owners to cryptographically verify the state of a TEE before providing secrets or sensitive data.
+
+`Confidential Containers `__ (CoCo) is the cloud-native approach of CC on Kubernetes.
+The Confidential Containers project leverages Kata Containers to provide the sandboxing capabilities. `Kata Containers `_ is an open-source project that provides lightweight Utility Virtual Machines (UVMs) that feel and perform like containers while providing strong workload isolation. Along with the Confidential Containers project, Kata enables the orchestration of secure, GPU-accelerated workloads in Kubernetes.
+
+.. _coco-use-cases:
+
+Use Cases
+=========
+
+The target for Confidential Containers is to enable model providers (Closed and Open source) and Enterprises to use the advancements of Gen AI, agnostic to the deployment model (Cloud, Enterprise, or Edge). Some of the key use cases that CC and Confidential Containers enable are:
-In addition to TEEs, Confidential Computing provides Remote Attestation features. Attestation enables remote systems or users to interrogate the security state of a TEE before interacting with it and providing any secrets or sensitive data.
+* **Zero-Trust AI & IP Protection:** You can deploy proprietary models (like LLMs) on third-party or private infrastructure. The model weights remain encrypted and are only decrypted inside the hardware-protected enclave, ensuring absolute IP protection from the host.
+* **Data Clean Rooms:** This allows you to process sensitive enterprise data (like financial analytics or healthcare records) securely. Neither the infrastructure provider nor the model builder can see the raw data.
+
+.. image:: graphics/CoCo-Sample-Workflow.png
+ :alt: Sample Workflow for Securing Model IP on Untrusted Infrastructure with CoCo
+
+*Sample Workflow for Securing Model IP on Untrusted Infrastructure with CoCo*
-`Confidential Containers `_ (CoCo) is the cloud-native approach of CC on Kubernetes.
-The Confidential Containers architecture leverages Kata Containers to provide the sandboxing capabilities. `Kata Containers `_ is an open-source project that provides lightweight Utility Virtual Machines (UVMs) that feel and perform like containers while providing strong workload isolation. Along with the Confidential Containers project, Kata enables the orchestration of secure, GPU-accelerated workloads in Kubernetes.
.. _coco-architecture:
@@ -48,145 +67,104 @@ Architecture Overview
NVIDIA's approach to the Confidential Containers architecture delivers on the key promise of Confidential Computing: confidentiality, integrity, and verifiability.
Integrating open source and NVIDIA software components with the Confidential Computing capabilities of NVIDIA GPUs, the Reference Architecture for Confidential Containers is designed to be the secure and trusted deployment model for AI workloads.
-.. image:: graphics/CoCo-Reference-Architecture.png
- :alt: High-Level Reference Architecture for Confidential Containers
-
-*High-Level Reference Architecture for Confidential Containers*
-
-The key value proposition for this architecture approach is:
+The key values of this architecture approach are:
1. **Built on OSS standards** - The Reference Architecture for Confidential Containers is built on key OSS components such as Kata, Trustee, QEMU, OVMF, and Node Feature Discovery (NFD), along with hardened NVIDIA components like NVIDIA GPU Operator.
2. **Highest level of isolation** - The Confidential Containers architecture is built on Kata containers, which is the industry standard for providing hardened sandbox isolation, and augmenting it with support for GPU passthrough to Kata containers makes the base of the Trusted Execution Environment (TEE).
3. **Zero-trust execution with attestation** - Ensuring the trust of the model providers/data owners by providing a full-stack verification capability with attestation. The integration of NVIDIA GPU attestation capabilities with Trustee based architecture, to provide composite attestation provides the base for secure, attestation based key-release for encrypted workloads, deployed inside the TEE.
-.. _coco-use-cases:
-
-Use Cases
-=========
-
-The target for Confidential Containers is to enable model providers (Closed and Open source) and Enterprises to leverage the advancements of Gen AI, agnostic to the deployment model (Cloud, Enterprise, or Edge). Some of the key use cases that CC and Confidential Containers enable are:
-
-* **Zero-Trust AI & IP Protection:** You can deploy proprietary models (like LLMs) on third-party or private infrastructure. The model weights remain encrypted and are only decrypted inside the hardware-protected enclave, ensuring absolute IP protection from the host.
-* **Data Clean Rooms:** This allows you to process sensitive enterprise data (like financial analytics or healthcare records) securely. Neither the infrastructure provider nor the model builder can see the raw data.
+.. image:: graphics/CoCo-Reference-Architecture.png
+ :alt: High-Level Reference Architecture for Confidential Containers
-.. image:: graphics/CoCo-Sample-Workflow.png
- :alt: Sample Workflow for Securing Model IP on Untrusted Infrastructure with CoCo
+*High-Level Reference Architecture for Confidential Containers*
-*Sample Workflow for Securing Model IP on Untrusted Infrastructure with CoCo*
+The above diagram shows the high-level reference architecture for Confidential Containers and
+the key components that are used to deploy and manage Confidential Containers workloads.
+The components are described in more detail in the next section.
.. _coco-supported-platforms-components:
Software Components for Confidential Containers
===============================================
-The following is a brief overview of the software components for Confidential Containers.
+The following is a brief overview of the software components in NVIDIA's Reference Architecture for Confidential Containers.
+Refer to the diagram above for a visual representation of the components.
+
**Kata Containers**
-Acts as the secure isolation layer by running standard Kubernetes Pods inside lightweight, hardware-isolated Utility VMs (UVMs) rather than sharing the untrusted host kernel. Kata containers are integrated with the Kubernetes `Agent Sandbox `_ project to deliver sandboxing capabilities.
+Acts as the secure isolation layer by running standard Kubernetes Pods inside lightweight, hardware-isolated Utility Virtual Machines (UVMs) rather than sharing the untrusted host kernel.
+`Kata Containers `_ is an open source project, which is integrated with the Kubernetes `Agent Sandbox `_ project, that delivers sandboxing capabilities.
-**NVIDIA GPU Operator**
+**Kata Deploy**
-Automates GPU lifecycle management. For Confidential Containers, it securely provisions GPU support and handles VFIO-based GPU passthrough directly into the Kata confidential VM without breaking the hardware trust boundary.
+Deployment mechanism (often managed with Helm) that installs the Kata runtime binaries, UVM images and kernels, and TEE-specific shims (such as ``kata-qemu-nvidia-gpu-snp`` or ``kata-qemu-nvidia-gpu-tdx``) onto the cluster's worker nodes.
-The GPU Operator deploys the components needed to run Confidential Containers to simplify managing the software required for confidential computing and deploying confidential container workloads:
+Refer to the `Kata Containers documentation `_ for more information.
-* NVIDIA Confidential Computing Manager (cc-manager) for Kubernetes - to set the confidential computing (CC) mode on the NVIDIA GPUs.
-* NVIDIA Sandbox Device Plugin - to discover NVIDIA GPUs along with their capabilities, to advertise these to Kubernetes, and to allocate GPUs during pod deployment.
-* NVIDIA VFIO Manager - to bind discovered NVIDIA GPUs to the vfio-pci driver for VFIO passthrough.
-* NVIDIA Kata Manager for Kubernetes - to create host-side CDI specifications for GPU passthrough.
+**NVIDIA GPU Operator**
-**Kata Deploy**
+Automates GPU lifecycle management.
+For Confidential Containers, it securely provisions GPU support and handles VFIO-based GPU passthrough directly into the Kata confidential Virtual Machine (VM) without breaking the hardware trust boundary.
-Deployment mechanism (often managed via Helm) that installs the Kata runtime binaries, UVM images and kernels, and TEE-specific shims (such as ``kata-qemu-nvidia-gpu-snp`` or ``kata-qemu-nvidia-gpu-tdx``) onto the cluster's worker nodes.
+The GPU Operator deploys the components needed to run Confidential Containers to simplify managing the software required for confidential computing and deploying confidential container workloads.
+The GPU Operator uses node labels to manage the deployment of components to the nodes in your cluster.
+These components include:
-**Node Feature Discovery (NFD)**
+* NVIDIA Confidential Computing Manager (cc-manager) for Kubernetes: Sets the confidential computing (CC) mode on the NVIDIA GPUs.
+* NVIDIA Kata Sandbox Device Plugin: Creates host-side Container Device Interface (CDI) specifications for GPU passthrough and discovers NVIDIA GPUs along with their capabilities, advertises these to Kubernetes, and allocates GPUs during pod deployment.
+* NVIDIA VFIO Manager: Binds discovered NVIDIA GPUs and NVSwitches to the vfio-pci driver for VFIO passthrough.
+
+Refer to the :doc:`NVIDIA GPU Operator ` page for more information on the NVIDIA GPU Operator.
-Bootstraps the node by advertising the node features via labels to make sophisticated scheduling decisions, like installing the Kata/CoCo stack only on the nodes that support the CC prerequisites for CPU and GPU. This feature directs the Operator to install node feature rules that detect CPU security features and the NVIDIA GPU hardware.
+**Node Feature Discovery (NFD)**
-**Trustee**
+Bootstraps the node by advertising the node features using labels to make sophisticated scheduling decisions, like installing the Kata/CoCo stack only on the nodes that support the CC prerequisites for CPU and GPU. This feature directs the Operator to install node feature rules that detect CPU security features and the NVIDIA GPU hardware.
-Attestation and key brokering framework (which includes the Key Broker Service and Attestation Service). It acts as the cryptographic gatekeeper, verifying hardware/software evidence and only releasing secrets if the environment is proven secure.
+Refer to the `Node Feature Discovery documentation `_ for upstream usage and reference material.
+The project source repository is `kubernetes-sigs/node-feature-discovery `_ on GitHub.
+This component is deployed and managed by default by the GPU Operator.
-**Snapshotter (e.g., Nydus)**
+**Snapshotter (for example, Nydus)**
Handles the container image "guest pull" functionality. Used as a remote snapshotter, it bypasses image pulls on the host. Instead, it fetches and unpacks encrypted and signed container images directly inside the protected guest memory, keeping proprietary contents hidden and ensuring image integrity.
+
**Kata Agent and Agent Security Policy**
Runs inside the guest VM to manage the container lifecycle while enforcing a strict, immutable agent security policy based on Rego (regorus). This blocks the untrusted host from executing unauthorized commands, such as a malicious ``kubectl exec``.
+**Trustee and Attestation Service**
+
+Attestation and key brokering framework (which includes the Key Broker Service and Attestation Service). It acts as the cryptographic gatekeeper, verifying hardware/software evidence and only releasing secrets if the environment is proven secure.
+
+
**Confidential Data Hub (CDH)**
An in-guest component that securely receives sealed secrets from Trustee and transparently manages encrypted persistent storage and image decryption for the workload.
-**NVRC (NVIDIA runcom)**
+**NVIDIA Runtime Container (NVRC)**
A minimal hardened init system that securely bootstraps the guest environment, life cycles the kata-agent, provides health checks on started helper daemons while drastically reducing the attack surface.
-Software Stack and Component Versions
---------------------------------------
-The following is the component stack to support the open Reference Architecture (RA) along with the proposed versions of different SW components.
+GPU Operator Cluster Topology Considerations
+--------------------------------------------
+
+The GPU Operator deploys and manages components for allocating and utilizing the GPU resources on your cluster.
+Depending on how you configure the Operator, different components are deployed on the worker nodes.
+When setting up Confidential Containers support, you can configure all the worker nodes in your cluster for running GPU workloads with Confidential Containers, or you can configure some nodes for Confidential Containers and the others for traditional containers.
+This configuration is done through node labelling and configuration flags set during installation or by editing the ClusterPolicy object post installation.
-.. flat-table::
- :header-rows: 1
- * - Category
- - Component
- - Release/Version
- * - :rspan:`1` **HW Platform**
- - GPU Platform
- - | Hopper 100/200
- | Blackwell B200
- | Blackwell RTX Pro 6000
- * - CPU Platform
- - | AMD Genoa/ Milan
- | Intel ER/ GR
- * - :rspan:`7` **Host SW Components**
- - Host OS
- - 25.10
- * - Host Kernel
- - 6.17+
- * - Guest OS
- - Distroless
- * - Guest kernel
- - 6.18.5
- * - OVMF
- - edk2-stable202511
- * - QEMU
- - 10.1 \+ Patches
- * - Containerd
- - 2.2.2 \+
- * - Kubernetes
- - 1.32 \+
- * - :rspan:`3` **Confidential Containers Core Components**
- - NFD
- - v0.6.0
- * - NVIDIA/gpu-operator
- | - NVIDIA VFIO Manager
- | - NVIDIA Sandbox device plugin
- | - NVIDIA Confidential Computing Manager for Kubernetes
- | - NVIDIA Kata Manager for Kubernetes
- - v25.10.0 and higher
- * - CoCo release (EA)
- | - Kata 3.25 (w/ kata-deploy helm)
- | - Trustee/Guest components 0.17.0
- | - KBS protocol 0.4.0
- - v0.18.0
-
-
-Cluster Topology Considerations
--------------------------------
-
-You can configure all the worker nodes in your cluster for running GPU workloads with confidential containers, or you can configure some nodes for Confidential Containers and the others for traditional containers. Consider the following example where node A is configured to run traditional containers and node B is configured to run confidential containers.
+Consider the following example where node A is configured to run traditional containers and node B is configured to run confidential containers.
.. list-table::
:widths: 50 50
:header-rows: 1
- * - Node A - Traditional Containers receives the following software components
- - Node B - Kata CoCo receives the following software components
+ * - Node A - Traditional Container nodes receive the following software components
+ - Node B - Confidential Container nodes receive the following software components
* - * NVIDIA Driver Manager for Kubernetes
* NVIDIA Container Toolkit
* NVIDIA Device Plugin for Kubernetes
@@ -194,52 +172,69 @@ You can configure all the worker nodes in your cluster for running GPU workloads
* NVIDIA MIG Manager for Kubernetes
* Node Feature Discovery
* NVIDIA GPU Feature Discovery
- - * NVIDIA Kata Manager for Kubernetes
- * NVIDIA Confidential Computing Manager for Kubernetes
+ - * NVIDIA Confidential Computing Manager for Kubernetes
* NVIDIA Sandbox Device Plugin
* NVIDIA VFIO Manager
* Node Feature Discovery
-This configuration can be controlled via node labelling, as described in the `GPU Operator confidential containers deployment guide `_.
+This configuration can be controlled through node labelling, as described in the :doc:`Confidential Containers deployment guide `.
-.. _coco-supported-platforms:
+Supported Features and Deployment Scenarios
+===========================================
-Supported Platforms
-===================
+The following features are supported with Confidential Containers:
-Following is the platform and feature support scope for Early Access (EA) of Confidential Containers open Reference Architecture published by NVIDIA.
+* Support for Confidential Container workloads as
-.. flat-table:: Supported Platforms
- :header-rows: 1
-
- * - Component
- - Feature
- * - GPU Platform
- - Hopper 100/200
- * - TEE
- - AMD SEV-SNP only
- * - Feature Support
- - Confidential Containers w/ Kata; Single GPU Passthrough only
- * - Attestation Support
- - Composite Attestation for CPU \+ GPU; integration with Trustee for local verifier.
+ * Single-GPU passthrough (one physical GPU per pod).
+ * Multi-GPU passthrough on NVSwitch (NVLink) based HGX systems.
-Refer to the *Confidential Computing Deployment Guide* at the `Confidential Computing `_ website for information about supported NVIDIA GPUs, such as the NVIDIA Hopper H100, and specifically to `CC deployment guide for SEV-SNP `_ for setup specific to AMD SEV-SNP machines.
+.. note::
-The following topics in the deployment guide apply to a cloud-native environment:
+ For both single and multi GPU Passthrough, all GPUs on the host must be configured for Confidential Computing and all GPUs must be assigned to one Confidential Container virtual machine.
+ Configuring only some GPUs on a node for Confidential Computing is not supported.
-* Hardware selection and initial hardware configuration, such as BIOS settings.
-* Host operating system selection, initial configuration, and validation.
-When following the cloud-native sections in the deployment guide linked above, use Ubuntu 25.10 as the host OS with its default kernel version and configuration.
+* Composite :doc:`attestation ` using Trustee and the NVIDIA Remote Attestation Service (NRAS).
+* Generating Kata Agent Security Policies using the `genpolicy tool `_.
+* Use of `signed sealed secrets `_.
+* Access to authenticated registries for container image guest-pull.
+* Container image signature verification and encrypted container images.
+* Ephemeral container data and image layer storage.
+* Lifecycle management of Kata Containers through the `Kata Lifecycle Manager `_.
-The remaining configuration topics in the deployment guide do not apply to a cloud-native environment. NVIDIA GPU Operator performs the actions that are described in these topics.
+More information on these features can be found in the `Confidential Containers documentation `_.
-Limitations and Restrictions for CoCo EA
-----------------------------------------
+Limitations and Restrictions
+============================
-* Only the AMD platform using SEV-SNP is supported for Confidential Containers Early Access.
-* GPUs are available to containers as a single GPU in passthrough mode only. Multi-GPU passthrough and vGPU are not supported.
-* Support is limited to initial installation and configuration only. Upgrade and configuration of existing clusters to configure confidential computing is not supported.
-* Support for confidential computing environments is limited to the implementation described on this page.
* NVIDIA supports the GPU Operator and confidential computing with the containerd runtime only.
-* NFD doesn't label all Confidential Container capable nodes as such automatically. In some cases, users must manually label nodes to deploy the NVIDIA Confidential Computing Manager for Kubernetes operand onto these nodes as described in the deployment guide.
+* Image signature verification for signed multi-arch images is currently not supported.
+* For both single and multi GPU Passthrough, all GPUs on the host must be configured for Confidential Computing and all GPUs must be assigned to one Confidential Container virtual machine.
+ Configuring only some GPUs on a node for Confidential Computing is not supported.
+
+Next Steps
+==========
+Refer to the following pages to learn more about deploying with Confidential Containers:
+
+.. grid:: 3
+ :gutter: 3
+
+ .. grid-item-card:: :octicon:`server;1.5em;sd-mr-1` Supported Platforms
+ :link: supported-platforms
+ :link-type: doc
+
+ Hardware, OS, and component versions validated for general availability (GA).
+
+ .. grid-item-card:: :octicon:`rocket;1.5em;sd-mr-1` Deploy Confidential Containers
+ :link: confidential-containers-deploy
+ :link-type: doc
+
+ Deploy with the NVIDIA GPU Operator on Kubernetes.
+
+ .. grid-item-card:: :octicon:`shield-check;1.5em;sd-mr-1` Attestation
+ :link: attestation
+ :link-type: doc
+
+ Remote attestation, Trustee, and the NVIDIA verifier for GPU workloads.
+
diff --git a/confidential-containers/supported-platforms.rst b/confidential-containers/supported-platforms.rst
new file mode 100644
index 000000000..5f34ef208
--- /dev/null
+++ b/confidential-containers/supported-platforms.rst
@@ -0,0 +1,131 @@
+.. license-header
+ SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ SPDX-License-Identifier: Apache-2.0
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+
+.. headings # #, * *, =, -, ^, "
+
+.. _coco-supported-platforms:
+
+Supported Platforms
+====================
+
+Following are the platforms supported by the NVIDIA Confidential Containers Reference Architecture.
+
+Supported Hardware Platform
+---------------------------
+
+NVIDIA GPUs
+-----------
+
+.. list-table::
+ :header-rows: 1
+ :widths: 50 50
+
+ * - GPU
+ - Passthrough
+
+ * - NVIDIA HGX H100
+ - Single-GPU
+
+ * - NVIDIA HGX H200
+ - Single-GPU
+
+ * - NVIDIA H100 PCIe
+ - Single-GPU
+
+ * - NVIDIA HGX B200
+ - Single-GPU, Multi-GPU
+
+ * - NVIDIA HGX B300
+ - Single-GPU, Multi-GPU
+
+ * - NVIDIA RTX Pro 6000 BSE
+ - Single-GPU
+
+.. note::
+
+ Multi-GPU passthrough on NVIDIA Hopper HGX systems requires ``ppcie`` mode.
+ Refer to :ref:`Managing the Confidential Computing Mode ` in the deployment guide for details.
+
+.. note::
+
+ For both single and multi GPU Passthrough, all GPUs on the host must be configured for Confidential Computing and all GPUs must be assigned to one Confidential Container virtual machine.
+ Configuring only some GPUs on a node for Confidential Computing is not supported.
+
+CPU Platforms
+-------------
+
+.. flat-table::
+ :header-rows: 1
+
+ * - Category
+ - Operating System
+ - Kernel Version
+ * - AMD Genoa / Milan
+ - Ubuntu 25.10
+ - 6.17+
+ * - Intel Emerald Rapids (ER) / Granite Rapids (GR)
+ - Ubuntu 25.10
+ - 6.17+
+
+For additional information on node configuration, refer to the `Confidential Computing Deployment Guide `_ for information about supported NVIDIA GPUs, such as the NVIDIA Hopper H100.
+
+The following topics in the deployment guide apply to a cloud-native environment:
+
+* Hardware selection and initial hardware configuration, such as BIOS settings.
+* Host operating system selection, initial configuration, and validation.
+
+When following the cloud-native sections in the deployment guide linked above, use Ubuntu 25.10 as the host OS with its default kernel version and configuration.
+
+For additional resources on machine setup:
+
+* Refer to the `NVIDIA Trusted Computing Solutions website `_.
+* Refer to the :doc:`Licensing ` page for more information on the licensing requirements for NVIDIA Confidential Computing capabilities.
+
+Supported Software Components
+-----------------------------
+
+.. flat-table::
+ :header-rows: 1
+
+ * - Component
+ - Release/Version
+ * - Guest OS
+ - Distroless
+ * - Guest kernel
+ - 6.18.5
+ * - OVMF
+ - edk2-stable202511
+ * - QEMU
+ - 10.1 \+ Patches
+ * - Containerd
+ - 2.2.2 \+
+ * - Kubernetes
+ - 1.32 \+
+ * - Node Feature Discovery (NFD)
+ - v0.6.0
+ * - NVIDIA GPU Operator
+ - v26.3.0 and higher
+ * - Kata Containers
+ - 3.29 (installed with ``kata-deploy`` Helm chart)
+ * - Key Broker Service (KBS) protocol
+ - 0.4.0
+ * - Kata Lifecycle Manager
+ - 0.1.4
+
+
+
+
+
diff --git a/confidential-containers/versions1.json b/confidential-containers/versions1.json
index 4d9c5bd4a..c7fbb415b 100644
--- a/confidential-containers/versions1.json
+++ b/confidential-containers/versions1.json
@@ -4,4 +4,4 @@
"url": "../1.0.0",
"version": "1.0.0"
}
- ]
\ No newline at end of file
+]
\ No newline at end of file
diff --git a/gpu-operator/confidential-containers-deploy.rst b/gpu-operator/confidential-containers-deploy.rst
new file mode 100644
index 000000000..38e84a692
--- /dev/null
+++ b/gpu-operator/confidential-containers-deploy.rst
@@ -0,0 +1,31 @@
+.. _confidential-containers-deploy:
+
+***********************
+Confidential Containers
+***********************
+
+The NVIDIA GPU Operator supports deploying Confidential Containers using Kata Containers and NVIDIA Reference Architecture for Confidential Containers.
+
+Documentation for configuring the GPU Operator for Confidential Containers is available in the `NVIDIA Reference Architecture for Confidential Containers documentation `_.
+
+.. grid:: 3
+ :gutter: 3
+
+ .. grid-item-card:: :octicon:`book;1.5em;sd-mr-1` Reference Architecture
+ :link: https://docs.nvidia.com/datacenter/cloud-native/confidential-containers/latest/overview.html
+ :link-type: url
+
+ Overview, reference architecture, and software components for Confidential Containers.
+
+ .. grid-item-card:: :octicon:`rocket;1.5em;sd-mr-1` Deploy
+ :link: https://docs.nvidia.com/datacenter/cloud-native/confidential-containers/latest/confidential-containers-deploy.html
+ :link-type: url
+
+ Deploy Confidential Containers with the NVIDIA GPU Operator on Kubernetes.
+
+ .. grid-item-card:: :octicon:`server;1.5em;sd-mr-1` Supported Platforms
+ :link: https://docs.nvidia.com/datacenter/cloud-native/confidential-containers/latest/supported-platforms.html
+ :link-type: url
+
+ Hardware, host, and component versions validated for general availability (GA).
+
diff --git a/gpu-operator/getting-started.rst b/gpu-operator/getting-started.rst
index f15420d96..62ac510c4 100644
--- a/gpu-operator/getting-started.rst
+++ b/gpu-operator/getting-started.rst
@@ -774,4 +774,4 @@ Installation on Commercially Supported Kubernetes Platforms
- |nvaie-tanzu|_
* - Google Cloud Anthos
- - :external+edge:doc:`anthos-guide`
+ - :external+edge:doc:`anthos-guide`
\ No newline at end of file
diff --git a/gpu-operator/index.rst b/gpu-operator/index.rst
index b3d2546c5..8d2dff342 100644
--- a/gpu-operator/index.rst
+++ b/gpu-operator/index.rst
@@ -56,6 +56,7 @@
:hidden:
KubeVirt
+ Confidential Containers
.. toctree::
:caption: Specialized Networks
diff --git a/repo.toml b/repo.toml
index b96407585..229dd7eac 100644
--- a/repo.toml
+++ b/repo.toml
@@ -197,8 +197,7 @@ redirects = [
{ path="openshift/install-gpu-ocp.html", project="openshift", target="install-gpu-ocp.html" },
{ path="dra-crds.html", target="dra-intro-install.html" },
{ path="dra-gpus.html", target="dra-intro-install.html" },
- { path="confidential-containers-deploy.html", project="confidential-containers", target="overview.html" },
- { path="confidential-containers.html", project="confidential-containers", target="overview.html" },
+ { path="confidential-containers.html", target="confidential-containers-deploy.html" },
]
[repo_docs.projects.gpu-operator.builds.linkcheck]
From a07b423cc2f7eb12d3fa9e7a53e0ea484f8446dd Mon Sep 17 00:00:00 2001
From: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
Date: Thu, 9 Apr 2026 15:33:10 -0400
Subject: [PATCH 2/2] minor fixes
Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
---
.../confidential-containers-deploy.rst | 12 ++++++------
confidential-containers/overview.rst | 6 +++---
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/confidential-containers/confidential-containers-deploy.rst b/confidential-containers/confidential-containers-deploy.rst
index 01fbb014e..9c5d4dd22 100644
--- a/confidential-containers/confidential-containers-deploy.rst
+++ b/confidential-containers/confidential-containers-deploy.rst
@@ -162,10 +162,10 @@ The minimum required version is 3.29.0.
.. code-block:: console
$ helm install kata-deploy "${CHART}" \
- --namespace kata-system --create-namespace \
- --set nfd.enabled=false \
- --wait --timeout 10m \
- --version "${VERSION}"
+ --namespace kata-system --create-namespace \
+ --set nfd.enabled=false \
+ --wait --timeout 10m \
+ --version "${VERSION}"
*Example Output*
@@ -356,7 +356,7 @@ Optional: Configuring the Sandbox Device Plugin to Use GPU or NVSwitch Specific
----------------------------------------------------------------------------------------------
By default, the NVIDIA GPU Operator creates a single resource type for GPUs, ``nvidia.com/pgpu``.
-In homogenious clusters, were all GPUs are the same type, using a single resource type is fine because all available GPUs are the same type as well.
+In clusters where all GPUs are the same type, using a single resource type is fine because all available GPUs are the same type as well.
In heterogeneous clusters, where you have different GPU types on your nodes, you may want to use specific GPU types for your workload.
To do this, specify an empty ``P_GPU_ALIAS`` environment variable in the sandbox device plugin by
@@ -364,7 +364,7 @@ the following in your GPU Operator installation:
``--set sandboxDevicePlugin.env[0].name=P_GPU_ALIAS`` and
``--set sandboxDevicePlugin.env[0].value=""``.
-When this valiable is set to ``""``, the sandbox device plugin creates GPU model-specific resource types, for example ``nvidia.com/GH100_H100L_94GB``, instead of the default ``nvidia.com/pgpu`` type.
+When this variable is set to ``""``, the sandbox device plugin creates GPU model-specific resource types, for example ``nvidia.com/GH100_H100L_94GB``, instead of the default ``nvidia.com/pgpu`` type.
Use the exposed device resource types in pod specs by specifying respective resource limits.
Similarly, NVSwitches are exposed as resources of type ``nvidia.com/nvswitch`` by default.
diff --git a/confidential-containers/overview.rst b/confidential-containers/overview.rst
index 7ad72ea7b..819f9d44f 100644
--- a/confidential-containers/overview.rst
+++ b/confidential-containers/overview.rst
@@ -22,11 +22,11 @@ NVIDIA Confidential Containers Reference Architecture
*****************************************************
NVIDIA GPUs with Confidential Computing support provide the hardware foundation for running GPU workloads inside a hardware-enforced Trusted Execution Environment (TEE).
-The NVIDIA Confidential Containers Reference Architecture provides a validated deployment model for cluster administrators interested in to leveraging NVIDIA GPU Confidential Computing capabilities on Kubernetes platforms.
+The NVIDIA Confidential Containers Reference Architecture provides a validated deployment model for cluster administrators interested in leveraging NVIDIA GPU Confidential Computing capabilities on Kubernetes platforms.
This documentation describes the architecture overview and the key software components, including the NVIDIA GPU Operator and Kata Containers, used to deploy and manage confidential workloads.
-This architecture build on principles of Confidential Computing and `Confidential Containers `__, the cloud-native approach to Confidential Computing.
-Its recommended to be familiar with the basic concepts of Confidential Containers, including attestation, before reading this documentation.
+This architecture builds on principles of Confidential Computing and `Confidential Containers `__, the cloud-native approach to Confidential Computing.
+It is recommended to be familiar with the basic concepts of Confidential Containers, including attestation, before reading this documentation.
Refer to the `Confidential Containers `__ documentation for more information.
.. _confidential-containers-overview: