You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: confidential-containers/attestation.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,6 +40,6 @@ Following the upstream Trustee documentation, add the following annotation to th
40
40
41
41
Now, the guest can be used with attestation. For more information on how to provision Trustee with resources and policies, refer to the `Trustee documentation <https://confidentialcontainers.org/docs/attestation/>`_.
42
42
43
-
During attestation, the GPU will be set to ready. As such, when running a workload that does attestation, it is not necessary to set the ``nvrc.smi.srs=1`` and ``RUST_LOG=debug`` kernel parameters.
43
+
During attestation, the GPU will be set to ready. As such, when running a workload that does attestation, it is not necessary to set the ``nvrc.smi.srs=1`` kernel parameters.
44
44
45
45
If attestation does not succeed, debugging is best done through the Trustee log. Debug mode can be enabled by setting the ``nvrc.smi.srs=1`` and ``RUST_LOG=debug`` kernel parameters in the Trustee environment.
Copy file name to clipboardExpand all lines: confidential-containers/confidential-containers-deploy.rst
+26-25Lines changed: 26 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,7 @@ The high-level workflow for configuring Confidential Containers is as follows:
21
21
This installs the Kata Containers runtime binaries, UVM images and kernels, and TEE-specific shims (such as ``kata-qemu-nvidia-gpu-snp`` or ``kata-qemu-nvidia-gpu-tdx``) onto the cluster's worker nodes.
22
22
23
23
#. Install the :ref:`NVIDIA GPU Operator configured for Confidential Containers <coco-install-gpu-operator>`.
24
-
This installs the NVIDIA GPU Operator components that are required to deploy Confidential Containers workloads.
24
+
This installs the NVIDIA GPU Operator components that are required to deploy GPU passthrough workloads.
25
25
26
26
After installation, you can change the :ref:`confidential computing mode <managing-confidential-computing-mode>` and :ref:`run a sample GPU workload <coco-run-sample-workload>` in a confidential container.
27
27
@@ -34,17 +34,7 @@ Prerequisites
34
34
=============
35
35
36
36
* Use a supported platform for Confidential Containers.
37
-
For more information, refer to :doc:`Supported Platforms <supported-platforms>`.
38
-
39
-
For additional information on node configuration, refer to the *Confidential Computing Deployment Guide* at the `Confidential Computing <https://docs.nvidia.com/confidential-computing>`_ website for information about supported NVIDIA GPUs, such as the NVIDIA Hopper H100.
40
-
Specifically refer to the `CC deployment guide for SEV-SNP <https://docs.nvidia.com/cc-deployment-guide-snp.pdf>`_ for setup specific to AMD SEV-SNP machines.
41
-
42
-
The following topics in the deployment guide apply to a cloud-native environment:
43
-
44
-
* Hardware selection and initial hardware configuration, such as BIOS settings.
45
-
* Host operating system selection, initial configuration, and validation.
46
-
47
-
When following the cloud-native sections in the deployment guide linked above, use Ubuntu 25.10 as the host OS with its default kernel version and configuration.
37
+
For more information on machine setup, refer to :doc:`Supported Platforms <supported-platforms>`.
48
38
49
39
* Ensure hosts are configured to enable hardware virtualization and Access Control Services (ACS).
50
40
With some AMD CPUs and BIOSes, ACS might be grouped under Advanced Error Reporting (AER).
@@ -63,16 +53,10 @@ Prerequisites
63
53
* A Kubernetes cluster with cluster administrator privileges.
64
54
65
55
* It is recommended that you configure your cluster's ``runtimeRequestTimeout`` in your `kubelet configuration <https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/>`_ with a higher timeout value than the two minute default.
66
-
You could set this value to 20 minutes to match the default values for the NVIDIA shim configurations in Kata Containers ``create_container_timeout`` and the agent's ``image_pull_timeout``.
67
-
68
-
Using the guest-pull mechanism, pulling large images may take a significant amount of time and may delay container start.
69
-
This can lead to Kubelet de-allocating your pod before it transitions from the container creating to the container running state.
70
-
71
-
The NVIDIA shim configurations in Kata Containers use a default ``create_container_timeout`` of 1200 seconds (20 minutes).
72
-
This controls the time the shim allows for a container to remain in container creating state.
73
-
74
-
If you need a timeout of more than 1200 seconds, you will also need to adjust the agent's ``image_pull_timeout`` value which controls the agent-side timeout for guest-image pull.
75
-
To do this, add the ``agent.image_pull_timeout`` kernel parameter to your shim configuration, or pass an explicit value in a pod annotation in the ``io.katacontainers.config.hypervisor.kernel_params="..."`` annotation.
56
+
You could set this value to 20 minutes to match the default values for the other image pull timeout values in Kata Containers.
57
+
58
+
Refer to the :ref:`Configure Image Pull Timeouts <configure-image-pull-timeouts>` section on this page for more details on adjusting the image pull timeout values.
59
+
76
60
77
61
* Enable ``KubeletPodResourcesGet`` on your cluster.
78
62
The NVIDIA GPU runtime classes use VFIO cold-plug, which requires the Kata runtime to query Kubelet's Pod Resources API to discover allocated GPU devices during sandbox creation.
@@ -124,7 +108,7 @@ Install the Kata Containers Helm Chart
124
108
--------------------------------------
125
109
126
110
Install the ``kata-deploy`` Helm chart.
127
-
The ``kata-deploy`` chart installs all required components from the Kata Containers project including the Kata Containers runtime binary, runtime configuration, UVM kernel and initrd that NVIDIA uses for Confidential Containers and native Kata containers.
111
+
The ``kata-deploy`` chart installs all required components from the Kata Containers project including the Kata Containers runtime binary, runtime configuration, UVM kernel, and images that NVIDIA uses for Confidential Containers and native Kata containers.
128
112
129
113
The minimum required version is 3.29.0.
130
114
@@ -492,12 +476,29 @@ Example output when CC mode is enabled:
492
476
493
477
The "nvidia.com/cc.mode.state" variable is either "off" or "on", with "off" meaning that mode state transition is still ongoing and "on" meaning mode state transition completed.
494
478
495
-
.. _additional-resources:
479
+
.. _configure-image-pull-timeouts:
480
+
481
+
Configure Image Pull Timeouts
482
+
-----------------------------
483
+
484
+
Using the guest-pull mechanism to securly manage images in your deployment scenarios means that pulling large images may take a significant amount of time and may delay container start.
485
+
This can lead to Kubelet de-allocating your pod before it transitions from the container creating to the container running state.
496
486
487
+
It is recommended that you configure your cluster's ``runtimeRequestTimeout`` in your `kubelet configuration <https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/>`_ with a higher timeout value than the two minute default.
488
+
You could set this value to 20 minutes (``20m``) to match the default values for the NVIDIA shim configurations in Kata Containers ``create_container_timeout`` and the agent's ``image_pull_timeout``.
489
+
490
+
The NVIDIA shim configurations in Kata Containers use a default ``create_container_timeout`` of 1200 seconds (20 minutes).
491
+
This controls the time the shim allows for a container to remain in container creating state.
492
+
493
+
If you need a timeout of more than 1200 seconds, you will also need to adjust the agent's ``image_pull_timeout`` value which controls the agent-side timeout for guest-image pull.
494
+
To do this, add the ``agent.image_pull_timeout`` kernel parameter to your shim configuration, or pass an explicit value in a pod annotation in the ``io.katacontainers.config.hypervisor.kernel_params:="..."`` annotation.
495
+
497
496
498
497
Next Steps
499
498
==========
500
499
501
-
* Refer to the :doc:`Attestation <attestation>` page for more information on configuringattestation.
500
+
* Refer to the :doc:`Attestation <attestation>` page for more information on configuring attestation.
501
+
* Enable pod security policy with Agent Policy.
502
+
Refer to the `Kata Containers Agent Policy documentation <https://github.com/kata-containers/kata-containers/blob/main/docs/how-to/how-to-use-the-kata-agent-policy.md>`_ for more information.
502
503
* Additional NVIDIA Confidential Computing documentation is available at https://docs.nvidia.com/confidential-computing.
503
504
* Licensing information is available on the :doc:`Licensing <licensing>` page.
Copy file name to clipboardExpand all lines: confidential-containers/overview.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -185,7 +185,7 @@ The following features are supported with Confidential Containers:
185
185
* Single‑GPU passthrough (one physical GPU per pod).
186
186
* Multi‑GPU passthrough on NVSwitch (NVLink) based HGX systems.
187
187
For example, NVIDIA HGX Hopper (SXM) and NVIDIA HGX Blackwell or NVIDIA HGX B200.
188
-
* Composite attestation using Trustee and the NVIDIA Remote Attestation Service NRAS.
188
+
* Composite attestation using Trustee and the NVIDIA Remote Attestation Service (NRAS).
189
189
* Generating Kata Agent Security Policies using the `genpolicy tool <https://github.com/kata-containers/kata-containers/blob/main/src/tools/genpolicy/README.md>`_.
190
190
* Use of signed sealed secrets.
191
191
* Access to authenticated registries for container image guest-pull.
For additional information on node configuration, refer to the *Confidential Computing Deployment Guide* at the `Confidential Computing <https://docs.nvidia.com/confidential-computing>`_ website for information about supported NVIDIA GPUs, such as the NVIDIA Hopper H100.
38
-
Specifically refer to the `CC deployment guide for SEV-SNP <https://docs.nvidia.com/cc-deployment-guide-snp.pdf>`_ for setup specific to AMD SEV-SNP machines.
37
+
For additional information on node configuration, refer to the `Confidential Computing Deployment Guide <https://docs.nvidia.com/cc-deployment-guide-tdx-snp.pdf>`_ for information about supported NVIDIA GPUs, such as the NVIDIA Hopper H100.
39
38
40
39
The following topics in the deployment guide apply to a cloud-native environment:
41
40
@@ -44,7 +43,10 @@ The following topics in the deployment guide apply to a cloud-native environment
44
43
45
44
When following the cloud-native sections in the deployment guide linked above, use Ubuntu 25.10 as the host OS with its default kernel version and configuration.
46
45
47
-
Also refer to the :doc:`Licensing <licensing>` page for more information on the licensing requirements for NVIDIA Confidential Computing capabilities.
46
+
For additional resources on machine setup:
47
+
48
+
* Refer to the `NVIDIA Trusted Computing Solutions website <https://docs.nvidia.com/nvtrust/index.html>`_.
49
+
* Refer to the :doc:`Licensing <licensing>` page for more information on the licensing requirements for NVIDIA Confidential Computing capabilities.
48
50
49
51
Supported Software Components
50
52
-----------------------------
@@ -69,13 +71,12 @@ Supported Software Components
69
71
* - Node Feature Discovery (NFD)
70
72
- v0.6.0
71
73
* - NVIDIA GPU Operator
72
-
- v25.10.0 and higher
73
-
* - Kata
74
+
- v26.3.0 and higher
75
+
* - Kata Containers
74
76
- 3.29 (w/ kata-deploy helm)
75
-
* - KBS protocol
77
+
* - Key Broker Service (KBS) protocol
76
78
- 0.4.0
77
-
* - Attestation Support
78
-
- Composite Attestation for CPU \+ GPU; integration with Trustee for local verifier.
0 commit comments