Skip to content

Commit c4d12e3

Browse files
committed
Updates from review
Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
1 parent 85ec42e commit c4d12e3

File tree

4 files changed

+42
-40
lines changed

4 files changed

+42
-40
lines changed

confidential-containers/attestation.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,6 @@ Following the upstream Trustee documentation, add the following annotation to th
4040
4141
Now, the guest can be used with attestation. For more information on how to provision Trustee with resources and policies, refer to the `Trustee documentation <https://confidentialcontainers.org/docs/attestation/>`_.
4242

43-
During attestation, the GPU will be set to ready. As such, when running a workload that does attestation, it is not necessary to set the ``nvrc.smi.srs=1`` and ``RUST_LOG=debug`` kernel parameters.
43+
During attestation, the GPU will be set to ready. As such, when running a workload that does attestation, it is not necessary to set the ``nvrc.smi.srs=1`` kernel parameters.
4444

4545
If attestation does not succeed, debugging is best done through the Trustee log. Debug mode can be enabled by setting the ``nvrc.smi.srs=1`` and ``RUST_LOG=debug`` kernel parameters in the Trustee environment.

confidential-containers/confidential-containers-deploy.rst

Lines changed: 26 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ The high-level workflow for configuring Confidential Containers is as follows:
2121
This installs the Kata Containers runtime binaries, UVM images and kernels, and TEE-specific shims (such as ``kata-qemu-nvidia-gpu-snp`` or ``kata-qemu-nvidia-gpu-tdx``) onto the cluster's worker nodes.
2222

2323
#. Install the :ref:`NVIDIA GPU Operator configured for Confidential Containers <coco-install-gpu-operator>`.
24-
This installs the NVIDIA GPU Operator components that are required to deploy Confidential Containers workloads.
24+
This installs the NVIDIA GPU Operator components that are required to deploy GPU passthrough workloads.
2525

2626
After installation, you can change the :ref:`confidential computing mode <managing-confidential-computing-mode>` and :ref:`run a sample GPU workload <coco-run-sample-workload>` in a confidential container.
2727

@@ -34,17 +34,7 @@ Prerequisites
3434
=============
3535

3636
* Use a supported platform for Confidential Containers.
37-
For more information, refer to :doc:`Supported Platforms <supported-platforms>`.
38-
39-
For additional information on node configuration, refer to the *Confidential Computing Deployment Guide* at the `Confidential Computing <https://docs.nvidia.com/confidential-computing>`_ website for information about supported NVIDIA GPUs, such as the NVIDIA Hopper H100.
40-
Specifically refer to the `CC deployment guide for SEV-SNP <https://docs.nvidia.com/cc-deployment-guide-snp.pdf>`_ for setup specific to AMD SEV-SNP machines.
41-
42-
The following topics in the deployment guide apply to a cloud-native environment:
43-
44-
* Hardware selection and initial hardware configuration, such as BIOS settings.
45-
* Host operating system selection, initial configuration, and validation.
46-
47-
When following the cloud-native sections in the deployment guide linked above, use Ubuntu 25.10 as the host OS with its default kernel version and configuration.
37+
For more information on machine setup, refer to :doc:`Supported Platforms <supported-platforms>`.
4838

4939
* Ensure hosts are configured to enable hardware virtualization and Access Control Services (ACS).
5040
With some AMD CPUs and BIOSes, ACS might be grouped under Advanced Error Reporting (AER).
@@ -63,16 +53,10 @@ Prerequisites
6353
* A Kubernetes cluster with cluster administrator privileges.
6454

6555
* It is recommended that you configure your cluster's ``runtimeRequestTimeout`` in your `kubelet configuration <https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/>`_ with a higher timeout value than the two minute default.
66-
You could set this value to 20 minutes to match the default values for the NVIDIA shim configurations in Kata Containers ``create_container_timeout`` and the agent's ``image_pull_timeout``.
67-
68-
Using the guest-pull mechanism, pulling large images may take a significant amount of time and may delay container start.
69-
This can lead to Kubelet de-allocating your pod before it transitions from the container creating to the container running state.
70-
71-
The NVIDIA shim configurations in Kata Containers use a default ``create_container_timeout`` of 1200 seconds (20 minutes).
72-
This controls the time the shim allows for a container to remain in container creating state.
73-
74-
If you need a timeout of more than 1200 seconds, you will also need to adjust the agent's ``image_pull_timeout`` value which controls the agent-side timeout for guest-image pull.
75-
To do this, add the ``agent.image_pull_timeout`` kernel parameter to your shim configuration, or pass an explicit value in a pod annotation in the ``io.katacontainers.config.hypervisor.kernel_params="..."`` annotation.
56+
You could set this value to 20 minutes to match the default values for the other image pull timeout values in Kata Containers.
57+
58+
Refer to the :ref:`Configure Image Pull Timeouts <configure-image-pull-timeouts>` section on this page for more details on adjusting the image pull timeout values.
59+
7660

7761
* Enable ``KubeletPodResourcesGet`` on your cluster.
7862
The NVIDIA GPU runtime classes use VFIO cold-plug, which requires the Kata runtime to query Kubelet's Pod Resources API to discover allocated GPU devices during sandbox creation.
@@ -124,7 +108,7 @@ Install the Kata Containers Helm Chart
124108
--------------------------------------
125109

126110
Install the ``kata-deploy`` Helm chart.
127-
The ``kata-deploy`` chart installs all required components from the Kata Containers project including the Kata Containers runtime binary, runtime configuration, UVM kernel and initrd that NVIDIA uses for Confidential Containers and native Kata containers.
111+
The ``kata-deploy`` chart installs all required components from the Kata Containers project including the Kata Containers runtime binary, runtime configuration, UVM kernel, and images that NVIDIA uses for Confidential Containers and native Kata containers.
128112

129113
The minimum required version is 3.29.0.
130114

@@ -492,12 +476,29 @@ Example output when CC mode is enabled:
492476
493477
The "nvidia.com/cc.mode.state" variable is either "off" or "on", with "off" meaning that mode state transition is still ongoing and "on" meaning mode state transition completed.
494478

495-
.. _additional-resources:
479+
.. _configure-image-pull-timeouts:
480+
481+
Configure Image Pull Timeouts
482+
-----------------------------
483+
484+
Using the guest-pull mechanism to securly manage images in your deployment scenarios means that pulling large images may take a significant amount of time and may delay container start.
485+
This can lead to Kubelet de-allocating your pod before it transitions from the container creating to the container running state.
496486

487+
It is recommended that you configure your cluster's ``runtimeRequestTimeout`` in your `kubelet configuration <https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/>`_ with a higher timeout value than the two minute default.
488+
You could set this value to 20 minutes (``20m``) to match the default values for the NVIDIA shim configurations in Kata Containers ``create_container_timeout`` and the agent's ``image_pull_timeout``.
489+
490+
The NVIDIA shim configurations in Kata Containers use a default ``create_container_timeout`` of 1200 seconds (20 minutes).
491+
This controls the time the shim allows for a container to remain in container creating state.
492+
493+
If you need a timeout of more than 1200 seconds, you will also need to adjust the agent's ``image_pull_timeout`` value which controls the agent-side timeout for guest-image pull.
494+
To do this, add the ``agent.image_pull_timeout`` kernel parameter to your shim configuration, or pass an explicit value in a pod annotation in the ``io.katacontainers.config.hypervisor.kernel_params:="..."`` annotation.
495+
497496

498497
Next Steps
499498
==========
500499

501-
* Refer to the :doc:`Attestation <attestation>` page for more information on configuringattestation.
500+
* Refer to the :doc:`Attestation <attestation>` page for more information on configuring attestation.
501+
* Enable pod security policy with Agent Policy.
502+
Refer to the `Kata Containers Agent Policy documentation <https://github.com/kata-containers/kata-containers/blob/main/docs/how-to/how-to-use-the-kata-agent-policy.md>`_ for more information.
502503
* Additional NVIDIA Confidential Computing documentation is available at https://docs.nvidia.com/confidential-computing.
503504
* Licensing information is available on the :doc:`Licensing <licensing>` page.

confidential-containers/overview.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,7 @@ The following features are supported with Confidential Containers:
185185
* Single‑GPU passthrough (one physical GPU per pod).
186186
* Multi‑GPU passthrough on NVSwitch (NVLink) based HGX systems.
187187
For example, NVIDIA HGX Hopper (SXM) and NVIDIA HGX Blackwell or NVIDIA HGX B200.
188-
* Composite attestation using Trustee and the NVIDIA Remote Attestation Service NRAS.
188+
* Composite attestation using Trustee and the NVIDIA Remote Attestation Service (NRAS).
189189
* Generating Kata Agent Security Policies using the `genpolicy tool <https://github.com/kata-containers/kata-containers/blob/main/src/tools/genpolicy/README.md>`_.
190190
* Use of signed sealed secrets.
191191
* Access to authenticated registries for container image guest-pull.

confidential-containers/supported-platforms.rst

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Supported Platforms
22
====================
33

4-
Following are the platforms supported by Confidential Containers open Reference Architecture published by NVIDIA.
4+
Following are the platforms supported by the NVIDIA Confidential Containers Reference Architecture.
55

66
Supported Hardware Platform
77
---------------------------
@@ -18,24 +18,23 @@ NVIDIA GPUs
1818
* - NVIDIA Blackwell B200
1919
* - NVIDIA Blackwell RTX Pro 6000
2020

21-
CPU Platform
22-
------------
21+
CPU Platforms
22+
-------------
2323

2424
.. flat-table::
2525
:header-rows: 1
2626

2727
* - Category
2828
- Operating System
2929
- Kernel Version
30-
* - AMD Genoa/ Milan
30+
* - AMD Genoa / Milan
3131
- Ubuntu 25.10
3232
- 6.17+
33-
* - Intel ER/ GR
33+
* - Intel Emerald Rapids (ER) / Granite Rapids (GR)
3434
- Ubuntu 25.10
3535
- 6.17+
3636

37-
For additional information on node configuration, refer to the *Confidential Computing Deployment Guide* at the `Confidential Computing <https://docs.nvidia.com/confidential-computing>`_ website for information about supported NVIDIA GPUs, such as the NVIDIA Hopper H100.
38-
Specifically refer to the `CC deployment guide for SEV-SNP <https://docs.nvidia.com/cc-deployment-guide-snp.pdf>`_ for setup specific to AMD SEV-SNP machines.
37+
For additional information on node configuration, refer to the `Confidential Computing Deployment Guide <https://docs.nvidia.com/cc-deployment-guide-tdx-snp.pdf>`_ for information about supported NVIDIA GPUs, such as the NVIDIA Hopper H100.
3938

4039
The following topics in the deployment guide apply to a cloud-native environment:
4140

@@ -44,7 +43,10 @@ The following topics in the deployment guide apply to a cloud-native environment
4443

4544
When following the cloud-native sections in the deployment guide linked above, use Ubuntu 25.10 as the host OS with its default kernel version and configuration.
4645

47-
Also refer to the :doc:`Licensing <licensing>` page for more information on the licensing requirements for NVIDIA Confidential Computing capabilities.
46+
For additional resources on machine setup:
47+
48+
* Refer to the `NVIDIA Trusted Computing Solutions website <https://docs.nvidia.com/nvtrust/index.html>`_.
49+
* Refer to the :doc:`Licensing <licensing>` page for more information on the licensing requirements for NVIDIA Confidential Computing capabilities.
4850

4951
Supported Software Components
5052
-----------------------------
@@ -69,13 +71,12 @@ Supported Software Components
6971
* - Node Feature Discovery (NFD)
7072
- v0.6.0
7173
* - NVIDIA GPU Operator
72-
- v25.10.0 and higher
73-
* - Kata
74+
- v26.3.0 and higher
75+
* - Kata Containers
7476
- 3.29 (w/ kata-deploy helm)
75-
* - KBS protocol
77+
* - Key Broker Service (KBS) protocol
7678
- 0.4.0
77-
* - Attestation Support
78-
- Composite Attestation for CPU \+ GPU; integration with Trustee for local verifier.
79+
7980

8081
.. _coco-supported-platforms:
8182

0 commit comments

Comments
 (0)