From e8c1d6eae5d03464953b20491047f7b3b6486b16 Mon Sep 17 00:00:00 2001
From: Will Price <will.price94@gmail.com>
Date: Fri, 11 Dec 2020 13:20:45 +0000
Subject: [PATCH 1/4] Add first draft of nvidia instructions

---
 source/running.rst | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/source/running.rst b/source/running.rst
index 1f8b078..615d5d2 100644
--- a/source/running.rst
+++ b/source/running.rst
@@ -119,6 +119,49 @@ Once the script has been edited to your liking, re-run Packer with:
 This will start a VM inside your cloud account, build the image and then shut down the VM.
 From that point on, any newly-started nodes will use the new image.
 
+AWS GPU nodes
++++++++++++++
+
+We need to adapt the default packer image as out of the box it does not
+contain any of the nvidia software necessary to interact with the GPU.
+
+The first step is to change the ``compute_image_extra.sh`` script to
+install the nvidia driver and CUDA toolchain:
+
+.. code-block:: shell-session
+
+   [citc@mgmt ~]$ cat >> compute_image_extra.sh <<EOF
+   sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
+   sudo dnf clean all
+   sudo dnf -y install kernel-devel
+   sudo dnf -y module install nvidia-driver:latest-dkms
+   sudo dnf -y install cuda
+   sudo dkms autoinstall
+   EOF
+
+We can't just rebuild our image straight away though since the CUDA
+toolchain is large and exceeds the base image size, consequently we need
+to change the packer configuration to create a larger image. Edit
+``/etc/citc/packer/all.pkr.hcl`` in your favourite editor, and add the
+following to the end of the ``source "amazon-ebs" "aws"`` section
+
+.. code-block:: 
+
+   launch_block_device_mappings {
+       device_name = "/dev/sda1"
+       volume_size =  40
+   }
+
+We can now re-build the image used to provision compute nodes:
+
+.. code-block:: shell-session
+
+   [citc@mgmt ~]$ sudo /usr/local/bin/run-packer
+
+Once this has succesfully completed you will be able to launch jobs on compute
+nodes with nvidia GPUs.
+
+
 Oracle GPU nodes
 ++++++++++++++++
 

From 857c64fe6d74e959bd7cea509effdad2c7b0c683 Mon Sep 17 00:00:00 2001
From: Will Price <will.price94@gmail.com>
Date: Fri, 11 Dec 2020 13:51:21 +0000
Subject: [PATCH 2/4] Decrease the image size used for building an AWS GPU AMI

40GB is unnecessarily large and causes the AMI build to take a long time
and increases the provisioning time of the compute nodes.
---
 source/running.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/source/running.rst b/source/running.rst
index 615d5d2..1f9ff0f 100644
--- a/source/running.rst
+++ b/source/running.rst
@@ -149,7 +149,7 @@ following to the end of the ``source "amazon-ebs" "aws"`` section
 
    launch_block_device_mappings {
        device_name = "/dev/sda1"
-       volume_size =  40
+       volume_size =  10
    }
 
 We can now re-build the image used to provision compute nodes:

From 6b3541009a230017a2734fd1a130c5af17420c66 Mon Sep 17 00:00:00 2001
From: Will Price <will.price94@gmail.com>
Date: Fri, 11 Dec 2020 15:10:58 +0000
Subject: [PATCH 3/4] Remove installation of CUDA for AWS GPU AMI

It is not necessary to install CUDA to run GPU accelerated things (e.g.
pytorch) we will leave this up to users to install CUDA as a module.
---
 source/running.rst | 1 -
 1 file changed, 1 deletion(-)

diff --git a/source/running.rst b/source/running.rst
index 1f9ff0f..db67c05 100644
--- a/source/running.rst
+++ b/source/running.rst
@@ -135,7 +135,6 @@ install the nvidia driver and CUDA toolchain:
    sudo dnf clean all
    sudo dnf -y install kernel-devel
    sudo dnf -y module install nvidia-driver:latest-dkms
-   sudo dnf -y install cuda
    sudo dkms autoinstall
    EOF
 

From bded4cbf3fbff662d21710dfe63a997321ce3b74 Mon Sep 17 00:00:00 2001
From: Will Price <will.price94@gmail.com>
Date: Fri, 11 Dec 2020 15:12:03 +0000
Subject: [PATCH 4/4] Remove instructions to modify AMI builder size

We have now updated the CitC config so that by default the builder size
is 20GB see https://github.com/clusterinthecloud/ansible/pull/93
---
 source/running.rst | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/source/running.rst b/source/running.rst
index db67c05..014f805 100644
--- a/source/running.rst
+++ b/source/running.rst
@@ -138,19 +138,6 @@ install the nvidia driver and CUDA toolchain:
    sudo dkms autoinstall
    EOF
 
-We can't just rebuild our image straight away though since the CUDA
-toolchain is large and exceeds the base image size, consequently we need
-to change the packer configuration to create a larger image. Edit
-``/etc/citc/packer/all.pkr.hcl`` in your favourite editor, and add the
-following to the end of the ``source "amazon-ebs" "aws"`` section
-
-.. code-block:: 
-
-   launch_block_device_mappings {
-       device_name = "/dev/sda1"
-       volume_size =  10
-   }
-
 We can now re-build the image used to provision compute nodes:
 
 .. code-block:: shell-session