diff --git "a/content/zh/blog/2021-10-20-TKEStackGPU\351\203\250\347\275\262\345\222\214\344\275\277\347\224\250/images/gpu-1.png" "b/content/zh/blog/2021-10-20-TKEStackGPU\351\203\250\347\275\262\345\222\214\344\275\277\347\224\250/images/gpu-1.png" new file mode 100644 index 0000000..159dcc1 Binary files /dev/null and "b/content/zh/blog/2021-10-20-TKEStackGPU\351\203\250\347\275\262\345\222\214\344\275\277\347\224\250/images/gpu-1.png" differ diff --git "a/content/zh/blog/2021-10-20-TKEStackGPU\351\203\250\347\275\262\345\222\214\344\275\277\347\224\250/images/gpu-2.png" "b/content/zh/blog/2021-10-20-TKEStackGPU\351\203\250\347\275\262\345\222\214\344\275\277\347\224\250/images/gpu-2.png" new file mode 100644 index 0000000..d994921 Binary files /dev/null and "b/content/zh/blog/2021-10-20-TKEStackGPU\351\203\250\347\275\262\345\222\214\344\275\277\347\224\250/images/gpu-2.png" differ diff --git "a/content/zh/blog/2021-10-20-TKEStackGPU\351\203\250\347\275\262\345\222\214\344\275\277\347\224\250/images/gpu-3.png" "b/content/zh/blog/2021-10-20-TKEStackGPU\351\203\250\347\275\262\345\222\214\344\275\277\347\224\250/images/gpu-3.png" new file mode 100644 index 0000000..b92cd8c Binary files /dev/null and "b/content/zh/blog/2021-10-20-TKEStackGPU\351\203\250\347\275\262\345\222\214\344\275\277\347\224\250/images/gpu-3.png" differ diff --git "a/content/zh/blog/2021-10-20-TKEStackGPU\351\203\250\347\275\262\345\222\214\344\275\277\347\224\250/images/gpu-4.png" "b/content/zh/blog/2021-10-20-TKEStackGPU\351\203\250\347\275\262\345\222\214\344\275\277\347\224\250/images/gpu-4.png" new file mode 100644 index 0000000..533438b Binary files /dev/null and "b/content/zh/blog/2021-10-20-TKEStackGPU\351\203\250\347\275\262\345\222\214\344\275\277\347\224\250/images/gpu-4.png" differ diff --git "a/content/zh/blog/2021-10-20-TKEStackGPU\351\203\250\347\275\262\345\222\214\344\275\277\347\224\250/images/gpu-5.png" "b/content/zh/blog/2021-10-20-TKEStackGPU\351\203\250\347\275\262\345\222\214\344\275\277\347\224\250/images/gpu-5.png" new file mode 100644 index 0000000..c7f86e0 Binary files /dev/null and "b/content/zh/blog/2021-10-20-TKEStackGPU\351\203\250\347\275\262\345\222\214\344\275\277\347\224\250/images/gpu-5.png" differ diff --git "a/content/zh/blog/2021-10-20-TKEStackGPU\351\203\250\347\275\262\345\222\214\344\275\277\347\224\250/index.md" "b/content/zh/blog/2021-10-20-TKEStackGPU\351\203\250\347\275\262\345\222\214\344\275\277\347\224\250/index.md" new file mode 100644 index 0000000..2fd656f --- /dev/null +++ "b/content/zh/blog/2021-10-20-TKEStackGPU\351\203\250\347\275\262\345\222\214\344\275\277\347\224\250/index.md" @@ -0,0 +1,251 @@ +--- +layout: blog +title: "TKEStack GPU 的部署和使用" +date: 2021-10-20 +# slug of this blog url +slug: TKEStack GPU deploy and usage +--- + +**Author**: LisTai + +# 安装使用步骤 + +## 安装使用步骤 + +### 限制条件 + +* 用户在安装使用GPU时,要求集群内必须包含GPU机型节点 +* 该组件基于 Kubernetes DevicePlugin 实现,只能运行在支持 DevicePlugin 的 kubernetes版本(Kubernetes 1.10 之上的版本) +* GPU-Manager 将每张 GPU 卡视为一个有100个单位的资源:当前仅支持 0-1 的小数张卡,如 20、35、50;以及正整数张卡,如200、500等;不支持类似150、250的资源请求;显存资源是以 256MiB 为最小的一个单位的分配显存 + +### TKEStack 支持的 GPU 类型 +TKEStack目前支持两种GPU类型: +* vGPU:虚拟GPU类型(Virtual GPU),当选择安装此类型的GPU时,平台会自动安装组件[GPUManager](https://github.com/tkestack/gpu-manager),对应在集群中部署的kubernetes资源对象如下: + +| kubernetes 对象名称 | 类型 | 建议预留资源 | 所属 Namespaces | +| --------------------- | ---------- | ---------------------- | --------------- | +| gpu-manager-daemonset | DaemonSet | 每节点1核 CPU, 1Gi内存 | kube-system | +| gpu-quota-admission | Deployment | 1核 CPU, 1Gi内存 | kube-system | +* pGPU: 物理GPU类型(Physical GPU),当选择安装此类型的GPU时,平台会自动安装组件[Nvidia-k8s-device-plugin](https://github.com/NVIDIA/k8s-device-plugin),对应的在集群中部署的kubernetes资源对象如下: + +| kubernetes 对象名称 | 类型 | 建议预留资源 | 所属 Namespaces | +| --------------------- | ---------- | ---------------------- | --------------- | +| nvidia-device-plugin-daemonset | DaemonSet | 每节点1核 CPU, 1Gi内存 | kube-system | + +### 安装步骤 +#### 安装使用 vGPU + +用户在新建独立集群时,勾选GPU选项,在下拉选项中选择 vGPU,如下图所示: + +![](../images/gpu-1.png) + +目标机器部分,勾选GPU选项,平台会自动为节点安装GPU驱动,如下图所示: + +![](../images/gpu-2.png) + +等待新建独立集群处于running状态后,可以通过登陆到集群节点通过`kubectl`查看在集群`kube-system`命名空间中部署了`gpu-manager`和`gpu-quota-admission`两个pod: +``` +# kubectl get pods -n kube-system | grep gpu +gpu-manager-daemonset-2vvbm 1/1 Running 0 2m13s +gpu-quota-admission-76cfff49b6-vdh42 1/1 Running 0 3m2s +``` +#### 创建使用 vGPU 的工作负载 + +TKEStack创建使用GPU的工作负载支持两种方式:第一种是通过TKEStack前端页面创建,第二种是通过后台命令行的方式创建。 + +1、 通过前端控制台创建 + +在安装了 GPU-Manager 的集群中,创建工作负载时可以设置GPU限制,如下图所示: + +> 注意: +> +> 1. 卡数只能填写 0.1 到 1 之间的两位小数或者是所有自然数,例如:0、0.3、0.56、0.7、0.9、1、6、34,不支持 1.5、2.7、3.54 +> 2. 显存只能填写自然数 n,负载使用的显存为 n*256MiB + +![](../images/gpu-3.png) + +2、 通过后台命令行创建 + +使用 YAML 创建使用 GPU 的工作负载,需要在 YAML 文件中为容器设置 GPU 的使用资源。 + +* CPU 资源需要在 resource 上填写`tencent.com/vcuda-core` +* 显存资源需要在 resource 上填写`tencent.com/vcuda-memory` + +如下所示:创建一个使用 0.3 张卡、5GiB 显存的nginx应用(5GiB = 20*256MB) +``` +apiVersion: v1 +kind: Pod +metadata: + name: nginx +spec: + containers: + - name: nginx + image: nginx + imagePullPolicy: IfNotPresent + resources: + limits: + tencent.com/vcuda-core: 30 + tencent.com/vcuda-memory: 20 + requests: + tencent.com/vcuda-core: 30 + tencent.com/vcuda-memory: 20 +``` +``` +# kubectl create -f nginx.yaml +pod/nginx created +``` + +> 注意: +> +> - 如果pod在创建过程中出现CrashLoopBackOff 的状态,且error log如下所示: +> ``` +> failed to create containerd task: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: open failed: /sys/fs/cgroup/devices/system.slice/containerd.service/kubepods-besteffort-podfd3b355a_665c_4c95_8e7f_61fd2111689f.slice/devices.allow: no such file or directory: unknown +> ``` +> 需要在GPU主机上手动安装libnvidia-container-tools这个组件,首先需要添加repo源:添加repo源, 添加repo源后执行如下命令: +> ``` +> # yum install libnvidia-container-tools +> ``` +> - 如果pod在创建过程中出现如下error log: +> ``` +> failed to generate spec: lstat /dev/nvidia-uvm: no such file or directory +> ``` +> 需要在pod所在的主机上手动mount这个设备文件: +> ``` +> # nvidia-modprobe -u -c=0 +> ``` +查看创建的应用状态: +``` +# kubectl get pods +NAME READY STATUS RESTARTS AGE +nginx 1/1 Running 0 3s +``` +查看GPU监控数据(需要提前安装socat): +``` +# yum install socat +``` +``` +# kubectl port-forward svc/gpu-manager-metric -n kube-system 5678:5678 +``` +``` +# curl http://127.0.0.1:5678/metric +``` +结果如下 +``` +Handling connection for 5678 +# HELP container_gpu_memory_total gpu memory usage in MiB +# TYPE container_gpu_memory_total gauge +container_gpu_memory_total{container_name="nginx",gpu_memory="gpu0",namespace="default",node="10.0.0.127",pod_name="nginx"} 0 +container_gpu_memory_total{container_name="nginx",gpu_memory="total",namespace="default",node="10.0.0.127",pod_name="nginx"} 0 +# HELP container_gpu_utilization gpu utilization +# TYPE container_gpu_utilization gauge +container_gpu_utilization{container_name="nginx",gpu="gpu0",namespace="default",node="10.0.0.127",pod_name="nginx"} 0 +container_gpu_utilization{container_name="nginx",gpu="total",namespace="default",node="10.0.0.127",pod_name="nginx"} 0 +# HELP container_request_gpu_memory request of gpu memory in MiB +# TYPE container_request_gpu_memory gauge +container_request_gpu_memory{container_name="nginx",namespace="default",node="10.0.0.127",pod_name="nginx",req_of_gpu_memory="total"} 5120 +# HELP container_request_gpu_utilization request of gpu utilization +# TYPE container_request_gpu_utilization gauge +container_request_gpu_utilization{container_name="nginx",namespace="default",node="10.0.0.127",pod_name="nginx",req_of_gpu="total"} 0.30000001192092896 +``` + +#### 安装使用 pGPU + +用户在新建独立集群时,勾选GPU选项,在下拉选项中选择pGPU,如下图所示: + +![](../images/gpu-4.png) + +目标机器部分,勾选GPU选项,平台会自动为节点安装GPU驱动,如下图所示: + +![](../images/gpu-2.png) + +等待新建独立集群处于running状态后,可以通过登陆到集群节点通过`kubectl`查看到,在集群`kube-system`命名空间中部署了`nvidia-device-plugin`pod: +``` +# kubectl get pods -n kube-system | grep nvidia +nvidia-device-plugin-daemonset-frdh2 1/1 Running 0 64s +``` +通过查看节点信息可以看到GPU资源和使用情况: +``` +# kubectl describe nodes +``` +显示信息如下: +``` +Capacity: + cpu: 8 + ephemeral-storage: 154685884Ki + hugepages-1Gi: 0 + hugepages-2Mi: 0 + memory: 32779608Ki + nvidia.com/gpu: 1 + pods: 256 +Allocatable: + cpu: 7800m + ephemeral-storage: 142558510459 + hugepages-1Gi: 0 + hugepages-2Mi: 0 + memory: 31653208Ki + nvidia.com/gpu: 1 + pods: 256 +``` +#### 创建使用vGPU的工作负载 + +- 通过控制台创建方式参考vGPU的创建步骤 + +- 通过命令行创建 + +通过如下YAML创建使用1个GPU的工作负载: + +``` +apiVersion: v1 +kind: Pod +metadata: + name: gpu-operator-test +spec: + restartPolicy: OnFailure + containers: + - name: cuda-vector-add + image: "tkestack/cuda-vector-add:v0.1" + resources: + limits: + nvidia.com/gpu: 1 +``` +``` +# kubectl create -f pod.yaml +pod/gpu-operator-test created +``` +查看pod的状态和log: +``` +# kubectl get pods +NAME READY STATUS RESTARTS AGE +gpu-operator-test 0/1 Completed 0 4m51s +``` +``` +# kubectl logs gpu-operator-test +[Vector addition of 50000 elements] +Copy input data from the host memory to the CUDA device +CUDA kernel launch with 196 blocks of 256 threads +Copy output data from the CUDA device to the host memory +Test PASSED +Done +``` +通过再次查看节点信息可以看到GPU已经被分配使用: +``` +kubectl describe nodes +``` +``` +Allocated resources: + (Total limits may be over 100 percent, i.e., overcommitted.) + Resource Requests Limits + -------- -------- ------ + cpu 1137m (14%) 282m (3%) + memory 644Mi (2%) 1000Mi (3%) + ephemeral-storage 0 (0%) 0 (0%) + hugepages-1Gi 0 (0%) 0 (0%) + hugepages-2Mi 0 (0%) 0 (0%) + nvidia.com/gpu 1 1 +``` + +#### 添加节点使用GPU +在添加节点上使用GPU资源,需要在创建添加节点时勾选GPU选项,如下图所示: + +![](../images/gpu-5.png) +