# GPU Plugins ## 1. Introduction UK8S implements GPU sharing using the open-source component HAMi, including: - Video memory partitioning - Computing power partitioning - Error isolation ## 2. Deployment ⚠️ Check that the system meets the following requirements before installation: - **NVIDIA drivers**: Version ≥ 440 - **Kubernetes version**: ≥ 1.16 - **glibc**: Version ≥ 2.17 and < 2.3.1 ### 2.1 Label GPU nodes to be scheduled by HAMi: ``` kubectl label nodes xxx.xxx.xxx.xxx gpu=on ``` ### 2.2 Helm installation ⚠️ Helm version requirement: ≥ 3.0. Check the Helm version after installation: ``` helm version ``` ### 2.3 Obtain the chart Download and extract the chart package: ```bash wget https://docs.ucloud.cn/uk8s/yaml/gpu-share/hami.tar.gz tar -xzf hami.tar.gz rm hami.tar.gz ``` ### 2.4 Install HAMi ```bash helm install hami ./hami -n kube-system ``` Check the installation result: ``` kubectl get po -A ``` When the installation is successful, the output shows the pod status: ``` hami-device-plugin-l4jj4 2/2 Running 0 45s hami-scheduler-59c7f4b6ff-7g565 2/2 Running 0 3m54s ``` ### 2.5 Usage Create a Pod via a YAML file. Note that in resources.limits, besides the traditional nvidia.com/gpu, add nvidia.com/gpumem and nvidia.com/gpucores to specify the video memory size and GPU computing power. - **nvidia.com/gpu**: Number of vGPUs requested (e.g., 1). - **nvidia.com/gpumem**: Requested video memory size (e.g., 3000M). - **nvidia.com/gpumem-percentage**: Video memory request percentage (e.g., 50 for 50% of video memory). - **nvidia.com/gpucores**: Computing power percentage of each vGPU relative to the actual GPU. - **nvidia.com/priority**: Priority level (0 for high, 1 for low; default is 1). **High-priority tasks**: If sharing a GPU node with other high-priority tasks, their resource utilization is not limited by resourceCores. In other words, when only high-priority tasks exist on a GPU node, they can utilize all available resources on the node. **Low-priority tasks**: If a low-priority task is the only one occupying the GPU, its resource utilization is also not limited by resourceCores. This means low-priority tasks can utilize all available node resources when no other tasks share the GPU. ## 3. Monitoring ### 3.1 If the monitoring center is not enabled Enable the monitoring center. ### 3.2 If the monitoring center is already enabled > ⚠️ If the monitoring center version is 1.0.6 > version >= 1.0.5-3 or version > 1.0.6, the following deployment files are installed by default. Skip the deployment in 3.2.1. #### 3.2.1 Deploy Dcgm-Exporter ``` kubectl get po -A ``` #### 3.2.2 Add monitoring targets in UK8S In the UK8S monitoring center's monitoring targets, add monitoring targets in UK8S as shown in the diagram. ### 3.3 Grafana monitoring After adding monitoring targets in UK8S and logging into Grafana: - Download the JSON file. - Select the '+' icon in the left navigation bar. - Click Import. - Paste the downloaded JSON content into the second input box. - Click Load. The following is a schematic diagram of Grafana monitoring for HAMi: ![img](https://cdn.udelivrs.com/2025/06/7e6e6763dcab208fe12d71a759c52aa4_1751006643838.png)
### 3.4 Monitoring indicators In addition to the indicators from the DCGM plugin (specifically described in the GPU monitoring documentation), HAMi also supports the following indicators: - **Device_memory_desc_of_container**: Real-time device memory usage in the container, used to monitor the memory consumption of devices (e.g., GPU) for each container. - **Device_utilization_desc_of_container**: Real-time device utilization rate of the container, used to monitor the usage of internal devices (e.g., GPU workload). - **HostCoreUtilization**: Real-time core utilization rate of the host, used to monitor the CPU core usage of the host, including multiple workloads from containers or virtualization. - **HostGPUMemoryUsage**: Real-time memory usage of GPU devices on the host, used to monitor memory consumption by containers or tasks using GPUs on the host. - **vGPU_device_memory_limit_in_bytes**: Memory limit for a container's vGPU (virtual GPU) device, in bytes. This is the maximum GPU memory the container can use. - **vGPU_device_memory_usage_in_bytes**: Actual memory usage of a container's vGPU device, in bytes, used to monitor vGPU memory consumption of the container. ## 4. Testing ### 4.1 Node GPU Resource Verification In the test environment, the number of physical GPUs is 1. However, since HAMi's default configuration sets the expansion ratio to 10x, theoretically, 1 * 10 = 10 GPUs can be viewed on the node. #### Verification Command Execute the following command to get the Node's GPU resource quantity: ``` kubectl get node xxx -oyaml | grep capacity -A 8 ``` Sample Output: ```yaml capacity: cpu: "16" ephemeral-storage: 102687672Ki hugepages-1Gi: "0" hugepages-2Mi: "0" memory: 32689308Ki nvidia.com/gpu: "10" pods: "110" ucloud.cn/uni: "16" ``` ### 4.2 GPU Memory Test The configuration of gpu-mem-test.yaml is as follows: ```yaml apiVersion: v1 kind: Pod metadata: name: gpu-pod spec: containers: - name: ubuntu-container image: uhub.service.ucloud.cn/library/ubuntu:trusty-20160412 command: ["bash", "-c", "sleep 86400"] resources: limits: nvidia.com/gpu: 1 # Request 1 vGPU nvidia.com/gpumem: 3000 # Allocate 3000MiB memory per vGPU (optional, integer) nvidia.com/gpucores: 30 # Allocate 30% computing power per vGPU (optional, integer) ``` Create the Pod using this configuration. It should start normally. Verification steps: ``` kubectl get po ``` Expected output: ``` NAME READY STATUS RESTARTS AGE gpu-pod 1/1 Running 0 48s ``` Enter the Pod and execute the nvidia-smi command to view GPU information, and verify that the video memory limit is 3000M as requested in the resources. ### 4.3 Multiple Pods Single GPU Utilization Test #### Test Command First, create hami-npod-1gpu.yml with the following content. Replace the GPU node IP to specify the target GPU node. This configuration creates n Pods (controlled by modifying the replica count in the YAML). Manually delete them after testing. ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: hami-npod-1gpu spec: replicas: 3 # Create three identical Pods (adjust as needed) selector: matchLabels: app: pytorch template: metadata: labels: app: pytorch spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - xxx.xxx.xxx.xxx # Replace with actual GPU node IP containers: - name: pytorch-container image: uhub.service.ucloud.cn/gpu-share/gpu_pytorch_test:latest command: ["/bin/sh", "-c"] args: ["cd /app/pytorch_code && python3 2.py"] resources: limits: nvidia.com/gpu: 1 nvidia.com/gpumem: 3000 nvidia.com/gpucores: 25 ``` #### Test Results Monitoring shows that the long-term average computing power consumption of the three Pods aligns with the configured limits, though short-term fluctuations may occur.