# GPU Plugins

## 1. Introduction

UK8S implements GPU sharing using the open-source component HAMi, including:

- Video memory partitioning
- Computing power partitioning
- Error isolation

## 2. Deployment

⚠️ Check that the system meets the following requirements before installation:

- **NVIDIA drivers**: Version ≥ 440
- **Kubernetes version**: ≥ 1.16
- **glibc**: Version ≥ 2.17 and < 2.3.1

### 2.1 Label GPU nodes to be scheduled by HAMi:

```
kubectl label nodes xxx.xxx.xxx.xxx gpu=on

```

### 2.2 Helm installation

⚠️ Helm version requirement: ≥ 3.0. Check the Helm version after installation:

```
helm version

```

### 2.3 Obtain the chart

Download and extract the chart package:

```bash
wget https://docs.ucloud.cn/uk8s/yaml/gpu-share/hami.tar.gz
tar -xzf hami.tar.gz
rm hami.tar.gz

```

### 2.4 Install HAMi

```bash
helm install hami ./hami -n kube-system

```

Check the installation result:

```
kubectl get po -A
```

When the installation is successful, the output shows the pod status:

```
hami-device-plugin-l4jj4                2/2     Running   0          45s
hami-scheduler-59c7f4b6ff-7g565         2/2     Running   0          3m54s

```

### 2.5 Usage

Create a Pod via a YAML file. Note that in resources.limits, besides the traditional nvidia.com/gpu, add nvidia.com/gpumem and nvidia.com/gpucores to specify the video memory size and GPU computing power.

- **nvidia.com/gpu**: Number of vGPUs requested (e.g., 1).
- **nvidia.com/gpumem**: Requested video memory size (e.g., 3000M).
- **nvidia.com/gpumem-percentage**: Video memory request percentage (e.g., 50 for 50% of video memory).
- **nvidia.com/gpucores**: Computing power percentage of each vGPU relative to the actual GPU.
- **nvidia.com/priority**: Priority level (0 for high, 1 for low; default is 1).

**High-priority tasks**: If sharing a GPU node with other high-priority tasks, their resource utilization is not limited by resourceCores. In other words, when only high-priority tasks exist on a GPU node, they can utilize all available resources on the node.

**Low-priority tasks**: If a low-priority task is the only one occupying the GPU, its resource utilization is also not limited by resourceCores. This means low-priority tasks can utilize all available node resources when no other tasks share the GPU.

## 3. Monitoring

### 3.1 If the monitoring center is not enabled

Enable the monitoring center.

### 3.2 If the monitoring center is already enabled

> ⚠️ If the monitoring center version is 1.0.6 > version >= 1.0.5-3 or version > 1.0.6, the following deployment files are installed by default. Skip the deployment in 3.2.1.

#### 3.2.1 Deploy Dcgm-Exporter

```
kubectl get po -A
```

#### 3.2.2 Add monitoring targets in UK8S

In the UK8S monitoring center's monitoring targets, add monitoring targets in UK8S as shown in the diagram.

### 3.3 Grafana monitoring

After adding monitoring targets in UK8S and logging into Grafana:

- Download the JSON file.
- Select the '+' icon in the left navigation bar.
- Click Import.
- Paste the downloaded JSON content into the second input box.
- Click Load.

The following is a schematic diagram of Grafana monitoring for HAMi:

![img](https://cdn.udelivrs.com/2025/06/7e6e6763dcab208fe12d71a759c52aa4_1751006643838.png)<br>

### 3.4 Monitoring indicators

In addition to the indicators from the DCGM plugin (specifically described in the GPU monitoring documentation), HAMi also supports the following indicators:

- **Device_memory_desc_of_container**: Real-time device memory usage in the container, used to monitor the memory consumption of devices (e.g., GPU) for each container.
- **Device_utilization_desc_of_container**: Real-time device utilization rate of the container, used to monitor the usage of internal devices (e.g., GPU workload).
- **HostCoreUtilization**: Real-time core utilization rate of the host, used to monitor the CPU core usage of the host, including multiple workloads from containers or virtualization.
- **HostGPUMemoryUsage**: Real-time memory usage of GPU devices on the host, used to monitor memory consumption by containers or tasks using GPUs on the host.
- **vGPU_device_memory_limit_in_bytes**: Memory limit for a container's vGPU (virtual GPU) device, in bytes. This is the maximum GPU memory the container can use.
- **vGPU_device_memory_usage_in_bytes**: Actual memory usage of a container's vGPU device, in bytes, used to monitor vGPU memory consumption of the container.

## 4. Testing

### 4.1 Node GPU Resource Verification

In the test environment, the number of physical GPUs is 1. However, since HAMi's default configuration sets the expansion ratio to 10x, theoretically, 1 * 10 = 10 GPUs can be viewed on the node.

#### Verification Command

Execute the following command to get the Node's GPU resource quantity:

```
kubectl get node xxx -oyaml | grep capacity -A 8

```

Sample Output:

```yaml
capacity:
  cpu: "16"
  ephemeral-storage: 102687672Ki
  hugepages-1Gi: "0"
  hugepages-2Mi: "0"
  memory: 32689308Ki
  nvidia.com/gpu: "10"
  pods: "110"
  ucloud.cn/uni: "16"

```

### 4.2 GPU Memory Test

The configuration of gpu-mem-test.yaml is as follows:

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
    - name: ubuntu-container
      image: uhub.service.ucloud.cn/library/ubuntu:trusty-20160412
      command: ["bash", "-c", "sleep 86400"]
      resources:
        limits:
          nvidia.com/gpu: 1       # Request 1 vGPU
          nvidia.com/gpumem: 3000 # Allocate 3000MiB memory per vGPU (optional, integer)
          nvidia.com/gpucores: 30 # Allocate 30% computing power per vGPU (optional, integer)
```

Create the Pod using this configuration. It should start normally. Verification steps:

```
kubectl get po

```

Expected output:

```
NAME      READY   STATUS    RESTARTS   AGE
gpu-pod   1/1     Running   0          48s

```

Enter the Pod and execute the nvidia-smi command to view GPU information, and verify that the video memory limit is 3000M as requested in the resources.

### 4.3 Multiple Pods Single GPU Utilization Test

#### Test Command

First, create hami-npod-1gpu.yml with the following content. Replace the GPU node IP to specify the target GPU node.

This configuration creates n Pods (controlled by modifying the replica count in the YAML). Manually delete them after testing.

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hami-npod-1gpu
spec:
  replicas: 3  # Create three identical Pods (adjust as needed)
  selector:
    matchLabels:
      app: pytorch
  template:
    metadata:
      labels:
        app: pytorch
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                - xxx.xxx.xxx.xxx # Replace with actual GPU node IP
      containers:
      - name: pytorch-container
        image: uhub.service.ucloud.cn/gpu-share/gpu_pytorch_test:latest
        command: ["/bin/sh", "-c"]
        args: ["cd /app/pytorch_code && python3 2.py"]
        resources:
          limits:
            nvidia.com/gpu: 1
            nvidia.com/gpumem: 3000
            nvidia.com/gpucores: 25
```

#### Test Results

Monitoring shows that the long-term average computing power consumption of the three Pods aligns with the configured limits, though short-term fluctuations may occur.
