RKE builds K8S and deploys Ali GPU Sharing to realize GPU sharing

statement

All content in this article is based on Docker. The k8s cluster is built by the rke tool provided by rancher (hereinafter referred to as rancher version k8s, which is also applicable to clusters built using RancherUI). The GPU sharing technology uses Ali GPU Sharing. This article that uses other container technologies may not be applicable, or some of the k8s built using kubeadm may not be applicable. When deploying GPU Sharing, k8s built by kubeadm has a lot of information available on the Internet and official website information, while the rancher version of k8s It is different from native kubernetes, and some specific instructions will be included later.

Preparation

++If you already have a Rancher version of the K8S cluster, start directly from "k8s cluster construction"++

Install docker and nvidia-docker2

To install Docker, directly execute the official installation script installation:

curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun

After the installation is complete, use Docker versionView Version. The current installation is generally version 20.10. If you can successfully query the version, the installation is successful.

After installation, execute the following command to set docker to start automatically:

systemctl start docker
systemctl enable docker

For nvidia-docker2 installation, please refer to the previous article "Ubuntu Realizes K8S Scheduling NVIDIA GPU Notes" to install nvidia-docker installation.
After the installation is complete, we need to modify the default runtime of docker to support nvidia scheduling, and edit /etc/docker/daemon.jsonthe configuration (if it does not exist, create a new one):

{
    
    
    "runtimes": {
    
    
        "nvidia": {
    
    
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia",
    "exec-opts": ["native.cgroupdriver=systemd"]
}

Among them: runtimesthe parameter is to define the runtime, here a runtime environment named nvidia is defined, default-runtimewhich means that the specified default runtime is just defined nvidia.

The effect of the last sentence "exec-opts": ["native.cgroupdriver=systemd"]is that because the file driver of K8S is cgroupfs, and the file driver of docker is systemd, the difference between the two will cause the image to fail to start, so you need to specify the file driver of K8S as systemd.

GPU driver

For GPU driver installation and simple scheduling, please refer to the NVIDIA driver section in the previous article "Ubuntu Realizes K8S Scheduling NVIDIA GPU Notes".

Download the rke tool

Before downloading rke, you need to determine whether you need a specified version of the k8s cluster or a specified version of rancher. In the rancher repository in github, you can view the supported k8s version and the required rke tool version in the corresponding rancher version. On the contrary , you have determined the required k8s version, and you can reverse the rancher version and rke version.

My requirement here is to use rancher2.3.8, the corresponding rke version is 1.0.9, and the k8s version corresponding to rke-1.0.9 is as follows:
image.png
here I use the version of k8s v1.17.6-rancher2-1.

After determining the versions of various tools, download the corresponding tools. The links here are changed according to the needs and are not static:

# 下载rke
wget https://github.com/rancher/rke/releases/download/v1.0.9/rke_linux-amd64

# 配置rke
mv rke_linux-amd64 rke
chmod 755 rke
mv rke /usr/local/bin/

#验证
rke --version

k8s cluster construction

rke needs to create a cluster based on the yaml file. I use a single node here. First, create the rke cluster file directory:

mkdir rke-config
cd rke-config

The following is the content of the rke cluster file, use vim to edit, vim cluster.ymlif you have an existing cluster, edit your cluster yaml, and add all the contents of the scheduler node under services (also applicable to directly using rancher to create The cluster is just operated through rancher-ui):

nodes:
  - address: 192.168.1.102
    user: root
    role:
      - controlplane
      - etcd
      - worker
services: 
    scheduler:
      extra_args:
        address: 0.0.0.0
        kubeconfig: /etc/kubernetes/ssl/kubecfg-kube-scheduler.yaml
        leader-elect: 'true'
        policy-config-file: /etc/kubernetes/ssl/scheduler-policy-config.json
        profiling: 'false'
        v: '2'
kubernetes_version: "v1.17.6-rancher2-1"
cluster_name: "aliyun"

The nodes configuration node is the host that configures the cluster, here is a single node, so the role needs to include three types: control, etcd, and worker. The scheduler node under services is the relevant configuration for specifying scheduling. Here is a difference between the rancher version k8s and the native k8s. In the native k8s, kube-scheduler exists as an executable file, while in rancher it is a container, the scheduler node Part of the content can be docker inspect kube-schedulerviewed by viewing, just copy and add it here policy-config-file: /etc/kubernetes/ssl/scheduler-policy-config.json.

We get scheduler-policy-config.json from github and put it under /etc/kubernetes/ssl/. If there are multiple master nodes, each master node needs to execute:

cd /etc/kubernetes/ssl/
curl -O https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/config/scheduler-policy-config.json

Then, to build a cluster, you need to enter the directory of cluster.yml created in the previous step:

rke up

If it prompts that the docker version does not support it, just add parameters to ignore the version:

rke up --ignore-docker-version

After the execution is completed, the cluster is set up at this time, and then the following configurations are performed:

mkdir ~/.kube
# kube_config_cluster.yml文件为rke up创建集群时在当前目录生成
cp kube_config_cluster.yml ~/.kube
mv ~/.kube/kube_config_cluster.yml ~/.kube/config

Next install kubectl to manage the cluster:

curl -LO "https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl"
# 配置kubectl
chmod 755 ./kubectl
mv ./kubectl /usr/local/bin/kubectl
# 查看版本
kubectl version

Use kubectl to view pods:

# 查看pod
kubectl get pods

Since then, the k8s cluster has been built, and the GPU sharing configuration will be done next

GPU Sharing deployment

  1. Deploy the GPU sharing scheduling plug-in gpushare-schd-extender:
cd /tmp/
curl -O https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/config/gpushare-schd-extender.yaml
# 因为是使用单节点,因此需要能够在master上进行调度,所以需要在gpushare-schd-extender.yaml中将
# nodeSelector:
#    node-role.kubernetes.io/master: ""
# 这两句删除,使k8s能够在master上进行调度
kubectl create -f gpushare-schd-extender.yaml
  1. Deploy the device plugin gpushare-device-plugin

If your cluster is not newly built, if you have installed nvidia-device-plugin before, you need to delete it. For rancher version of k8s, you can use kubectl get pods to see the corresponding pod of nvidia-device-plugin, just delete it. Then deploy the device plugin gpushare-device-plugin:

cd /tmp/
wget https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-rbac.yaml
kubectl create -f device-plugin-rbac.yaml
wget https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-ds.yaml
# 默认情况下,GPU显存以GiB为单位,若需要使用MiB为单位,需要在这个文件中,将--memory-unit=GiB修改为--memory-unit=MiB
kubectl create -f device-plugin-ds.yaml
  1. Label the GPU node

In order to schedule a GPU program to a server with a GPU, the service needs to be tagged gpushare=true:

# 查看所有节点
kubectl get nodes
# 选取GPU节点打标
kubectl label node <target_node> gpushare=true
# 例如我这里主机名为master,则打标语句为:
# kubectl label node master gpushare=true
  1. Update the kubectl executable:
wget https://github.com/AliyunContainerService/gpushare-device-plugin/releases/download/v0.3.0/kubectl-inspect-gpushare
chmod u+x kubectl-inspect-gpushare
mv kubectl-inspect-gpushare /usr/local/bin

Then execute kubectl inspect gpushare, if you can see the GPU information, it means the installation is successful:

image.png

It can be seen that the total GPU memory is 7981MiB at this time, and the usage is 0.

test

Next to test, we get the sample program of Alibaba Cloud:

wget https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/samples/1.yaml
wget https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/samples/2.yaml
wget https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/samples/3.yaml

These four files are the yaml of four sample containers that need to schedule GPUs, which kubectl create -f x.yamlcan be started directly. The GPUs scheduled in these files are all in G units. Here I have modified the scheduling value. The parameter of the scheduling value is named : aliyun.com/gpu-mem, the first one is 128, the second one is 256, the third one is 512, start one by one, observe the GPU usage:
start the first one:
image.png
start the second one:
image.png
start the third one:
image.png

So far, rancher version k8s configures GPU sharing successfully.

Guess you like

Origin blog.csdn.net/u012751272/article/details/120566202