Taints and Tolerations of Kubernetes Scheduling

Generally, what we deploy Podis to select a certain node through the automatic scheduling of the cluster. By default, the scheduler considers that the resources are sufficient and the load is as even as possible. However, sometimes we need to be able to control the scheduling of PODs in a more fine-grained manner. This requires the use of a concept in Kubernetes: affinity, which is mainly divided into two categories: nodeAffinityand podAffinity.

I. Overview

Kubernetes supports restricting Pods to run on specified Nodes, or specifying that they are more inclined to run on certain Nodes.

There are several ways to achieve this functionality:

NodeName: The simplest node selection method, directly specify the node and skip the scheduler.
NodeSelector: An early simple control method, directly dispatching Pods to Nodes with specific labels through key-value pairs.
NodeAffinity: NodeSelectorAn upgraded version that supports richer configuration rules and is more flexible to use. (NodeSelector will be eliminated)
PodAffinity: Constrain which nodes the Pod can be scheduled to based on the Pod label already running on the node, not based on the node label.

2. Specify the scheduling node

1. NodeName

Pod.spec.nodeNamePodScheduling directly to the specified Nodenode will skip the Schedulerscheduling policy of , which is a mandatory match. Podis in state if the selected node does not exist Pending.

apiVersion: v1
kind: Pod
metadata:
  name: nginx-nodename
spec:
  containers:
  - name: nginx
    image: nginx:1.7.9
  nodeName: loc-node36

2. NodeSelector

Nodes are selected through kubernetesthe label-selectormechanism of , matched by the scheduler strategy label, and then scheduled Podto the target node. This rule is a mandatory constraint. Podis in state if the selected node does not exist Pending.

Set the label on the node:

# kubectl label nodes loc-node36  zone=guangzhou

Examples are as follows:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
  nodeSelector:
    zone: guangzhou

3. Affinity

1. Node affinity

pod.spec.affinity.nodeAffinity

  • requiredDuringSchedulingIgnoredDuringExecution # hard strategy

     硬策略是必须(不)落在指定的节点上,如果不符合条件,则一直处于Pending状态
    
  • preferredDuringSchedulingIgnoredDuringExecution # soft strategy

    软策略是偏向于,更想(不)落在某个节点上,但如果实在没有,落在其他节点也可以    
    

The combination of soft and hard achieves a more accurate nodechoice. The following configuration means that Podmust not exist loc-node36in the node. Other nodes are fine, but it is best to fall in the node with the value of labelin .apploc-node37

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-node-affinity
  labels:
    app: nginx-node-affinity
spec:
  replicas: 2
  selector:
    matchLabels:
      app: node-affinity         # 这个lable需要对应下面metadata lables
  template:
    metadata:
      labels:
        app: node-affinity
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80
          name: nginx-web
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:  # 硬策略
            nodeSelectorTerms:   # 在nodeAffinity下面,如果是required,则需要使用nodeSelectorTerms   #多个nodeSelectorTerms是或的关系,满足一个即可
            - matchExpressions:  # 多个matchExpressions是与的关系,全部满足才会调度
              - key: kubernetes.io/hostname
                operator: NotIn
                values:
                - loc-node36
          preferredDuringSchedulingIgnoredDuringExecution:  # 软策略
          - weight: 1
            preference:
              matchExpressions:
              - key: app
                operator: In
                values:
                - loc-node37
# 节点存在默认的 label
# kubectl get nodes --show-labels
NAME           STATUS   ROLES    AGE    VERSION   LABELS
loc-master35   Ready    master   5d7h   v1.18.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=loc-master35,kubernetes.io/os=linux,node-role.kubernetes.io/master=
loc-node36     Ready    <none>   5d5h   v1.18.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=loc-node36,kubernetes.io/os=linux
loc-node37     Ready    <none>   5d4h   v1.18.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=loc-node37,kubernetes.io/os=linux
# kubectl get pods -o wide 
NAME                                   READY   STATUS    RESTARTS   AGE     IP               NODE         NOMINATED NODE   READINESS GATES
nginx-node-affinity-7fcff98c5c-8whzf   1/1     Running   0          7h58m   172.16.100.209   loc-node37   <none>           <none>
nginx-node-affinity-7fcff98c5c-nnwgr   1/1     Running   0          7h58m   172.16.100.210   loc-node37   <none>           <none>

Key-value operation relationship

  • In: the value of label is in a list
  • NotIn: The value of label is not in a list
  • Gt: The value of label is less than a certain value
  • Exists: a label exists
  • DoesNotExist: A label does not exist

2. Pod affinity

The above two methods allow Podto select nodes. Sometimes we also want to be able to Podschedule according to the relationship between. podAffinityThe concept introduced by Kubernetes in version 1.4 can meet our needs.

Similar nodeAffinityto , podAffinitythere are two scheduling strategies as follows, the only difference is that if mutual exclusion is used, we need to use podAntiAffinitythe field.

pod.spec.affinity.podAffinity/podAntiAffinity

  • requiredDuringSchedulingIgnoredDuringExecution # hard strategy
  • preferredDuringSchedulingIgnoredDuringExecution # soft strategy

The following policy indicates that the value of pod-affinity-podand the value of labelthe middle are deployed in the same topological domain, and it is best not to be deployed under the same topological domain as the value of the middle value .appnginxlabelappnginx2Pod

How to understand topological domain? At present, kubernetes.io/hostnamebasically means that a host is a topological domain. We can also put a new label on the host kubernetes.io/zone = gz, which Podcan be deployed on any one under the topological domain Node, and does not need to be in the same one as the matching one Node.

apiVersion: v1
kind: Pod
metadata:
  name: pod-affinity-pod
  labels:
    app: nginx-affinity-pod
spec:
  containers:
  - name: pod-affinity-pod
    image: nginx:1.7.9
    imagePullPolicy: IfNotPresent
    ports:
    - name: web
      containerPort: 80
  affinity:
      podAffinity:                 # 在同一域下
        requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app           # 标签key
                operator: In
                values:
                - nginx            # 标签value
            topologyKey: kubernetes.io/hostname  # 域的标准为node节点的名称
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 1
          podAffinityTerm:
            labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - nginx2
            topologyKey: kubernetes.io/hostname
# kubectl get pods --show-labels -o wide
NAME               READY   STATUS    RESTARTS   AGE     IP               NODE         NOMINATED NODE   READINESS GATES   LABELS
nginx              1/1     Running   0          5m30s   172.16.100.235   loc-node37   <none>           <none>            app=nginx
pod-affinity-pod   1/1     Running   0          5m26s   172.16.100.236   loc-node37   <none>           <none>            app=nginx-affinity-pod

The affinity/anti-affinity scheduling strategy is as follows:

Scheduling strategy match tags operator Topology domain support Scheduling target
nodeAffinity the host In,NotIn,Exists,DoesNotExists,Gt,Lt no designated host
podAffinity Pod In,NotIn,Exists,DoesNotExists yes The pod and the specified pod are in the same topology domain
podAnitAffinity Pod In,NotIn,Exists,DoesNotExists yes The pod and the specified pod are not in the same topology domain

4. Stains

Node affinity, is Podan attribute (preference or hard requirement) of , which makes Podis attracted to a specific class of nodes. TrintInstead, it enables nodes to repel a specific class Pod.

TaintWorking with Tolerationeach other, it can be used to avoid Podbeing assigned to inappropriate nodes. One or more can be applied to each node Taint, which means that those that cannot tolerate these Taintwill Podnot be accepted by the node. If Tolerationapplied Podto , it means that these Podcan (but are not required to) be scheduled Tainton nodes with matching .

1. Stain composition

Composition of each stain:

key=value:effect

Each taint has a key and value as the taint label, where vlaue can be empty, and effect describes the effect of the taint. Currently taint effectthe following three options are supported:

  • Noscedule: Indicates that k8s will not schedule on the Podwith this taintNode
  • PreferNoSchedule: Indicates that k8s will try to avoid dispatching to the Podwith this taintNode
  • NoExecute: Indicates that k8s will not Podschedule on the with this taint Node, and will expel Nodethe already existing onPod

2. Stain settings

You can use kubectl taintthe command to Nodeset a taint on a node. NodeAfter the taint is set , Podthere is an exclusive relationship with it, which allows the scheduling of Nodethe rejected Podto be executed, and even Nodethe existing Pods to be expelled.

# 设置污点
kubectl taint nodes loc-node36 key1=value:Noscedule

# 节点说明中,查找 Taints 字段
kubectl describe node loc-master35

# 去除污点
kubectl taint nodes loc-node36 key1:Noscedule-

5. Tolerance

If the taint is set, it will not be dispatched to to a certain extent Nodeaccording to taintthe mutually exclusive relationship between effect: Noscedule、PreferNoSchedule, NoExecuteand , but we can set tolerance (Toleration) on , which means that the taint can be tolerated if the tolerance is set The existence of , can be dispatched to the existence of the taint.PodPodNodePodPodNode

pod.spec.tolerations

tolerations:
- key: "key1"
  operator: "Equal"
  value: "Value"
  effect: "NoSchedule"
  tolerationSeconds: 3600
- key: "key1"
  operator: "Equal"
  value: "Value"
  effect: "NoExecute"
- key: "key2"
  operator: "Exists"
  effect: "NoSchedule" 
  • Among them key, value, effectshould be consistent with Nodethe setting ontaint
  • operatorA value of Existswill ignore valuethe value
  • tolerationsSecondsUsed to describe how long a can keep running on when Podit needs to be evictedPod

1. When the key value is not specified, it means that all tainted keys are tolerated:

tolerations:
- operator: "Exists"

2. When the effect value is not specified, it means that all stain effects are tolerated

tolerations:
- key: "key"
  operator: "Exists"

3. When there are multiple Masters, in order to prevent waste of resources, you can set as follows

kubectl taint nodes master-nodename node-role.kubernetes.io/master:PreferNoSchedule


Reference:
https://segmentfault.com/a/1190000018446833

Guess you like

Origin blog.csdn.net/qq_25854057/article/details/124303796