Generally, what we deploy
Pod
is to select a certain node through the automatic scheduling of the cluster. By default, the scheduler considers that the resources are sufficient and the load is as even as possible. However, sometimes we need to be able to control the scheduling of PODs in a more fine-grained manner. This requires the use of a concept in Kubernetes: affinity, which is mainly divided into two categories:nodeAffinity
andpodAffinity
.
I. Overview
Kubernetes supports restricting Pods to run on specified Nodes, or specifying that they are more inclined to run on certain Nodes.
There are several ways to achieve this functionality:
NodeName
: The simplest node selection method, directly specify the node and skip the scheduler.
NodeSelector
: An early simple control method, directly dispatching Pods to Nodes with specific labels through key-value pairs.
NodeAffinity
: NodeSelector
An upgraded version that supports richer configuration rules and is more flexible to use. (NodeSelector will be eliminated)
PodAffinity
: Constrain which nodes the Pod can be scheduled to based on the Pod label already running on the node, not based on the node label.
2. Specify the scheduling node
1. NodeName
Pod.spec.nodeName
Pod
Scheduling directly to the specified Node
node will skip the Scheduler
scheduling policy of , which is a mandatory match. Pod
is in state if the selected node does not exist Pending
.
apiVersion: v1
kind: Pod
metadata:
name: nginx-nodename
spec:
containers:
- name: nginx
image: nginx:1.7.9
nodeName: loc-node36
2. NodeSelector
Nodes are selected through kubernetes
the label-selector
mechanism of , matched by the scheduler strategy label
, and then scheduled Pod
to the target node. This rule is a mandatory constraint. Pod
is in state if the selected node does not exist Pending
.
Set the label on the node:
# kubectl label nodes loc-node36 zone=guangzhou
Examples are as follows:
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
nodeSelector:
zone: guangzhou
3. Affinity
1. Node affinity
pod.spec.affinity.nodeAffinity
-
requiredDuringSchedulingIgnoredDuringExecution # hard strategy
硬策略是必须(不)落在指定的节点上,如果不符合条件,则一直处于Pending状态
-
preferredDuringSchedulingIgnoredDuringExecution # soft strategy
软策略是偏向于,更想(不)落在某个节点上,但如果实在没有,落在其他节点也可以
The combination of soft and hard achieves a more accurate node
choice. The following configuration means that Pod
must not exist loc-node36
in the node. Other nodes are fine, but it is best to fall in the node with the value of label
in .app
loc-node37
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-node-affinity
labels:
app: nginx-node-affinity
spec:
replicas: 2
selector:
matchLabels:
app: node-affinity # 这个lable需要对应下面metadata lables
template:
metadata:
labels:
app: node-affinity
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
name: nginx-web
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution: # 硬策略
nodeSelectorTerms: # 在nodeAffinity下面,如果是required,则需要使用nodeSelectorTerms #多个nodeSelectorTerms是或的关系,满足一个即可
- matchExpressions: # 多个matchExpressions是与的关系,全部满足才会调度
- key: kubernetes.io/hostname
operator: NotIn
values:
- loc-node36
preferredDuringSchedulingIgnoredDuringExecution: # 软策略
- weight: 1
preference:
matchExpressions:
- key: app
operator: In
values:
- loc-node37
# 节点存在默认的 label
# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
loc-master35 Ready master 5d7h v1.18.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=loc-master35,kubernetes.io/os=linux,node-role.kubernetes.io/master=
loc-node36 Ready <none> 5d5h v1.18.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=loc-node36,kubernetes.io/os=linux
loc-node37 Ready <none> 5d4h v1.18.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=loc-node37,kubernetes.io/os=linux
# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-node-affinity-7fcff98c5c-8whzf 1/1 Running 0 7h58m 172.16.100.209 loc-node37 <none> <none>
nginx-node-affinity-7fcff98c5c-nnwgr 1/1 Running 0 7h58m 172.16.100.210 loc-node37 <none> <none>
Key-value operation relationship
In
: the value of label is in a listNotIn
: The value of label is not in a listGt
: The value of label is less than a certain valueExists
: a label existsDoesNotExist
: A label does not exist
2. Pod affinity
The above two methods allow Pod
to select nodes. Sometimes we also want to be able to Pod
schedule according to the relationship between. podAffinity
The concept introduced by Kubernetes in version 1.4 can meet our needs.
Similar nodeAffinity
to , podAffinity
there are two scheduling strategies as follows, the only difference is that if mutual exclusion is used, we need to use podAntiAffinity
the field.
pod.spec.affinity.podAffinity/podAntiAffinity
- requiredDuringSchedulingIgnoredDuringExecution # hard strategy
- preferredDuringSchedulingIgnoredDuringExecution # soft strategy
The following policy indicates that the value of pod-affinity-pod
and the value of label
the middle are deployed in the same topological domain, and it is best not to be deployed under the same topological domain as the value of the middle value .app
nginx
label
app
nginx2
Pod
How to understand topological domain? At present, kubernetes.io/hostname
basically means that a host is a topological domain. We can also put a new label on the host kubernetes.io/zone = gz
, which Pod
can be deployed on any one under the topological domain Node
, and does not need to be in the same one as the matching one Node
.
apiVersion: v1
kind: Pod
metadata:
name: pod-affinity-pod
labels:
app: nginx-affinity-pod
spec:
containers:
- name: pod-affinity-pod
image: nginx:1.7.9
imagePullPolicy: IfNotPresent
ports:
- name: web
containerPort: 80
affinity:
podAffinity: # 在同一域下
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app # 标签key
operator: In
values:
- nginx # 标签value
topologyKey: kubernetes.io/hostname # 域的标准为node节点的名称
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx2
topologyKey: kubernetes.io/hostname
# kubectl get pods --show-labels -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS
nginx 1/1 Running 0 5m30s 172.16.100.235 loc-node37 <none> <none> app=nginx
pod-affinity-pod 1/1 Running 0 5m26s 172.16.100.236 loc-node37 <none> <none> app=nginx-affinity-pod
The affinity/anti-affinity scheduling strategy is as follows:
Scheduling strategy | match tags | operator | Topology domain support | Scheduling target |
---|---|---|---|---|
nodeAffinity | the host | In,NotIn,Exists,DoesNotExists,Gt,Lt | no | designated host |
podAffinity | Pod | In,NotIn,Exists,DoesNotExists | yes | The pod and the specified pod are in the same topology domain |
podAnitAffinity | Pod | In,NotIn,Exists,DoesNotExists | yes | The pod and the specified pod are not in the same topology domain |
4. Stains
Node affinity, is Pod
an attribute (preference or hard requirement) of , which makes Pod
is attracted to a specific class of nodes. Trint
Instead, it enables nodes to repel a specific class Pod
.
Taint
Working with Toleration
each other, it can be used to avoid Pod
being assigned to inappropriate nodes. One or more can be applied to each node Taint
, which means that those that cannot tolerate these Taint
will Pod
not be accepted by the node. If Toleration
applied Pod
to , it means that these Pod
can (but are not required to) be scheduled Taint
on nodes with matching .
1. Stain composition
Composition of each stain:
key=value:effect
Each taint has a key and value as the taint label, where vlaue can be empty, and effect describes the effect of the taint. Currently taint effect
the following three options are supported:
Noscedule
: Indicates that k8s will not schedule on thePod
with this taintNode
PreferNoSchedule
: Indicates that k8s will try to avoid dispatching to thePod
with this taintNode
NoExecute
: Indicates that k8s will notPod
schedule on the with this taintNode
, and will expelNode
the already existing onPod
2. Stain settings
You can use kubectl taint
the command to Node
set a taint on a node. Node
After the taint is set , Pod
there is an exclusive relationship with it, which allows the scheduling of Node
the rejected Pod
to be executed, and even Node
the existing Pod
s to be expelled.
# 设置污点
kubectl taint nodes loc-node36 key1=value:Noscedule
# 节点说明中,查找 Taints 字段
kubectl describe node loc-master35
# 去除污点
kubectl taint nodes loc-node36 key1:Noscedule-
5. Tolerance
If the taint is set, it will not be dispatched to to a certain extent Node
according to taint
the mutually exclusive relationship between effect: Noscedule、PreferNoSchedule
, NoExecute
and , but we can set tolerance (Toleration) on , which means that the taint can be tolerated if the tolerance is set The existence of , can be dispatched to the existence of the taint.Pod
Pod
Node
Pod
Pod
Node
pod.spec.tolerations
tolerations:
- key: "key1"
operator: "Equal"
value: "Value"
effect: "NoSchedule"
tolerationSeconds: 3600
- key: "key1"
operator: "Equal"
value: "Value"
effect: "NoExecute"
- key: "key2"
operator: "Exists"
effect: "NoSchedule"
- Among them
key
,value
,effect
should be consistent withNode
the setting ontaint
operator
A value ofExists
will ignorevalue
the valuetolerationsSeconds
Used to describe how long a can keep running on whenPod
it needs to be evictedPod
1. When the key value is not specified, it means that all tainted keys are tolerated:
tolerations:
- operator: "Exists"
2. When the effect value is not specified, it means that all stain effects are tolerated
tolerations:
- key: "key"
operator: "Exists"
3. When there are multiple Masters, in order to prevent waste of resources, you can set as follows
kubectl taint nodes master-nodename node-role.kubernetes.io/master:PreferNoSchedule
Reference:
https://segmentfault.com/a/1190000018446833