简介
主要任务是把定义好的Pod分配到集群节点上,启动后一直会监听API Server获取PodSpec.NodeName为空的Pod,对每个Pod都会创建一个binding,表明该Pod应该放到哪个节点上。
调度过程
首先是过滤掉不满足条件的节点,这个过程成为predicate;然后对通过的节点按照优先级排序,这个是priority;最后从选择优先级最高的节点(如果中间有任何一部错误直接报错)。
几种调度的策略
1.节点亲和性(pod.spec.nodeAffinity)
- 软策略(perferredDuringSchedulingIgnoredDuringExecution):没有满足条件也可以
- 硬策略(requiredDuringSchedulingIgnoredDuringExecution):必须满足条件,没有则挂起
软策略可以和硬策略结合使用先写硬策略后写软策略
调度策略
调度策略 | 匹配标签 | 操作符 | 拓扑域支持 | 调度目标 |
nodeAffinty | 主机 | In,NotIn,Exists,DoesNotExists,Gt,Lt | 否 | 指定主机 |
podAffiinity | POD | In,NotIn,Exists,DoesNotExists | 是 | Pod与指定Pod同一拓扑域 |
podAnitAffinity | POD | In,NotIn,Exists,DoesNotExists | 是 | Pod与指定Pod不在同一拓扑域 |
eg:
1.硬策略
下面示例展示无论如何Pod都会运行在k8s-03节点
##requiredDuringSchedulinglgnoredDuringExecution##
查看Node标签,通过标签选择“kubernetes.io/hostname=k8s-03”
# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8s-01 Ready <none> 17d v1.14.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-01,kubernetes.io/os=linux
k8s-02 Ready <none> 17d v1.14.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-02,kubernetes.io/os=linux
k8s-03 Ready <none> 17d v1.14.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-03,kubernetes.io/os=linux
k8s-04 Ready <none> 17d v1.14.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-04,kubernetes.io/os=linux
# vim test-deployment.yaml
注意key和value要写正确否则Pod一直处于peding状态
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: test
namespace: default
spec:
replicas: 2
selector:
matchLabels:
app: test
template:
metadata:
name: myapp
namespace: default
labels:
app: test
spec:
containers:
- name: nginx
image: ntp.weijiayu.club/myapp/nginx:v2
ports:
- containerPort: 80
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- k8s-03
2.软策略
有就运行在匹配的主机k8s-06上没有的话也可以
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: test
namespace: default
spec:
replicas: 2
selector:
matchLabels:
app: test
template:
metadata:
name: myapp
namespace: default
labels:
app: test
spec:
containers:
- name: nginx
image: ntp.weijiayu.club/myapp/nginx:v2
ports:
- containerPort: 80
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1 #权重,软策略在同一个Pod中添加多个
preference:
matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- k8s-06
3.根据Pod来建立亲和性
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: test
namespace: default
spec:
replicas: 2
selector:
matchLabels:
app: test
template:
metadata:
name: myapp
namespace: default
labels:
app: test
spec:
containers:
- name: nginx
image: ntp.weijiayu.club/myapp/nginx:v2
ports:
- containerPort: 80
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- test
topologyKey: kubernetes.io/hostname
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchExpressions
- key: app
operator: In
values:
- test
topologyKey: kubernetes.io/hostname
2.污点/容忍(Taint/Toleration)
Taint和toleration相互配合,可以用来避免Pod被分配到不合适的节点上,每个节点上都可以应用一个或多个taint,这表示对于那些不能容忍这些taint的pod,是不会被该节点接受的,如果将toleration应用于Pod上,则表示这些Pod可以(但不要求)被调度到具有匹配taint的节点。
污点
使用kubectl taint 命令可以给某个Node节点设置污点,Node被设置上污点之后就和Pod之间存在了一种相斥的关系,可以让Node拒绝Pod的调度执行,甚至将Node已经存在的Pod驱逐出去;
每个污点有一个key和value做为污点的标签,其中value可以为空,effect描述污点的作用,当前taint effect支持如下三个选项:
- NoSchedule:表示k8s将不会将Pod调度到具有该污点的Node上;
- PreferNoSchedule:表示k8s将尽量避免将Pod调度到具有该污点的Node上;
- NoExecute:表示k8s将不会将Pod调度到具有该污点的Node上,同时会将Node上已经存在的Pod驱逐出去
污点的设置,查看和去除:
#设置污点
# kubectl taint nodes node1 key1=value1:NoSchedule
#节点说明中,查找Taints字段
# kubectl describe pod pod-name
#去除污点
# kubectl taint nodes node1 key1:NoSchedule-
容忍
设置了污点的Node将根据taint的effect:NoSchedule,PreferNoSchedule,NoExecute和Pod之间产生互斥的关系,Pod将在一定程度上不会被调度到Node上,但我们可以再Pod上设置容忍,意思是设置了容忍的Pod将可以容忍污点的存在,可以被调度到存在污点的Node上.
pod.spec.tolerations
tolerations:
- key:"key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"
tolerationSeconds: 3600
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoExecute"
- key: "key2"
operator: "Exists"
effect: "NoSchedule"
- 其中key,value,effect要与Node上设置的taint保持一致
- operator的值为Exists将会忽略value值
- tolerationSeconds用于描述当Pod需要被驱逐时可以在Pod上继续保留运行的时间
当不指定key值时,表示容忍所有污点的key:
tolerations:
- operator: "Exists"
当不指定effect值时,表示容忍所有的污点作用
tolerations:
- key: "key"
operator: "Exists"
有多个Master存在时防止资源浪费,可以设置
kubectl taint nodes Node-name node-role.kubernetes.io/master=:PreferNoSchedule
eg:
apiVersion: v1
kind: Pod
metadata:
name: pod-test
namespace: default
labels:
app: pod-test
spec:
containers:
- name: pod-test
image: ntp.weijiayu.club/myapp/nginx:v1
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"
tolerationSeconds: 3600
3.指定调度节点
pod.spec.nodeName将Pod直接调度到指定的Node节点上,会跳过Scheduler的调度策略,该匹配规则是强制匹配
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: myweb
namespace: default
spec:
replicas: 3
template:
metadata:
labels:
app: myweb
spec:
nodeName: k8s-02
containers:
- name: myweb
image: ntp.weijiayu.club/myweb/nginx:v2
ports:
- containerPort: 80
pod.spec.nodeSelector通过kubernetes的label-selector机制选择节点,由调度器来调度策略匹配label,而后调度Pod到目标节点,该匹配规则属于强制约束
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: myweb
spec:
replicas: 2
template:
metadata:
labels:
app: myweb
spec:
nodeSelector:
type: backEadNode1
containers:
- name: myweb
image: ntp.weijiayu.club/myapp/nginx:v2
ports:
- containerPort: 80
Node添加标签
# kubectl label node k8s-02 disk=ssd