Reference: https://kubernetes.io/zh/docs/concepts/configuration/assign-pod-node/
Pod constraint can only run on a particular Nodes, or a limited run on a particular node. There are several ways to achieve this, the recommended method is selected with the tag selector. Typically such constraints are not necessary because the scheduler will place itself reasonably (for example, the node Pod dispersed, rather than placed on node Pod insufficient resources), but in some cases, may require more multi-pod docking control node, such as ensuring the connection pod ultimately rests on the SSD machine, or from two different services and a large number of pod placed in the same communication area is available.
nodeSelector
nodeSelector
Node selection is constrained simplest form of recommendations. nodeSelector
It is a field of PodSpec. It specifies the key-value mapping pairs. In order to make the pod can be run on the node, the node must be assigned to each key as label (which may also have other label). The most commonly used is a pair of pairs.
Let's look at an nodeSelector
example.
Step Zero: Prerequisites
This example assumes that you have a basic understanding pod Kubernetes and has established a Kubernetes cluster .
Step a: add tags to the node
Execute kubectl get nodes
command to get the name of the cluster nodes. Select the one you want to add a node label, and then execute kubectl label nodes <node-name> <label-key>=<label-value>
the command to add a label to the node of your choice. For example, if your node name is 'kubernetes-foo-node-1.ca -robinson.internal' and wants of the tag is 'disktype = ssd', you can execute kubectl label nodes kubernetes-foo-node-1.c.a-robinson.internal disktype=ssd
the command.
You can re-run kubectl get nodes --show-labels
to verify that it is valid and view the current node has a label. You can also use the kubectl describe node "nodename"
command to view a complete list of tags specified node.
Gets node node name
kubectl get nodes NAME STATUS ROLES AGE VERSION 192.168.1.65 Ready <none> 10d v1.17.4 192.168.1.66 Ready <none> 10d v1.17.4
add tag
kubectl label nodes 192.168.1.65 disktype=ssd node/192.168.1.65 labeled
Check the label to add
# kubectl get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS 192.168.1.65 Ready <none> 10d v1.17.4 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=192.168.1.65,kubernetes.io/os=linux 192.168.1.66 Ready <none> 10d v1.17.4 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=192.168.1.66,kubernetes.io/os=linux
We can see node192.168.1.65 add one more label disktype = ssd while the label is not node192.168.1.66
Second Step: nodeSelector field to the pod configuration
Take any pod configuration file that you want to run, and add a nodeSelector part. For example, if the following is my pod configuration:
apiVersion: v1 kind: Pod metadata: name: nginx-pod labels: env: test spec: containers: - name: nginx image: nginx
Then like this add nodeSelector:
# cat nginx-pod.yaml apiVersion: v1 kind: Pod metadata: name: nginx-pod labels: env: test spec: containers: - name: nginx image: nginx nodeSelector: disktype: ssd
When you run the following kubectl apply -f pod-nginx.yaml
command, pod will be scheduled on the label to add a node. You can run kubectl get pods -o wide
and view assigned to a pod of "NODE" to verify that it is valid.
kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-pod 1/1 Running 0 3m10s 172.17.54.8 192.168.1.65 <none> <none>
Episode: Built-in node label
In addition you attach labels, a node also pre-populated set of standard labels. These labels are
kubernetes.io/hostname
failure-domain.beta.kubernetes.io/zone
failure-domain.beta.kubernetes.io/region
beta.kubernetes.io/instance-type
kubernetes.io/os
kubernetes.io/arch
Note: The value of a particular cloud vendor of these labels, and therefore not guaranteed. For example, the value kubernetes.io / hostname may be the same as the node name in certain environments, but in other environments may be a different value.
Node isolation / restriction
Node objects can be added to the tag of the pod to specific nodes or groups. This can be used to ensure that the specified pod can only run on a node with a certain isolation, safety or regulatory properties. When using a label for this purpose, it is strongly recommended to choose kubelet processes on the node labeled keys can not be modified. This prevents the infected node using its credentials kubelet set these labels on your Node object, and the scheduler will affect the workload scheduling to node infection.
NodeRestriction
Kubelet plug prevents access using node-restriction.kubernetes.io/
the prefix set or modify the label. To use the tag prefix node isolation:
- Check whether the use Kubernetes v1.11 +, so NodeRestriction function is available.
- Make sure you use the nodes authorized and has enabled NodeRestriction access plug-ins .
- To
node-restriction.kubernetes.io/
add labels to the Node prefix objects, and then use these labels in the node selector. For example,example.com.node-restriction.kubernetes.io/fips=true
orexample.com.node-restriction.kubernetes.io/pci-dss=true
.
Affinity and anti-affinity
nodeSelector provides a very simple method to Pod constraint on nodes having a particular node tag. And pro / anti-affinity feature greatly expands the types you can express constraints. The key point is to enhance
- More expressive language (not just "exact match AND")
- You can find the rules "soft" / "Preferences", rather than a mandatory requirement, so if the scheduler can not meet this requirement, the pod is still scheduled
- You can use the node (or other topologies domain) of the pod label bound label instead of the node itself, to allow the pod which may or may not be put together.
Node affinity
The concept is similar node affinity nodeSelector on, it allows you to constrain the pod to those nodes can be scheduled according to the label on the node.
There are two types of affinity nodes, respectively, requiredDuringSchedulingIgnoredDuringExecution
and preferredDuringSchedulingIgnoredDuringExecution
. You can see them as "hard" and "soft", meaning that the former specifies the rules pod scheduled on a node that must be met (like nodeSelector
but with more expressive syntax), which specifies the scheduler will attempt We can not guarantee the implementation of the single preference. "IgnoredDuringExecution" part means that similar nodeSelector
works, if the node labels change occurs at runtime, and thus no longer meet affinity rule on the pod, then the pod will still continue to run on that node. In the future we plan to offer requiredDuringSchedulingRequiredDuringExecution
, it is similar requiredDuringSchedulingIgnoredDuringExecution
, except that it will be expelled from the node pod pod no longer meet the requirements of the node affinity.
Thus, requiredDuringSchedulingIgnoredDuringExecution
the example would be a "pod will only run on a node with an Intel CPU", and preferredDuringSchedulingIgnoredDuringExecution
the example is "trying to run this group pod XYZ fault area, if this is not possible, then allow some of the pod to run elsewhere . "
By affinity PodSpec node of affinity
the field nodeAffinity
for the specified field.
The following is an example of a node and the pod using affinity:
# cat pod-with-node-affinity.yaml apiVersion: v1 kind: Pod metadata: name: with-node-affinity spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/e2e-az-name operator: In values: - e2e-az1 - e2e-az2 preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: another-node-label-key operator: In values: - another-node-label-value containers: - name: with-node-affinity image: k8s.gcr.io/pause:2.0
This node represents the affinity rules, pod can only be placed in a key tag to kubernetes.io/e2e-az-name
and label value e2e-az1
or e2e-az2
the node. Further, the nodes that meet these criteria, the bond is a tag another-node-label-key
and a label value another-node-label-value
of a node should be preferred.
You can see in the example above In
using operator. The new node affinity syntax supports the following In
operators: NotIn
, Exists
, DoesNotExist
, Gt
, Lt
, . You can use NotIn
and DoesNotExist
to implement pro-and anti-node behavior, or use the node stain the pod expulsion from a particular node.
If you specify both nodeSelector
and nodeAffinity
both must be satisfied in order to dispatch the pod on the candidate node.
If you specify a plurality of the nodeAffinity
associated type nodeSelectorTerms
, then if one nodeSelectorTerms
met, then, will be scheduled on POD node.
If you specify a number of nodeSelectorTerms
associated matchExpressions
, then only when all matchExpressions
met, then, pod will be scheduled on the node.
If you modify or delete the label pod scheduled to nodes, pod will not be deleted. In other words, affinity selection valid only during scheduled pod.
preferredDuringSchedulingIgnoredDuringExecution
The weight
range of field values are 1-100. For each scheduling node meets all the requirements (resource request, RequiredDuringScheduling affinity expressions, etc.), the scheduler will traverse the elements of the field sum is calculated, and if the node matches the corresponding MatchExpressions, add "weight" to the sum. This score is then combined with other priority function score of the node. Node with the highest score is the most preferred.
Pod affinity between the pro and anti
between the pod and the pro and anti-affinity so that you can * Based on already running on the label * nodes pod pod can be scheduled to be bound by the node, not based on the node label. Formatting rules as "X If the node is already running on one or more of the pod Y rule is satisfied, then the pod should (or should not be the case where the non-affinity) running in the X-node." Y represents a LabelSelector optional list associated with the command space; different node, as defined namespace pod (pod are therefore the label on the namespace qualified), acting on the pod label tag selector must specify selection which application namespace. Conceptually, X is a topological domains, such as the node, rack, cloud providers regions, cloud providers and other regions. You can use topologyKey
to indicate it topologyKey
is a node labeled keys so that the system used to represent such a topological domains. See above episode: built-in node label labeled keys that are listed.
Note: inter-Pod affinity and anti-affinity requires a lot of processing, which could significantly slow down the dispatch of large-scale cluster. We do not recommend using them in more than hundreds of nodes in the cluster.
Note: Pod affinity anti-node and the need for markers consistent, i.e. each node in the cluster must be matched with the appropriate label topologyKey. If some or all of the nodes specified topologyKey missing label, may cause unexpected behavior.
Affinity with the node, there are currently two types of affinity and anti pod affinity, i.e. requiredDuringSchedulingIgnoredDuringExecution
, and preferredDuringSchedulingIgnoredDuringExecution
, represents a sub-table "hard" and "soft" requirement. See the previous node and the pro portion of the description. requiredDuringSchedulingIgnoredDuringExecution
One example is affinity "service A and service B pod placed in the same area, because communication between them a lot", and preferredDuringSchedulingIgnoredDuringExecution
examples will be anti-affinity "pod distributed across the area of this service" (mandatory requirement It is illogical, because you may have a few more than the area of the pod number).
Pod by affinity between the PodSpec affinity
the field podAffinity
for the specified field. The affinity between the pod by anti PodSpec in affinity
the field podAntiAffinity
for the specified field.
Example pod Pod using affinity
# cat pod-with-pod-affinity.yaml apiVersion: v1 kind: Pod metadata: name: with-pod-affinity spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: security operator: In values: - S1 topologyKey: failure-domain.beta.kubernetes.io/zone podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: security operator: In values: - S2 topologyKey: failure-domain.beta.kubernetes.io/zone containers: - name: with-pod-affinity image: k8s.gcr.io/pause:2.0
In this pod of affinity configuration defines a pod and a pod pro and anti-affinity rules and regulations. In this example, podAffinity
configured requiredDuringSchedulingIgnoredDuringExecution
, but podAntiAffinity
configured preferredDuringSchedulingIgnoredDuringExecution
. pod affinity rule indicates, when only the node and at least one key has been run and there is a "security" and pod tag value of "S1" in the same area, the pod can only be scheduled on the node. (More precisely, if a key with a node N having a failure-domain.beta.kubernetes.io/zone
tag and a value of V, the pod are eligible to run on the node N, to at least one node in the cluster having a key failure-domain.beta.kubernetes.io/zone
node and the value of V having a running bond pod label "security" and the value of "S1".) pod anti-affinity rule indicates that, if the node is already running a pod tag key "security" and the value "S2" having the pod does not want to schedule to the node. (If topologyKey
is failure-domain.beta.kubernetes.io/zone
, it means that when the pod having a tag node keys and "security" and the value "S2" is in the same zone, pod can not be scheduled on the node.) Read the design documentation for more affinity with pod pro and anti sample, including requiredDuringSchedulingIgnoredDuringExecution
and preferredDuringSchedulingIgnoredDuringExecution
configurations.
Pod affinity have affinity with anti legitimate operator In
, NotIn
, Exists
, DoesNotExist
.
In principle, topologyKey
it can be any valid key label. However, for performance and security reasons, topologyKey subject to some restrictions:
- For affinity with
requiredDuringSchedulingIgnoredDuringExecution
pod required anti-affinity,topologyKey
not allowed to be empty. - For the
requiredDuringSchedulingIgnoredDuringExecution
required affinity and anti-pod, the admission controllerLimitPodHardAntiAffinityTopology
is introduced to restricttopologyKey
notkubernetes.io/hostname
. If you want to make it available for custom topology, you must modify the access control or disable it. - For
preferredDuringSchedulingIgnoredDuringExecution
pod required affinity and anti-emptytopologyKey
is interpreted as "all Topology" (where "all Topology" limitkubernetes.io/hostname
,failure-domain.beta.kubernetes.io/zone
andfailure-domain.beta.kubernetes.io/region
combinations thereof). - In addition to the above,
topologyKey
it may be any valid key label.
In addition to labelSelector
and topologyKey
, you can also specify expressed namespace namespaces
queue, labelSelector
we should match it (with this labelSelector
and topologyKey
defined at the same level). If omitted or empty, the default namespace pod and pro / anti-affinity definitions are located.
All requiredDuringSchedulingIgnoredDuringExecution
affinity and anti-affinity association matchExpressions
must meet in order to be pod scheduled on the node.
More practical use cases
Pod affinity between the affinity and anti when used together with a set of higher-level (e.g. ReplicaSets, StatefulSets, Deployments, etc.), they may be more useful. You can easily configure a set of definitions should be located in the same topology (e.g., nodes) workload.
In the three-node cluster, a cache memory having a web application, e.g. redis. We hope that the web server as cache placed in the same position.
The following is a simple redis deployment of yaml code segment, and it has three copies of selected tag app=store
. Deployment is configured PodAntiAffinity
, to ensure that the scheduler will not schedule to copy on a single node.
apiVersion: apps/v1 kind: Deployment metadata: name: redis-cache spec: selector: matchLabels: app: store replicas: 3 template: metadata: labels: app: store spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - store topologyKey: "kubernetes.io/hostname" containers: - name: redis-server image: redis:3.2-alpine
The following code segment yaml the webserver the deployment configuration podAntiAffinity
and podAffinity
. This tells the scheduler all copies having app=store
placed together pod selector tag. It also ensures that each web server copies are not scheduled on a single node.
apiVersion: apps/v1 kind: Deployment metadata: name: web-server spec: selector: matchLabels: app: web-store replicas: 3 template: metadata: labels: app: web-store spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - web-store topologyKey: "kubernetes.io/hostname" podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - store topologyKey: "kubernetes.io/hostname" containers: - name: web-app image: nginx:1.12-alpine
nodeName
nodeName
The easiest way is to select the constraint nodes, but because of its self-limiting, is not normally used. nodeName
It is a field of PodSpec. If it is not empty, the scheduler will ignore the pod, and running processes on its kubelet specified node try to run the pod. Thus, if nodeName
specified in the PodSpec, it takes precedence over the node selection method above.
Use nodeName
some restrictions to the selected node:
- If the specified node does not exist,
- If the specified node does not have the resources to accommodate the pod, pod schedule will fail and the reason will be displayed as such or OutOfmemory OutOfcpu.
- Node name cloud environment is not always predictable or stable.
The following use nodeName
examples of fields pod profile:
apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx image: nginx nodeName: kube-01
The above pod will run on node kube-01.