此文为学习《Kubernetes权威指南》的相关笔记
学习笔记:
Kubernetes提供了两类探针(probe)作为健康和服务可用性检查的解决方案,允许使用者在Pod定义时指定检查方式和检查位置,使得K8s对于容器的管理不局限于通过容器init进程判断容器的健康程度,而是能够通过预先定义方式,根据容器类型、结构和功能的不同做不同检查,更准确地判断容器真正的状态,具有较强的灵活性。
这两类探针分别为:
- LivenessProbe探针,用于判断容器的存活状态(是否Runing),在判断为不健康时,kubelet将杀掉容器,根据容器的重启策略(RestartPolicy)作出不同的处理,如果不包含该探针,默认其返回值恒为Success
- ReadinessProbe探针,用于判断容器是否可用(是否Ready),只有Ready状态的Pod才能够接受请求,和LivenessProbe探针判断为false不同,服务可用探针判断为false后只将Pod从Service的可使用列表中移除,并不是直接杀死,因为可能还会存在恢复为Ready状态的情况,再回到Endpoint列表中。
两种探针均可以以下方式实现:
- ExecAction:通过执行命令的返回值做出判断
- TCPSocketAction:通过容器IP地址和端口号执行TCP检查,做出判断
- HTTPGetAction:通过HTTP Get方法检查,IP地址+端口号+路径,响应状态码200-400之间,则认为容器健康
以下是实现三种方法的实例:
1、通过ExecAction进行健康检查
编写yaml文件
定义spec.containers.livenessPobe.exec作为健康检查方式
探针尝试打开文件,因为时间设置,在第一次探测之间就删除该文件,探针返回false,容器将被重启
apiVersion: v1
kind: Pod
metadata:
name: liveness-exec
labels:
test: liveness
spec:
containers:
- name: liveness
image: busybox
args:
- /bin/sh
- -c
- echo ok >/tmp/health; sleep 10 ;rm -rf /tmp/health; sleep 600;
livenessProbe:
exec:
command:
- cat
- /tmp/health
initialDelaySeconds: 15
timeoutSeconds: 1
创建Pod并等待开始运行
# kubectl create -f health-1.yaml
pod/liveness-exec created
# kubectl get pod
NAME READY STATUS RESTARTS AGE
liveness-exec 1/1 Running 0 22s
查看Pod容器运行Events,可以与预想的相同
# kubectl describe pod liveness-exec
......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/liveness-exec to xu.node1
Normal Pulled 72s kubelet, xu.node1 Successfully pulled image "busybox"
Normal Created 71s kubelet, xu.node1 Created container liveness
Normal Started 71s kubelet, xu.node1 Started container liveness
Warning Unhealthy 30s (x3 over 50s) kubelet, xu.node1 Liveness probe failed: cat: can't open '/tmp/health': No such file or directory
Normal Killing 30s kubelet, xu.node1 Container liveness failed liveness probe, will be restarted
Normal Pulling 0s (x2 over 74s) kubelet, xu.node1 Pulling image "busybox"
注释掉删除文件的命令,可以看到不会再出现因为健康探针返回false,控制器杀死容器的情况
2、使用TCPSocketAction进行健康检查
编写yaml文件,部署一个nginx容器
设置健康检查方式为TCP检查,监听80端口是否可连接
因为该容器本身主要功能就是作为服务器对外开放80端口,所以把80端口是否可以访问作为判断依据显得很自然
这体现出K8s提供的探针检查良好的灵活性
apiVersion: v1
kind: Pod
metadata:
name: pod-with-healthcheck
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
livenessProbe:
tcpSocket:
port: 80
initialDelaySeconds: 30
timeoutSeconds: 1
建立Pod,尝试访问该端口,获得Nginx默认初始页
# curl 10.44.0.1:80
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
......
</body>
</html>
登陆该容器,直接关闭nginx,容器随之被探针关闭
# kubectl exec -it pod-with-healthcheck -c nginx -- /bin/sh
# nginx -s quit
2019/12/11 11:12:33 [notice] 12#12: signal process started
# command terminated with exit code 137
3、使用HTTPGetAction进行健康检查
建立Pod配置文件
使用nginx容器,将健康探针设置为定时发送HTTP请求到一个服务器地址下的固定文件路径
apiVersion: v1
kind: Pod
metadata:
name: pod-with-healthcheck
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: Never
ports:
- containerPort: 80
livenessProbe:
httpGet:
path: /_status/healthz
port: 80
initialDelaySeconds: 30
timeoutSeconds: 1
建立Pod,查看运行日志,由于服务器路径下没有想要的路径,容器被重启
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling <unknown> default-scheduler 0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.
Normal Scheduled <unknown> default-scheduler Successfully assigned default/pod-with-healthcheck to xu.node1
Normal Pulled 14s (x2 over 70s) kubelet, xu.node1 Container image "nginx" already present on machine
Normal Created 14s (x2 over 70s) kubelet, xu.node1 Created container nginx
Normal Started 14s (x2 over 70s) kubelet, xu.node1 Started container nginx
Warning Unhealthy 14s (x3 over 34s) kubelet, xu.node1 Liveness probe failed: HTTP probe failed with statuscode: 404
Normal Killing 14s kubelet, xu.node1 Container nginx failed liveness probe, will be restarted