本文目的是完成部署kubernestes流程以便之后能更快部署。
参考连接:https://blog.csdn.net/subfate/article/details/103774072
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
大部分参考第一篇博客。
一、环境
两台ubuntu 16.04 64 bit,2GB内存,双核 CPU。
环境要求和设置:
两主机,一为 master,一为 node。master 主机名称为 ubuntu。node 主机名称为 node。操作系统的主机名称要确保不同。
注意,k8s要求机器的CPU必须双核心以上。
所有操作使用 root 权限执行(注:理论上普通用户亦可,为避免权限问题,故出此下策)。
二、安装docker
# apt install docker.io
执行如下命令新建 /etc/docker/daemon.json 文件:
cat > /etc/docker/daemon.json <<-EOF { "registry-mirrors": [ "https://a8qh6yqv.mirror.aliyuncs.com", "http://hub-mirror.c.163.com" ], "exec-opts": ["native.cgroupdriver=systemd"] } EOF
registry-mirrors 为镜像加速器地址。
native.cgroupdriver=systemd 表示使用的 cgroup 驱动为 systemd(k8s 使用此方式),默认为 cgroupfs。修改原因是 kubeadm.conf 中修改k8s的驱动方式不成功。
重启docker,查看 cgroup:
# systemctl restart docker # docker info | grep -i cgroup Cgroup Driver: systemd
三、部署k8s主机
3.1、关闭swap
# swapoff -a
3.2、添加国内k8s源(这里使用阿里云镜像)
# cat <<EOF > /etc/apt/sources.list.d/kubernetes.list deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main EOF
3.3、更新源
# apt-get update
添加源之后,使用 apt update 命令会出现错误,原因是缺少相应的key,依次添加相应的key,可以通过下面命令添加(假设其中一个提示缺少key:E084DAB9 ):
# gpg --keyserver keyserver.ubuntu.com --recv-keys E084DAB9
# gpg --export --armor E084DAB9 | apt-key add -
安装 kubeadm、kubectl、kubelet、kubernetes-cni 等工具。
# apt-get install -y kubeadm kubectl kubelet kubernetes-cni
注1:安装 kubeadm 会自动安装 kubectl、kubelet 和 kubernetes-cni,故只指定 kubeadm 亦可。
3.4 获取部署所需的镜像版本并安装
# kubeadm config images list
W0330 10:07:27.234362 12944 version.go:101] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get https://dl.k8s.io/release/stable-1.txt: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
W0330 10:07:27.234404 12944 version.go:102] falling back to the local client version: v1.17.3
W0330 10:07:27.234491 12944 validation.go:28] Cannot validate kube-proxy config - no validator is available
W0330 10:07:27.234496 12944 validation.go:28] Cannot validate kubelet config - no validator is available
k8s.gcr.io/kube-apiserver:v1.17.3
k8s.gcr.io/kube-controller-manager:v1.17.3
k8s.gcr.io/kube-scheduler:v1.17.3
k8s.gcr.io/kube-proxy:v1.17.3
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.4.3-0
k8s.gcr.io/coredns:1.6.5
去除"k8s.gcr.io/"的前缀,版本换成kubeadm config images list命令获取到的版本
# images=( kube-apiserver:v1.17.3 kube-controller-manager:v1.17.3 kube-scheduler:v1.17.3 kube-proxy:v1.17.3 pause:3.1 etcd:3.4.3-0 coredns:1.6.5 )
# for imageName in ${images[@]} ; do docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/$imageName docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/$imageName k8s.gcr.io/$imageName docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/$imageName done
3.5 下载flannel镜像
# docker pull quay.io/coreos/flannel:v0.11.0-amd64
3.6 初始化
# kubeadm init \ --apiserver-advertise-address=192.168.2.190 \ --image-repository registry.aliyuncs.com/google_containers \ --kubernetes-version v1.17.3 \--pod-network-cidr=10.244.0.0/16
释义:
--pod-network-cidr 指定了网络段,后续网络插件会使用到(本文使用 flannel,默认10.244.0.0/16)。
--image-repository 指定了镜像地址,默认为 k8s.gcr.io,此处指定为阿里云镜像地址 registry.aliyuncs.com/google_containers。(没有该参数时使用默认值)
--kubernetes-versionr如果安装有其他版本可以指定具体版本号,(没有该参数时使用默认值)
注意,其它参数默认。
都是用默认值的启动方式:
# kubeadm init --pod-network-cidr=10.244.0.0/16
初始化信息如下:
W0330 09:34:29.486559 10623 validation.go:28] Cannot validate kube-proxy config - no validator is available W0330 09:34:29.486587 10623 validation.go:28] Cannot validate kubelet config - no validator is available [init] Using Kubernetes version: v1.17.3 [preflight] Running pre-flight checks [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Starting the kubelet [certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Generating "ca" certificate and key [certs] Generating "apiserver" certificate and key [certs] apiserver serving cert is signed for DNS names [dyan-desktop kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.2.190] [certs] Generating "apiserver-kubelet-client" certificate and key [certs] Generating "front-proxy-ca" certificate and key [certs] Generating "front-proxy-client" certificate and key [certs] Generating "etcd/ca" certificate and key [certs] Generating "etcd/server" certificate and key [certs] etcd/server serving cert is signed for DNS names [dyan-desktop localhost] and IPs [192.168.2.190 127.0.0.1 ::1] [certs] Generating "etcd/peer" certificate and key [certs] etcd/peer serving cert is signed for DNS names [dyan-desktop localhost] and IPs [192.168.2.190 127.0.0.1 ::1] [certs] Generating "etcd/healthcheck-client" certificate and key [certs] Generating "apiserver-etcd-client" certificate and key [certs] Generating "sa" key and public key [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "kubelet.conf" kubeconfig file [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" W0330 09:34:33.826116 10623 manifests.go:214] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC" [control-plane] Creating static Pod manifest for "kube-scheduler" W0330 09:34:33.827805 10623 manifests.go:214] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC" [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests" [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s [apiclient] All control plane components are healthy after 26.526203 seconds [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace [kubelet] Creating a ConfigMap "kubelet-config-1.17" in namespace kube-system with the configuration for the kubelets in the cluster [upload-certs] Skipping phase. Please see --upload-certs [mark-control-plane] Marking the node dyan-desktop as control-plane by adding the label "node-role.kubernetes.io/master=''" [mark-control-plane] Marking the node dyan-desktop as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule] [bootstrap-token] Using token: akhepb.ngrxfnvs04qvjfg6 [bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials [bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token [bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster [bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace [kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key [addons] Applied essential addon: CoreDNS [addons] Applied essential addon: kube-proxy Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ Then you can join any number of worker nodes by running the following on each as root: kubeadm join 192.168.2.190:6443 --token akhepb.ngrxfnvs04qvjfg6 \ --discovery-token-ca-cert-hash sha256:86ee3f4483db21166b18eb733ff812c2305cbdd63037eb5ba6259824f1ba1d9d
根据提示,根据拷贝 admin.conf 文件到当前用户相应目录下。admin.conf 文件后续会使用到(需要拷贝到 node 节点)。
$ mkdir -p $HOME/.kube $ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config $ sudo chown $(id -u):$(id -g) $HOME/.kube/config
如果后续忘记node加入集群的命令,可 kubeadm token create --print-join-command 查看,示例如下:
# kubeadm token create --print-join-command W0330 11:08:57.802478 15879 validation.go:28] Cannot validate kube-proxy config - no validator is available W0330 11:08:57.802505 15879 validation.go:28] Cannot validate kubelet config - no validator is available kubeadm join 192.168.2.190:6443 --token 1vtzxw.9ypai6p4y0jkp1oz --discovery-token-ca-cert-hash sha256:86ee3f4483db21166b18eb733ff812c2305cbdd63037eb5ba6259824f1ba1d9d
此时 pod 状态如下:
# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-6955765f44-bt9xj 0/1 Pending 0 38s coredns-6955765f44-fpp2m 0/1 Pending 0 38s etcd-dyan-desktop 1/1 Running 0 51s kube-apiserver-dyan-desktop 1/1 Running 0 51s kube-controller-manager-dyan-desktop 1/1 Running 0 51s kube-proxy-778f2 1/1 Running 0 38s kube-scheduler-dyan-desktop 1/1 Running 0 51s
除 coredns 状态为 Pending外,其它 pod 均运行。这是因为没有部署网络插件导致的。本文选用 flannel 。
3.7 部署flannel
执行如下命令部署 flannel:
# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
使用 flannel 仓库的 kube-flannel.yml 文件部署。详细可参考该文件。
如果无法访问,则可手动下载 https://github.com/coreos/flannel/blob/master/Documentation/kube-flannel.yml 文件到当前目录,再执行 kubectl apply -f kube-flannel.yml
命令。
# kubectl get pods --all-namespaces kube-system coredns-6955765f44-bt9xj 0/1 CrashLoopBackOff 39 4h57m kube-system coredns-6955765f44-fpp2m 0/1 CrashLoopBackOff 39 4h57m kube-system etcd-dyan-desktop 1/1 Running 0 4h57m kube-system kube-apiserver-dyan-desktop 1/1 Running 0 4h57m kube-system kube-controller-manager-dyan-desktop 1/1 Running 0 4h57m kube-system kube-flannel-ds-amd64-v8frb 1/1 Running 0 177m kube-system kube-proxy-778f2 1/1 Running 0 4h57m kube-system kube-scheduler-dyan-desktop 1/1 Running 0 4h57m
查看该 pod 日志:
# kubectl logs coredns-6955765f44-bt9xj -n kube-system .:53 [INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7 CoreDNS-1.6.5 linux/amd64, go1.13.4, c2fd1b2 [FATAL] plugin/loop: Loop (127.0.0.1:51523 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 7486309701405418795.2886934462876536096."
原因是 coredns 的域名解析有问题。修改 coredns 的 ConfigMap:
kubectl edit cm coredns -n kube-system
默认使用VIM编辑,删除 loop 字段的那一行(用dd命令)。再输入 :wq
保存退出。
coredns ConfigMap内容如下:
# Please edit the object below. Lines beginning with a '#' will be ignored, # and an empty file will abort the edit. If an error occurs while saving this file will be # reopened with the relevant failures. # apiVersion: v1 data: Corefile: | .:53 { errors health { lameduck 5s } ready kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure fallthrough in-addr.arpa ip6.arpa ttl 30 } prometheus :9153 forward . /etc/resolv.conf cache 30 loop reload loadbalance } kind: ConfigMap metadata: creationTimestamp: "2020-03-30T01:35:01Z" name: coredns namespace: kube-system resourceVersion: "187" selfLink: /api/v1/namespaces/kube-system/configmaps/coredns uid: 9cfc6357-83b6-4233-8fb8-4961594f6a6b
删除出问题的所有的 coredns:
# kubectl delete pod coredns-6955765f44-bt9xj coredns-6955765f44-fpp2m -n kube-system pod "coredns-6955765f44-bt9xj" deleted pod "coredns-6955765f44-fpp2m" deleted
删除后,coredns 会自动重启。再查看 pod:
# kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-6955765f44-jlkft 1/1 Running 0 42s kube-system coredns-6955765f44-wkq5r 1/1 Running 0 42s kube-system etcd-dyan-desktop 1/1 Running 0 5h54m kube-system kube-apiserver-dyan-desktop 1/1 Running 0 5h54m kube-system kube-controller-manager-dyan-desktop 1/1 Running 0 5h54m kube-system kube-flannel-ds-amd64-v8frb 1/1 Running 0 3h54m kube-system kube-proxy-778f2 1/1 Running 0 5h54m kube-system kube-scheduler-dyan-desktop 1/1 Running 0 5h54m
全部 pod 已全部运行。
注:也可以先修改 ConfigMap,再部署 flannel。
至此,master 节点已部署成功。
四、node节点
4.1 前置条件
在 node 节点上操作。
1、安装kubeadm,见前述。
2、下载flannel镜像,见前述(如果不预先下载,在加入集群时会自动下载)。
3、将主机的 /etc/kubernetes/admin.conf 文件拷贝到 node 节点的 /etc/kubernetes/ 目录。(注:在 master 节点使用 scp 命令即可,kubernetes 不存在自行创建)
$ scp $HOME/.kube/config kevin@192.168.2.69:/home/kevin/kube.config kevin@192.168.2.69's password: config
然后就可以在节点上通过这个config访问集群,比如
# kubectl --kubeconfig /home/kevin/kube.config get pods --all-namespaces
4.2 加入集群
在master上查看加入集群命令并发送到node节点机器:
# kubeadm token create --print-join-command > kube-join-command.token W0330 15:47:11.326973 17834 validation.go:28] Cannot validate kube-proxy config - no validator is available W0330 15:47:11.327001 17834 validation.go:28] Cannot validate kubelet config - no validator is available # scp kube-join-command.token kevin@192.168.2.69:/home/kevin/ kevin@192.168.2.69's password: kube-join-command.token
在node节点上,此时,k8s服务还没有启动。执行如下命令以加入节点,执行命令加入到集群:
# `cat kube-join-command.token`
如果之前node加入过其他集群,需要先用kubeadm reset重置。
五、验证
在master上查看节点情况,可以看到kevin这个node已经加入到集群:
# kubectl get node NAME STATUS ROLES AGE VERSION dyan-desktop Ready master 6h26m v1.17.3 kevin-lenovo-tianyi-310-15ikb Ready <none> 3m14s v1.17.3
使用 busybox 镜像简单测试 pod。在 master 节点执行:
# kubectl run -i --tty busybox --image=latelee/busybox --restart=Never -- sh
稍等片刻,即可进入 busybox 命令行
master上另起一个终端,查看 pod 运行状态:
# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES busybox 1/1 Running 0 91s 10.244.2.2 kevin-lenovo-tianyi-310-15ikb <none> <none>
可以看到 pod 为 Running 状态,运行在 node 上。
在 node 节点上查看:
# docker ps | grep busybox 8cb2f2a1a24d latelee/busybox "sh" 5 minutes ago Up 5 minutes k8s_busybox_busybox_default_aa01c6a5-c12d-4f68-b014-ef338efcb974_0 c9618eaa84d6 k8s.gcr.io/pause:3.1 "/pause" 7 minutes ago Up 7 minutes k8s_POD_busybox_default_aa01c6a5-c12d-4f68-b014-ef338efcb974_0
此时在 master 节点退出 busybox, pod 依旧存在,但不是 READY 状态,node 主机也没有 busybox 容器运行。
验证通过,k8s部署成功。
六、其它
6.1、重置k8s
# kubeadm reset
执行如下命令清除目录、删除网络设备:
rm -rf $HOME/.kube/config rm -rf /var/lib/cni/ rm -rf /var/lib/kubelet/* rm -rf /etc/kubernetes/ rm -rf /etc/cni/ ifconfig cni0 down ifconfig flannel.1 down ip link delete cni0 ip link delete flannel.1
6.2、节点机器退出
在 master 上执行:
1、退出节点:
# kubectl drain kevin-lenovo-tianyi-310-15ikb node/kevin-lenovo-tianyi-310-15ikb cordoned evicting pod "busybox" pod/busybox evicted node/kevin-lenovo-tianyi-310-15ikb evicted
再查看节点:
# kubectl get node NAME STATUS ROLES AGE VERSION dyan-desktop Ready master 6h52m v1.17.3 kevin-lenovo-tianyi-310-15ikb Ready,SchedulingDisabled <none> 29m v1.17.3
kevin节点已经变成不可调度状态了,但还是保持 Ready 状态(因为原本就是此状态)。可以理解为“禁止该节点的使用”。可以使用uncordon让节点重新变得可用:
# kubectl uncordon kevin-lenovo-tianyi-310-15ikb node/kevin-lenovo-tianyi-310-15ikb uncordoned
# kubectl get node
NAME STATUS ROLES AGE VERSION
dyan-desktop Ready master 25h v1.17.3
kevin-lenovo-tianyi-310-15ikb Ready <none> 19h v1.17.3
2、删除节点:
# kubectl delete node kevin-lenovo-tianyi-310-15ikb node "kevin-lenovo-tianyi-310-15ikb" deleted
再查看已无 node 节点。
此时 node 节点的 flannel、kube-proxy没有在运行:
# ps aux | grep kube root 3269 1.6 4.3 754668 88712 ? Ssl Dec20 18:54 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.1 root 124216 0.0 0.0 14228 964 pts/0 R+ 00:49 0:00 grep --color=auto kube
在 node 上执行:
# kubeadm reset
执行如下命令清除目录、删除网络设备(注:与 master 有类似但又不同):
ifconfig cni0 down ip link delete cni0 ifconfig flannel.1 down ip link delete flannel.1 rm /var/lib/cni/ -rf rm /etc/kubernetes/ -rf rm /var/lib/kubelet/ -rf