Article Directory
1. Summary of HAProxy installation problems
1.1 Binding VIP failed to start
- Description of the problem The binding vip is configured in
, and an error message of binding failure appears when the haproxy service is started./etc/haproxy/haproxy.cfg
- Solution
Modify the configuration file/etc/sysctl.conf
and add the following content:
net.ipv4.ip_nonlocal_bind=1
Then let the variable take effect
sysctl -p
Then start haproxy again
systemctl restart haproxy
2. ETCD installation problem summary cluster fails to start
2.1 etcd node restart failed
- Problem Description
The node cannot be started due to a node data failure. The etcdctl and kubectl client tools cannot be used, and both request timeout, and the etcd service of the current node is directly restarted, and an exception occurs. - Solution
Step 1: Delete the data directory of the node etcd,ETCD_DATA_DIR
the variable value is the data storage directory
Step 2: Set ETCD_INITIAL_CLUSTER_STATE="existing"
Step 3: Restart the node,systemctl restart etcd
3. Summary of Kubernetes installation problems
3.1 namespace cannot be deleted
- Problem description
The status of the namespace is Terminating, so resources cannot be deployed or deleted in this namespace. - Solution
cd /opt
kubectl get namespace 命名空间 -o json > 命名空间.json
Modify the content in the namespace.json, spec
delete status
the corresponding values of and , and then execute the following command:
kubectl proxy --port=9988 &
curl -k -H "Content-Type: application/json" -X PUT --data-binary @命名空间.json 127.0.0.1:9988/api/v1/namespaces/${命名空间}/finalize
3.2 A large number of Pods are in Terminating state
- Problem Description
Due to some node failures, a large number of Pods are in Terminating state
istio-system jaeger-5994d55ffc-nmhq6 0/1 Terminating 0 13h
istio-system jaeger-5994d55ffc-pjj5m 0/1 Terminating 0 11h
istio-system kiali-64df7bf7cc-29kxl 0/1 Terminating 0 12h
istio-system kiali-64df7bf7cc-2bk77 0/1 Terminating 0 11h
istio-system kiali-64df7bf7cc-4wwhg 0/1 Terminating 0 14h
istio-system kiali-64df7bf7cc-8cfsh 0/1 Terminating 0 13h
istio-system kiali-64df7bf7cc-dks5w 0/1 Terminating 0 15h
istio-system kiali-64df7bf7cc-dkzgc 0/1 Terminating 0 15h
- Solution
kubectl get pods -n 命名空间 | grep Terminating | awk '{print $1}' | xargs kubectl delete pod -n 命名空间 --force --grace-period=0
If this happens to a large number of Pods, scripts can be written to execute them periodically.
3.3 Pod logs cannot be viewed
- Problem description The following error message is prompted when viewing
the log using :kubectl logs -f PodName
Error from server (Forbidden): Forbidden (user=kubernetes, verb=get, resource=nodes, subresource=proxy)
- Solution
kubectl create clusterrolebinding kube-apiserver:kubelet-apis --clusterrole=system:kubelet-api-admin --user kubernetes
3.4 Pod container initialization failed
- Problem Description
Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.6":
Error response from daemon: Get https://k8s.gcr.io/v2/:
net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
- Solution
Due to network access restrictions, under normal circumstances, the mirror in the registry.k8s.io domain name cannot be accessed, so after downloading through the domestic mirror warehouse, use the tag command to rename.
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6 registry.k8s.io/pause:3.6
3.5 Pods are evicted
-
Problem description
A large number of Pod status is Evicted -
Solution
Delete the Pod whose state is Evicted
kubectl get pods -A| grep Evicted | awk '{print $2}' | xargs kubectl delete pod -n <namespace>
3.6 node node error
- Problem Description
"Error syncing pod, skipping" err="network is not ready: container runtime network not ready
- Solution
The installation of the Kubernetes CNI network plug-in fails. If the CNI is Calico, please confirm whether the calico-node is started successfully.
3.7 View kubelet logs
journalctl -u kubelet --since today |less
3.8 The master node cannot be scheduled
- Problem Description
0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }
The k8s node cannot be scheduled, use the kubectl tool to view the node status
kubectl get nodes -o wide
The displayed results are as follows:
NAME STATUS ROLES AGE VERSION
k8s-master1 NotReady,SchedulingDisabled <none> 43h v1.24.2
k8s-master2 Ready <none> 4d6h v1.24.2
k8s-node1 NotReady,SchedulingDisabled <none> 44h v1.24.2
- Solution
# 禁止调度
kubectl cordon 节点名称
# 解除禁用
kubectl uncordon 节点名称
4. Summary of Calico installation problems
4.1 Access Timeout between nodes
- Problem Description
cni-installer/<nil> <nil>: Unable to create token for CNI kubeconfig error=Post
"https://10.255.0.1:443/api/v1/namespaces/kube-system/serviceaccounts/calico-node/token":
dial tcp 10.255.0.1:443: i/o timeout
- Solution
Check whether the cluster network CNI plug-in is normal, such as whether calico-node can be started normally. Check whether--service-cluster-ip-range
the and--cluster-cidr
overlap, causing the cluster to become a stand-alone environment.
4.2 Calico-node Pod failed to start
- Problem Description
calico-node启动失败,在事件信息中出现如下:
Back-off restarting failed container
invalid capacity 0 on image filesystem
Node k8s-node2 status is now: NodeHasNoDiskPressure
Updated Node Allocatable limit across pods
Node k8s-node2 status is now: NodeHasSufficientPID
- Analysis log
#: kubectl logs -n kube-system calico-node-wzq2p -c install-cni
#: kubectl describe pod calico-node-wzq2p -n kube-system
#: journalctl -u kubelet -f
Insufficient disk space does not necessarily mean that the calico-node cannot be started due to insufficient disk space. You need to check the specific log information. You can use kubectl logs -n kube-system calico-node-wzq2p -c install-cni
to view the specific error message, and then analyze the problem. The author always thought that calico-node could not be started due to insufficient disk space. Later, after checking the detailed logs, it was found that the configuration parameters kube-proxy
of --cluster-cidr
did not match kube-controller-manager
those setkube-apiserver
in .--service-cluster-ip-range
5. Summary of CoreDNS installation problems
5.1 DNS domain name service IP address adjustment
- Problem Description
The default configuration of CoreDNS is inconsistent with the current kubernetes cluster configuration. Kubelet startup parametersclusterDNS
depend on IP address of the CoreDNS domain name service.clusterDNS
When the parameters set when the Kubelet service starts are inconsistent with the IP address of the domain name service set by CoreDNS deployment, service access will time out. - Solution
Execute the following command to specify the Service IP address of the DNS domain name service
cd /opt
git clone https://github.com/coredns/deployment
cd /opt/deployment/kubernetes
./deploy.sh -r 10.255.0.0/16 -i 10.255.0.2 > coredns.yaml
kubectl apply -f coredns.yaml
In the above configuration -i
is to set the DNS domain name service IP address.
6. Summary of Istio installation issues
6.1 Kiali cannot connect to Istiod
- Problem Description
unable to proxy Istiod pods.
Make sure your Kubernetes API server has access to the Istio control plane through 8080 port
- Solution
yum install socat -y
6.2 Istio Ingress modify network type
- Problem Description
By default, Istio Ingress uses LoadBalancer - Solution
kubectl patch svc -n istio-ingress istio-ingress -p '{"spec": {"type": "NodePort"}}'
6.3 istio disable egress restriction
helm upgrade --set meshConfig.outboundTrafficPolicy.mode=REGISTRY_ONLY istiod istio/istiod -n istio-system