[k8s 오류 해결 시리즈] 오류: 샌드박스 컨테이너 작업을 가져오지 못했습니다: 실행 중인 작업이 없습니다: 작업 xxx를 찾을 수 없습니다.
기사 디렉토리
문제 현상
호스트에 노드 node가 있고 가끔 시작된 Pod 중 하나가 정상이거나 모두 비정상인 경우가 있습니다.
포드 정보를 봅니다. 모두 NotReady입니다.
kubelet의 로그를 확인하고 이상한 오류를 보고하세요.
Dec 09 12:33:10 n11.dcos kubelet[1107062]: I1209 12:33:10.817133 1107062 kubelet.go:2110] "SyncLoop (PLEG): event for pod" pod="kube-system/coredns-55d6db8d84-jnljp" event=&{
ID:695d2078-8243-4025-8324-1f54195ea095 Type:ContainerDied Data:5f4e3353f3c396228e993c10d8bba8894a0a68f53987aa2aa534132e4379ca39}
Dec 09 12:33:10 n11.dcos kubelet[1107062]: I1209 12:33:10.817164 1107062 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="5f4e3353f3c396228e993c10d8bba8894a0a68f53987aa2aa534132e4379ca39"
Dec 09 12:33:10 n11.dcos kubelet[1107062]: I1209 12:33:10.879058 1107062 kuberuntime_manager.go:488] "No ready sandbox for pod can be found. Need to start a new one" pod="kube-system/coredns-55d6db8d84-jnljp"
Dec 09 12:33:14 n11.dcos kubelet[1107062]: E1209 12:33:14.684002 1107062 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"nettest\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=nettest pod=net-test-87478686f-4qvk7_default(ba206940-0733-4e8e-a09c-d28387e46abf)\"" pod="default/net-test-87478686f-4qvk7" podUID=ba206940-0733-4e8e-a09c-d28387e46abf
Dec 09 12:33:14 n11.dcos kubelet[1107062]: I1209 12:33:14.844797 1107062 kubelet.go:2110] "SyncLoop (PLEG): event for pod" pod="default/net-test-87478686f-4qvk7" event=&{
ID:ba206940-0733-4e8e-a09c-d28387e46abf Type:ContainerStarted Data:34ef1dc4b2394818a7a843a8655af5444af0766083fe72a53204d9e554ed30c7}
Dec 09 12:33:14 n11.dcos kubelet[1107062]: I1209 12:33:14.845095 1107062 scope.go:110] "RemoveContainer" containerID="797fdf4f559d1079c4e38bc6da7dbc3d547a8f41aa9da86e80d12934326b9507"
Dec 09 12:33:14 n11.dcos kubelet[1107062]: E1209 12:33:14.845294 1107062 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"nettest\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=nettest pod=net-test-87478686f-4qvk7_default(ba206940-0733-4e8e-a09c-d28387e46abf)\"" pod="default/net-test-87478686f-4qvk7" podUID=ba206940-0733-4e8e-a09c-d28387e46abf
Dec 09 12:33:15 n11.dcos kubelet[1107062]: I1209 12:33:15.285417 1107062 kubelet_getters.go:300] "Path does not exist" path="/vdata/kubelet/cuk-tgops-214613666997-m19n11anwy/pods/606fc9e3-624e-41a4-8fd5-62138bedd0d0/volumes"
Dec 09 12:33:15 n11.dcos kubelet[1107062]: I1209 12:33:15.849072 1107062 scope.go:110] "RemoveContainer" containerID="797fdf4f559d1079c4e38bc6da7dbc3d547a8f41aa9da86e80d12934326b9507"
Dec 09 12:33:15 n11.dcos kubelet[1107062]: E1209 12:33:15.849298 1107062 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"nettest\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=nettest pod=net-test-87478686f-4qvk7_default(ba206940-0733-4e8e-a09c-d28387e46abf)\"" pod="default/net-test-87478686f-4qvk7" podUID=ba206940-0733-4e8e-a09c-d28387e46abf
Dec 09 12:33:16 n11.dcos kubelet[1107062]: I1209 12:33:16.856556 1107062 kubelet.go:2110] "SyncLoop (PLEG): event for pod" pod="default/net-test-87478686f-4qvk7" event=&{
ID:ba206940-0733-4e8e-a09c-d28387e46abf Type:ContainerDied Data:34ef1dc4b2394818a7a843a8655af5444af0766083fe72a53204d9e554ed30c7}
Dec 09 12:33:16 n11.dcos kubelet[1107062]: I1209 12:33:16.856585 1107062 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="34ef1dc4b2394818a7a843a8655af5444af0766083fe72a53204d9e554ed30c7"
Dec 09 12:33:16 n11.dcos kubelet[1107062]: I1209 12:33:16.887999 1107062 kuberuntime_manager.go:488] "No ready sandbox for pod can be found. Need to start a new one" pod="default/net-test-87478686f-4qvk7"
Dec 09 12:33:21 n11.dcos kubelet[1107062]: I1209 12:33:21.285448 1107062 kubelet_getters.go:300] "Path does not exist" path="/vdata/kubelet/cuk-tgops-214613666997-m19n11anwy/pods/a53c22da-66e3-4ca0-aa58-70333f97b1f4/volumes"
Dec 09 12:33:36 n11.dcos kubelet[1107062]: E1209 12:33:36.075809 1107062 remote_runtime.go:248] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"8c9a835ce06133da9af68d2e09fa2c429ebe292d4273b44937300b432c117c79\": not found" podSandbox ID="8c9a835ce06133da9af68d2e09fa2c429ebe292d4273b44937300b432c117c79"
Dec 09 12:33:36 n11.dcos kubelet[1107062]: E1209 12:33:36.076142 1107062 remote_runtime.go:248] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"7e7d0bfc388bbd1afbaee350c6ee443ebc77ec0fbd47747743cf0b80317fe9a3\": not found" podSandbox ID="7e7d0bfc388bbd1afbaee350c6ee443ebc77ec0fbd47747743cf0b80317fe9a3"
Dec 09 12:33:36 n11.dcos kubelet[1107062]: E1209 12:33:36.076428 1107062 remote_runtime.go:248] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"ae37e7198949ba4fefe79e0c43cba6d7e2e9e86bfed45315d590654771af550c\": not found" podSandbox ID="ae37e7198949ba4fefe79e0c43cba6d7e2e9e86bfed45315d590654771af550c"
Dec 09 12:33:38 n11.dcos kubelet[1107062]: E1209 12:33:38.225891 1107062 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"nginx\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=nginx pod=nginx-deployment-7c44d8d97c-xmc4d_default(29ca696f-a456-4854-be71-f29850e8c155)\"" pod="default/nginx-deployment-7c44d8d97c-xmc4d" podUID=29ca696f-a456-4854-be71-f29850e8c155
Dec 09 12:33:38 n11.dcos kubelet[1107062]: I1209 12:33:38.955676 1107062 kubelet.go:2110] "SyncLoop (PLEG): event for pod" pod="default/nginx-deployment-7c44d8d97c-xmc4d" event=&{
ID:29ca696f-a456-4854-be71-f29850e8c155 Type:ContainerStarted Data:ab57bd5b5ab16ef80b46e9e6cbec947016c90893c7fb1d90cf3b9ba535c8e588}
Dec 09 12:33:38 n11.dcos kubelet[1107062]: I1209 12:33:38.956127 1107062 scope.go:110] "RemoveContainer" containerID="54a5a0bab215c7b30e34826f78a3027deeb26ea0e7384772f0fc149ca5184ab5"
Dec 09 12:33:38 n11.dcos kubelet[1107062]: E1209 12:33:38.956460 1107062 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"nginx\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=nginx pod=nginx-deployment-7c44d8d97c-xmc4d_default(29ca696f-a456-4854-be71-f29850e8c155)\"" pod="default/nginx-deployment-7c44d8d97c-xmc4d" podUID=29ca696f-a456-4854-be71-f29850e8c155
Dec 09 12:33:39 n11.dcos kubelet[1107062]: I1209 12:33:39.959295 1107062 scope.go:110] "RemoveContainer" containerID="54a5a0bab215c7b30e34826f78a3027deeb26ea0e7384772f0fc149ca5184ab5"
Dec 09 12:33:39 n11.dcos kubelet[1107062]: E1209 12:33:39.959503 1107062 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"nginx\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=nginx pod=nginx-deployment-7c44d8d97c-xmc4d_default(29ca696f-a456-4854-be71-
이벤트 이벤트 보기
kubectl get event
주요 오류 메시지는 다음과 같습니다.
34m Warning Failed pod/buxybox-test-vd9lp Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: read init-p: connection reset by peer: unknown
31m Warning FailedCreatePodSandBox pod/net-test-87478686f-2ts26 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "bd6fef3750ad1321a556bebeb4b86ee3202ecb8760f3a05598e7ee1f395dfc9e": plugin type="cucni" name="cucni" failed (add): cni add error; network not ready after 30s
33m Warning Failed pod/net-test-87478686f-st87d Error: failed to start containerd task "nettest": OCI runtime start failed: container process is already dead: unknown
33m Warning Failed pod/nginx-deployment-7c44d8d97c-2h667 Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to create new parent process: namespace path: lstat /proc/2732734/ns/ipc: no such file or directory: unknown
33m Warning Failed pod/nginx-deployment-7c44d8d97c-g47d8 Error: failed to get sandbox container task: no running task found: task 06a7363fcd6b00363e03ee6c581f28e2331c7fac3ef80fb9a455451e65567f7c not found: not found
해결책
나중에 호스트에 2개의 컨테이너d와 2개의 kubelet이 있어 충돌이 발생한 것으로 확인되었으며, 나머지 컨테이너d와 kubelet을 삭제하면 문제가 없습니다.
두 번째 검증
호스트에서 두 클러스터의 노드 노드 프로세스(kubelet 및 Containerd)를 동시에 시작합니다.
첫 번째 클러스터가 시작되었을 때 모든 것이 정상이었지만
두 번째 클러스터를 시작한 후 두 번째 클러스터의 Pod를 시작할 수 없는 것으로 나타났습니다.
돌아가서 첫 번째 클러스터를 보면 모두 다운되었습니다.
2m23s Normal Scheduled pod/buxybox-test-jf5c5 Successfully assigned default/buxybox-test-jf5c5 to node0
93s Warning FailedCreatePodSandBox pod/buxybox-test-jf5c5 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "34d367de3caa7eac610c5b4d119406764ffcbe7ce2001de91cc27a362d3b6e94": plugin type="cucni" name="cucni" failed (add): cni add error; network not ready after 30s
67s Warning FailedCreatePodSandBox pod/buxybox-test-jf5c5 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "46dc633fb32aed71d127b78eccc5af35842d81564ba24c713aa29708dff89d61": OCI runtime start failed: container process is already dead: unknown
40s Warning FailedCreatePodSandBox pod/buxybox-test-jf5c5 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "f384ef4587957920555bab2d5ecbe91f02526bbf88a969ae60fc8d7aacdc7b16": cannot start a stopped process: unknown
2m
2m7s Warning FailedCreatePodSandBox pod/net-test-87478686f-l5f5n Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "0d989f3e527299314aa0f11c073767ff1fc700929a0d9af7367d25afc8312e09": plugin type="cucni" name="cucni" failed (add): error; Link not found
94s Warning FailedCreatePodSandBox pod/net-test-87478686f-l9qdz Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4fa3d5d807cbb0c2be6bb6e6cb13a8ec15ecef124b19b9363507d5e40c8118a5": plugin type="cucni" name="cucni" failed (add): cni add error; network not ready after 30s
2m14s Warning Failed pod/net-test-87478686f-rr2dv Error: failed to get sandbox container task: no running task found: task 947dd55920aa49d9544b7867703242cb3b294fec317de4cafa247377ed1d34aa not found: not found
7m29s Warning Failed pod/net-test-87478686f-klmzs Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: %!w(<nil>): unknown
따라서 하나의 호스트가 두 클러스터의 노드 노드로 사용되어 충돌이 발생했다는 결론을 내릴 수 있습니다~