This article provides targeted explanations and solutions for the problems and failures that occur during the deployment and maintenance of Docker containers. I hope it can help everyone quickly locate and solve similar problems and failures.
Docker is a relatively simple container to use. We can obtain information in the following ways:
1. Execute the command through docker run, and may return information
2. Obtain logs through docker logs and perform targeted filtering.
3. Check the docker service status through systemctl status docker
4. View the log through journalctl -u docker.service
The following is a collection of docker container problems and failures, divided into 9 categories:
1. Startup fault
1、docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Reason: Docker did not start normally
Solution: systemctl start docker
2、can't create unix socket /var/run/docker.sock: is a directory
Reason: docker.sock cannot be created
Solution: rm -rf /var/run/docker.sock
Then restart docker
3、Job for docker.service failed. Failed to start Docker Application
Cause: Caused by Selinux
Solution: /etc/sysconfig/selinux, change the selinux value to disabled
Restart docker to solve the problem
4、docker: Error response from daemon:
/var/lib/docker/overlay/XXXXXXXXXXXXXXXXXXXXXXX: no such file or directory.
Reason: docker did not specify a directory or file
Solution:
systemctl stop docker
rm -rf /var/lib/docker/*
systemctl start docker
Restart the run image to start the container
5、docker: Error response from daemon: Conflict. The container name "XXX" is already in use by container "XXX". You have to remove (or rename) that container to be able to reuse that name.
Reason: Docker name has the same name
Solution: Rename the container or delete and rebuild the container
6、Error: Connection activation failed: No suitable device found for this connection
Reason: Network card configuration problem
Solution: Restart the network card
7. Docker cannot start after system restart
The error reported is: docker0: iptables: No chain/target/match by that name
Reason: docker service iptables problem
Solution: Restart the docker service system restart docker
8、Error starting daemon: error initializing graphdriver: driver not supported
Error when starting docker daemon using overlay2 storage driver
Reason: daemon lacks configuration
Solution:
Add configuration:
/etc/docker/daemon.json
{"storage-driver": "overlay2",
"storage-opts": ["overlay2.override_kernel_check=true"]}
9、Failed to start docker.service: Unit docker.service is masked.
Unknown reason: docker is masked
Solution:
systemctl unmask docker.service
systemctl unmask docker.socket
systemctl start docker.service
10、Failed to start docker.service: Unit is not loaded properly: Invalid argument.
Unknown reason: docker service cannot load normally
Solution: Uninstall docker and delete docker.service
Reinstall docker
11. Docker-compose reports an error when starting the container:
/usr/lib/python2.7/site-packages/requests/init.py:80: RequestsDependencyWarning: urllib3 (1.22) or chardet (2.2.1) doesn't match a supported version! RequestsDependencyWarning)
Unknown reason: The corresponding component version of pip is not supported
Solution:
pip uninstall urllib3
pip uninstall chardet
pip install requests
12. Docker container restart failure
After killing the docker process, restart docker. The container in docker cannot start and reports an error
docker restart XXXXXXX Error response from daemon: Cannot restart container XXXXXXX: container "XXXXXXXXXXXXXXXX": already exists
Reason: The old container did not exit safely
Solution: docker-containerd-ctr --address /run/docker/containerd/docker-containerd.sock --namespace c rm <container hash_id>
docker start container
13. Docker restart error - the restart command keeps getting stuck
systemctl restart docker stuck
Unknown reason: There may be too many containers started, or disk IO problems
Solution:
systemctl start docker-cleanup.service
systemctl start docker
2. Error reporting on permission issues
14、Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock
Solution:
Check the user group where /var/run/docker.sock is located
Rejoin the user to the docker group, usermod -aG docker ${USER}
15、chown socket at step GROUP: No such process
Reason: Docker cannot find the Group group information. The docker group may be deleted accidentally.
Solution: groupadd docker
16、Post http:///var/run/docker.sock/v1.XXX /auth: dial unix /var/run/docker.sock: permission denied. Are you trying to connect to a TLS-enabled daemon without TLS?
Reason: When a non-Root user manages Docker, the permissions are insufficient.
Solution:
groupadd docker
usermod -a -G docker user
17. Docker commit image error
Error processing tar file(exit status 1): unexpected EOF
Reason: It may be caused by permission issues
Solution: chmod +x to add execution permissions
3. Error reporting on mirroring and warehouse issues
18、Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io
Reason: Docker repository cannot be accessed
Solution:
Modify the Docker warehouse source to a domestic or self-built warehouse source
Modify /etc/docker/daemon.json
19. Error when pushing local image
The push refers to a repository [XXXX] Get https://xxx/v1/_ping: http: server gave HTTP response to HTTPS client
Reason: Docker registry does not use https service
Solution:
/etc/docker/daemon.json file writes:
{ "insecure-registries":[""] }
20、/usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go: starting container process caused "exec: \"/bin/bash\": executable file not found in $PATH".
Reason: Docker image itself has problems or the Docker engine version is relatively low.
Solution: You can upgrade the Docker version service
21. When building an image, executing chown -R is very slow.
Reason: Docker uses a copy-on-write strategy, so when the chown command is executed, all upper-layer image files will be copied to the current layer, and then the permissions will be modified and then written to the file system.
Solution: You should not use commands such as chown -R that modify files in large batches.
22. Docker build reports an error when building the image:
Message from syslogd kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1
Reason: The docker engine version is too high
Solution: The docker engine version needs to match the kernel version of the docker internal image
23、docker: Error response from daemon: containerd: container did not start before the specified time-out.ERRO[0133] error getting events from daemon: context canceled
Reason: After modifying the docker root dir and restarting, an error occurs when downloading the image.
Solution: Restart the docker service or restart the server
4. Error reporting on resource issues
24、Docker no space left on device
Reason: Not enough space
Solution: Clean up space, delete unused containers, images and other resources
docker system prune -a
25. /var/lib/docker/containers takes up too much space
Reason: The log file takes up too much space
Solution:
cat /dev/null > *-json.log
or
Add dockerd startup parameters, /etc/docker/daemon.json
{"log-driver":"json-file",
"log-opts": {"max-size":"2G", "max-file":"10"}
26、max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
Reason: The default configuration of system parameters is too small
Solution: Modify vm.max_map_count in /etc/sysctl.conf and increase it
27、Got starting container process caused "process_linux.go:301:
running exec setns process for init caused \"exit status 40\"": unknown.
from time to time
Reason: It may be caused by cache problem
Solution: echo 1 > /proc/sys/vm/drop_caches
28. Docker starts multiple containers locally, causing subsequent container startup failures.
Reason: Check whether the hard disk space is full, if it is not caused by hard disk space problem
Solution:
vim /etc/sysctl.conf
Add parameter fs.aio-max-nr = 1048576
sysctl -p
29. Docker starts abnormally and the status restarts repeatedly.
Docker logs container name, view exception logs
View /var/log/messages
Reason: The memory is full, causing OOM
Solution: Release the memory and then start the container
5. Version incompatibility error report
30、overlayfs: Can't delete file moved from base layer to newly created dir even on ext4
Reason: Caused by compatibility issues between XFS and Overlay, the file system provided by Centos.
Solution: This problem is fixed in kernel 4.4.6 or above
31、docker: Error response from daemon: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:297: getting the final child's pid from pipe caused \"read init-p: connection reset by peer\"": unknown.
Reason: Docker version and operating system version do not match
Solution: Reinstall the docker version supported by the operating system kernel
6. Error reporting for network or port problems
32、WARNING: IPv4 forwarding is disabled. Networking will not work.
Reason: IPv4 network cannot forward
Solution:
/usr/lib/sysctl.d/00-system.conf
Add net.ipv4.ip_forward=1 in the last line
Restart the network service. Delete the wrong container and create a new one again
33、Creating network "xxxxxxx" with the default driver
Reason: docker gateway conflict
After starting the container and docker-compose starting the container, the network is disconnected.
Solution: Configure the network_mode: "bridge" configuration parameter for the started container in docker-compose.yml
34、Unable to find a node that satisfies the following conditions [port xxxx]
Reason: When the container uses port mapping (docker run -p xxxx:xxxx or in compose template
ports), the system will create a port on the host and access the specified port of the container through NAT. If the port on the host is occupied by a container or system process, port allocation will fail.
Solution: Clear the container or process occupying the port, or adjust the host port of the container port mapping to avoid conflicts.
35、Error response from daemon: service endpoint with name xxx already
Reason: The port is already occupied
Solution: Restart the docker container
36、docker: Error response from daemon: driver failed programming external connectivity on endpoint XXXXX: Bind for 0.0.0.0:80 failed: port is already allocated
Reason: Container port conflict
Solution: Change the host bound port
7. Docker installation error
37. When installing docker, it reports Requires: container-selinux >= 2.9
Reason: The container-selinux version is low or not installed.
Solution:
wget -O /etc/yum.repos.d/CentOS-Base.repo
http://mirrors.aliyun.com/repo/Centos-7.repo
yum install epel-release
yum makecache
yum install container-selinux
38. An error occurred when installing docker-compose.
“ImportError: 'module' object has no attribute 'check_specifier'”
Reason: setuptools version problem
Solution:
Upgrade setuptools to version 30.1.0 or above
pip install --upgrade setuptools
39. An error occurred when installing docker-compose.
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
Reason: python2.7 prompts for upgrade
Solution: pip install -i https://pypi.douban.com/simple docker-compose
8. Docker deletion error
40. Docker reports an error when deleting a container
Error response from daemon:Driver overlay failed to remove root filesystem xxxxx: remove/var/lib/docker/overlay2/xxxxx/merged: device or resource busy
Reason: The container mounts the data volume and cannot be deleted directly.
Solution:
grep docker /proc/*/mountinfo | grep xxxxx
After killing the process
Delete the container again
41. Error when deleting a container in dead status
Error response from daemon: Driver aufs failed to remove root filesystem XXXXXXXXXXXXXXXX: aufs: unmount error after retries: /var/lib/docker/aufs/mnt/xxxxxxxx: device or resource busy
Reason: The dead state container cannot be deleted and is still occupying resources.
Solution: docker rm -fv container id will be automatically deleted after a few minutes
42. Docker reports error when deleting image
Error response from daemon: conflict: unable to remove repository reference "XXXX" (must force) - container XXXX is using its referenced image YYYY
Reason: The image is being used by a container
Solution: You need to delete the relevant ID container before you can delete the image.
43. Docker reports error when deleting image
Error response from daemon: conflict: unable to delete XXXXXXXXXX (must be forced) - image is referenced in multiple repositories
Reason: The image login pushed to other remote repositories
Solution: If this image is not needed, docker rmi -f forcefully delete it
44. Docker reports error when deleting image
Error response from daemon: conflict: unable to delete XXX (cannot be forced) - image has dependent child images
Reason: There is a child mirror that depends on the parent mirror
Solution: Forcefully delete the image or delete the containers in batches, then delete the image
9. Other error reports
45、docker: Error response from daemon: driver failed programming external connectivity on end-point XXXXXXX: (iptables failed: iptables --wait -t filter -A DOCKER ! -i docker0 -o docker0 -p tcp -d 172.17.0.2 --dport 8080 -j ACCEPT: iptables: No chain/target/match by that name.
Reason: Caused by firewall problem
Solution: Turn off the firewall and restart docker
46. The following warning appears when executing docker info
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Reason: Caused by configuration issues, bridge-nf-call-iptables needs to be enabled
Solution:
vi /etc/sysctl.conf
Add the following
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-arptables = 1
47. Errors related to docker database
Using Docker to create a mysql container crashes
Database is uninitialized and password option is not specified
Solution: docker run -d -e MYSQL_ROOT_PASSWORD=[password] -p 3306:3306 mysql image
In order to avoid various strange and occasional problems, operation and maintenance personnel and developers should use docker containers in a standardized manner to avoid failures caused by improper use to the greatest extent. Please refer to the following:
Docker usage specification recommendations
1. Try to use the new stable docker version in the last 1-2 years
Don't install versions that are very old before this year. A large number of bugs have been solved by new version updates.
2. Try not to create very large images, such as 5G10G or above
The image should be as lightweight as possible and remove unnecessary software, data, etc.
3. Mount the host configuration in the container and use read-only
The container requires -v host configuration file, try to use ro read-only
4. The data must be mounted on the host’s physical hard disk or storage node.
Do not run directly in the container to avoid data loss caused by container downtime.
5. The application log must be hung on the host machine
Do not print directly into the container. Avoid viewing logs only in docker logs mode. Avoid going to the vulume directory to view logs.
6. Don’t just use the latest tag
Tags need to have a management standard, and you can find the corresponding version based on the tag.
7. Do not use the container IP, and do not hard-code it in the configuration (default 172.17.0.x)
After the container is restarted, the IP address is likely to change.
8. Try not to run multiple processes in a single container
Containers are not virtual machines. Try to have one container and one process.
9. Keep images consistent across environments
Whether it is testing, UAT, or production environment, try to keep the same image and do not change it. When changing the environment, you only need to change the environment variable parameters to distinguish it.
10. Be sure to monitor docker containers, even if problems are found
It is recommended to use prometheus to monitor the container
11. Be sure to limit the resources of the docker container
Especially the CPU, memory, hard disk space, and even the network, etc., to avoid invading the host's hardware resources.