Graphical kubernetes Pod creation process big secret
https://www.kubernetes.org.cn/6766.html
The creation of containers in kubernetes is undoubtedly a complicated process, involving the unified collaboration of various internal components, as well as docking with the external CRI runtime. This article attempts to explore the various details of the container creation process and understand its various component collaboration processes. Therefore, when there are problems in the follow-up, it may be possible to check the direction a bit.
1. Foundation construction
1.1 Container management thread model
1.2 Event-driven state final consistency
1.3 Component collaboration process
2.Kubelet create container process
2.1 Obtaining Pod for admission check
The event source of kubelet mainly includes two parts: static Pod and Apiserver, we only consider ordinary Pod here, will directly add Pod to PodManager for management, and conduct admission check
Admission check mainly includes two key controllers: eviction management and pre-selection check eviction management is mainly based on the current resource pressure to detect whether the corresponding Pod tolerates the current resource pressure; pre-selection check is based on the currently active container and the current node Information to check whether it meets the basic operating environment of the current Pod, such as affinity check, and if the current Pod has a particularly high priority or a static Pod, it will try to preempt resources for it, and it will be graded according to the QOS level. Preemption to meet its operating environment
2.2 Create event pipeline and container management main thread
When kubelet receives a newly created Pod, it will first create an event pipeline for it, and start a container-managed main thread to consume the events in the pipeline, and will wait for the latest event in the current kubelet based on the last synchronization time (from the local Obtained from podCache), if it is a newly created Pod, it is mainly through the update time operation in PLEG, and the default empty state of the broadcast is used as the latest state.
2.3 Synchronize the latest status
When the latest status information and the Pod information obtained from the event source are obtained from the local podCache, it will be updated in conjunction with the current status of the container in the Pod in the statusManager and probeManager to obtain the latest Pod status that is currently perceived
2.4 Access control inspection
The previous admission check is a check of the hard limits of the resources that the Pod runs, and the admission check here is a soft state that is the container runtime and version of some software operating environment checks. If the check fails here, it will talk about the corresponding container status Set to Blocked
2.5 Update container status
After passing the admission check, the statusManager will be called to synchronize the latest POd status, which may be synchronized to apiserver
2.6 Cgroup configuration
After the update is completed, a PodCOntainerManager will be launched.The main function is to update the Cgroup configuration for the corresponding Pod according to its QOS level.
2.7Pod basic operation environment preparation
Next, kubelet will prepare the basic environment for the creation of Pods, including the creation of Pod data directories, the acquisition of mirror keys, waiting for the completion of volume mounting, etc. The creation of Pod data directories is mainly to create Pods, plug-ins, Volume directory, and the key information will be generated through the Pod configured image pull key, and the work of creating the container by kubelet has been basically completed by this point.
3.ContainerRuntime
3.1 Calculate Pod container changes
The calculation container changes mainly include: whether the sandbox of the Pod has changed, the short declaration cycle container, the initialization container is completed, and the business container has been completed. Correspondingly, we will get a list of several corresponding containers: the list of containers that need to be killed, the need The list of started containers. Note that if our initial container is not completed, the business container to be run will not be added to the list of containers that need to be started. You can see that this place is in two stages
3.2 Initialization failed and attempted to terminate
If the previous initialization container failure is detected before, it will check all the containers of the current Pod and the containers associated with the sandbox. If there are running containers, they will all perform the Kill operation and wait for the operation to complete
3.3 Unknown state container compensation
When some Pod containers are already running, but their status is still Unknow, a unified process will be performed in this place, all will be killed, so as to clean up for the next restart, here and 3.2 will only take a branch , But the core goal is to clean up those containers that have failed or cannot obtain status
3.4 Create a container sandbox
Before starting the Pod container, a sandbox container will first be created for it. All containers of the current Pod share the same namespace as the sandbox corresponding to the Pod to share the resources in a namespace. Creating a Sandbox is more complicated.
3.5 Start Pod related containers
Pod containers are currently divided into three categories: short life cycle containers, initialization containers, and business containers. The startup sequence is also from left to right. If the creation of the container fails, the creation of the container will be delayed through the backoff mechanism. Here Let's introduce the process of starting the container under containerRuntime
3.5.1 Check if the container image is pulled
The pull of the image will first splice the corresponding container image, and then give the key information and the image information of the pull obtained previously to the CRI runtime to pull the underlying container image. Of course, there will be various backoffs here. Mechanism to avoid frequent pull failures affecting kubelet performance
3.5.2 Create Container Configuration
Create container configuration is mainly to create corresponding configuration data for the operation of the container, mainly including: Pod host name, domain name, mounted volume, configMap, secret, environment variables, mounted device information, directory information to be mounted, Port mapping information, commands generated and executed according to the environment, log directory and other information
3.5.3 Call runtimeService to complete the creation of the container
Call runtimeService to pass the container configuration information, call CRI, and finally call the container's creation interface to complete the container's state
3.5.4 Calling runtimeService to start the container
Use the container ID returned by the previously created container to start the corresponding container and create the corresponding log directory for the container
3.5.5 Execute the callback hook of the container
If the container is configured with the PostStart hook, the corresponding hook will be executed here. If the hook type is the Exec class, the EXI interface of CNI will be called to complete the execution in the container
4. Run the sandbox container
4.1 Pulling the sandbox image
First it will pull the sandbox image
4.2 Create a sandbox container
4.2.1 Application SecurityContext
Before the container is created, the configuration of the container SecurityContext will be configured according to the allocation information in the SecurityContext, which mainly includes information such as privilege level, read-only directory, and running account group.
4.2 Other basic information
In addition to the application of SecurityContext, it can also map information such as disconnection, OOMScoreAdj, Cgroup driver, etc.
4.3 Create Container
According to the above various configuration information to create the container
4.3 Create checkpoint
Checkpoint is mainly to serialize the configuration information of the current sandbox and store its current snapshot information
4.4 Start the sandbox container
When the sandbox container is started, StartContainer is directly called and the ID returned by the previously created container is passed in to complete the start of the container, and the dns configuration file that overrides the container is overwritten at this time.
4.5 Container network settings
The network configuration of the container is mainly to call the CNI plug-in to complete the configuration of the container network.
5. Pod container startup summary
This article is a basic version. We will continue to superimpose various details on this version in the future. Interested friends can help forward it. Thank you everyone.
k8s source code reading e-book address: https://www.yuque.com/baxiaoshi/tyado3