[Repost] Illustrated the creation process of kubernetes Pod

Graphical kubernetes Pod creation process big secret

https://www.kubernetes.org.cn/6766.html

 

The creation of containers in kubernetes is undoubtedly a complicated process, involving the unified collaboration of various internal components, as well as docking with the external CRI runtime. This article attempts to explore the various details of the container creation process and understand its various component collaboration processes. Therefore, when there are problems in the follow-up, it may be possible to check the direction a bit.

1. Foundation construction

1.1 Container management thread model

image.pngThe thread model in kubelet belongs to the master / wroker model. It monitors various event sources through a single master and creates a goroutine for each Pod to process Pod business logic. The master and broker communicate through a state pipeline

1.2 Event-driven state final consistency

image.pngAfter creating a Pod through yaml, kubernetes will continue to adjust according to the current event and the current Pod state, so as to achieve the consistency of the final target state

1.3 Component collaboration process

image.pngThe structure declaration of kubelet is up to more than 300 lines of code, which shows its complexity, but we create this process according to the container, we observe the core process, in fact, it can be summarized into three parts: kubelet, containerRuntime, CRI container runtime

2.Kubelet create container process

image.png

2.1 Obtaining Pod for admission check

The event source of kubelet mainly includes two parts: static Pod and Apiserver, we only consider ordinary Pod here, will directly add Pod to PodManager for management, and conduct admission check

Admission check mainly includes two key controllers: eviction management and pre-selection check eviction management is mainly based on the current resource pressure to detect whether the corresponding Pod tolerates the current resource pressure; pre-selection check is based on the currently active container and the current node Information to check whether it meets the basic operating environment of the current Pod, such as affinity check, and if the current Pod has a particularly high priority or a static Pod, it will try to preempt resources for it, and it will be graded according to the QOS level. Preemption to meet its operating environment

2.2 Create event pipeline and container management main thread

When kubelet receives a newly created Pod, it will first create an event pipeline for it, and start a container-managed main thread to consume the events in the pipeline, and will wait for the latest event in the current kubelet based on the last synchronization time (from the local Obtained from podCache), if it is a newly created Pod, it is mainly through the update time operation in PLEG, and the default empty state of the broadcast is used as the latest state.

2.3 Synchronize the latest status

When the latest status information and the Pod information obtained from the event source are obtained from the local podCache, it will be updated in conjunction with the current status of the container in the Pod in the statusManager and probeManager to obtain the latest Pod status that is currently perceived

2.4 Access control inspection

The previous admission check is a check of the hard limits of the resources that the Pod runs, and the admission check here is a soft state that is the container runtime and version of some software operating environment checks. If the check fails here, it will talk about the corresponding container status Set to Blocked

2.5 Update container status

After passing the admission check, the statusManager will be called to synchronize the latest POd status, which may be synchronized to apiserver

2.6 Cgroup configuration

After the update is completed, a PodCOntainerManager will be launched.The main function is to update the Cgroup configuration for the corresponding Pod according to its QOS level.

2.7Pod basic operation environment preparation

Next, kubelet will prepare the basic environment for the creation of Pods, including the creation of Pod data directories, the acquisition of mirror keys, waiting for the completion of volume mounting, etc. The creation of Pod data directories is mainly to create Pods, plug-ins, Volume directory, and the key information will be generated through the Pod configured image pull key, and the work of creating the container by kubelet has been basically completed by this point.

3.ContainerRuntime

image.pngEarlier we mentioned that operations against Pod are ultimately completed based on the synchronization of events and states. ContainerRUntime does not distinguish whether the corresponding event is a create or update operation, but only compares the current Pod information with the target state. To construct the corresponding operation to reach the target state

3.1 Calculate Pod container changes

The calculation container changes mainly include: whether the sandbox of the Pod has changed, the short declaration cycle container, the initialization container is completed, and the business container has been completed. Correspondingly, we will get a list of several corresponding containers: the list of containers that need to be killed, the need The list of started containers. Note that if our initial container is not completed, the business container to be run will not be added to the list of containers that need to be started. You can see that this place is in two stages

3.2 Initialization failed and attempted to terminate

If the previous initialization container failure is detected before, it will check all the containers of the current Pod and the containers associated with the sandbox. If there are running containers, they will all perform the Kill operation and wait for the operation to complete

3.3 Unknown state container compensation

When some Pod containers are already running, but their status is still Unknow, a unified process will be performed in this place, all will be killed, so as to clean up for the next restart, here and 3.2 will only take a branch , But the core goal is to clean up those containers that have failed or cannot obtain status

3.4 Create a container sandbox

Before starting the Pod container, a sandbox container will first be created for it. All containers of the current Pod share the same namespace as the sandbox corresponding to the Pod to share the resources in a namespace. Creating a Sandbox is more complicated.

3.5 Start Pod related containers

Pod containers are currently divided into three categories: short life cycle containers, initialization containers, and business containers. The startup sequence is also from left to right. If the creation of the container fails, the creation of the container will be delayed through the backoff mechanism. Here Let's introduce the process of starting the container under containerRuntime

3.5.1 Check if the container image is pulled

The pull of the image will first splice the corresponding container image, and then give the key information and the image information of the pull obtained previously to the CRI runtime to pull the underlying container image. Of course, there will be various backoffs here. Mechanism to avoid frequent pull failures affecting kubelet performance

3.5.2 Create Container Configuration

Create container configuration is mainly to create corresponding configuration data for the operation of the container, mainly including: Pod host name, domain name, mounted volume, configMap, secret, environment variables, mounted device information, directory information to be mounted, Port mapping information, commands generated and executed according to the environment, log directory and other information

3.5.3 Call runtimeService to complete the creation of the container

Call runtimeService to pass the container configuration information, call CRI, and finally call the container's creation interface to complete the container's state

3.5.4 Calling runtimeService to start the container

Use the container ID returned by the previously created container to start the corresponding container and create the corresponding log directory for the container

3.5.5 Execute the callback hook of the container

If the container is configured with the PostStart hook, the corresponding hook will be executed here. If the hook type is the Exec class, the EXI interface of CNI will be called to complete the execution in the container

4. Run the sandbox container

image.png

4.1 Pulling the sandbox image

First it will pull the sandbox image

4.2 Create a sandbox container

4.2.1 Application SecurityContext

Before the container is created, the configuration of the container SecurityContext will be configured according to the allocation information in the SecurityContext, which mainly includes information such as privilege level, read-only directory, and running account group.

4.2 Other basic information

In addition to the application of SecurityContext, it can also map information such as disconnection, OOMScoreAdj, Cgroup driver, etc.

4.3 Create Container

According to the above various configuration information to create the container

4.3 Create checkpoint

Checkpoint is mainly to serialize the configuration information of the current sandbox and store its current snapshot information

4.4 Start the sandbox container

When the sandbox container is started, StartContainer is directly called and the ID returned by the previously created container is passed in to complete the start of the container, and the dns configuration file that overrides the container is overwritten at this time.

4.5 Container network settings

The network configuration of the container is mainly to call the CNI plug-in to complete the configuration of the container network.

5. Pod container startup summary

image.pngKubelet is the core housekeeper of container management. It is responsible for various admission control, status management, detection management, volume management, QOS management, and unified scheduling of CSI docking. It also prepares basic data for Runtime runtime and feeds back Pod ’s current The latest state image.pngRuntime layer uses the data assembled by the kubelet to reorganize the resources according to the target configuration of the CRI runtime and the resource configuration information managed by the kubelet, and decides the container start and stop, creation and other operations according to the state of the Pod container, and Complete the construction of the basic configuration environment of the container, and finally call CRI to complete the creation of the container, and when the CRI runs, it will talk about the various data passed on to further combine and apply it to the host and the corresponding namespace resource limit, and according to Organize your own container service data, call the container service to complete the final creation of the container

This article is a basic version. We will continue to superimpose various details on this version in the future. Interested friends can help forward it. Thank you everyone.

k8s source code reading e-book address:  https://www.yuque.com/baxiaoshi/tyado3

Guess you like

Origin www.cnblogs.com/jinanxiaolaohu/p/12503273.html