Flink on K8s enterprise production practice

Background
In order to solve the systemic problem of the company's model & feature iteration and improve the efficiency of algorithm development and iteration, the department established the feature platform project. The feature platform aims to solve problems such as scattered data storage, duplicate calibers, complex extraction, and long links, build a scientific bridge between big data and algorithms, and provide strong sample and feature data support. The platform performs rapid data ETL from Hive, Hbase, combined with big data ODS (Operational Data store) layer, DWD, and DWS layers such as relational databases, extracts data to the feature platform for management, and unifies the data export for data scientists, data Engineers and machine learning engineers perform data testing, training, inference and other data applications of algorithm models.

This article mainly shares the deployment practice of the feature platform flink on K8s. The article mainly introduces it in the following aspects. First, this article briefly introduces the basic concepts of K8s and the Flink task execution diagram. Then the article compares several existing Flink on K8s deployment methods.

Why does flink need to be deployed based on K8s?

There are mainly the following advantages:

The container environment is easy to deploy, clean and rebuild: unlike a virtual environment distributed and deployed as an image, it has little dependence on the underlying system environment. All required packages can be integrated into the image and reused.
Better isolation and security, application deployment is started with pods, pods are independent of each other, and the resource environment is more secure after isolation.
K8s clusters can make good use of resources, and many tasks such as machine learning and online services can be deployed in a mixed manner.
The trend of cloud native, rich k8s ecosystem, and the trend of cloud native in big data computing.
Introduction
2.1 K8s Introduction
Kubernetes provides you with a framework that can run distributed systems flexibly. Kubernetes will meet your expansion requirements, failover, deployment modes, etc. The essence of the Kubernetes project is to provide users with a universal container orchestration tool.
Insert image description here

K8S is known as the operating system of the cloud era (the image in it is similar to a software installation package).
It aims to provide "automatic deployment and expansion across host clusters and a platform for running application containers"
scheduling, resource management, service discovery, health check, Automatic scaling, rolling upgrade...

basic components

Pod: The atomic scheduling unit of K8s is a combination of one or more Containers. Containers share the same network and storage.Insert image description here

Deployment: A high-level abstraction for a group of identical Pods that can automatically restart and recover to ensure high availability.

Service: Define the access entrance of the service and bind the backend Pod replica set through Label Selector. If there is a service inside K8s that needs to be accessed externally, you can expose it through Service using LoadBalancer or NodePort. If you do not want or need to expose the service to the outside world, you can set the Service to Cluster IP or None mode.

ConfigMap: KV structure data. The usual usage is to mount ConfigMap to a Pod and provide it as a configuration file for new processes in the Pod.

Stateful - Stateful application deployment

Job and Cronjob-offline business

2.2 Introduction to Flink
Apache Flink is a framework and distributed processing engine for stateful computing on unbounded and bounded data streams. Flink runs in all common cluster environments and can compute at memory speeds and at any scale
Insert image description here

2.2.1 Flink architecture diagram
The Flink architecture diagram is similar to common big data structures. They all adopt the mainstream master-slave architecture, one JobManager, multiple TaskManagers, and HA deployment of JobManagers can be performed.

Flink code needs to go through several graph conversions from submission to actual execution. The process is as follows: StreamGraph -> JobGraph -> ExecutionGraph -> Physical Execution Graph

Insert image description here

The first layer of StreamGraph starts from the Source node. Each transform generates a StreamNode. The two StreamNodes are connected together through StreamEdge to form a DAG composed of StreamNode and StreamEdge.
The second layer of JobGraph still starts from the Source node, and then traverses to find operators that can be embedded together. If they can be embedded together, they will be embedded together. If they cannot be embedded together, they will generate jobVertex separately. The upstream and downstream JobVertex will be linked through JobEdge to finally form DAG at the JobVertex level.
After the JobVertex DAG is submitted to the task, it is sorted starting from the Source node, the ExecutionJobVertex is generated based on the JobVertex, the IntermediateResult is built based on the IntermediateDataSet of the jobVertex, and then the IntermediateResult builds upstream and downstream dependencies to form a DAG at the ExecutionJobVertex level, that is, ExecutionGraph.
Finally, it passes through the ExecutionGraph layer to the physical execution layer.
Flink on K8s deployment mode
3.1 Flink deployment mode [1]
Session mode

Multiple job submissions share the same JobManager, and an instance of Flink Cluster has been created and shared by all jobs. Flink tasks are submitted by the client, which does some preparatory work and generates JobGraph on the Flink Client. The disadvantage of this method is that the failure of the JobManager caused by one job may cause the failure of all jobs.

Per-Job mode

Start a dedicated JM for each Job submission, the JM will only execute this job and then exit. Generate JobGraph on Flink Client,

It can be understood as the Application Mode of Client mode. This mode makes full use of the advantages of resource management frameworks, such as Yarn, Mesos, etc., to achieve stronger resource isolation, and flink applications will not affect each other. One Job and one Cluster instance.

Application mode

The program submitted by Flink is treated as an internal application in the cluster, and the client no longer needs to do heavy preparations such as executing the main function.

(number, generate JobGraph, download dependencies and distribute to various nodes, etc.), the main function is submitted to JobManager for execution.

One Application and one Cluster instance.

3.3 Shortcomings of Standalone deployment
Users need to have some basic understanding of K8s to ensure smooth operation of Flink on K8s.
Flink cannot sense the existence of K8s.
Currently static resource allocation is mainly used. It is necessary to confirm in advance how many TaskManagers are needed. If the concurrency of the Job needs to be adjusted, the resources of the TaskManager must keep up accordingly, otherwise the task cannot be executed normally.
It is impossible to apply for and release resources in real time. If you maintain a relatively large Session Cluster, resources may be wasted. However, if the maintained Session Cluster is relatively small, it may cause the job to run slowly or not be able to run.
3.4 Advantages of Navtive deployment
Resource application method: Flink's Client has a built-in K8s Client. You can use the K8s Client to create a JobManager. After the Job is submitted, if there is a demand for resources, the JobManager will apply for resources from Flink's own ResourceManager. At this time, Flink's ResourceManager will directly communicate with the K8s API Server, send these requested resources directly to the K8s Cluster, and tell it how many TaskManagers are needed and how big each TaskManager is. When the task is finished running, it will also tell K8s Cluster to release unused resources. It is equivalent to Flink understanding the existence of K8s Cluster in a very native way, and knowing when to apply for resources and when to release resources.
Native is compared to Flink. With the help of Flink's commands, you can achieve a state of autonomy. You can use Flink to complete tasks and run them on K8s without introducing external tools.
3.5 Final selection of deployment plan
Through the analysis of Flink standalone and native modes, standalone needs to cooperate with kubectl + yaml deployment. Flink cannot sense the existence of K8s cluster and passively applies for resources. However, native deployment only uses the flink client kubernetes-session.sh or flink run deployment, and Flink actively and K8s applies for resources and becomes the best deployment method. In addition, because the tasks are mainly offline batch processing, each application can contain multiple jobs, which is more suitable for business needs.

Actual deployment:
Here we only demonstrate the k8s native deployment mode. Standalone deployment requires manual creation of ConfigMap, Service, JobManager Deployment, TaskManager Deployment, etc. which is troublesome.

4.1 K8s cluster
K8s >= 1.9 or Minikube

KubeConfig (can view, create, delete pods and services)

Enable Kubernetes DNS

Service Account with RBAC permissions can create and delete pods

4.2 PyFlink image
FROM flink:1.12.1-scala_2.11-java8

Install python3 and pip3 and required debug tools

RUN apt-get update -y &&
apt-get install -y python3.7 python3-pip python3.7-dev
&& rm -rf /var/lib/apt/lists/*
RUN rm -rf /usr/bin/python
RUN ln -s /usr/bin/python3 /usr/bin/python

Install Python Flink

RUN pip3 install apache-flink==1.12.1

If you reference third-party Python dependency libraries, you can install these dependencies when building the image.

#COPY /path/to/requirements.txt /opt/requirements.txt
#RUN pip3 install -r requirements.txt

If third-party Java dependencies are referenced, they can also be added to the ${FLINK_HOME}/usrlib directory when building the image.

RUN mkdir -p $FLINK_HOME/usrlib
COPY /path/of/external/jar/dependencies $FLINK_HOME/usrlib/

COPY /path/of/python/codes /opt/python_codesCopy
the
pyflink image required for Docker build deploymentInsert image description here

Flink image -> PyFlink image -> PyFlink App image

4.3 Flink Application native deployment
Operation method: flink application on k8s native mode is very simple. You can run and submit the Job in Application mode with just one command.

./bin/flink run-application -p 2 -t kubernetes-application
-Dkubernetes.cluster-id=app-cluster
-Dtaskmanager.memory.process.size=4096m
-Dkubernetes.taskmanager.cpu=2
-Dtaskmanager.numberOfTaskSlots=4
-Dkubernetes.container.image=demo-pyflink-app:1.12.1
-pyfs /opt/python_codes
-pym new_word_count
Copy
startup flow chart:

After first creating the resources of Service, Master and ConfigMap, Flink Master Deployment already contains a user Jar. At this time, Cluster Entrypoint will extract or run the user's main from the user Jar, and then generate JobGraph. Then submit it to Dispatcher, which will generate Master, and then apply for resources from ResourceManager. The subsequent logic is the same as Session.
The biggest difference between it and Session is that it is submitted in one step. Because there is no need for two-step submission, if you do not need to access the external UI after the task is started, you do not need an external Service. Tasks can be run directly through one-step submission. Flink's Web UI can be accessed through local port-forward or using some proxies of K8s ApiServer. At this point, the External Service is no longer needed, which means there is no need to occupy a LoadBalancer or NodePort.
4.4 Production process
The Flink application writing process is as follows:
Insert image description here

   这块产品主要是采用flink sql去完成 功能,运行模式比较统一,注册source、sink、 执行sq,因此可以采用同一份代码,提供给用户sql编辑框或者用户界面上选择所需要读取的库表字段后端组合成sql语句,最终统一任务运行形成一个离线计算平台,通过动态传递参数进行flink应用的提交和执行。

The backend configures the source and sink types and connection information in the database and exposes them to the frontend.

The front end selects the corresponding data source such as mysql and hive, then selects the library table that needs to be read, displays the table schema, and the user can select the library table fields that need to be read. At the same time, select the data sink that needs to be stored, such as elasticsearch, mysql, etc. After obtaining these dynamic parameters, create a job through k8s java client to submit the flink application.

When the flink application starts, it obtains these db, library table information, and library table fields and passes them to the FLink program. The flink program is constructed into flinksql to execute the application. The specific execution is not detailed.

5. Summary
This article shares the practical experience of flink on K8s deployment, briefly introduces the basic concepts of K8s and Flink execution diagram, compares different deployment methods of Flink, and uses specific demos to analyze the components in the deployment process of Pyflink on K8s. The coordination process between them helps everyone understand the underlying execution process while getting started.

Guess you like

Origin blog.csdn.net/xiuqingzhouyang/article/details/131401946