Flink代码阅读之代码架构

flink-core

基础的内部数据定义。集群端和用户端共用

一些函数（function）

The base interface for all user-defined functions.
This interface is empty in order to allow extending interfaces to be SAM (single abstract method) interfaces that can be implemented via Java 8 lambdas.

org.apache.flink.api.common.functions.MapFunction
org.apache.flink.api.common.functions.JoinFunction

一些操作符（operator）

An operator is a source, sink, or it applies an operation to one or more inputs, producing a result.

在这里插入图片描述
DualInputOperator，对应两个输入，一个输出的function。

累计器（Accumulator）

我们以典型的IntCounter为例
在这里插入图片描述

聚合器（Aggregator）

LongSumAggregator为例子
在这里插入图片描述

事件事件定义类

Watermark，内部维护一个 private final long timestamp;
在这里插入图片描述

sink

source

Tuple

Tuples have a fix length and contain a set of fields, which may all
be of different types. Because Tuples are strongly typed, each distinct tuple length is
represented by its own class. Tuples exists with up to 25 fields and are described in the classes

flink-runtime

集群端的运行包

JobGraph

The JobGraph represents a Flink dataflow program, at the low level that the JobManager accepts.
All programs from higher level APIs are transformed into JobGraphs.

扫描二维码关注公众号，回复： 12469314 查看本文章

The JobGraph is a graph of vertices and intermediate results that are connected together to
form a DAG. Note that iterations (feedback edges) are currently not encoded inside the JobGraph
but inside certain special vertices that establish the feedback channel amongst themselves.

The JobGraph defines the job-wide configuration settings, while each vertex and intermediate result
define the characteristics of the concrete operation and intermediate data.
内部维护一个节点图

	private final Map<JobVertexID, JobVertex> taskVertices = new LinkedHashMap<JobVertexID, JobVertex>();

每个节点维护自己的结果和输入

//JobVertex.java
	/** List of produced data sets, one per writer */
	private final ArrayList<IntermediateDataSet> results = new ArrayList<IntermediateDataSet>();

	/** List of edges with incoming data. One per Reader. */
	private final ArrayList<JobEdge> inputs = new ArrayList<JobEdge>();

在这里插入图片描述

ResourceManager

一般有三种RM。

StandaloneResourceManager

public class StandaloneResourceManager extends ResourceManager<ResourceID> {

YarnResourceManager

public class YarnResourceManager extends ResourceManager<YarnWorkerNode> implements AMRMClientAsync.CallbackHandler, NMClientAsync.CallbackHandler {

MesosResourceManager

public class MesosResourceManager extends ResourceManager<RegisteredMesosWorkerNode> {

会启动jobLeaderIdService和leaderElectionService

Dispatcher

在这里插入图片描述
是一个FencedRpcEndpoint。

JobManagerRunner

启动JobManager，每个任务一个。

JobMaster（其实就是新版本的JobManager）

JobMaster implementation. The job master is responsible for the execution of a single JobGraph
是一个FencedRpcEndpoint

在这里插入图片描述
维护了如下字段

	private final Scheduler scheduler;
    private final SlotPool slotPool;
   	private final JobGraph jobGraph;

Scheduler

在这里插入图片描述

Slot

Slot是flink进行资源管理的基本单位。

SlotManager

IO.NetWork

IO.Disk

flink-java

给java开发中提供的。

DataSet

flink提供DataSet Api用户处理批量数据。flink先将接入数据转换成DataSet数据集，并行分布在集群的每个节点上；然后将DataSet数据集进行各种转换操作(map，filter等)，最后通过DataSink操作将结果数据集输出到外部系统。

public <R> MapOperator<T, R> map(MapFunction<T, R> mapper) 
public <R> MapPartitionOperator<T, R> mapPartition(MapPartitionFunction<T, R> mapPartition)
public <R> FlatMapOperator<T, R> flatMap(FlatMapFunction<T, R> flatMapper)
public FilterOperator<T> filter(FilterFunction<T> filter)
public <OUT extends Tuple> ProjectOperator<?, OUT> project(int... fieldIndexes) 
public AggregateOperator<T> aggregate(Aggregations agg, int field)
public AggregateOperator<T> sum(int field)
public <K> DistinctOperator<T> distinct(KeySelector<T, K> keyExtractor)
public UnionOperator<T> union(DataSet<T> other)

在这里插入图片描述
注意，dataSet中的API

operators

flink-streaming-java

给java流处理提供的

DataStream

用于处理流数据

    public <R> SingleOutputStreamOperator<R> map(MapFunction<T, R> mapper) {
    public <T2> JoinedStreams<T, T2> join(DataStream<T2> otherStream) 
    public <R> SingleOutputStreamOperator<R> flatMap(FlatMapFunction<T, R> flatMapper)

StreamGraph

维护流处理的状态

	private Set<Integer> sources;
    private Set<Integer> sinks;
	private Map<Integer, StreamNode> streamNodes;

streamGraph会转化为JobGraph进行执行。

	public JobGraph getJobGraph(@Nullable JobID jobID) {

其中，每一个StreamNode维护一个

    //StreamNode.java
	private List<StreamEdge> inEdges = new ArrayList<StreamEdge>();
	private List<StreamEdge> outEdges = new ArrayList<StreamEdge>();

flink-client

向集群提交工作的过程

flink-libraries

包含CEP等内容

flink-yarn

Per-Job-Cluster模式
一个任务会对应一个Job，每提交一个作业会根据自身的情况，都会单独向yarn申请资源，直到作业执行完成，一个作业的失败与否并不会影响下一个作业的正常提交和运行。独享Dispatcher和ResourceManager，按需接受资源申请；适合规模大长时间运行的作业。

flink-table

tabkle API的相关功能。

flink-connectors

在这里插入图片描述

flink-core

一些函数（function）

一些操作符（operator）

累计器（Accumulator）

聚合器（Aggregator）

事件事件定义类

sink

source

Tuple

flink-runtime

JobGraph

ResourceManager

Dispatcher

JobManagerRunner

JobMaster（其实就是新版本的JobManager）

Scheduler

Slot

SlotManager

IO.NetWork

IO.Disk

flink-java

DataSet

operators

flink-streaming-java

DataStream

StreamGraph

flink-client

flink-libraries

flink-yarn

flink-table

flink-connectors

猜你喜欢

目录

热门文章