flink-core
基础的内部数据定义。集群端和用户端共用
一些函数(function)
The base interface for all user-defined functions.
This interface is empty in order to allow extending interfaces to be SAM (single abstract method) interfaces that can be implemented via Java 8 lambdas.
org.apache.flink.api.common.functions.MapFunction
org.apache.flink.api.common.functions.JoinFunction
一些操作符(operator)
An operator is a source, sink, or it applies an operation to one or more inputs, producing a result.
DualInputOperator,对应两个输入,一个输出的function。
累计器(Accumulator)
我们以典型的IntCounter为例
聚合器(Aggregator)
LongSumAggregator为例子
事件事件定义类
Watermark,内部维护一个 private final long timestamp;
sink
source
Tuple
Tuples have a fix length and contain a set of fields, which may all
be of different types. Because Tuples are strongly typed, each distinct tuple length is
represented by its own class. Tuples exists with up to 25 fields and are described in the classes
flink-runtime
集群端的运行包
JobGraph
The JobGraph represents a Flink dataflow program, at the low level that the JobManager accepts.
All programs from higher level APIs are transformed into JobGraphs.

The JobGraph is a graph of vertices and intermediate results that are connected together to
form a DAG. Note that iterations (feedback edges) are currently not encoded inside the JobGraph
but inside certain special vertices that establish the feedback channel amongst themselves.
The JobGraph defines the job-wide configuration settings, while each vertex and intermediate result
define the characteristics of the concrete operation and intermediate data.
内部维护一个节点图
private final Map<JobVertexID, JobVertex> taskVertices = new LinkedHashMap<JobVertexID, JobVertex>();
每个节点维护自己的结果和输入
//JobVertex.java
/** List of produced data sets, one per writer */
private final ArrayList<IntermediateDataSet> results = new ArrayList<IntermediateDataSet>();
/** List of edges with incoming data. One per Reader. */
private final ArrayList<JobEdge> inputs = new ArrayList<JobEdge>();
ResourceManager
一般有三种RM。
- StandaloneResourceManager
public class StandaloneResourceManager extends ResourceManager<ResourceID> {
- YarnResourceManager
public class YarnResourceManager extends ResourceManager<YarnWorkerNode> implements AMRMClientAsync.CallbackHandler, NMClientAsync.CallbackHandler {
- MesosResourceManager
public class MesosResourceManager extends ResourceManager<RegisteredMesosWorkerNode> {
会启动jobLeaderIdService和leaderElectionService
Dispatcher
是 一个FencedRpcEndpoint。
JobManagerRunner
启动JobManager,每个任务一个。
JobMaster(其实就是新版本的JobManager)
JobMaster implementation. The job master is responsible for the execution of a single JobGraph
是 一个FencedRpcEndpoint
维护了如下字段
private final Scheduler scheduler;
private final SlotPool slotPool;
private final JobGraph jobGraph;
Scheduler
Slot
Slot是flink进行资源管理的基本单位。
SlotManager
IO.NetWork
IO.Disk
flink-java
给java开发中提供的。
DataSet
flink提供DataSet Api用户处理批量数据。flink先将接入数据转换成DataSet数据集,并行分布在集群的每个节点上;然后将DataSet数据集进行各种转换操作(map,filter等),最后通过DataSink操作将结果数据集输出到外部系统。
public <R> MapOperator<T, R> map(MapFunction<T, R> mapper)
public <R> MapPartitionOperator<T, R> mapPartition(MapPartitionFunction<T, R> mapPartition)
public <R> FlatMapOperator<T, R> flatMap(FlatMapFunction<T, R> flatMapper)
public FilterOperator<T> filter(FilterFunction<T> filter)
public <OUT extends Tuple> ProjectOperator<?, OUT> project(int... fieldIndexes)
public AggregateOperator<T> aggregate(Aggregations agg, int field)
public AggregateOperator<T> sum(int field)
public <K> DistinctOperator<T> distinct(KeySelector<T, K> keyExtractor)
public UnionOperator<T> union(DataSet<T> other)
注意,dataSet中的API
operators
flink-streaming-java
给java流处理提供的
DataStream
用于处理流数据
public <R> SingleOutputStreamOperator<R> map(MapFunction<T, R> mapper) {
public <T2> JoinedStreams<T, T2> join(DataStream<T2> otherStream)
public <R> SingleOutputStreamOperator<R> flatMap(FlatMapFunction<T, R> flatMapper)
StreamGraph
维护流处理的状态
private Set<Integer> sources;
private Set<Integer> sinks;
private Map<Integer, StreamNode> streamNodes;
streamGraph会转化为JobGraph进行执行。
public JobGraph getJobGraph(@Nullable JobID jobID) {
其中,每一个StreamNode维护一个
//StreamNode.java
private List<StreamEdge> inEdges = new ArrayList<StreamEdge>();
private List<StreamEdge> outEdges = new ArrayList<StreamEdge>();
flink-client
向集群提交工作的过程
flink-libraries
包含CEP等内容
flink-yarn
Per-Job-Cluster模式
一个任务会对应一个Job,每提交一个作业会根据自身的情况,都会单独向yarn申请资源,直到作业执行完成,一个作业的失败与否并不会影响下一个作业的正常提交和运行。独享Dispatcher和ResourceManager,按需接受资源申请;适合规模大长时间运行的作业。
flink-table
tabkle API的相关功能。
flink-connectors