Apache Flink Advanced (III): Checkpoint Analysis and Application of Principle

640?wx_fmt=jpeg

Chen said data concern the small, dry goods for more manufacturers to share technology

Reply to "spark", "flink", "machine learning", "front-end" access to massive learning materials ~~~

This paper finishing part series based on Apache Flink advanced courses live together , shared by Alibaba senior research engineer Tang Yun (dried tea), mainly on the application of Checkpoint Flink in practice, it consists of four parts, namely, the relationship with the state of Checkpoint, What is the state, how to use the enforcement mechanisms of the state and Checkpoint Flink in.


Tips: end of the article can review all the Basics and Advanced articles series of tutorials.

Checkpoint's relationship with the state


Checkpoint is triggered from a source global to all downstream nodes complete the operation. The following diagram may have an intuitive feel for Checkpoint, the red box which you can see a total of 569K times triggered Checkpoint, then all completed without fail in.

640?wx_fmt=png


The main data state is actually made major Checkpoint persistent backup, see the figure of specific statistics, which state will 9kb size.

640?wx_fmt=png

What is the state


Next we look at what is state. Look at a very classic word count code, which will go to monitor local and network data input port 9000 port word frequency statistics, our local action netcat, and then enter the hello world at the end, what the implementation of the program will output?

640?wx_fmt=png


The answer is obvious, (hello, 1) and (word, 1).

So the question is, if the input hello world again in the terminal, the program will enter what?

The answer is actually very clear, (hello, 2) and (world, 2). Why Flink know has been treated before a hello world, this is the state to play a role, this is referred to as keyed state the need for statistical data stored before, so help Flink know hello and world appeared once.

Just look at this word count code. keyby invocation interface to create keyed stream of key divide, which is the premise of using keyed state. After this, sum method calls the built-in StreamGroupedReduce achieved.

640?wx_fmt=png


What is keyed state


For keyed state, it has two characteristics:

  • KeyedStream function can only be used with the operation, for example Keyed UDF, window state
  • keyed state is already partition / division good, each key can only belong to a keyed state

For how to understand the concept has been partitioned, we need to look at keyby semantics, we can see the figure below on the left there are three concurrent right is three concurrent, after word of the left come in, will make the appropriate distribution through keyby. For example, hello word, hello word by hash operation will forever top to the bottom right task concurrent go.

640?wx_fmt=png


 
What is the operator state

  • Also known as non-keyed state, each State operator only bound to an instance of all of the operator
  • Common operator state is the source state, such as records of offset current source

Look at the word count of code that use the operator state:

640?wx_fmt=png


FromElements here calls FromElementsFunction class, which would use the operator state type of list state. The type of state to make a classification as shown below: 

640?wx_fmt=png


Aside from this classification point of view, there is a classification of whether the angle is directly taken over from Flink:

  • Managed State: the state Flink managed just an example of all the state are managed state
  • State RAW: Flink only provide stream data can be stored, for Flink raw state in terms of just a few bytes

In actual production, is recommended only managed state, this article will focus on the topic for discussion.


How to use the state in Flink


The following figure previously StreamGroupedReduce kind word count of the sum used an example to explain how to use the keyed state in your code:

640?wx_fmt=png


The following figure is the example of the word count Explanation FromElementsFunction classes and how to use the operator state sharing in the code:

640?wx_fmt=png


Checkpoint enforcement mechanisms


Before the implementation of the mechanisms by Checkpoint, we need to look at the state of storage, because state is the main role Checkpoint for persistence backup.

Statebackend classification

The following figure illustrates the current Flink built three state backend, which MemoryStateBackend and FsStateBackend at runtime are stored in java heap is only in the implementation of Checkpoint, FsStateBackend only the data in a file format persisted to the remote storage. The RocksDBStateBackend is borrowed RocksDB (RAM disk mixed LSM DB) of state for storage.

640?wx_fmt=png


For HeapKeyedStateBackend, there are two implementations:

  • It supports asynchronous Checkpoint (default): storage format CopyOnWriteStateMap
  • Only supports synchronous Checkpoint: storage format NestedStateMap

Especially when used in MemoryStateBackend HeapKeyedStateBackend, Checkpoint data sequence stage is limited by default maximum 5 MB of data.

For RocksDBKeyedStateBackend, each state are stored in a separate column family, wherein keyGroup, Key, and Namespace serialized as stored in the DB key.

640?wx_fmt=png

Checkpoint enforcement mechanisms Detailed

This section will gradually dismantling execution flow Checkpoint explain, the left figure is Checkpoint Coordinator, the entire initiator Checkpoint, two middle Source Flink, consisting of a sink operation, the rightmost persistence storing, corresponding to the user in most scenarios HDFS.

a. The first step , Checkpoint Coordinator trigger to all source nodes Checkpoint.

640?wx_fmt=png


b. The second step , source node to the downstream broadcasting barrier, this barrier is to achieve a distributed core Chandy-Lamport snapshot algorithm, task will be performed only downstream of the corresponding barrier Checkpoint receive all the input.

640?wx_fmt=png


c. The third step , after the task is completed when the backup state, the address of the backup data (state handle) notifies the Checkpoint coordinator.

640?wx_fmt=png


d. The fourth step after, sink nodes downstream barrier collected together upstream of the two input, performs local snapshots, here specifically showing RocksDB incremental Checkpoint process, the total amount will brush RocksDB first data to the disk (red triangle indicates) then Flink framework will choose not to upload files persistence backup (purple triangles).

640?wx_fmt=png

e. the same , sink node after completion of their Checkpoint, will return notification state handle Coordinator.

640?wx_fmt=png


f. Finally , when Checkpoint coordinator gather together all the state handle the task, it is believed that this time of global Checkpoint completed, the persistent store and then back up a Checkpoint meta file.

640?wx_fmt=png

Checkpoint semantics of EXACTLY_ONCE

In order to realize semantic EXACTLY ONCE, Flink through a data input buffer to receive the alignment phase is cached, and the like after the alignment is completed before handling. The semantic data for AT LEAST ONCE, no cache collected, will direct the follow-up process, resulting in time restore, data may be processed more than once. The figure is a schematic diagram of the official website of Checkpoint align the document on the inside:

640?wx_fmt=png


Special attention is required, Checkpoint can only guarantee the mechanism Flink Flink calculation can be done EXACTLY ONCE, end to end EXACTLY ONCE need to support source and sink.

Checkpoint is the difference Savepoint

When job recovery, both may be used, the main differences are as follows:

640?wx_fmt=png

More Apache Flink Basics Advanced articles and tutorials full review:

Apache Flink Getting Started Tutorial Series

▼ Advanced articles

▼  Basics

 


▼ Flink community recommendation  ▼ 

Apache Flink and large data field event  Flink Forward Asia 2019  will be held at the Beijing National Convention Center held on November 28-30, the agenda of the General Assembly has been on-line, click on the " read the original text ," to learn details of the agenda of the General Assembly.

▼ 
▼ 

( Click on the picture to see the details Flink Forward Asia 2019 )
You can also " looking at " it?


Published 40 original articles · won praise 3 · Views 9085

Guess you like

Origin blog.csdn.net/huzechen/article/details/102548868