Three frameworks flow calculation: Storm, Spark and Flink

We know that big data computing model is divided into batch computing (batch computing), flow calculation (stream computing), interactive computing (interactive computing), a calculation (graph computing) and so on. Wherein the flow calculations is calculated and batches of data are two major large calculation mode, respectively, for different scenarios of large data.

Current mainstream flow calculation framework Storm, Spark Streaming, Flink three, the basic principle is as follows:

Apache Storm

In Storm, the need to design a real-time computing architecture, which we call topology (topology). After that, this topology will be submitted to the cluster, where the master node (master node) is responsible for work node (worker node) assigned codes, work node is responsible for executing code. In a topology containing spout and bolt two roles. Data transfer between the spouts, which spouts the data stream is transmitted as a tuple of tuples; bolt is responsible for converting the data stream.Here Insert Picture Description

Apache Spark

Spark Streaming, i.e. core Spark API extensions, not as a Storm processing one data stream. Instead, before it processes the data stream, the data stream segment will be sliced ​​time intervals. Spark abstract for continuous data stream, we call DStream (Discretized Stream). DStream small batch RDD (elastic distributed data set), then the RDD data set is distributed, can be converted by any function and a sliding data window (window calculation), for parallel operation.
Here Insert Picture Description

Apache Flink

+ Stream data batch data for the frame is calculated. Batch data as a special case of the data stream, a lower latency (in milliseconds), and to ensure that the message transmission is not lost will not be repeated.
Here Insert Picture Description
Flink creatively unified stream processing and batch input stream processing is viewed as a data stream **, and the batch was as a special stream processing, but its input data stream is defined as bounded. Flink Program Stream and Transformation of these two basic building blocks, which is an intermediate result data Stream, and a Transformation operation, one or more input performs calculation processing Stream, Stream output one or more results.

Comparison of these three frame calculated as follows:

Here Insert Picture Description
Reference article:

Streaming Big Data: Storm, Spark and Samza

Guess you like

Origin blog.51cto.com/13945147/2437363