Overview of the use of Flume

1. Overview of Flume

Flume is a highly reliable, distributed and massive log collection, aggregation and transmission system provided by Cloudera.

2.Flume composition

Agent mainly consists of 3 parts, Source, Channel, and Sink.

①Source

The Source component is specifically used to collect data and can process log data of various types and formats, including avro, exec, spooling directory, netcat, etc.

②Channel

The Channel component caches the collected data, which can be stored in Memory or File.

Flume comes with two Channels: Memory Channel and File Channel.

Memory Channel is a queue in memory. High performance, but it is possible to lose data.

File Channel writes all events to disk. Therefore, no data will be lost when the program is closed or the machine is down. High fault tolerance.

③Sink

Sink component is a component used to send data to the destination, the destination includes Hdfs, Logger, avro

3. Flume internal principle

①Channel Selector

The difference between the two Channel Selectors is that the Replicating Channel Selector sends the events from the Source to all channels, and the Multiplexing Channel Selector can choose which channels to send to.

②Channel

Fume comes with two channels: Memory Channel and File Channel.
Memory Channel is based on memory cache and is suitable for situations where you don't need to care about data loss.
File channe is a persistent channel of Fume. No data loss in system downtime

4. Flume's transaction mechanism

Flume uses two independent transactions to be responsible for the event delivery from Soucrce to Channel and from Channel to Sink.

Only when all the data in the Source is submitted to the Channel, then the Source considers that the data reading is complete; the Channel->Sink stage is the same, if the event cannot be recorded for some reason, the transaction will be rolled back, and all events will be protected to the Channel In, waiting for re-delivery.

5. Attention

1. In the topology diagram of flume, you need to open flume that accepts data first.

2. The output local directory must already exist and will not be created automatically.

3. yum install netcat tool nc hostname port

4. Observe whether Flume's data is lost or not, using the Ganglia third-party framework

Guess you like

Origin blog.csdn.net/QJQJLOVE/article/details/107093964