Kafka整合Flume

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/daerzei/article/details/82182235

大数据进行流式数据处理的时候Flume采集数据,Kafka消费数据,Spark Streaming处理数据是一种非常常见的架构,这里记录一下Kafka整合Flume的不过,以备后用

这里默认已经安装好了KafkaFlume,不再介绍,大家可以自行去网上找下

其实最主要的就是为Flume创建kafka的配置文件kafka-conf.properties

# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per agent,
# in this case called 'agent'

agent.sources = r1
agent.channels = c1
agent.sinks = s1

# Flume中Source相关的配置
# 指定要监控的类型,比如说端口号,目录
agent.sources.r1.type = spooldir
# 指定要监控的目录
agent.sources.r1.spoolDir = /opt/shortcut/flume/logs/data
agent.sources.r1.fileHeader = true

# Flume中Sink相关的配置
# 指定输出结果到Kafka
agent.sinks.s1.type = org.apache.flume.sink.kafka.KafkaSink
# 指定输出结果到Kafka指定的Topic为test
agent.sinks.s1.topic = test
# 指定Kafka的brokerList为cm02.spark.com:9092
agent.sinks.s1.brokerList = cm02.spark.com:9092
agent.sinks.s1.requiredAcks = 1
agent.sinks.s1.batchSize = 2

# Flume中Channel相关的配置
agent.channels.c1.type = memory
agent.channels.c1.capacity = 100
agent.sources.r1.channels = c1
agent.sinks.s1.channel = c1

启动Flume:

bin/flume-ng agent --conf conf -f ./conf/kafka-conf.properties -n agent -Dflume.root.logger=INFO,console

Kafka整合Flume_01.png
Kafka整合Flume_02.png
启动Kafka消费者:

bin/kafka-console-consumer.sh --zookeeper cm01.spark.com:2181,cm02.spark.com:2181,cm03.spark.com:2181 --topic test --from-beginning

上传一个文本文件到Flume监听的目录:

在Kafka的消费者中查看消费情况:
Kafka整合Flume_03.png
Kafka整合Flume_04.png

猜你喜欢

转载自blog.csdn.net/daerzei/article/details/82182235