Kafka | Flume Sink日志到Kafka&HDFS

记录下将服务端AC设备产生的数据采集到Flume中,然后基于Flume Sink 把数据日志同时写入到Kafka与HDFS中,对于Kafka中的数据保存到指定的Topic中,然后后续基于Spark Streaming采用Direct的方式,将数据从消息队列中,取出并做处理。

Flume采集日志文件,同时Sink写入到Kafka与HDFS。: agent = ac_online_user,如下:

ac_online_user.sources = ac_source
ac_online_user.channels = ac_channel_kafka ac_channel_hdfs
ac_online_user.sinks = ac_sink_kafka ac_sink_hdfs

ac_online_user.sources.ac_source.type = TAILDIR
ac_online_user.sources.ac_source.channels = ac_channel_kafka
ac_online_user.sources.ac_source.positionFile = /var/log/flume/position/accessaconlineuser.log
ac_online_user.sources.ac_source.recursiveDirectorySearch = true
ac_online_user.sources.ac_source.fileHeader = true
ac_online_user.sources.ac_source.fileHeaderKey = fileName
ac_online_user.sources.ac_source.filegroups = group_ac_online_user
ac_online_user.sources.ac_source.filegroups.group_ac_online_user = /var/log/accessaconlineuser.log
ac_online_user.sources.ac_source.deserializer.maxLineLength = 20480000

ac_online_user.channels.ac_channel_kafka.type = memory
ac_online_user.channels.ac_channel_kafka.capacity = 30000
ac_online_user.channels.ac_channel_kafka.transactionCapacity = 10000
ac_online_user.channels.ac_channel_kafka.useDualCheckpoints = true
ac_online_user.channels.ac_channel_kafka.checkpointDir = /data4/flume/agent/kafka/ac_online_user/checkpoint
ac_online_user.channels.ac_channel_kafka.dataDir = /data4/flume/agent/kafka/ac_online_user/datadir/
ac_online_user.channels.ac_channel_kafka.backupCheckpointDir = /data4/flume/agent/kafka/ac_online_user/backup/
ac_online_user.channels.ac_channel_kafka.checkpointInterval = 600000
ac_online_user.channels.ac_channel_kafka.keep-alive = 600

#ac_online_user.sinks.ac_sink_kafka.channel = ac_channel_kafka
ac_online_user.sinks.ac_sink_kafka.type = org.apache.flume.sink.kafka.KafkaSink
ac_online_user.sinks.ac_sink_kafka.kafka.bootstrap.servers = 10.10.10.1:9092,10.10.10.2:9092,10.10.10.3:9092
ac_online_user.sinks.ac_sink_kafka.kafka.topic = ac_online_user
ac_online_user.sinks.ac_sink_kafka.kafka.batchSize = 20
ac_online_user.sinks.ac_sink_kafka.kafka.producer.requiredAcks=1

ac_online_user.channels.ac_channel_hdfs.type = memory
ac_online_user.channels.ac_channel_hdfs.capacity = 30000
ac_online_user.channels.ac_channel_hdfs.transactionCapacity = 10000
ac_online_user.channels.ac_channel_hdfs.useDualCheckpoints = true
ac_online_user.channels.ac_channel_hdfs.checkpointDir = /data4/flume/agent/hdfs/ac_online_user/checkpoint
ac_online_user.channels.ac_channel_hdfs.dataDir = /data4/flume/agent/hdfs/ac_online_user/datadir/
ac_online_user.channels.ac_channel_hdfs.backupCheckpointDir = /data4/flume/agent/hdfs/ac_online_user/backup/
ac_online_user.channels.ac_channel_hdfs.checkpointInterval = 600000
ac_online_user.channels.ac_channel_hdfs.keep-alive = 600

#ac_online_user.sinks.ac_sink_hdfs.channel = ac_channel_hdfs
ac_online_user.sinks.ac_sink_hdfs.type = hdfs
ac_online_user.sinks.ac_sink_hdfs.hdfs.path = hdfs://hadoop-master:9000/datalog/ac_online_user/%Y/%m/%d/%H/%M
ac_online_user.sinks.ac_sink_hdfs.hdfs.filePrefix = ac.online.10.254.32.203-
ac_online_user.sinks.ac_sink_hdfs.hdfs.fileType = DataStream
ac_online_user.sinks.ac_sink_hdfs.hdfs.useLocalTimeStamp = true
ac_online_user.sinks.ac_sink_hdfs.hdfs.callTimeout = 1000000
ac_online_user.sinks.ac_sink_hdfs.hdfs.batchSize = 1000
ac_online_user.sinks.ac_sink_hdfs.hdfs.closeTries = 0
ac_online_user.sinks.ac_sink_hdfs.hdfs.round = true
ac_online_user.sinks.ac_sink_hdfs.hdfs.rollCount = 0
ac_online_user.sinks.ac_sink_hdfs.hdfs.rollSize = 134217728
ac_online_user.sinks.ac_sink_hdfs.hdfs.rollInterval = 10
ac_online_user.sinks.ac_sink_hdfs.hdfs.idleTimeout = 300

ac_online_user.sources.ac_source.channels = ac_channel_kafka ac_channel_hdfs
ac_online_user.sinks.ac_sink_hdfs.channel = ac_channel_hdfs
ac_online_user.sinks.ac_sink_kafka.channel = ac_channel_kafka
发布了44 篇原创文章 · 获赞 11 · 访问量 5432

猜你喜欢

转载自blog.csdn.net/Sampson_Hugo/article/details/103820662