Flume listens for file changes in the folder _ and sinks the file to hdfs

Read the full text http://click.aliyun.com/m/23237/
1. Collection directory to HDFS
collection requirements: Under a certain directory of a server, new files will be generated continuously. Collect files into HDFS
According to the requirements, first define the following three major elements
: Collection source, namely source—monitoring file directory: spooldir
sinking target, namely sink—HDFS file system:
transmission channel between hdfs sink source and sink— --channel, available file channel can also be written in memory channel

configuration file spooldir-hdfs.conf:

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
## Note: You cannot repeatedly drop files of the same name into the monitoring directory
. ## Monitor file content changes
through spooldir a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/tuzq/software/flumedata
a1.sources. r1.fileHeader = true

# Describe the sink
## Indicates sinking to hdfs. The type of configuration below is different, and the parameters under type are different
a1.sinks.k1.type = hdfs
#sinks.k1 can only connect to one channel, and the source can be configured with multiple
a1.sinks.k1. channel = c1
#The following configuration tells where to write when using hdfs to write files. The following representation is not hard-coded, but dynamically changed. Indicates that the output directory name is variable
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/ #represents
the prefix of the file
a1.sinks.k1.hdfs .filePrefix = events- #Indicates
whether to update the folder when it is time to trigger, true: Indicates to update
a1.sinks.k1.hdfs.round = true
##Indicates to change the folder
a1.sinks every 1 minute .k1.hdfs.roundValue = 1
##The unit is minute when switching files
a1.sinks.k1.hdfs.roundUnit = minute
##Indicates that as long as 3 seconds pass, a new file
a1.sinks.k1 will be generated by switching .hdfs.rollInterval = 3
##Switch once if the recorded file is larger than 20 bytes
a1.sinks.k1.hdfs.rollSize = 20
##Triggered when 5 events are written
a1.sinks.k1.hdfs.rollCount = 5
##How many messages are received to append content to hdfs
a1.sinks.k1.hdfs.batchSize = 1 #Use
local timestamp
a1.sinks.k1.hdfs.useLocalTimeStamp = true #The
generated file type, the default is Sequencefile, if DataStream is available, it is normal text
a1.sinks.k1.hdfs.fileType = DataStream

# Use a channel which buffers events in memory
##How to use memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
Channel parameter explanation :
capacity: the default maximum number of events that can be stored in the channel
trasactionCapacity: the maximum number of events that can be obtained from the source or sent to the sink each time
keep-alive: the allowable time for the event to be added to or removed from the channel
Read full article http://click.aliyun.com/m/23237/

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326185123&siteId=291194637