I feel that it is slightly more powerful than flume. I researched it recently and recorded some cautions here.
In other words, the best tutorial is the official website: https://www.elastic.co/guide/index.html
About installation
Logstash is developed in JRuby language. Just unzip the installation package (provided that jdk is already installed)
Logstash basically consists of three parts, input , output and the filter that the user needs to add , so the standard configuration file format is as follows:
input {...}
filter {...}
output {...}
input is similar to flume's source, output is similar to flume's sink. filter is similar to interceptor
In each section, you can also specify multiple access methods. For example, if you want to specify two log source files, you can write:
input {
file { path =>"/var/log/messages" type =>"syslog"}
file { path =>"/var/log/apache/access.log" type =>"apache"}
}
Similarly, if multiple processing rules are added to the filter , they will be processed one by one in their order, but some plugins are not thread-safe.
For example, if two identical plugins are specified in the filter , these two tasks cannot be guaranteed to be executed in exact order, so the official recommendation is to avoid reusing plugins in the filter .
The following is a very simple example of reading from a local folder and writing to kafka:
logstash.conf
input { file { path => "/var/service_logs/*.log" discover_interval => 5 start_position => "beginning" } } output { kafka { topic_id => "servicelogs" codec => plain { format => "%{message}" } bootstrap_servers => "192.168.75.71:9092,192.168.75.72:9092,192.168.75.73:9092" } }
bootstrap_servers gives the address of the kafka broker. (Note that it is not zookeeper, but directly gives the broker address)
When starting logstash, you only need to pass the written conf file as a parameter to the startup script. If you do not specify a startup configuration file, the default is from standard input as input and standard output as output
bin/logstash -f logstash.conf
simple test
1. As a preparation, start the Zookeeper and kafka clusters:
zookeeper/bin/zkServer.sh start
kafka/bin/kafka-server-start.sh /app/kafka/config/server.properties
2. Create topic: servicelogs
bin/kafka-topics.sh --create --zookeeper amie01:2181 --replication-factor 3 --partitions 3 --topic servicelogs
kafka-topics.sh --describe --zookeeper amie01:2181 --topic servicelogs View topic details:
Topic:servicelogs PartitionCount:3 ReplicationFactor:3 Configs:
Topic: servicelogs Partition: 0 Leader: 0 Replicas: 0,2,1 Isr: 0,2,1
Topic: servicelogs Partition: 1 Leader: 1 Replicas: 1,0,2 Isr: 1,0,2
Topic: servicelogs Partition: 2 Leader: 2 Replicas: 2,1,0 Isr: 2,1,0
3. Start logstash:
bin/logstash -f logstash.conf
You can write a simple script to simulate the data: while true ; do echo `date` >> /var/service_logs/a.log ; sleep 0.05; done
Then in order to check whether kafka's servicelogs have been output by logstash, you can use kafka's console consume to test it.
The test result is completed and can work~~