logstash filters and collects nginx logs

 

 

      In the production environment, the nginx log format often uses a custom format. We need to structure the messages in logstash and then store them to facilitate kibana's search and statistics. Therefore, we need to parse the messages.

  This article uses the grok filter, uses match regular expression parsing, and customizes it according to its own log_format.

1. nginx log format

  The log_format configuration is as follows:

log_format  main  '$remote_addr - $remote_user [$time_local] $http_host $request_method "$uri" "$query_string" '
                  '$status $body_bytes_sent "$http_referer" $upstream_status $upstream_addr $request_time $upstream_response_time '
                  '"$http_user_agent" "$http_x_forwarded_for"' ;

The corresponding log is as follows:

192.172.2.1 - - [06/Jun/2016:00:00:01 +0800] test.changh.com GET "/api/index" "?cms=0&rnd=1692442321" 200 4 "http://www.test.com/?cp=sfwefsc" 200 192.168.0.122:80 0.004 0.004 "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36" "-"

 

2. Write regular expressions

  There is a part of regular rules in logstash by default for us to use, you can visit the Grok Debugger to view, you can find it in logstash/

View in the vendor/bundle/jruby/1.9/gems/logstash-patterns-core-4.1.1/patterns/ directory

      The basic definition is in grok-patterns, and we can use the regular patterns in it. Of course, not all of them are suitable for the nginx field. In this case, we need to customize the regular patterns and call them by specifying patterns_dir.

At the same time, you can use the Grok Debugger or Grok Comstructor tool to help us debug faster   when writing regular expressions. When you don't know how to use the regular expressions in logstash, you can also use Grok Debugger 's Descover to automatically match. (Pay attention to whether the network is connected, you need a wall)

  1) nginx standard log format

    The grok regularization that comes with logstash has Apache's standard log format:

 COMMONAPACHELOG %{IPORHOST:clientip} %{HTTPDUSER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) COMBINEDAPACHELOG %{COMMONAPACHELOG} %{QS:referrer} %{QS:agent}

For the nginx standard log format, it can be found that there is only one more $http_x_forwarded_for variable at the end. Then the grok regularity of the nginx standard log is defined as:

MAINNGINXLOG %{COMBINEDAPACHELOG} %{QS:x_forwarded_for}

2) Custom format

    Using log_format to match the corresponding regular is as follows:

%{IPV4:remote_addr} - (%{USERNAME:user}|-) \[%{HTTPDATE:log_timestamp}\] (%{HOSTNAME1:http_host}|-) (%{WORD:request_method}|-) \"(%{URIPATH1:uri}|-|)\" \"(%{URIPARM1:param}|-)\" %{STATUS:http_status} (?:%{BASE10NUM:body_bytes_sent}|-) \"(?:%{GREEDYDATA:http_referrer}|-)\" (%{STATUS:upstream_status}|-) (?:%{HOSTPORT1:upstream_addr}|-) (%{BASE16FLOAT:upstream_response_time}|-) (%{STATUS:request_time}|-) \"(%{GREEDYDATA:user_agent}|-)\" \"(%{FORWORD:x_forword_for}|-)\"

Here are a few of my custom regulars:

       URIPARM1 [A-Za-z0-9$.+!*'|(){},~@#%&/=:;^\\_<>`?\-\[\]]*

URIPATH1 (?:/[\\A-Za-z0-9$.+!*'(){},~:;=@#% \[\]_<>^\-&?]*)+

HOSTNAME1 \ b (?: [0-9A-Za-z _ \ -] [0-9A-Za-z -_ \ -] {0,62}) (?: \. (?: [0-9A-Za] -z _ \ -] [0-9A-Za-z -: \ -_] [0,62})) * (\.? | \ b)

STATUS ([0-9.]{0,3}[, ]{0,2})+

HOSTPORT1 (%{IPV4}:%{POSINT}[, ]{0,2})+

FORWORD (?:%{IPV4}[,]?[ ]?)+|%{WORD}

      The message in logstash is each log read in. IPORHOST, USERNAME, HTTPDATE, etc. are the regular format names defined in patterns/grok-patterns, which are written against the log.

  The syntax of grok pattren is: %{SYNTAX:semantic}, ":" is preceded by the variable defined in grok-pattrens, and the name of the variable can be customized after it. (?:%{SYNTAX:semantic}|-) This form is a conditional judgment.

  If there are double quotes "" or square brackets [], it needs to be escaped with \.

  Detailed explanation of custom regex:

 URIPARAM \?[A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]<>]* 

 URIPARM1 [A-Za-z0-9$.+!*'|(){},~@#%&/=:;^\\_<>`?\-\[\]]* in grok-patterns Regular expression, you can see that the parameters starting with "?" in grok-patterns have been removed from the $query_string of nginx, so we no longer need "?" here. In addition, the special symbols ^ \ _ < > ` appearing in the log are added separately 

 URIPATH (?:/[A-Za-z0-9$.+!*'(){},~:;=@#%&_\-]*)+ 

 URIPATH1 (?:/[\\A-Za-z0-9$.+!*'(){},~:;=@#% \[\]_<>^\-&?]*)+ grok Regular expressions in -patterns, URIPATH in grok-patterns cannot match URIs with spaces, so add a space in the middle. In addition there are \ [ ] < > ^ special symbols.

 HOSTNAME \ b [? [0-9A-Za-z] [0-9A-Za-z -] {0,62}) (?: \. (?: [0-9A-Za-z] [0] -9A-Za-z -] [0,62})) * (\.? | \ B) 

 HOSTNAME1 \b(?:[0-9A-Za-z_\-][0-9A-Za-z-_\-]{0,62})(?:\.(?:[0-9A-Za -z_\-][0-9A-Za-z-:\-_]{0,62}))*(\.?|\b) Add matching characters with "-" in http_host.

 HOSTPORT %{IPORHOST}:%{POSINT} 

 HOSTPORT1 (%{IPV4}:%{POSINT}[, ]{0,2})+ When matching the upstream_addr field, it is found that there will be multiple IP addresses, and multiple IP addresses will be matched.

 STATUS ([0-9.]{0,3}[, ]{0,2})+ This field matches multiple http_status when multiple upstream_addr fields appear.

 FORWORD (?:%{IPV4}[,]?[ ]?)+|%{WORD} matches when multiple IP addresses appear in the x_forword_for field.

  The left and right fields of nginx are defined, and you can use the Grok Debugger or Grok Comstructor tool to test. When adding custom patterns, you can check "Add custom patterns" in Grok Debugger.

  The above log matching results are:

{

  "remote_addr": [

    "1.1.1.1"

  ],

  "user": [

    "-"

  ],

  "log_timestamp": [

    "06/Jun/2016:00:00:01 +0800"

  ],

  "http_host": [

    "www.test.com"

  ],

  "request_method": [

    "GET"

  ],

  "uri": [

    "/api/index"

  ],

  "param": [

    "?cms=0&rnd=1692442321"

  ],

  "http_status": [

    "200"

  ],

  "body_bytes_sent": [

    "4"

  ],

  "http_referrer": [

    "http://www.test.com/?cp=sfwefsc"

  ],

  "port": [

    null

  ],

  "upstream_status": [

    "200"

  ],

  "upstream_addr": [

    "192.168.0.122:80"

  ],

  "upstream_response_time": [

    "0.004"

  ],

  "request_time": [

    "0.004"

  ],

  "user_agent": [

    ""Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36""

  ],

  "client_ip": [

    "2.2.2.2"

  ],

  "x_forword_for": [

    null

  ]

}

 

但是、我们的环境中 nginx 的log_format  定义如下:

log_format  access  '$remote_addr - $remote_user [$time_local] "$request" '

                        '$status $body_bytes_sent "$http_referer" '

                        '"$http_user_agent" $http_x_forwarded_for'

                        '$upstream_addr $upstream_response_time $request_time ';

 

故,我在grokdubug调试配置就必须这样写 ,同时要添加自定义的正则表达式:

      %{IPV4:remote_addr} - (%{USERNAME:user}|-) \[%{HTTPDATE:log_timestamp}\] \"%{WORD:request_method} %{URIPATH1:uri}\" %{BASE10NUM:http_status} (?:%{BASE10NUM:body_bytes_sent}|-) \"(?:%{GREEDYDATA:http_referrer}|-)\" \"(%{GREEDYDATA:user_agent}|-)\"

 

3、logstash的配置文件

  创建自定义正则目录

# mkdir -p /usr/local/logstash/patterns
# vi /usr/local/logstash/patterns/nginx

然后写入上面自定义的正则

 

URIPARM1 [A-Za-z0-9$.+!*'|(){},~@#%&/=:;^\\_<>`?\-\[\]]*

URIPATH1 (?:/[\\A-Za-z0-9$.+!*'(){},~:;=@#% \[\]_<>^\-&?]*)+

HOSTNAME1 \b(?:[0-9A-Za-z_\-][0-9A-Za-z-_\-]{0,62})(?:\.(?:[0-9A-Za-z_\-][0-9A-Za-z-:\-_]{0,62}))*(\.?|\b)

STATUS ([0-9.]{0,3}[, ]{0,2})+

HOSTPORT1 (%{IPV4}:%{POSINT}[, ]{0,2})+

FORWORD (?:%{IPV4}[,]?[ ]?)+|%{WORD}

URIPARM [A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]]*

URIPATH (?:/[A-Za-z0-9$.+!*'(){},~:;=@#%&_\- ]*)+

URI1 (%{URIPROTO}://)?(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})?

NGINXACCESS %{IPORHOST:remote_addr} - (%{USERNAME:user}|-) \[%{HTTPDATE:log_timestamp}\]  \"{WORD:request_method} %{URIPATH1:uri}\" %{BASE10NUM:http_status} (?:%{BASE10NUM:body_bytes_sent}|-) \"(?:%{GREEDYDATA:http_referrer}|-)\" \"(%{GREEDYDATA:user_agent}|-)\" (%{FORWORD:x_forword_for}|-) (?:%{HOSTPORT1:upstream_addr}|-) ({BASE16FLOAT:upstream_response_time}|-) (%{STATUS:request_time}|-)

 

logstash.conf配置文件内容

input { 

  beats {

    port => 5044

    type => "nginx-log"

  }

}

 

filter {

  if [type] == "nginx-log"{

     grok {

        patterns_dir => "/usr/local/logstash/patterns"

        match => {"message" => "%{NGINXACCESS}" }

     }

     date {

        match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]

     }

     geoip {

        source => "clientip"

     }

  }

}

 

output {

  elasticsearch {

    hosts => ["10.129.11.87:9200","10.129.11.88:9200"]

    index => "logstash-custom-nginx%{+YYYY.MM.dd}"

    document_type => "%{type}"

    flush_size => 20000

    idle_flush_time => 10

    sniffing => true

    template_overwrite => true

  }

}

 

 4、启动logstash,然后就可以查看日志是否写入elasticsearch中。

 

==========================

如果用  grafana 读取es日志好看监控数据:

则可以将nginx 配置为:

log_format main   '{"@timestamp":"$time_iso8601",'
                        '"@source":"$server_addr",'
                        '"hostname":"$hostname",'
                        '"ip":"$http_x_forwarded_for",'
                        '"client":"$remote_addr",'
                        '"request_method":"$request_method",'
                        '"scheme":"$scheme",'
                        '"domain":"$server_name",'
                        '"referer":"$http_referer",'
                        '"request":"$request_uri",'
                        '"args":"$args",'
                        '"size":$body_bytes_sent,'
                        '"status": $status,'
                        '"responsetime":$request_time,'
                        '"upstreamtime":"$upstream_response_time",'
                        '"upstreamaddr":"$upstream_addr",'
                        '"http_user_agent":"$http_user_agent",'
                        '"https":"$https"'
                        '}';

这样的字段可以用 grafana 相对的模板数据;

input {
    file {
        #这里根据自己日志命名使用正则匹配所有域名访问日志
        path => [ "/usr/local/nginx/logs/*_access.log" ]
        ignore_older => 0
    codec => json
    }
}

filter {
    mutate {
      convert => [ "status","integer" ]
      convert => [ "size","integer" ]
      convert => [ "upstreatime","float" ]
      remove_field => "message"
    }
    geoip {
        source => "ip"
    }


}
output {
    elasticsearch {
        hosts => "127.0.0.1:9200"
        index => "logstash-nginx-access-%{+YYYY.MM.dd}"
    }
#    stdout {codec => rubydebug}
}

https://grafana.com/dashboards/2292 (grafana 的nginx-access 模板)  

 

我用的filebeat采集的则如下配置

input {

  beats {

    port => 5044

  }

}

 

filter {

   if [fields][doc_type] == "nginx_access_log" {

      mutate {

         convert => [ "status","integer" ]

         convert => [ "size","integer" ]

         convert => [ "upstreatime","float" ]

      }

      geoip {

         source => "ip"

      }

   }

}

 

output {

  if [fields][doc_type] == "nginx_access_log"{

     elasticsearch {

        hosts => ["10.129.11.87:9200","10.129.11.88:9200"]

        index => "logstash-nginx-access%{+YYYY.MM.dd}"

        document_type => "%{type}"

        flush_size => 20000

        idle_flush_time => 10

        sniffing => true

        template_overwrite => true

     }

     stdout{codec => rubydebug}

  }

  if [fields][doc_type] == "nginx_error_log" {

     elasticsearch {

        hosts => ["10.129.11.87:9200","10.129.11.88:9200"]

        index => "logstash-nginx-error%{+YYYY.MM.dd}"

        document_type => "%{type}"

        flush_size => 20000

        idle_flush_time => 10

        sniffing => true

        template_overwrite => true

     }

  }

}

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326237976&siteId=291194637