ELK 配置及注意点

本人写的基于elk收集nginx日志,并对接口访问量统计和响应慢的接口统计

nginx日志格式:

这里nginx日志采用json格式输出,如下:

'{"@timestamp":"$time_iso8601",'
    '"body_size":$body_bytes_sent,'
    '"token":"$http_token",'
    '"cookie_token":"$cookie_token",'
    '"parameters":"$query_string",'
    '"request_time":$request_time,'
    '"request_length":$request_length,'
    '"server":"$upstream_addr",'
    '"method":"$request_method",'
    '"url":"$uri",'
    '"upstream_header_time":"$upstream_header_time",'
    '"upstream_response_time":"$upstream_response_time",'
    '"upstream_connect_time":"$upstream_connect_time",'
    '"network":"$http_network",'
    '"status":"$status"}'

这里对几个特殊的时间进行解释下

request_time
nginx从接收用户第一个字节开始到发送给用户最后一个字节结束,可以粗略看做用户这次请求总耗时(完整的来说还应该加上建立http连接的时间)

upstream_connect_time
nginx和upstream(nginx代理的服务)建立连接的时间

upstream_header_time
从nginx和upstream建立连接到收到upstream响应的第一个字节,可以简单理解为:upstream_header_time=服务处理时间+upstream_connect_time

upstream_response_time
从nginx和upstream建立连接到收到upstream最后一个字节,可以简单理解为:upstream_response_time=upstream_connect_time+服务处理时间+upstream把结果传输给nginx时间

需要注意的是,这三个时间有多个,以逗号分割的,因为nginx会有重试,如果重试了,就会存在时间,
例如:"123,23" 说明nginx访问第一失败了,重试访问的第二个

logstash配置

input{
  file{
    path => "/var/log/nginx/access.log"
    codec => "json"
    add_field => {"nginx" => "nginxip"}
  }
}

filter {
  ruby {
        code => "event['request_time'] = event['request_time'].to_f * 1000;
        event['upstream_header_time'] = event['upstream_header_time'].split(',').first.to_f * 1000;
        event['upstream_response_time'] = event['upstream_response_time'].split(',').first.to_f * 1000;
        event['upstream_connect_time'] = event['upstream_connect_time'].split(',').first.to_f * 1000;
        "
  }

  if [token] == "" or [token] == "-" {
    mutate {
        replace => {
          "token" => "%{cookie_token}"
        }
      remove_field => ["cookie_token"]
    }
  } else {
    mutate {
      remove_field => ["cookie_token"]
    }
  }
}

output {
  elasticsearch {
    hosts => ["es hosts,逗号分隔"]
    index => "logstash-nginx-%{+YYYY.MM}"
  }
}

Elasticsearch配置

nginx log index template:

{
  "template": "logstash-nginx-*",
  "order": 1,
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 0
  },
  "mappings": {
    "logstash-nginx": {
      "_source": {
        "enabled": true
      },
      "_all": {
        "enabled": false
      },
      "properties": {
        "date": {
          "type": "date",
          "index": "not_analyzed",
          "doc_values": true,
          "format": "yyyy-MM-dd'\''T'\''HH:mm:ss.SSS'\''Z'\''"
        },
        "body_size": {
          "type": "integer",
          "index": "not_analyzed",
          "doc_values": true
        },
        "request_time": {
          "type": "integer",
          "index": "not_analyzed",
          "doc_values": true
        },
        "server": {
          "type": "string",
          "index": "not_analyzed",
          "doc_values": true
        },
        "method": {
          "type": "string",
          "index": "not_analyzed",
          "doc_values": true
        },
        "url": {
          "type": "string",
          "index": "not_analyzed",
          "doc_values": true
        },
        "status": {
          "type": "integer",
          "index": "not_analyzed",
          "doc_values": true
        },
        "token": {
          "type": "string",
          "index": "not_analyzed",
          "doc_values": true
        },
        "nginx": {
          "type": "string",
          "index": "not_analyzed",
          "doc_values": true
        },
        "parameters": {
          "type": "string",
          "index": "not_analyzed",
          "doc_values": true
        },
        "request_length": {
          "type": "integer",
          "index": "not_analyzed",
          "doc_values": true
        },
        "upstream_header_time": {
          "type": "integer",
          "index": "not_analyzed",
          "doc_values": true
        },
        "upstream_response_time": {
          "type": "integer",
          "index": "not_analyzed",
          "doc_values": true
        },
        "upstream_connect_time": {
          "type": "integer",
          "index": "not_analyzed",
          "doc_values": true
        },
        "network": {
          "type": "string",
          "index": "not_analyzed",
          "doc_values": true
        }
      }
    }
  }
}

grafana

本人没采用kabana,是因为我比较熟悉grafana,并且grafana支持多种数据源,可以方便后期切换数据源,也就是扩展性好些.

grafana查询es数据直接采用luence语法即可
例如查询慢响应接口:upstream_response_time:[600 TO 1000000000] AND status:200