本人写的基于elk收集nginx日志,并对接口访问量统计和响应慢的接口统计
nginx日志格式:
这里nginx日志采用json格式输出,如下:
'{"@timestamp":"$time_iso8601",'
'"body_size":$body_bytes_sent,'
'"token":"$http_token",'
'"cookie_token":"$cookie_token",'
'"parameters":"$query_string",'
'"request_time":$request_time,'
'"request_length":$request_length,'
'"server":"$upstream_addr",'
'"method":"$request_method",'
'"url":"$uri",'
'"upstream_header_time":"$upstream_header_time",'
'"upstream_response_time":"$upstream_response_time",'
'"upstream_connect_time":"$upstream_connect_time",'
'"network":"$http_network",'
'"status":"$status"}'
这里对几个特殊的时间进行解释下
request_time
nginx从接收用户第一个字节开始到发送给用户最后一个字节结束,可以粗略看做用户这次请求总耗时(完整的来说还应该加上建立http连接的时间)
upstream_connect_time
nginx和upstream(nginx代理的服务)建立连接的时间
upstream_header_time
从nginx和upstream建立连接到收到upstream响应的第一个字节,可以简单理解为:upstream_header_time=服务处理时间+upstream_connect_time
upstream_response_time
从nginx和upstream建立连接到收到upstream最后一个字节,可以简单理解为:upstream_response_time=upstream_connect_time+服务处理时间+upstream把结果传输给nginx时间
需要注意的是,这三个时间有多个,以逗号分割的,因为nginx会有重试,如果重试了,就会存在时间,
例如:"123,23" 说明nginx访问第一失败了,重试访问的第二个
logstash配置
input{
file{
path => "/var/log/nginx/access.log"
codec => "json"
add_field => {"nginx" => "nginxip"}
}
}
filter {
ruby {
code => "event['request_time'] = event['request_time'].to_f * 1000;
event['upstream_header_time'] = event['upstream_header_time'].split(',').first.to_f * 1000;
event['upstream_response_time'] = event['upstream_response_time'].split(',').first.to_f * 1000;
event['upstream_connect_time'] = event['upstream_connect_time'].split(',').first.to_f * 1000;
"
}
if [token] == "" or [token] == "-" {
mutate {
replace => {
"token" => "%{cookie_token}"
}
remove_field => ["cookie_token"]
}
} else {
mutate {
remove_field => ["cookie_token"]
}
}
}
output {
elasticsearch {
hosts => ["es hosts,逗号分隔"]
index => "logstash-nginx-%{+YYYY.MM}"
}
}
Elasticsearch配置
nginx log index template:
{
"template": "logstash-nginx-*",
"order": 1,
"settings": {
"number_of_shards": 2,
"number_of_replicas": 0
},
"mappings": {
"logstash-nginx": {
"_source": {
"enabled": true
},
"_all": {
"enabled": false
},
"properties": {
"date": {
"type": "date",
"index": "not_analyzed",
"doc_values": true,
"format": "yyyy-MM-dd'\''T'\''HH:mm:ss.SSS'\''Z'\''"
},
"body_size": {
"type": "integer",
"index": "not_analyzed",
"doc_values": true
},
"request_time": {
"type": "integer",
"index": "not_analyzed",
"doc_values": true
},
"server": {
"type": "string",
"index": "not_analyzed",
"doc_values": true
},
"method": {
"type": "string",
"index": "not_analyzed",
"doc_values": true
},
"url": {
"type": "string",
"index": "not_analyzed",
"doc_values": true
},
"status": {
"type": "integer",
"index": "not_analyzed",
"doc_values": true
},
"token": {
"type": "string",
"index": "not_analyzed",
"doc_values": true
},
"nginx": {
"type": "string",
"index": "not_analyzed",
"doc_values": true
},
"parameters": {
"type": "string",
"index": "not_analyzed",
"doc_values": true
},
"request_length": {
"type": "integer",
"index": "not_analyzed",
"doc_values": true
},
"upstream_header_time": {
"type": "integer",
"index": "not_analyzed",
"doc_values": true
},
"upstream_response_time": {
"type": "integer",
"index": "not_analyzed",
"doc_values": true
},
"upstream_connect_time": {
"type": "integer",
"index": "not_analyzed",
"doc_values": true
},
"network": {
"type": "string",
"index": "not_analyzed",
"doc_values": true
}
}
}
}
}
grafana
本人没采用kabana,是因为我比较熟悉grafana,并且grafana支持多种数据源,可以方便后期切换数据源,也就是扩展性好些.
grafana查询es数据直接采用luence语法即可
例如查询慢响应接口:upstream_response_time:[600 TO 1000000000] AND status:200