Under microservices, use ELK for log collection and unified processing

Abstract : The relevant practice of each component of microservices will involve tools. This article will introduce some tools for the daily development of microservices. These tools help us build a more robust microservice system and help us troubleshoot and solve problems and performance in the microservice system. bottleneck, etc.

The practice of each component of microservices will involve tools. This article will introduce some tools for the daily development of microservices. These tools help us build a more robust microservice system and help us troubleshoot and solve problems and performance bottlenecks in the microservice system. .

We will focus on the log collection scheme ELK in the microservice architecture (ELK is the abbreviation of Elasticsearch, Logstash, Kibana), to be precise, ELKB, that is, ELK + Filebeat, in which Filebeat is a lightweight for forwarding and centralizing log data. delivery tool.

Why you need a distributed logging system

In previous projects, if you want to locate bugs or performance problems of business services through logs in the production environment, operation and maintenance personnel need to use commands to query the log files one by one service instance, resulting in very low efficiency in troubleshooting.

Under the microservice architecture, multiple instances of services are deployed on different physical machines, and the logs of each microservice are scattered and stored on different physical machines. If the cluster is large enough, it becomes very inappropriate to use the above traditional way to view logs. Therefore, it is necessary to centrally manage the logs in the distributed system. There are open source components such as syslog, which are used to collect and summarize the logs on all servers.

However, after centralizing log files, we are faced with the statistics and retrieval of these log files, which services have alarms and exceptions, and these require detailed statistics. Therefore, when there was an online failure before, it was often seen that developers and operation and maintenance personnel downloaded the service logs, and retrieved and counted them based on some commands under Linux, such as grep, awk, and wc. This method is inefficient and has a large workload, and it is inevitable to use this method for higher requirements such as query, sorting and statistics and a huge number of machines.

ELKB distributed logging system

ELKB is a complete distributed log collection system, which solves the above-mentioned problems of difficult log collection, retrieval and analysis. ELKB refers to Elasticsearch, Logstash, Kibana, and Filebeat, respectively. A complete set of components provided by elastic can be regarded as an MVC model, logstash corresponds to the logic control controller layer, Elasticsearch is a data model model layer, and Kibana is a view layer. Logstash and Elasticsearch are implemented in Java, while Kibana uses the node.js framework.

The functions of these components and their roles in the log collection system are introduced in sequence below.

Installation and use of Elasticsearch

Elasticsearch is a real-time full-text search and analysis engine that provides three functions of collecting, analyzing, and storing data; it is a set of open REST and JAVA API structures to provide efficient search functions and an extensible distributed system. It is built on top of the Apache Lucene search engine library.

Elasticsearch can be used to search all kinds of documents. It provides scalable search, has near real-time search, supports multi-tenancy, is capable of scaling hundreds of service nodes, and supports PB-level structured or unstructured data.

Elasticsearch is distributed, which means that the index can be divided into shards, and each shard can have 0 or more replicas. Each node hosts one or more shards and acts as a coordinator to delegate operations to the correct shard. Rebalancing and routing is done automatically. Related data is typically stored in the same index, which consists of one or more primary shards and zero or more replicated shards. Once the index is created, the number of primary shards cannot be changed.

Elasticsearch is a real-time distributed search analysis engine, it is used as full-text retrieval, structured search, analysis and a combination of these three functions, it is document-oriented, meaning it stores whole objects or documents. Elasticsearch not only stores documents, but indexes the content of each document so that it can be retrieved. In Elasticsearch, you index, retrieve, sort, and filter documents -- not row and column data.

For convenience, we install Elasticsearch directly using docker:

$ docker run   -d --name elasticsearch  docker.elastic.co/elasticsearch/elasticsearch:5.4.0

It should be noted that simple settings are required after Elasticsearch is started. xpack.security.enabled is enabled by default. For convenience, cancel the login authentication. We log into the container and execute the following command:

# 进入启动好的容器
$ docker exec -it elasticsearch bash

# 编辑配置文件
$ vim config/elasticsearch.yml
cluster.name: "docker-cluster"
network.host: 0.0.0.0
http.cors.enabled: true

http.cors.allow-origin: "*"
xpack.security.enabled: false

# minimum_master_nodes need to be explicitly set when bound on a public IP
# set to 1 to allow single node clusters
# Details: https://github.com/elastic/elasticsearch/pull/17288
discovery.zen.minimum_master_nodes: 1

After modifying the configuration file, exit the container and restart the container. To preserve the configuration for later use, we need to create a new image from this container. First, obtain the ContainerId corresponding to the container. Then submit a new image based on the container.

$ docker commit -a "add config" -m "dev" a404c6c174a2  es:latest
sha256:5cb8c995ca819765323e76cccea8f55b423a6fa2eecd9c1048b2787818c1a994

This way we get a new mirror es:latest. We run the new image:

docker run -d --name es -p 9200:9200 -p 9300:9300   -e "discovery.type=single-node" es:latest

We check if the installation was successful by accessing the built-in endpoint provided by Elasticsearch.

[root@VM_1_14_centos ~]# curl 'http://localhost:9200/_nodes/http?pretty'
{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "docker-cluster",
  "nodes" : {
    "8iH5v9C-Q9GA3aSupm4caw" : {
      "name" : "8iH5v9C",
      "transport_address" : "10.0.1.14:9300",
      "host" : "10.0.1.14",
      "ip" : "10.0.1.14",
      "version" : "5.4.0",
      "build_hash" : "780f8c4",
      "roles" : [
        "master",
        "data",
        "ingest"
      ],
      "attributes" : {
        "ml.enabled" : "true"
      },
      "http" : {
        "bound_address" : [
          "[::]:9200"
        ],
        "publish_address" : "10.0.1.14:9200",
        "max_content_length_in_bytes" : 104857600
      }
    }
  }
}

As you can see, we have successfully installed Elasticsearch, which serves as the storage source of log data information, providing us with efficient search performance.

We also installed Elasticsearch's visualization tool: elasticsearch-head. The installation method is very simple:

$ docker run -p 9100:9100 mobz/elasticsearch-head:5

elasticsearch-head is a client-side plug-in used to monitor the status of Elasticsearch, including data visualization, performing addition, deletion, modification, and query operations.

The interface after installation is as follows:

Installation and use of logstash

logstash is a data analysis software whose main purpose is to analyze log logs. The principle of its use is as follows:

The data source first transmits the data to logstash, here we use Filebeat to transfer log data. Its main components are Input data input, Filter data source filtering and Output data output.

Logstash filters and formats the data (converted to JSON format), then sends it to Elasticsearch for storage, and builds an index for search. Kibana provides a front-end page view, which can be searched on the page, making the results visualized in charts.

Below we start to install and use logstash. First download and decompress logstash:

# 下载 logstash
$ wget https://artifacts.elastic.co/downloads/logstash/logstash-5.4.3.tar.gz
# 解压 logstash
$ tar -zxvf logstash-5.4.3.tar.gz

The download speed may be relatively slow, you can choose a domestic mirror source. After the decompression is successful, we need to configure logstash, mainly the input, output and filtering we mentioned.

[root@VM_1_14_centos elk]# cat logstash-5.4.3/client.conf
input {
    beats {
        port => 5044
        codec => "json"
    }
}
output {
    elasticsearch {
        hosts => ["127.0.0.1:9200"]
        index => "logstash-app-error-%{+YYYY.MM.dd}"
    }
    stdout {codec => rubydebug}
}

Input support files, syslog, beats, we can only choose one of them when configuring. Here we configure the filebeats method.

Filtering is used to process some specific behaviors and process events that match specific rules. Common filters include grok to parse irregular text and convert it into a structured format, geoip to add geographic information, drop to discard some events, and mutate to modify documents. Here is an example of filter usage:

filter {
 #定义客户端的 IP 是哪个字段
  geoip {
    source => "clientIp"
  }
}

The output supports Elasticsearch, file, graphite, and statsd. By default, the filtered data is output to Elasticsearch. When we do not need to output to ES, we need to specify which output method is required. At the same time, it supports configuring multiple output sources.

An event can go through multiple outputs during processing, but once all outputs are executed, the event completes its life cycle.

In our configuration, we output log information to Elasticsearch. After the configuration file is done, we start logstash:

$ bin/logstash  -f client.conf
Sending Logstash's logs to /elk/logstash-5.4.3/logs which is now configured via log4j2.properties
[2020-10-30T14:12:26,056][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://127.0.0.1:9200/]}}
[2020-10-30T14:12:26,062][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://127.0.0.1:9200/, :path=>"/"}
log4j:WARN No appenders could be found for logger (org.apache.http.client.protocol.RequestAuthCache).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[2020-10-30T14:12:26,209][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>#<URI::HTTP:0x1abac0 URL:http://127.0.0.1:9200/>}[2020-10-30T14:12:26,225][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2020-10-30T14:12:26,288][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>50001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"_all"=>{"enabled"=>true, "norms"=>false}, "dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword"}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date", "include_in_all"=>false}, "@version"=>{"type"=>"keyword", "include_in_all"=>false}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2020-10-30T14:12:26,304][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>[#<URI::Generic:0x2fec3fe6 URL://127.0.0.1:9200>]}
[2020-10-30T14:12:26,312][INFO ][logstash.pipeline        ] Starting pipeline {"id"=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>500}
[2020-10-30T14:12:27,226][INFO ][logstash.inputs.beats    ] Beats inputs: Starting input listener {:address=>"0.0.0.0:5044"}
[2020-10-30T14:12:27,319][INFO ][logstash.pipeline        ] Pipeline main started
[2020-10-30T14:12:27,422][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}

According to the log output from the console, we know that logstash has started normally.

Installation and use of Kibana

Kibana is a web-based graphical interface for searching, analyzing, and visualizing log data stored in Elasticsearch metrics. Kibana calls the data returned by the Elasticsearch interface for visualization. It leverages Elasticsearch's REST interface to retrieve data, not only allowing users to create custom dashboard views of their own data, but also allowing them to query and filter data in ad hoc ways.

The installation of Kibana is relatively simple, we can install it based on docker:

docker run --name kibana -e ELASTICSEARCH_URL=http://127.0.0.1:9200 -p 5601:5601 -d kibana:5.6.9

We specified the environment variable of ELASTICSEARCH in the startup command, which is the local 127.0.0.1:9200.

Installation and use of Filebeat

Filebeat is a lightweight delivery tool for forwarding and centralizing log data. Filebeat monitors specified log files or locations, collects log events, and forwards them to Logstash, Kafka, Redis, etc., or directly to Elasticsearch for indexing.

Let's start installing and configuring Filebeat:

# 下载 filebeat
$ wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-5.4.3-linux-x86_64.tar.gz
$ tar -zxvf filebeat-5.4.3-linux-x86_64.tar.gz
$ mv filebeat-5.4.3-linux-x86_64 filebeat
# 进入目录
$ cd filebeat

# 配置 filebeat
$ vi filebeat/client.yml
filebeat.prospectors:

- input_type: log
  paths:
    - /var/log/*.log

output.logstash:
  hosts: ["localhost:5044"]

In the configuration of filebeat, input_type supports input from Log, Syslog, Stdin, Redis, UDP, Docker, TCP, NetFlow. The above configuration configures reading log information from log. And it is configured to only input log files in the /var/log/ directory. output Configures Filebeat to use logstash, and uses logstash to perform additional processing on data collected by Filebeat.

Once configured, we start Filebeat:

$ ./filebeat  -e  -c client.yml
2020/10/30 06:46:31.764391 beat.go:285: INFO Home path: [/elk/filebeat] Config path: [/elk/filebeat] Data path: [/elk/filebeat/data] Logs path: [/elk/filebeat/logs]
2020/10/30 06:46:31.764426 beat.go:186: INFO Setup Beat: filebeat; Version: 5.4.3
2020/10/30 06:46:31.764522 logstash.go:90: INFO Max Retries set to: 3
2020/10/30 06:46:31.764588 outputs.go:108: INFO Activated logstash as output plugin.
2020/10/30 06:46:31.764586 metrics.go:23: INFO Metrics logging every 30s
2020/10/30 06:46:31.764664 publish.go:295: INFO Publisher name: VM_1_14_centos
2020/10/30 06:46:31.765299 async.go:63: INFO Flush Interval set to: 1s
2020/10/30 06:46:31.765315 async.go:64: INFO Max Bulk Size set to: 2048
2020/10/30 06:46:31.765563 beat.go:221: INFO filebeat start running.
2020/10/30 06:46:31.765592 registrar.go:85: INFO Registry file set to: /elk/filebeat/data/registry
2020/10/30 06:46:31.765630 registrar.go:106: INFO Loading registrar data from /elk/filebeat/data/registry
2020/10/30 06:46:31.766100 registrar.go:123: INFO States Loaded from registrar: 6
2020/10/30 06:46:31.766136 crawler.go:38: INFO Loading Prospectors: 1
2020/10/30 06:46:31.766209 registrar.go:236: INFO Starting Registrar
2020/10/30 06:46:31.766256 sync.go:41: INFO Start sending events to output
2020/10/30 06:46:31.766291 prospector_log.go:65: INFO Prospector with previous states loaded: 0
2020/10/30 06:46:31.766390 prospector.go:124: INFO Starting prospector of type: log; id: 2536729917787673381
2020/10/30 06:46:31.766422 crawler.go:58: INFO Loading and starting Prospectors completed. Enabled prospectors: 1
2020/10/30 06:46:31.766430 spooler.go:63: INFO Starting spooler: spool_size: 2048; idle_timeout: 5s

2020/10/30 06:47:01.764888 metrics.go:34: INFO No non-zero metrics in the last 30s
2020/10/30 06:47:31.764929 metrics.go:34: INFO No non-zero metrics in the last 30s
2020/10/30 06:48:01.765134 metrics.go:34: INFO No non-zero metrics in the last 30s

When you start Filebeat, it starts one or more inputs that it looks for in the location specified for log data. For each log that Filebeat finds, Filebeat starts a collector. Each collector reads a single log for new content and sends the new log data to libbeat, which will aggregate the events and send the aggregated data to the output configured for Filebeat.

Practice of using ELKB

After installing the ELKB components, we start integrating them. First, let's take a look at the process of ELKB collecting logs.

Filebeat listens to the log files of the application, and then sends the data to logstash, which filters and formats the data, such as JSON format; then logstash sends the processed log data to Elasticsearch, which stores and builds search indexes; Kibana Provides a visual view page.

After we run all the components, first look at the index changes in elasticsearch-head:

You can see an additional index of filebeat-2020.10.12, indicating that the ELKB distributed log collection framework was successfully built. Visit http://localhost:9100, let's take a look at the indexed data:

As can be seen from the above two screenshots, new log data is generated in the mysqld.log file in the /var/log/ directory. This data is very large. We need to filter according to the actual business in the production environment and process the corresponding log data. log format.

elasticsearch-head is a simple Elasticsearch client. For more complete statistics and search requirements, Kibana is needed. Kibana improves the analysis capabilities of Elasticsearch, which can analyze data more intelligently, perform mathematical transformations, and cut data into blocks according to requirements.

Visit http://localhost:5601 and get the log information in the above figure. Filebeat listens to mysql logs and displays them on Kibana. Kibana is better able to handle massive amounts of data and create column, line, scatter, histogram, pie, and maps, which I won't show here.

summary

This paper mainly introduces the distributed log collection system ELKB. Logs are primarily used to record discrete events, containing detailed information about a point or stage of program execution. ELKB solves the problem that under the microservice architecture, there are many and scattered service instances, and it is difficult to collect and analyze logs. Due to space limitations, this class only introduces the installation and use of ELKB. Go microservices generally use log frameworks such as logrus, zap, etc., and output logs to a specified location in a certain format. Readers can build a microservice for practice.

This article is shared from the HUAWEI CLOUD community "[Huawei Cloud Expert Original] Using ELK for Log Collection and Unified Processing in Microservice Architecture", the original author: aoho .

 

Click Follow to learn about HUAWEI CLOUD's new technologies for the first time~

{{o.name}}
{{m.name}}

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=324062800&siteId=291194637