Converting the log structured visual configuration based on the realization

REVIEW: the overall architecture of a data bus DBus includes six modules, namely: log handling module, the incremental conversion module, the whole amount of the extraction procedure, log processing module operator, heartbeat monitoring module, the Web management module. Respective functions six modules are interconnected to form the working principle of DBus: acquiring the log by reading the incremental data RDBMS log increments in real time (the amount of pulling full support); based Logstash, flume, filebeat like real crawlers obtaining data, to visualize how the data is structured output. This paper describes the part of the log structured DBus conversion based visual configuration implemented.

First, the structural principles of the log

1.1 log source crawl

DBus log can interface various data sources, for example: Logstash, Flume, Filebeat like. The above-mentioned components are more popular industry journal crawlers, on the one hand and user-friendly unified industry standard for easy integration of user technical solutions; while avoiding unnecessary duplication-create the wheel. Our data fetch log referred to as original data (raw data log), a gripping assembly Kafka writes, wait for the subsequent processing DBus.

1.2 visualization configuration rules, that the log structured

Users can customize the configuration log source and destination end. The same source data log may be output to a plurality of sink. Each "log source - sink" line, the user may configure the filtering rules according to their needs. After post-rule calculation sub-processing log is structured, namely: schema constraints similar to database tables.

1.3 Rules operator

DBus design rich and easy to use operator for the data to customize operation. Processing of the user data may be divided into a plurality of steps, each step of the processing result data can instantly see, verification; and reusable different operators, until the conversion, to cut out the data they need.

1.4 execution engine

The configured rules applied to the operator sets the execution engine, the target log data pre-formed structured data, output to Kafka, downstream data for consumer use. The system flow chart is as follows:

The DBus log design principle, the same raw log, can be extracted into the one or more tables. Each table is structured to meet the same schema constraints.

  • Each table is a set of rules of operator set, each table can have one or more rules operator group;
  • Each rule set operator, a set of rules combination of operators, each operator having independence;

For any one log raw data (raw data log), it should belong to which tables it?

If the user defined number of sheets logic table (T1, T2 ...), for extracting different types of logs, then each log to be matched against rules Group Operator:

  • Execution access to all the rules of a particular table T1 operator group
  • Qualified operator enters the rule set, and the execution engine is converted to a data structure of the table
  • Under Log attempt to extract does not meet the conditions of a rule operator group
  • For all rules set T1 operator, if the requirements are not met, the process proceeds to execute the next table T2, and so on
  • If the log was not match any filter rule a table, the table into the _unknown_table_

For example, for the same application logs, which may belong to more than one set of rules or the Table, in the rule set or the Table we define as long as it satisfies the filter conditions, the application logs may be extracted set of rules, i.e., to ensure that the same application logs may belong to a different set of rules or Table.

Operator is the rule data filtering, processing, conversion of the base unit. Operators common rules shown above.

Independence between the operator, can be used in any combination between the operators, which can achieve a number of complex, high-level features, by the use of iteration operator, and ultimately can be achieved for any data processing purpose. Users can develop custom operator, operator of the development of the child is very easy, as long as the user interface to follow the basic principles, you can develop any operator.

Two, DBus Log Processing Example

DBus to a clustered environment, for example, there are two machines DBus cluster (ie, master-slave) to deploy a heartbeat procedures for monitoring, statistics, early warning, rapid heartbeat program will have some application logs, application logs contain these types of events information, if we want to process and classify these logs to the database structure, we can deal with the log using DBus log program.

DBus may access multiple data sources (Logstash, Flume, Filebeat etc.), here to Logstash DBus example to illustrate how the access log data and alarm monitoring.

Due to the monitoring and early warning were log on dbus-n2 and dbus-n3 two machines, for which we were on two machines deployed Logstash program. The heartbeat data is generated by carrying Logstash heartbeat plugin, whose role is to facilitate the DBus for statistical data and output, as well as the source extraction log end (here Logstash) early warning (for Filebeat Flume and, because they are no heartbeat plug-in, so the need for additional data timing generator heartbeat). Kafka Logstash program data written to the data format of both ordinary, but also the heartbeat data.

Here is not just limited to two programs deployed Logstash machine, DBus no limit on the number of Logstash, such as application logs distributed across dozens of hundreds of machines, you only need to deploy Logstash program on each machine, and unified data drawn to the same Kafka Topic in, DBus can be performed on the data of all the host of data processing, monitoring, early warning, and statistics.

2.1 Starting Logstash

After starting Logstash program, we can from the topic: heartbeat_log_logstash read data, the data sample as follows:

1) the heartbeat data

2) Normal log data

2.2 configuration rules

Next, we only need to configure appropriate rules in DBus Web in the data can be processed.

First create a logic table sink_info_table, the table is used to extract the log information sink event, then the rules of the corresponding table (one or more, but not all of the data required to meet the same set of filtering rules schema attribute), heartbeat_log_logstash as the original Topic data, we can visualize the data in real time operation configuration (WYSIWYG, impromptu verification).

1) read raw data log

Can be seen from Logstash previously been extracted log4j contains basic information, such as path, @ timestamp, level and the like. However, details of the data log in the log field. Due to the different output log data is not the same, and therefore can be seen that log data column are different.

2) extraction column of interest

If we are interested in raw information timestamp, log and so on, you can add a toIndex operator to extract these fields:

It should be noted, we consider the use of array subscript mode, there is a reason: - Not all columns comes with its own column names (for example flume extracted raw data, or split count data column after the sub-processing); - subscript mode you can specify column array (python similar manner, for example: 1: 3 represents 1, 2 columns); thus all subsequent operations on an array subscript access mode.

Implementation of the rules, we can see the situation after the field is extracted:

3) Data filtering required

In this example, we are only interested in containing "Sink to influxdb OK!" Data. So add a filter operator, extraction column 7 contains the "Sink to influxdb OK!" Row data content:

After the execution, only qualified log line data will exist.

4) extraction of a particular column

Adding a select operator, we are interested in the first and three in the content, so the two columns for extraction.

Select operator performed, the data will contain only 1 and 3 a.

5) so as to process data of a regular expression

We want to extract specific regular expression in line with the value of the data from the first column, use regexExtract operator to filter the data. Regular expressions are as follows:. (. *) (. *) (. *) (. *). Http_code = (\ d *) * type =, ds =, schema =, table = \ s * errorCount = (\ d *), the user can write a custom regular expressions.

After the execution, will retrieve it after regular expression execution.

6) selects the output of the column

Finally, we are interested in the column output, use saveAs operator, specify the column names and types, easy to hold in a relational database.

After performing saveAs operator, this is a good process the final output data samples.

See structured output 2.3

Further configured to save the set of rules, the log data after performing operator DBus engine, it can generate the appropriate structured data. According to the current actual project data is output DBus UMS format, if you do not want to use the UMS, you can pass a simple development, implementation, customization.

Note: UMS DBus is defined and used, common data exchange format standard JSON. Which contains both data and schema. Refer to introduce more UMS DBus open source project home page introduction. Open source address: https: //github.com/bridata/dbus

The following is a test case, the structure of the UMS data sample output:

2.4 log monitoring

To facilitate extraction master data, rule matching, monitoring and early warning situation, we provide logging data extracting visual real-time monitoring interface, as shown, at any time you know the following:

  • The number of real-time data
  • Article number error situation (the number of error means: execution count midnight event of an error, to help discover whether operator matches the data for modifying the operator, DBus also provides read-back of the log function, so as not to lose some data )
  • Data delay situation
  • Extract log normal end

Monitoring information included in the monitoring information from each host in the cluster to host IP (or domain name) to monitor the data, statistics and early warning, respectively.

There is also a table called the monitoring indicates that all _unkown_table_ no bar on the number of data to be matched. For example: Logstash fetch log log data has 5 different events, where we only three kinds of events captured, there is no other data on the matches are all in _unkown_table_ count.

DBus can also access Flume, Filebeat, UMS and other data sources, need only a small configuration, it is possible to achieve similar processing on the same data source Logstash results, more of the log on DBus processes described, refer to:

  • https://bridata.github.io/DBus/install-logstash-source.html

  • https://bridata.github.io/DBus/install-flume-source.html

  • https://bridata.github.io/DBus/install-filebeat-source.html

After treatment DBus application logs, log converter raw data for structured data, output to Kafka downstream data provided to the consumer for use, such as by falling into the database Wormhole data. How you DBus used in conjunction with the Wormhole, please refer to: How to design real-time data platform (technical papers) .

Author: Zhongzhen Lin

Guess you like

Origin www.cnblogs.com/yixinjishu/p/11274579.html