Logstash7.4实现Kafka消息、Beats、MySQL的数据收集、解析、转换和ElasticSearch存储的应用场景

ElasticSearch是个是一个分布式、可扩展、实时的搜索与数据分析引擎,如何将海量数据源高效可靠的写入到ElasticSearch是个无法避免的

Logstash概念与原理

Logstash 是开源的服务器端数据处理管道,能够同时从多个来源动态地采集、转换和传输数据到ElasticSearch的索引中,进而对数据进行分词、检索与分析,不受格式或复杂度的影响,它提供了丰富的过滤器库,如能利用 Grok 从非结构化数据中派生出结构,从 IP 地址解码出地理坐标,匿名化或排除敏感字段,并简化整体处理过程

Logstash应用场景

1、Logstash直接作为客户端数据源收集器,对数据进行解析转换和存储(Logstash较为重量级,消耗资源较多)

2、通过Beats收集客户端数据,Logstash对Beats的数据进行进一步收集、分析和转换

3、订阅Kaka消息,对数据进行解析、转换

解决方案:

1、数据源(MySQL数据,)——Logstash——输出(输出到ElasticSearch、文件、kafka、Redis…)

2、数据源——Beats(如FileBeats)——Logstash——输出

3、数据源——Beats——Kafka(Redis)——Logstash——输出

4、Kafia(Redis)——Logstash——输出

Logstash实现kafka消息订阅、解析与ElasticSearch存储

Logstash实现FileBeat数据收集、清洗与ElasticSearch存储

Logstash实现MySQL数据收集、解析与ElasticSearch存储

Logstash的过滤器插件库

Plugin

Description

Github repository

aggregate

Aggregates information from several events originating with a single task

logstash-filter-aggregate

alter

Performs general alterations to fields that the mutate filter does not handle

logstash-filter-alter

bytes

Parses string representations of computer storage sizes, such as "123 MB" or "5.6gb", into their numeric value in bytes

logstash-filter-bytes

cidr

Checks IP addresses against a list of network blocks

logstash-filter-cidr

cipher

Applies or removes a cipher to an event

logstash-filter-cipher

clone

Duplicates events

logstash-filter-clone

csv

Parses comma-separated value data into individual fields

logstash-filter-csv

date

Parses dates from fields to use as the Logstash timestamp for an event

logstash-filter-date

de_dot

Computationally expensive filter that removes dots from a field name

logstash-filter-de_dot

dissect

Extracts unstructured event data into fields using delimiters

logstash-filter-dissect

dns

Performs a standard or reverse DNS lookup

logstash-filter-dns

drop

Drops all events

logstash-filter-drop

elapsed

Calculates the elapsed time between a pair of events

logstash-filter-elapsed

elasticsearch

Copies fields from previous log events in Elasticsearch to current events

logstash-filter-elasticsearch

environment

Stores environment variables as metadata sub-fields

logstash-filter-environment

extractnumbers

Extracts numbers from a string

logstash-filter-extractnumbers

fingerprint

Fingerprints fields by replacing values with a consistent hash

logstash-filter-fingerprint

geoip

Adds geographical information about an IP address

logstash-filter-geoip

grok

Parses unstructured event data into fields

logstash-filter-grok

http

Provides integration with external web services/REST APIs

logstash-filter-http

i18n

Removes special characters from a field

logstash-filter-i18n

java_uuid

Generates a UUID and adds it to each processed event

core plugin

jdbc_static

Enriches events with data pre-loaded from a remote database

logstash-filter-jdbc_static

jdbc_streaming

Enrich events with your database data

logstash-filter-jdbc_streaming

json

Parses JSON events

logstash-filter-json

json_encode

Serializes a field to JSON

logstash-filter-json_encode

kv

Parses key-value pairs

logstash-filter-kv

memcached

Provides integration with external data in Memcached

logstash-filter-memcached

metricize

Takes complex events containing a number of metrics and splits these up into multiple events, each holding a single metric

logstash-filter-metricize

metrics

Aggregates metrics

logstash-filter-metrics

mutate

Performs mutations on fields

logstash-filter-mutate

prune

Prunes event data based on a list of fields to blacklist or whitelist

logstash-filter-prune

range

Checks that specified fields stay within given size or length limits

logstash-filter-range

ruby

Executes arbitrary Ruby code

logstash-filter-ruby

sleep

Sleeps for a specified time span

logstash-filter-sleep

split

Splits multi-line messages into distinct events

logstash-filter-split

syslog_pri

Parses the PRI (priority) field of a syslog message

logstash-filter-syslog_pri

threats_classifier

Enriches security logs with information about the attacker’s intent

logstash-filter-threats_classifier

throttle

Throttles the number of events

logstash-filter-throttle

tld

Replaces the contents of the default message field with whatever you specify in the configuration

logstash-filter-tld

translate

Replaces field contents based on a hash or YAML file

logstash-filter-translate

truncate

Truncates fields longer than a given length

logstash-filter-truncate

urldecode

Decodes URL-encoded fields

logstash-filter-urldecode

useragent

Parses user agent strings into fields

logstash-filter-useragent

uuid

Adds a UUID to events

logstash-filter-uuid

xml

Parses XML into fields

logstash-filter-xml

grok,能通过正则解析和结构化任何文本,Grok 目前是Logstash最好的方式对非结构化日志数据解析成结构化和可查询化。此外,Logstash还可以重命名、删除、替换和修改事件字段,当然也包括完全丢弃事件,如debug事件。还有很多的复杂功能可供选择,

Flume侧重数据的传输,使用者需非常清楚整个数据的路由,相对来说其更可靠,channel是用于持久化目的的,数据必须确认传输到下一个目的地,才会删除;

Logstash侧重数据的预处理,日志字段经过预处理之后再进行解析

发布了117 篇原创文章 · 获赞 17 · 访问量 8万+

猜你喜欢

转载自blog.csdn.net/Jack__iT/article/details/102888468