Combining graphics and text丨Prometheus+Grafana+GreatSQL performance monitoring system construction guide (Part 2)

1. Introduction to the environment

The environment of this article and the database used in this article areGreatSQL 8.0.32-24

$ cat /etc/system-release
Red Hat Enterprise Linux Server release 7.9 (Maipo)
$ uname -a
Linux gip 3.10.0-1160.el7.x86_64 #1 SMP Tue Aug 18 14:50:17 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
$ ldd --version
ldd (GNU libc) 2.17

In the previous article [Combined Picture and Text | Prometheus+Grafana+GreatSQL Performance Monitoring System Construction Guide (Part 1)], we introduced how to build a monitoring system. This article will introduce how to use the Grafana platform and the alarm function of the AlertManager module.

2. Grafana email alert

Here we take email alerts and use QQ mailbox as an example

1. Enable email service

After logging in to your QQ mailbox, click Settings->Account->Enable POP3/IMAP/SMTP/Exchange/CardDAV/CalDAV service

file

I have already turned it on. If it is not turned on, you can turn it on. Then click 生成授权码, remember to save the authorization code, and then modify the grafana email configuration.

$ vim /usr/local/prometheus/grafana-10.1.1/conf/defaults.ini 

Use /smtpthe area where you find the mail settings and modify it as shown in the example below.

file

Restart Grafana service

$ systemctl restart grafana-server.service

Next, log in to the Grafana web page http://172.17.137.104:3000/and add an email alert.

file

Fill in Name, Addresseswait for the information and click TestTest

file

You can see that the email has been received and the test was successful.

file

2. Add alarm rules

Alert rules can be added in Grafana. For example, I created a GreatSQL connection. It monitors this mysql_upvalue. If it is 0, it cannot connect to GreatSQL.

file

Entering the editing panel, you can see that there is an Alertalarm option, and then we clickCreate alert rule from this panel

file

At this time, you will enter the alarm rule setting panel.

First, let’s introduce the first part设置警报规则名称 , which is to set the alarm rule name.

file

(first part)

The second part is the data we display outside, and the Expressionsnext step is to set the alarm conditions.

file

(the second part)

The first item last()represents the latest data. There are many other options such as max()indicating the maximum value. Generally, we chooselast()

file

The second item is to indicate which query we come from, because we only have one so chooseA

The third item identifies the trigger value we want to monitor. The trigger judgment is the previous option. The one in the picture IS ABOVEindicates above this. There are several other options such as IS BELOWbelow, IS OUTSIDE RANGEout of range, and IS WITHIN RANGEwithin range. , HAS NO VALPUEno value. Here we choose IS ABOVE, the expression synthesis is: mysql_uptrigger when the value is less than 1.

The third part is to create the folder where the rules will be stored Folderand the groups to evaluate. Evaluation groupRules in the same group will be evaluated sequentially and within the same time interval.

file

(the third part)

Which Pending periodindicates the delay after triggering the alarm

The fourth section is used to add a comment Summarysummary, a short summary of what happened and why, Descriptiona description of the functionality of the alert rule, and the Runbook URLrunbook URL. A webpage to save the alert runbook.

file

The fifth part configures notifications, which is used to add custom labels to change the routing method of notifications. If no matching policy is set, all alert instances are handled by the default policy.

file

After clicking on the upper right corner to save the rule, you can see the alarm rule you just set on the page.

file

3. Test email alert

Now simulate a GreatSQL outage to see if the alarm rule will be triggered and an email alarm will be sent.

$ systemctl stop greatsql

Because we set it to 1 minute, it will take 1 minute before the GreatSQL connection status is checked again.

As you can see in the picture below, it has been detected that GreatSQL cannot be connected and has entered a pending state.

file

After the set delay time, the display Firingindicates that the email has been sent.

file

You can see that the alarm email has been received in the QQ mailbox.

file

Then we start GreatSQL again

$ systemctl start greatsql

When the solution is completed, you will receive a resolved email.1 resolved instances

file

3. Grafana’s DingTalk Alert

In the previous article, I have mentioned the use of DingTalk Alert Prometheus+Grafana+DingTalk to deploy a stand-alone MySQL monitoring and alarm system , but the Alertmanager used is Pumi’s alarm module, not Grafana, so here is an introduction to how to configure it with Grafana. DingTalk alert.

However, starting from September 1, 2023, DingTalk will no longer support the creation of custom robots for both non-internal groups and internal groups . You need to log in to the DingTalk developer backend and apply for developer permissions to create an internal application robot. The specific method is I won’t introduce it here. If you need it, you can go to DingTalk for a detailed introduction.

According to the method in Prometheus+Grafana+DingTalk to deploy a stand-alone MySQL monitoring and alarm systemContact points , first create a DingTalk robot, and then add a contact point to Grafana.

file

Next, fill Namein Integrationthe drop-down box and find DingDingthe URL of DingTalk Robot.Webhook:

file

Message TypeThere are two options, one is the card mode, the other is the link mode, as well as the Title title and Message message. Then click Test to test to see if the alarm information can be sent. At this time, the DingTalk robot will send the alarm test. No If you have any questions, click the blue button belowSave contact point

file

If you want to choose DingTalk alerts, you can Notification policiesselect in the optionsEdit

file

Change Default contact pointit to DingTalk alert method, click after modification.Update default policy

file

Next, for testing, we turn off the GreatSQL simulation to see if an alarm message will be sent.

$ systemctl stop greatsql

file

No problem, the alarm message was successfully received

4. Alertmanager email alerts

Do you still remember the Alertmanager we installed in the last article? It actually also has an alarm function. Prometheus includes an alarm module, which is our AlertManager. Alertmanager is mainly used to receive alarm information sent by Prometheus. It supports a variety of alarm notification channels, and it is easy to deduplicate, denoise, group, etc. the alarm information. In the previous article, we also added some rules. If you forget, you can re-read the previous article. Alertmanager can also be used to alert DingTalk. There is an introduction to deploying a stand-alone MySQL monitoring and alarm system in Prometheus+Grafana+DingTalk. Here it is. Introducing the email alert of Alertmanager

The process of Prometheus triggering an alarm

file

1. Configure AlertManager

The default configuration file of AlertManager is alertmanager.yml, and the path is. /usr/local/prometheus/alertmanager-0.26.0.linux-amd64/alertmanager.ymlThen, let’s configure the use of Email to notify alarm information. Here, we take QQ mailbox as an example. The configuration is as follows:

global:
  resolve_timeout: 5m
  smtp_from: '填写邮箱@qq.com'
  smtp_smarthost: 'smtp.qq.com:465'
  smtp_auth_username: '填写邮箱@qq.com'
  smtp_auth_password: '填写QQ邮箱授权码'
  smtp_require_tls: false
  smtp_hello: 'qq.com'
route:
  group_by: ['alertname']
  group_wait: 5s
  group_interval: 5s
  repeat_interval: 5m
  receiver: 'email'
receivers:
- name: 'email'
  email_configs:
  - to: '填写邮箱@qq.com'
    send_resolved: true
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

yml has requirements for indentation, please check carefully

A detailed introduction to the key parameters:

global

Global configuration, mainly configures alarm methods, such as email, webhook, etc.

  • resolve_timeout: Timeout, default 5min

  • smtp_auth_password: Remember the QQ mailbox authorization code and non-QQ account login password

  • smtp_require_tls: Whether to use tls, you can choose to turn it on or off depending on the environment. If an error is reported email.loginAuth failed: 530 Must issue a STARTTLS command first, it needs to be set to true. It is important to note that if tls is enabled and an error is reported starttls failed: x509: certificate signed by unknown authority, you need to configure insecure_skip_verify: true under email_configs to skip tls verification.

route

Distribution strategy used to set alerts

  • group_by: Used for group aggregation. Alarm notifications are grouped by label. Alarm notifications with the same label or the same alarm name (alertname) are aggregated into a group and then sent as one notification. If you want to disable aggregation completely, you can set group_by: [...]
  • group_wait: When a new alarm group is created, it needs to wait for 'group_wait' before sending the initial notification. This ensures that more alarms with the same label can be aggregated before sending and waiting, and finally merged into one notification for sending.
  • group_interval: After the first alarm notification is sent, and the latest alarm of the group is received in the new evaluation cycle, you need to wait for the 'group_interval' time before starting to send new alarms triggered for the group. It can be simply understood as, group It is equivalent to a channel.
  • repeat_interval: After the alarm notification is successfully sent, if the problem has not been recovered, the interval needs to be repeated again.
  • receiver: Configure the alarm message receiver, corresponding to the configuration below. For example, commonly used email, wechat, slack, webhook and other message notification methods.

receivers

Configure alarm information receiver information

  • to: Receive alert email
  • send_resolved: Notification after fault recovery

inhibit_rules

Suppression rule configuration that will disable alerts matching one set (target) when there are alerts matching another set (source)

After the configuration is complete, just restartsystemctl restart alertmanager.service

If the startup fails, you can troubleshoot it yourself journalctl -u alertmanager.service -f, and pay attention to check for indentation issues!

Next, you can configure the alert rules of AlertManager. We also mentioned this in the previous article, and also created a rules folder to store the rules, so just follow the method in the previous article.

2. Test email alert

Next, we log in to http://172.17.137.104:9090/rulesthe Rules of Prometheus to see if several alarms have been added.

file

Here is a description of the Prometheus Alert alarm status, which has three states: Inactive, Pending,Firing

  • Inactive: Inactive status, indicating that it is being monitored, but no alarm has been triggered yet.
  • Pending: Indicates that this alarm must be triggered. Since alarms can be grouped, suppressed/suppressed, or silenced/silenced, they await verification and will move to the Firing state once all verifications pass.
  • Firing: Send the alert to the AlertManager, which will send the alert to all recipients as configured. Once the alarm is cleared, the status is changed to Inactive, and the cycle continues.

file

Next, we close GreatSQL to mysql_up = 0trigger the alarm rule and see if an alarm email will be sent.

$ systemctl stop greatsql

After stopping the service, the alert page 绿色 Inactivechanges from status to status 黄色 Pendingand continues to wait until it changes to 红色 Firingstatus, thereby sending alarm information to AlertManager. At this time, AlertManager will send email alerts to the recipients according to the configuration rules.

file

YellowPending

file

Red Firing

Then we received the warning email

file

As you can see from the picture above, the default email template Title and Body will include the previously configured Labels and Annotations information, and will be automatically sent every 5m until the service returns to normal and the alarm is lifted. At the same time, an alarm will be sent. Dismiss the message.

file

After the alarm is triggered, alarm emails will be automatically sent every 5m (when the service is not restored to normal), which is determined by the configuration alertmanager.yml.route -> repeat_interval: 5m

3.Change AlertManager email content

This step is not necessary. If you want the email content to be more elegant and intuitive, you can refer to it.

Although all core information has been included, the email format content can be more elegant and intuitive. AlertManager also supports modifying custom email template configurations.

We need to create a new template file, which is calledemail.tmpl

$ vim /usr/local/prometheus/alertmanager-0.26.0.linux-amd64/email.tmp

Write the following

{{ define "email.from" }}填入邮箱@qq.com{{ end }}
{{ define "email.to" }}填入邮箱@qq.com{{ end }}
{{ define "email.to.html" }}
{{- if gt (len .Alerts.Firing) 0 -}}{{ range .Alerts }}
<h2>@告警通知</h2>
告警程序: prometheus_alert <br>
告警级别: {{ .Labels.severity }} 级 <br>
告警类型: {{ .Labels.alertname }} <br>
故障主机: {{ .Labels.instance }} <br>
告警主题: {{ .Annotations.summary }} <br>
告警详情: {{ .Annotations.description }} <br>
触发时间: {{ .StartsAt.Local.Format "2006-01-02 15:04:05" }} <br>
{{ end }}{{ end -}}
{{- if gt (len .Alerts.Resolved) 0 -}}{{ range .Alerts }}
<h2>@告警恢复</h2>
告警程序: prometheus_alert <br>
故障主机: {{ .Labels.instance }}<br>
故障主题: {{ .Annotations.summary }}<br>
告警详情: {{ .Annotations.description }}<br>
告警时间: {{ .StartsAt.Local.Format "2006-01-02 15:04:05" }}<br>
恢复时间: {{ .EndsAt.Local.Format "2006-01-02 15:04:05" }}<br>
{{ end }}{{ end -}}
{{- end }}

The template file above configures three template variables email.from, email.to, and , which can be referenced directly in the file.email.to.htmlalertmanager.yml

Here email.to.htmlis the content of the email to be sent, which supports Html and Text formats. In order to display it beautifully, the Html format is used to simply display the information. The following {{ range .Alerts }}is a loop syntax, which is used to obtain matching Alerts information in a loop. The alert information below is the same as the default email display information above, except that some core values ​​are extracted for display. Then, you need to add alertmanager.ymlthe file templatesconfiguration as follows:

$ vim /usr/local/prometheus/alertmanager-0.26.0.linux-amd64/alertmanager.yml

global:
  resolve_timeout: 5m
  smtp_from: '填入邮箱@qq.com'
  smtp_smarthost: 'smtp.qq.com:465'
  smtp_auth_username: '填入邮箱@qq.com'
  smtp_auth_password: '填写QQ邮箱授权码'
  smtp_require_tls: false
  smtp_hello: 'qq.com'
templates:
  - '/usr/local/prometheus/alertmanager-0.26.0.linux-amd64/email.tmp'
route:
  group_by: ['alertname']
  group_wait: 5s
  group_interval: 5s
  repeat_interval: 5m
  receiver: 'email'
receivers:
- name: 'email'
  email_configs:
  - to: '{{ template "email.to" . }}'
    html: '{{ template "email.to.html" . }}'
    send_resolved: true
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

Then we simulate a GreatSQL outage to trigger the alarm rule and see if an alarm email is sent.

file

No problem. I have successfully received the alert email.

Okay, that’s the end of the Prometheus+Grafana+GreatSQL performance monitoring system construction guide, let’s get started~


Enjoy GreatSQL :)

About GreatSQL

GreatSQL is a domestic independent open source database suitable for financial-level applications. It has many core features such as high performance, high reliability, high ease of use, and high security. It can be used as an optional replacement for MySQL or Percona Server and is used in online production environments. , completely free and compatible with MySQL or Percona Server.

Related links: GreatSQL Community Gitee GitHub Bilibili

GreatSQL Community:

image

Community reward suggestions and feedback: https://greatsql.cn/thread-54-1-1.html

Community blog prize-winning submission details: https://greatsql.cn/thread-100-1-1.html

(If you have any questions about the article or have unique insights, you can go to the official community website to ask or share them~)

Technical exchange group:

WeChat & QQ group:

QQ group: 533341697

WeChat group: Add GreatSQL Community Assistant (WeChat ID: wanlidbc) as a friend and wait for the community assistant to add you to the group.

Lei Jun: The official version of Xiaomi's new operating system ThePaper OS has been packaged. The pop-up window on the lottery page of Gome App insults its founder. Ubuntu 23.10 is officially released. You might as well take advantage of Friday to upgrade! Ubuntu 23.10 release episode: The ISO image was urgently "recalled" due to containing hate speech. A 23-year-old PhD student fixed the 22-year-old "ghost bug" in Firefox. RustDesk remote desktop 1.2.3 was released, enhanced Wayland to support TiDB 7.4 Release: Official Compatible with MySQL 8.0. After unplugging the Logitech USB receiver, the Linux kernel crashed. The master used Scratch to rub the RISC-V simulator and successfully ran the Linux kernel. JetBrains launched Writerside, a tool for creating technical documents.
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/GreatSQL/blog/10120061