1. Introduction to the environment
The environment of this article and the database used in this article areGreatSQL 8.0.32-24
$ cat /etc/system-release
Red Hat Enterprise Linux Server release 7.9 (Maipo)
$ uname -a
Linux gip 3.10.0-1160.el7.x86_64 #1 SMP Tue Aug 18 14:50:17 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
$ ldd --version
ldd (GNU libc) 2.17
In the previous article [Combined Picture and Text | Prometheus+Grafana+GreatSQL Performance Monitoring System Construction Guide (Part 1)], we introduced how to build a monitoring system. This article will introduce how to use the Grafana platform and the alarm function of the AlertManager module.
2. Grafana email alert
Here we take email alerts and use QQ mailbox as an example
1. Enable email service
After logging in to your QQ mailbox, click Settings->Account->Enable POP3/IMAP/SMTP/Exchange/CardDAV/CalDAV service
I have already turned it on. If it is not turned on, you can turn it on. Then click 生成授权码
, remember to save the authorization code, and then modify the grafana email configuration.
$ vim /usr/local/prometheus/grafana-10.1.1/conf/defaults.ini
Use /smtp
the area where you find the mail settings and modify it as shown in the example below.
Restart Grafana service
$ systemctl restart grafana-server.service
Next, log in to the Grafana web page http://172.17.137.104:3000/
and add an email alert.
Fill in Name
, Addresses
wait for the information and click Test
Test
You can see that the email has been received and the test was successful.
2. Add alarm rules
Alert rules can be added in Grafana. For example, I created a GreatSQL connection. It monitors this mysql_up
value. If it is 0, it cannot connect to GreatSQL.
Entering the editing panel, you can see that there is an Alert
alarm option, and then we clickCreate alert rule from this panel
At this time, you will enter the alarm rule setting panel.
First, let’s introduce the first part设置警报规则名称
, which is to set the alarm rule name.
(first part)
The second part is the data we display outside, and the Expressions
next step is to set the alarm conditions.
(the second part)
The first item last()
represents the latest data. There are many other options such as max()
indicating the maximum value. Generally, we chooselast()
The second item is to indicate which query we come from, because we only have one so chooseA
The third item identifies the trigger value we want to monitor. The trigger judgment is the previous option. The one in the picture IS ABOVE
indicates above this. There are several other options such as IS BELOW
below, IS OUTSIDE RANGE
out of range, and IS WITHIN RANGE
within range. , HAS NO VALPUE
no value. Here we choose IS ABOVE
, the expression synthesis is: mysql_up
trigger when the value is less than 1.
The third part is to create the folder where the rules will be stored Folder
and the groups to evaluate. Evaluation group
Rules in the same group will be evaluated sequentially and within the same time interval.
(the third part)
Which Pending period
indicates the delay after triggering the alarm
The fourth section is used to add a comment Summary
summary, a short summary of what happened and why, Description
a description of the functionality of the alert rule, and the Runbook URL
runbook URL. A webpage to save the alert runbook.
The fifth part configures notifications, which is used to add custom labels to change the routing method of notifications. If no matching policy is set, all alert instances are handled by the default policy.
After clicking on the upper right corner to save the rule, you can see the alarm rule you just set on the page.
3. Test email alert
Now simulate a GreatSQL outage to see if the alarm rule will be triggered and an email alarm will be sent.
$ systemctl stop greatsql
Because we set it to 1 minute, it will take 1 minute before the GreatSQL connection status is checked again.
As you can see in the picture below, it has been detected that GreatSQL cannot be connected and has entered a pending state.
After the set delay time, the display Firing
indicates that the email has been sent.
You can see that the alarm email has been received in the QQ mailbox.
Then we start GreatSQL again
$ systemctl start greatsql
When the solution is completed, you will receive a resolved email.1 resolved instances
3. Grafana’s DingTalk Alert
In the previous article, I have mentioned the use of DingTalk Alert Prometheus+Grafana+DingTalk to deploy a stand-alone MySQL monitoring and alarm system , but the Alertmanager used is Pumi’s alarm module, not Grafana, so here is an introduction to how to configure it with Grafana. DingTalk alert.
However, starting from September 1, 2023, DingTalk will no longer support the creation of custom robots for both non-internal groups and internal groups . You need to log in to the DingTalk developer backend and apply for developer permissions to create an internal application robot. The specific method is I won’t introduce it here. If you need it, you can go to DingTalk for a detailed introduction.
According to the method in Prometheus+Grafana+DingTalk to deploy a stand-alone MySQL monitoring and alarm systemContact points
, first create a DingTalk robot, and then add a contact point to Grafana.
Next, fill Name
in Integration
the drop-down box and find DingDing
the URL of DingTalk Robot.Webhook:
Message Type
There are two options, one is the card mode, the other is the link mode, as well as the Title title and Message message. Then click Test to test to see if the alarm information can be sent. At this time, the DingTalk robot will send the alarm test. No If you have any questions, click the blue button belowSave contact point
If you want to choose DingTalk alerts, you can Notification policies
select in the optionsEdit
Change Default contact point
it to DingTalk alert method, click after modification.Update default policy
Next, for testing, we turn off the GreatSQL simulation to see if an alarm message will be sent.
$ systemctl stop greatsql
No problem, the alarm message was successfully received
4. Alertmanager email alerts
Do you still remember the Alertmanager we installed in the last article? It actually also has an alarm function. Prometheus includes an alarm module, which is our AlertManager. Alertmanager is mainly used to receive alarm information sent by Prometheus. It supports a variety of alarm notification channels, and it is easy to deduplicate, denoise, group, etc. the alarm information. In the previous article, we also added some rules. If you forget, you can re-read the previous article. Alertmanager can also be used to alert DingTalk. There is an introduction to deploying a stand-alone MySQL monitoring and alarm system in Prometheus+Grafana+DingTalk. Here it is. Introducing the email alert of Alertmanager
The process of Prometheus triggering an alarm
1. Configure AlertManager
The default configuration file of AlertManager is alertmanager.yml
, and the path is. /usr/local/prometheus/alertmanager-0.26.0.linux-amd64/alertmanager.yml
Then, let’s configure the use of Email to notify alarm information. Here, we take QQ mailbox as an example. The configuration is as follows:
global:
resolve_timeout: 5m
smtp_from: '填写邮箱@qq.com'
smtp_smarthost: 'smtp.qq.com:465'
smtp_auth_username: '填写邮箱@qq.com'
smtp_auth_password: '填写QQ邮箱授权码'
smtp_require_tls: false
smtp_hello: 'qq.com'
route:
group_by: ['alertname']
group_wait: 5s
group_interval: 5s
repeat_interval: 5m
receiver: 'email'
receivers:
- name: 'email'
email_configs:
- to: '填写邮箱@qq.com'
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
yml has requirements for indentation, please check carefully
A detailed introduction to the key parameters:
global
Global configuration, mainly configures alarm methods, such as email, webhook, etc.
-
resolve_timeout
: Timeout, default 5min -
smtp_auth_password
: Remember the QQ mailbox authorization code and non-QQ account login password -
smtp_require_tls
: Whether to use tls, you can choose to turn it on or off depending on the environment. If an error is reportedemail.loginAuth failed: 530 Must issue a STARTTLS command first
, it needs to be set to true. It is important to note that if tls is enabled and an error is reportedstarttls failed: x509: certificate signed by unknown authority
, you need to configure insecure_skip_verify: true under email_configs to skip tls verification.
route
Distribution strategy used to set alerts
group_by
: Used for group aggregation. Alarm notifications are grouped by label. Alarm notifications with the same label or the same alarm name (alertname) are aggregated into a group and then sent as one notification. If you want to disable aggregation completely, you can set group_by: [...]group_wait
: When a new alarm group is created, it needs to wait for 'group_wait' before sending the initial notification. This ensures that more alarms with the same label can be aggregated before sending and waiting, and finally merged into one notification for sending.group_interval
: After the first alarm notification is sent, and the latest alarm of the group is received in the new evaluation cycle, you need to wait for the 'group_interval' time before starting to send new alarms triggered for the group. It can be simply understood as, group It is equivalent to a channel.repeat_interval
: After the alarm notification is successfully sent, if the problem has not been recovered, the interval needs to be repeated again.receiver
: Configure the alarm message receiver, corresponding to the configuration below. For example, commonly used email, wechat, slack, webhook and other message notification methods.
receivers
Configure alarm information receiver information
to
: Receive alert emailsend_resolved
: Notification after fault recovery
inhibit_rules
Suppression rule configuration that will disable alerts matching one set (target) when there are alerts matching another set (source)
After the configuration is complete, just restartsystemctl restart alertmanager.service
If the startup fails, you can troubleshoot it yourself
journalctl -u alertmanager.service -f
, and pay attention to check for indentation issues!
Next, you can configure the alert rules of AlertManager. We also mentioned this in the previous article, and also created a rules folder to store the rules, so just follow the method in the previous article.
2. Test email alert
Next, we log in to http://172.17.137.104:9090/rules
the Rules of Prometheus to see if several alarms have been added.
Here is a description of the Prometheus Alert alarm status, which has three states: Inactive
, Pending
,Firing
Inactive
: Inactive status, indicating that it is being monitored, but no alarm has been triggered yet.Pending
: Indicates that this alarm must be triggered. Since alarms can be grouped, suppressed/suppressed, or silenced/silenced, they await verification and will move to the Firing state once all verifications pass.Firing
: Send the alert to the AlertManager, which will send the alert to all recipients as configured. Once the alarm is cleared, the status is changed to Inactive, and the cycle continues.
Next, we close GreatSQL to mysql_up = 0
trigger the alarm rule and see if an alarm email will be sent.
$ systemctl stop greatsql
After stopping the service, the alert page 绿色 Inactive
changes from status to status 黄色 Pending
and continues to wait until it changes to 红色 Firing
status, thereby sending alarm information to AlertManager. At this time, AlertManager will send email alerts to the recipients according to the configuration rules.
YellowPending
Red Firing
Then we received the warning email
As you can see from the picture above, the default email template Title and Body will include the previously configured Labels and Annotations information, and will be automatically sent every 5m until the service returns to normal and the alarm is lifted. At the same time, an alarm will be sent. Dismiss the message.
After the alarm is triggered, alarm emails will be automatically sent every 5m (when the service is not restored to normal), which is determined by the configuration alertmanager.yml
.route -> repeat_interval: 5m
3.Change AlertManager email content
This step is not necessary. If you want the email content to be more elegant and intuitive, you can refer to it.
Although all core information has been included, the email format content can be more elegant and intuitive. AlertManager also supports modifying custom email template configurations.
We need to create a new template file, which is calledemail.tmpl
$ vim /usr/local/prometheus/alertmanager-0.26.0.linux-amd64/email.tmp
Write the following
{{ define "email.from" }}填入邮箱@qq.com{{ end }}
{{ define "email.to" }}填入邮箱@qq.com{{ end }}
{{ define "email.to.html" }}
{{- if gt (len .Alerts.Firing) 0 -}}{{ range .Alerts }}
<h2>@告警通知</h2>
告警程序: prometheus_alert <br>
告警级别: {{ .Labels.severity }} 级 <br>
告警类型: {{ .Labels.alertname }} <br>
故障主机: {{ .Labels.instance }} <br>
告警主题: {{ .Annotations.summary }} <br>
告警详情: {{ .Annotations.description }} <br>
触发时间: {{ .StartsAt.Local.Format "2006-01-02 15:04:05" }} <br>
{{ end }}{{ end -}}
{{- if gt (len .Alerts.Resolved) 0 -}}{{ range .Alerts }}
<h2>@告警恢复</h2>
告警程序: prometheus_alert <br>
故障主机: {{ .Labels.instance }}<br>
故障主题: {{ .Annotations.summary }}<br>
告警详情: {{ .Annotations.description }}<br>
告警时间: {{ .StartsAt.Local.Format "2006-01-02 15:04:05" }}<br>
恢复时间: {{ .EndsAt.Local.Format "2006-01-02 15:04:05" }}<br>
{{ end }}{{ end -}}
{{- end }}
The template file above configures three template variables email.from
, email.to
, and , which can be referenced directly in the file.email.to.html
alertmanager.yml
Here email.to.html
is the content of the email to be sent, which supports Html and Text formats. In order to display it beautifully, the Html format is used to simply display the information. The following {{ range .Alerts }}
is a loop syntax, which is used to obtain matching Alerts information in a loop. The alert information below is the same as the default email display information above, except that some core values are extracted for display. Then, you need to add alertmanager.yml
the file templates
configuration as follows:
$ vim /usr/local/prometheus/alertmanager-0.26.0.linux-amd64/alertmanager.yml
global:
resolve_timeout: 5m
smtp_from: '填入邮箱@qq.com'
smtp_smarthost: 'smtp.qq.com:465'
smtp_auth_username: '填入邮箱@qq.com'
smtp_auth_password: '填写QQ邮箱授权码'
smtp_require_tls: false
smtp_hello: 'qq.com'
templates:
- '/usr/local/prometheus/alertmanager-0.26.0.linux-amd64/email.tmp'
route:
group_by: ['alertname']
group_wait: 5s
group_interval: 5s
repeat_interval: 5m
receiver: 'email'
receivers:
- name: 'email'
email_configs:
- to: '{{ template "email.to" . }}'
html: '{{ template "email.to.html" . }}'
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
Then we simulate a GreatSQL outage to trigger the alarm rule and see if an alarm email is sent.
No problem. I have successfully received the alert email.
Okay, that’s the end of the Prometheus+Grafana+GreatSQL performance monitoring system construction guide, let’s get started~
Enjoy GreatSQL :)
About GreatSQL
GreatSQL is a domestic independent open source database suitable for financial-level applications. It has many core features such as high performance, high reliability, high ease of use, and high security. It can be used as an optional replacement for MySQL or Percona Server and is used in online production environments. , completely free and compatible with MySQL or Percona Server.
Related links: GreatSQL Community Gitee GitHub Bilibili
GreatSQL Community:
Community reward suggestions and feedback: https://greatsql.cn/thread-54-1-1.html
Community blog prize-winning submission details: https://greatsql.cn/thread-100-1-1.html
(If you have any questions about the article or have unique insights, you can go to the official community website to ask or share them~)
Technical exchange group:
WeChat & QQ group:
QQ group: 533341697
WeChat group: Add GreatSQL Community Assistant (WeChat ID: wanlidbc
) as a friend and wait for the community assistant to add you to the group.