由于工作需要,我们公司用的是kube-prometheus这个组合套件,没有通过二进制方式安装,省事不少,在测试时候也遇到不少坑,在这里做一下记录
1 prometheus规则编写
这个规则网上大多数都有,我这里只是利用kube-prometheus方式写规则方法记录
编辑prometheus-rules.yaml(这个文件在kube-prometheus克隆好代码后
的/manifests目录里),在最下面写入如下内容,目前是写好探测Pod的运行状态及
node节点状态功能,其它类似,增加如下内容
#探测pod状态
-
alert: pod-status
annotations:
message: pod-{{ $labels.pod }}故障
expr: |
kube_pod_container_status_running != 1
for: 1m
labels:
severity: warning#探测node节点状态
- alert: node-status
annotations:
message: node-{{ $labels.hostname }}故障
expr: |
kube_node_status_condition{status="unknown",condition="Ready"} == 1
for: 1m
labels:
severity: warning
最后保存退出
再编辑alertmanager-secret.yaml文件,这个文件主要是配置发送邮件或是钉钉,我
这里是钉钉方式告警,邮件也配置了,只是没有用到,邮件现在很少看,所以直接
钉钉告警查看了,把以下内容替换掉原来,如下:
apiVersion: v1
data: {}
kind: Secret
metadata:
name: alertmanager-main
namespace: monitoring
stringData:
alertmanager.yaml: |-
global:
resolve_timeout: 1m # 处理超时时间
smtp_smarthost: 'smtp.9icaishi.net:25' # 邮箱smtp服务器代理
smtp_from: '[email protected]' # 发送邮箱名称
smtp_auth_username: '[email protected]' # 邮箱名称
smtp_auth_password: 'Zabbix9icaishi2015' # 授权密码
smtp_require_tls: false # 不开启tls 默认开启
receivers:
-
name: 'webhook'
webhook_configs:- url: 'http://webhook-dingtalk/dingtalk/send/' #钉钉报警连接,这个一会要单独
部署,因为默认alertmanager发送的报警内容,钉钉不能识别,需要转换下
send_resolved: true
route:
group_interval: 1m # 在发送新警报前的等待时间
group_wait: 10s # 最初即第一次等待多久时间发送一组警报的通知
receiver: webhook
repeat_interval: 1m # 发送重复警报的周期
type: Opaque
最后保存退出即可。
现在要部署钉钉报警的一个pod
这里感谢http://www.mamicode.com/info-detail-2845201.html作者,我是在此基
础上把报警脚本给定制了下,符合我司告警内容,我更改好后如下图:
原来报警图:
更改后的
更改好符合我司脚本内容如下:就是那个app.py脚本#!/usr/bin/env python import time,io, sys,arrow,os
sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding='utf-8')
sys.stderr = io.TextIOWrapper(sys.stderr.detach(), encoding='utf-8') - url: 'http://webhook-dingtalk/dingtalk/send/' #钉钉报警连接,这个一会要单独
from flask import Flask, Response
from flask import request
import requests
import logging
import json
import locale
#locale.setlocale(locale.LC_ALL,"en_US.UTF-8")app = Flask(name)
console = logging.StreamHandler()
fmt = '%(asctime)s - %(filename)s:%(lineno)s - %(name)s - %(message)s'
formatter = logging.Formatter(fmt)
console.setFormatter(formatter)
log = logging.getLogger("flask_webhook_dingtalk")
log.addHandler(console)
log.setLevel(logging.DEBUG)EXCLUDE_LIST = ['prometheus', 'endpoint']
- alert: node-status
@app.route('/')
def index():
return 'Webhook Dingtalk by Billy https://blog.51cto.com/billy98'
@app.route('/dingtalk/send/',methods=['POST'])
def hander_session():
profile_url = sys.argv[1]
post_data = request.get_data()
post_data = json.loads(post_data.decode("utf-8"))['alerts']
post_data = post_data[0]
messa_list = []
if post_data['status'].upper() == "FIRING":
messa_list.append('### 报警名称: Prometheus-alert')
messa_list.append('**报警状态: 异常**')
messa_list.append('**报警时间: %s**' % arrow.get(post_data['startsAt']).to('Asia/Shanghai').format('YYYY-MM-DD HH:mm:ss ZZ'))
messa_list.append('**报警级别: %s**' % post_data['labels']['severity'])
messa_list.append('**报警类型: %s**' % post_data['labels']['alertname'])
messa_list.append('**报警详情: %s**' % post_data['annotations']['message'])
messa = (' \\n\\n > '.join(messa_list))
else:
messa_list.append('### 报警名称: Prometheus-alert')
messa_list.append('**报警状态: 恢复**')
messa_list.append('**报警时间: %s**' % arrow.get(post_data['startsAt']).to('Asia/Shanghai').format('YYYY-MM-DD HH:mm:ss ZZ'))
messa_list.append('**恢复时间: %s**' % arrow.get(post_data['endsAt']).to('Asia/Shanghai').format('YYYY-MM-DD HH:mm:ss ZZ'))
messa_list.append('**报警级别: %s**' % post_data['labels']['severity'])
messa_list.append('**报警类型: %s**' % post_data['labels']['alertname'])
messa_list.append('**报警详情: %s**' % post_data['annotations']['message'])
messa = (' \\n\\n > '.join(messa_list))
status = alert_data(messa, post_data['labels']['alertname'], profile_url )
log.info(status)
return status
def alert_data(data,title,profile_url):
headers = {'Content-Type':'application/json'}
send_data = '{"msgtype": "markdown","markdown": {"title": \"%s\" ,"text": \"%s\" }}' %(title,data) # type: str
send_data = send_data.encode('utf-8')
reps = requests.post(url=profile_url, data=send_data, headers=headers)
return reps.text
if name == 'main':
app.debug = False
app.run(host='0.0.0.0', port='8080')
最后重新打一个镜像即可。按照
Dockerfile内容如下:
FROM centos:7 as build
MAINTAINER billy98 [email protected]
RUN mkdir /root/.pip
ADD pip.conf /root/.pip/pip.conf
RUN curl -o /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo && yum install -y python36 python36-pip && pip3.6 install flask requests werkzeug arrow requests
ADD app.py /usr/local/alert-dingtalk.py
FROM gcr.io/distroless/python3
COPY --from=build /usr/local/alert-dingtalk.py /usr/local/alert-dingtalk.py
COPY --from=build usr/local/lib64/python3.6/site-packages usr/local/lib64/python3.6/site-packages
COPY --from=build usr/local/lib/python3.6/site-packages usr/local/lib/python3.6/site-packages
ENV PYTHONPATH=usr/local/lib/python3.6/site-packages:usr/local/lib64/python3.6/site-packages
EXPOSE 8080
ENTRYPOINT ["python","/usr/local/alert-dingtalk.py"]
最后更改成k8s的你的dingding.yaml或是其它文件名称即可,部署下即可。
更多k8s相关或是自动化运维,请移步到www.wangshuying.cn网站查看
里面有更多运维相关方面的知识要点。