k8s-promether 报警规则及alertmanager报警配置

由于工作需要,我们公司用的是kube-prometheus这个组合套件,没有通过二进制方式安装,省事不少,在测试时候也遇到不少坑,在这里做一下记录
1 prometheus规则编写
这个规则网上大多数都有,我这里只是利用kube-prometheus方式写规则方法记录
编辑prometheus-rules.yaml(这个文件在kube-prometheus克隆好代码后
的/manifests目录里),在最下面写入如下内容,目前是写好探测Pod的运行状态及
node节点状态功能,其它类似,增加如下内容
#探测pod状态

  • alert: pod-status
    annotations:
    message: pod-{{ $labels.pod }}故障
    expr: |
    kube_pod_container_status_running != 1
    for: 1m
    labels:
    severity: warning

    #探测node节点状态

    • alert: node-status
      annotations:
      message: node-{{ $labels.hostname }}故障
      expr: |
      kube_node_status_condition{status="unknown",condition="Ready"} == 1
      for: 1m
      labels:
      severity: warning
      最后保存退出
      再编辑alertmanager-secret.yaml文件,这个文件主要是配置发送邮件或是钉钉,我
      这里是钉钉方式告警,邮件也配置了,只是没有用到,邮件现在很少看,所以直接
      钉钉告警查看了,把以下内容替换掉原来,如下:
      apiVersion: v1
      data: {}
      kind: Secret
      metadata:
      name: alertmanager-main
      namespace: monitoring
      stringData:
      alertmanager.yaml: |-
      global:
      resolve_timeout: 1m # 处理超时时间
      smtp_smarthost: 'smtp.9icaishi.net:25' # 邮箱smtp服务器代理
      smtp_from: '[email protected]' # 发送邮箱名称
      smtp_auth_username: '[email protected]' # 邮箱名称
      smtp_auth_password: 'Zabbix9icaishi2015' # 授权密码
      smtp_require_tls: false # 不开启tls 默认开启

    receivers:

    • name: 'webhook'
      webhook_configs:

      • url: 'http://webhook-dingtalk/dingtalk/send/' #钉钉报警连接,这个一会要单独
        部署,因为默认alertmanager发送的报警内容,钉钉不能识别,需要转换下
        send_resolved: true
        route:
        group_interval: 1m # 在发送新警报前的等待时间
        group_wait: 10s # 最初即第一次等待多久时间发送一组警报的通知
        receiver: webhook
        repeat_interval: 1m # 发送重复警报的周期
        type: Opaque
        最后保存退出即可。
        现在要部署钉钉报警的一个pod
        这里感谢http://www.mamicode.com/info-detail-2845201.html作者,我是在此基
        础上把报警脚本给定制了下,符合我司告警内容,我更改好后如下图:
        原来报警图:
        k8s-promether 报警规则及alertmanager报警配置
        更改后的
        k8s-promether 报警规则及alertmanager报警配置
        k8s-promether 报警规则及alertmanager报警配置
        更改好符合我司脚本内容如下:就是那个app.py脚本
        
        #!/usr/bin/env python
        import time,io, sys,arrow,os

      sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding='utf-8')
      sys.stderr = io.TextIOWrapper(sys.stderr.detach(), encoding='utf-8')

    from flask import Flask, Response
    from flask import request
    import requests
    import logging
    import json
    import locale
    #locale.setlocale(locale.LC_ALL,"en_US.UTF-8")

    app = Flask(name)

    console = logging.StreamHandler()
    fmt = '%(asctime)s - %(filename)s:%(lineno)s - %(name)s - %(message)s'
    formatter = logging.Formatter(fmt)
    console.setFormatter(formatter)
    log = logging.getLogger("flask_webhook_dingtalk")
    log.addHandler(console)
    log.setLevel(logging.DEBUG)

    EXCLUDE_LIST = ['prometheus', 'endpoint']

@app.route('/')
def index():
return 'Webhook Dingtalk by Billy https://blog.51cto.com/billy98'

@app.route('/dingtalk/send/',methods=['POST'])

def hander_session():

profile_url = sys.argv[1]
post_data = request.get_data()
post_data = json.loads(post_data.decode("utf-8"))['alerts']
post_data = post_data[0]
messa_list = []
if post_data['status'].upper() == "FIRING":
   messa_list.append('### 报警名称: Prometheus-alert')
   messa_list.append('**报警状态: 异常**')
   messa_list.append('**报警时间: %s**' % arrow.get(post_data['startsAt']).to('Asia/Shanghai').format('YYYY-MM-DD HH:mm:ss ZZ'))
   messa_list.append('**报警级别: %s**' % post_data['labels']['severity'])
   messa_list.append('**报警类型: %s**' % post_data['labels']['alertname'])
   messa_list.append('**报警详情: %s**' % post_data['annotations']['message'])
   messa = (' \\n\\n > '.join(messa_list))
else:
   messa_list.append('### 报警名称: Prometheus-alert')
   messa_list.append('**报警状态: 恢复**')
   messa_list.append('**报警时间: %s**' % arrow.get(post_data['startsAt']).to('Asia/Shanghai').format('YYYY-MM-DD HH:mm:ss ZZ'))
   messa_list.append('**恢复时间: %s**' % arrow.get(post_data['endsAt']).to('Asia/Shanghai').format('YYYY-MM-DD HH:mm:ss ZZ'))
   messa_list.append('**报警级别: %s**' % post_data['labels']['severity'])
   messa_list.append('**报警类型: %s**' % post_data['labels']['alertname'])
   messa_list.append('**报警详情: %s**' % post_data['annotations']['message'])
   messa = (' \\n\\n > '.join(messa_list))
status = alert_data(messa, post_data['labels']['alertname'], profile_url )
log.info(status)
return status

def alert_data(data,title,profile_url):
headers = {'Content-Type':'application/json'}
send_data = '{"msgtype": "markdown","markdown": {"title": \"%s\" ,"text": \"%s\" }}' %(title,data) # type: str
send_data = send_data.encode('utf-8')
reps = requests.post(url=profile_url, data=send_data, headers=headers)
return reps.text

if name == 'main':
app.debug = False
app.run(host='0.0.0.0', port='8080')
最后重新打一个镜像即可。按照
Dockerfile内容如下:
FROM centos:7 as build
MAINTAINER billy98 [email protected]
RUN mkdir /root/.pip
ADD pip.conf /root/.pip/pip.conf

RUN curl -o /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo && yum install -y python36 python36-pip && pip3.6 install flask requests werkzeug arrow requests
ADD app.py /usr/local/alert-dingtalk.py

FROM gcr.io/distroless/python3
COPY --from=build /usr/local/alert-dingtalk.py /usr/local/alert-dingtalk.py
COPY --from=build usr/local/lib64/python3.6/site-packages usr/local/lib64/python3.6/site-packages
COPY --from=build usr/local/lib/python3.6/site-packages usr/local/lib/python3.6/site-packages
ENV PYTHONPATH=usr/local/lib/python3.6/site-packages:usr/local/lib64/python3.6/site-packages
EXPOSE 8080
ENTRYPOINT ["python","/usr/local/alert-dingtalk.py"]
最后更改成k8s的你的dingding.yaml或是其它文件名称即可,部署下即可。
更多k8s相关或是自动化运维,请移步到www.wangshuying.cn网站查看
里面有更多运维相关方面的知识要点。

猜你喜欢

转载自blog.51cto.com/461884/2542434