Ali alarm event open source components kube-eventer, let us spend five minutes with actual combat experience

  • Event alarms generated background

  • Ali kube-event reports, the support of the notification procedure

  • k8s1.16.3 cluster robot butt nailed practice

background

Monitoring is an important part of the guarantee system stability, in Kubernetes open source ecology, resource monitoring tools and components to monitor flourishing.

  • cAdvisor: kubelet built cAdvisor, container resource monitor, such as a container cpu, memory;
  • Kube-state-metrics: kube-state-metrics generated by listening to the API Server status indicators related to resource objects, focusing on metadata, such as Deployment, Pod, a copy of the state and so on;
  • metrics-server: metrics-server is also a cluster-wide resource data aggregator is Heapster alternative, the HPA K8S component receives the data from the metrics-server;

There node-exporter, each official, unofficial exporter, using Prometheus to capture and store the data, alarms, visualization. But these are not enough.

Real-time monitoring and lack of accuracy of
most of the resources are based on the monitoring of push or pull data off-line mode, it is often data is collected once every once in a while, if some glitch in the time interval or abnormal, and on the next collection when reaching the point of recovery, most of the collection system will swallow the exception. And for burr scene, the stage of collection will automatically clipping, resulting in reduced accuracy.

Inadequate monitoring of the scene coverage of
part of the monitoring scene is impossible, such as start-stop Pod expressed by resources, it is not easy to measure with the utilization of resources, because when the resource is zero, we can not distinguish this state produced the real reason.

Based on the above two issues, Kubernetes is how to solve it?

Event Monitor

In Kubernetes, the events are divided into two, one is Warning event, this event is generated showing state transitions are generated between the non-expected state; Normal is another event that indicates a desired state of arrival, and now reached the state is the same. We, for example, when creating a Pod, the first thing Pod will enter the state of Pending, wait mirrored pull, when the mirror is completed enrollment and health checks when the state becomes the Pod Pod Running with a life cycle. Normal this time of the event will generate. And if in operation, due to OOM or other causes Pod shoot down into the Failed state, and this state is unexpected, then the time will have Warning events in Kubernetes in. So for such a scenario, it can be very timely view of some easily overlooked resource monitoring problem if we can produce by monitoring events.

A standard Kubernetes event has the following important attributes, and can better diagnose the problem by alerting these properties.

Namespace: namespace object that generated the event is located.
Kind: type of event bound objects, such as: Node, Pod, Namespace, Componenet and so on.
Time, etc. generated events: Timestamp.
Reason: the cause of this event.
Message: detailed description of the event.

[root@master work]# kubectl get event --all-namespaces 
NAMESPACE     LAST SEEN   TYPE      REASON          OBJECT                              MESSAGE
default       14m         Normal    Created         pod/busybox2                        Created container busybox
default       14m         Normal    Started         pod/busybox2                        Started container busybox
default       19m         Normal    Pulling         pod/litemall-all-584bfdcd99-q6wd2   Pulling image "litemall-all:2019-12-18-13-13-26"
default       24m         Warning   Failed          pod/litemall-all-584bfdcd99-q6wd2   Error: ErrImagePull
default       14m         Normal    BackOff         pod/litemall-all-584bfdcd99-q6wd2   Back-off pulling image "litemall-all:2019-12-18-13-13-26"
default       4m47s       Warning   Failed          pod/litemall-all-584bfdcd99-q6wd2   Error: ImagePullBackOff

k8s1.9.2 version kubectl inquiry to have an event NAME, and now this high version columns are gone, leading to more detailed information when viewing event, very convenient, do not know the bug is not k8s.

Ali kube-eventer

Monitoring for Kubernetes event scene, Kuernetes community provides a simple event in Heapter offline capabilities, and later as abandoned, is also associated with the ability Heapster been filed. To compensate for lack of event monitoring the scene, Ali cloud service publishing and open the container kubernetes events offline tools kube-eventer. Kubernetes event to support offline nails robot, SLS log service, Kafka open source message queue, InfluxDB timing database, and so on.

project address

Ali cloud container service kube-eventer
Here Insert Picture Description

It supports the following notification procedure

Program Name description
dingtalk Nails robot
sls Ali cloud sls services
elasticsearch elasticsearch Service
honeycomb honeycomb Service
influxdb influxdb database
kafka kafka database
mysql mysql database
WeChat Micro letter

Practice exercises

Then practice what kube-events butt nailed alarm.

1, first of all there is a group, but they have to manage permissions, click on intelligent assistant group
Here Insert Picture Description
2, add the robot:
Here Insert Picture Description
Here Insert Picture Description
3, add a custom (by Webhook access custom services)
Here Insert Picture Description
4, fill in the name of the robot, security settings and so on.
Here Insert Picture Description
Security Settings There are three ways:

(1)自定义关键词,最多可以设置10个关键词,消息中至少包含其中1个关键词才可以发送成功。添加了一个自定义关键词:cluster1,则这个机器人所发送的消息,必须包含监控报警这个词,才能发送成功。
(2)加签,把timestamp+"\n"+密钥当做签名字符串,使用HmacSHA256算法计算签名,然后进行Base64 encode,最后再把签名参数再进行urlEncode,得到最终的签名(需要使用UTF-8字符集)。
(3)IP地址(段),设定后,只有来自IP地址范围内的请求才会被正常处理。支持两种设置方式:IP、IP段,暂不支持IPv6地址白名单

上面我用到的是自定义关键字的方式。关键字设置为cluster1,下面在创建yaml文件时需要修改。

更多详细实例可以参考钉钉API对接文档

5、复制webhook url地址。
Here Insert Picture Description
6、将以下yaml文件保存到kube-event.yaml文件中,修改启动参数中的--sink为自己刚才复制的webhook地址,label中写刚才自定义的关键字cluster1level指定告警为Warning级别的事件。


apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    name: kube-eventer
  name: kube-eventer
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kube-eventer
  template:
    metadata:
      labels:
        app: kube-eventer
      annotations:	
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      dnsPolicy: ClusterFirstWithHostNet
      serviceAccount: kube-eventer
      containers:
        - image: registry.aliyuncs.com/acs/kube-eventer-amd64:v1.1.0-63e7f98-aliyun
          name: kube-eventer
          command:
            - "/kube-eventer"
            - "--source=kubernetes:https://kubernetes.default"
            ## .e.g,dingtalk sink demo
            - --sink=dingtalk:https://oapi.dingtalk.com/robot/send?access_token=81kanbcl18sjambp9cb31o0k1jalh189asnxmafbf70933cb42978abd19d8fff7&label=cluster1&level=Warning
          env:
          # If TZ is assigned, set the TZ value as the time zone
          - name: TZ
            value: Asia/Shanghai
          volumeMounts:
            - name: localtime
              mountPath: /etc/localtime
              readOnly: true
            - name: zoneinfo
              mountPath: /usr/share/zoneinfo
              readOnly: true
          resources:
            requests:
              cpu: 100m
              memory: 100Mi
            limits:
              cpu: 500m
              memory: 250Mi
      volumes:
        - name: localtime
          hostPath:
            path: /etc/localtime
        - name: zoneinfo
          hostPath:
            path: /usr/share/zoneinfo
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kube-eventer
rules:
  - apiGroups:
      - ""
    resources:
      - events
    verbs:
      - get
      - list
      - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
  name: kube-eventer
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-eventer
subjects:
  - kind: ServiceAccount
    name: kube-eventer
    namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-eventer
  namespace: kube-system

7、kubctl apply -f kube-event.yaml

Here Insert Picture Description
当 kubernetes 集群中发生 Pod因为 OOM 、拉取不到镜像、健康检查不通过等错误导致重启,集群管理员其实是不知道的,因为 Kubernetes 有自我修复机制,Pod宕掉,可以重新启动一个。有了事件告警,集群管理员就可以及时发现服务问题,进行修复。

微信实践

阿里云sls实践

elasticsearch 实践

honeycomb 实践

influxdb 实践

mysql实践

kafka 实践

参考

对接钉钉机器人说明
Kubernetes事件离线工具kube-eventer正式开源

Published 125 original articles · won praise 8 · views 20000 +

Guess you like

Origin blog.csdn.net/ll837448792/article/details/103782038