Prometheus 是一个 Metrics 监控系统,与 Kubernetes 同属 CNCF(Cloud Native Computing Foundation),它已经成为炙手可热的 Kubernetes 生态圈中的核心监控系统。
Prometheus 所有的Metrics 都是通过组件export主动pull获取到的。
Prometheus提供4种类型Metrics:Counter
, Gauge
, Summary
和Histogram
Counter可以增长,并且在程序重启的时候会被重设为0,常被用于任务个数,总处理时间,错误个数等只增不减的指标。
Gauge与Counter类似,唯一不同的是Gauge数值可以减少,常被用于温度、利用率等指标。
Summary/Histogram概念比较复杂,对于我来说目前没有使用场景,暂无了解。本文主要使用Gauge实现获取es性能指标。
备注:本文主要使用Prometheus-operater部署形式
views.py
from django.shortcuts import render
import prometheus_client
from prometheus_client import Counter,Gauge
from prometheus_client.core import CollectorRegistry
from django.views.generic import View
from django.http import HttpResponse
import random
from .util import esApi
REGISTRY = CollectorRegistry(auto_describe=False)
esStatus = Gauge("elasticsearch", "elasticsearch status is:", ["node", "class"],
registry=REGISTRY) # 数值可大可小
class ApiResponse(View):
def get(self,request):
es_obj = esApi.esApi()
es_result = es_obj.es_status()
for node, values in es_result.items():
for key,value in values.items():
esStatus.labels(node,key).set(value)
return HttpResponse(prometheus_client.generate_latest(REGISTRY),content_type="text/plain")
esApi.py
from elasticsearch import Elasticsearch
class esApi():
def __init__(self):
self.es = Elasticsearch([{'host': '10.30.30.25', 'port': 9200}], timeout=3600)
self.result = dict()
self.Value = dict()
self.esStatus = dict()
def es_status(self):
for key,value in self.es.nodes.stats()['nodes'].items():
self.Value['es_heap_used_percent'] = value['jvm']['mem']['heap_used_percent'] #75的时候进行GC 节点总是大约75%,那你节点正在承受内存方面的压力,这是一个告警,预示着你不久就会出现慢GC heap使用率一直在85%
self.Value['es_heap_used_in_bytes'] = value['jvm']['mem']['heap_used_in_bytes']
self.Value['es_young_collection_count'] = value['jvm']['gc']['collectors']['young']['collection_count']
self.Value['es_young_collection_millis'] = value['jvm']['gc']['collectors']['young']['collection_time_in_millis']
self.Value['es_old_collection_count'] = value['jvm']['gc']['collectors']['old']['collection_count']
self.Value['es_old_collection_millis'] = value['jvm']['gc']['collectors']['old']['collection_time_in_millis']
self.Value['es_index_total'] = value['indices']['indexing']['index_total']
self.Value['index_time_in_millis'] = value['indices']['indexing']['index_time_in_millis']
self.result[key] = self.Value
return self.result
def cluster_status(self):
esStatus = self.es.cluster.health()
self.esStatus['es_status'] = esStatus['status']
self.esStatus['es_unassigned_shards'] = esStatus['unassigned_shards']
return self.esStatus
在使用prometheus-operator的情况下需要编写servicemonitor来配置target
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: autoexport
name: autoexport
namespace: monitoring
spec:
selector:
matchLabels:
app: autoexport
endpoints:
- port: metrics
scheme: http
interval: 30s
path: '/api/autoMetric/Apimetric'
k8s-deployment
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: autoexport
namespace: monitoring
spec:
replicas: 1
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
labels:
app: autoexport
spec:
containers:
- name: autoexport
image: autoexport:v0.0.1
ports:
- containerPort: 8000
livenessProbe:
tcpSocket:
port: 8000
initialDelaySeconds: 600
periodSeconds: 10
timeoutSeconds: 1
readinessProbe:
tcpSocket:
port: 8000
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 1
resources:
limits:
memory: "1Gi"
requests:
memory: "256Mi"
---
apiVersion: v1
kind: Service
metadata:
name: autoexport
namespace: monitoring
labels:
app: autoexport
spec:
ports:
- name: autometrics
port: 8000
targetPort: 8000
selector:
app: autoexport
部署完成之后 到web层查看效果: