Prometheus监控进程

目录

部署Prometheus

关闭防火墙

关闭selinux

解压软件包

创建用户

 创建Prometheus数据存储目录

修改配置文件

启动服务

部署process-exporter

关闭防火墙

关闭selinux

解压软件包

准备配置文件

启动

验证


        在监控里面Prometheus现在用的还是比较多的,一般我们都是在Kubernetes环境里面部署,然后监控咱们的容器化环境,今天给大家分享一些不一样的,使用二进制的方式在机器上直接部署,并且监控机器上的进程。

        说到监控大家通常都是主机级别的监控那么我们想要监控进程的话怎么实现呢,Prometheus里面有个process-exporter可以帮助我们实现。GitHub - ncabatoff/process-exporter: Prometheus exporter that mines /proc to report on selected processes

部署Prometheus

关闭防火墙

systemctl stop firewalld
systemctl disable firewalld

关闭selinux

sed -i "s/^SELINUX=*/SELINUX=disabled/g" /etc/selinux/config
setenforce 0

解压软件包

https://github.com/prometheus/prometheus/releases/download/v2.26.0/prometheus-2.26.0.linux-amd64.tar.gz

tar -zxvf prometheus-2.26.0.linux-amd64.tar.gz -C /usr/local
mv /usr/local/prometheus-2.26.0.linux-amd64/ /usr/local/prometheus

https://github.com/prometheus/prometheus/releases/download/v2.26.0/prometheus-2.26.0.linux-amd64.tar.gz

创建用户

groupadd prometheus
useradd -g prometheus -s /sbin/nologin prometheus

 创建Prometheus数据存储目录

mkdir -p /var/lib/prometheus
chown -R prometheus /var/lib/prometheus

chown -R prometheus:prometheus /usr/local/prometheus/

修改配置文件

# 在scrape_configs下添加个job_name,指定要监控的目标
# process-exporter在被监控端运行,监听的端口号为9256
[root@prometheus prometheus]# cat /usr/local/prometheus/prometheus.yml
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'node'
    scrape_interval: 10s
    static_configs:
    - targets: ['192.168.207.165:9256']
      labels:
        instance: node

启动服务

/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml

部署process-exporter

关闭防火墙

systemctl stop firewalld
systemctl disable firewalld

关闭selinux

sed -i "s/^SELINUX=*/SELINUX=disabled/g" /etc/selinux/config
setenforce 0

解压软件包

https://github.com/ncabatoff/process-exporter/releases/download/v0.8.2/process-exporter-0.8.2.linux-amd64.tar.gz

tar zxf process-exporter-0.8.2.linux-amd64.tar.gz -C /usr/local/
cd /usr/local/process-exporter-0.8.2.linux-amd64/

准备配置文件

# 监控所有进程
# 还有别的很多种写法可以查看GitHub
cat > config.yml << 'EOF'
process_names:
  - name: "{
   
   {.Comm}}"
    cmdline:
    - '.+'
EOF

# 监控指定进程
# 根据你的需求要监控的服务多的话按照格式继续往下写
cat > config.yml << 'EOF'
process_names:
  - name: "{
   
   {.Matches}}"
    cmdline:
    - 'sshd'

  - name: "{
   
   {.Matches}}"
    cmdline:
    - 'mysqld'
EOF

启动

# 该配置文件使用的是刚才展示的监控所有进程的
./process-exporter -config.path config.yml

验证

可以打开Prometheus的页面进行查看看看能不能获取到进程的数据,可以使用curl调接口,下面演示一下使用curl调接口获取sshd进程的数据

# 修改为自己的IP
curl -G 'http://<prometheus_host>:9090/api/v1/query'   --data-urlencode 'query=namedprocess_namegroup_context_switches_total{ctxswitchtype="nonvoluntary", groupname="sshd", instance="node", job="node"}'

# 示例
curl -G 'http://192.168.207.131:9090/api/v1/query'   --data-urlencode 'query=namedprocess_namegroup_context_switches_total{ctxswitchtype="nonvoluntary", groupname="sshd", instance="node", job="node"}'
# 可以看到获取到的是有数据的
[root@bogon ~]# curl -G 'http://192.168.207.131:9090/api/v1/query'   --data-urlencode 'query=namedprocess_namegroup_context_switches_total{ctxswitchtype="nonvoluntary", groupname="sshd", instance="node", job="node"}'

{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"namedprocess_namegroup_context_switches_total","ctxswitchtype":"nonvoluntary","groupname":"sshd","instance":"node","job":"node"},"value":[1718781659.381,"2"]}]}}