使用prometheus监控centos7主机

如何对服务器硬件和软件进行监控，一款优秀的监控软件是必需的，prometheus就是这样的一款监控软件。
它支持大量的数据库、应用服务器的监控（通过...exporter，本质上是一个agent），当然prometheus也支持节点的监控，包括cpu/mem/disk/network的使用情况。

// --------------------------------------------------------------------------------
参考文献
https://www.digitalocean.com/community/tutorials/how-to-use-prometheus-to-monitor-your-ubuntu-14-04-server

// --------------------------------------------------------------------------------
下载地址
https://github.com/prometheus/prometheus/releases/download/0.15.1/prometheus-0.15.1.linux-amd64.tar.gz
https://github.com/prometheus/node_exporter/releases/download/0.11.0/node_exporter-0.11.0.linux-amd64.tar.gz

下载解压安装node_exporter，注意版本号
cd /opt/linuxsir
mkdir node_exporter
cd node_exporter
tar -xzvf ../node_exporter-0.11.0.linux-amd64.tar.gz

下载解压安装prometheus，注意版本号
cd /opt/linuxsir
mkdir prometheus
cd prometheus
tar -xzvf ../prometheus-0.15.1.linux-amd64.tar.gz

配置文件
在/opt/linuxsir/prometheus下建立配置文件
prometheus.yml
内容为
scrape_configs:

- job_name: "node"

scrape_interval: "5s"

target_groups:

- targets: ['192.168.31.119:9100']

请参考
https://www.digitalocean.com/community/tutorials/how-to-use-prometheus-to-monitor-your-ubuntu-14-04-server

// --------------------------------------------------------------------------------
启动node_exporter
cd /opt/linuxsir
cd node_exporter
./node_exporter &
停止node_exporter用如下命令
netstat -ntlp|grep 9100
显示进程号
kill -9 进程号

启动prometheus
cd /opt/linuxsir
cd prometheus
./prometheus --config.file=prometheus.yml &
停止prometheus用如下命令
netstat -ntlp|grep 9090
显示进程号
kill -9 进程号

访问prometheus
http://192.168.31.119:9090/

访问node information，包括cpu/mem/disk/network的使用情况
http://192.168.31.119:9090/consoles/node.html

// --------------------------------------------------------------------------------
参考查询
可以在prometheus的http://192.168.31.119:9090/界面上，输入查询，显示prometheus监控到的数据

cpu
sum(rate(node_cpu{job='node',mode='user'}[5m])) * 100 / count(count by (cpu)(node_cpu{job='node'}))
sum(rate(node_cpu{job='node',mode='system'}[5m])) * 100 / count(count by (cpu)(node_cpu{job='node'}))
上述两项相加

mem
node_memory_MemTotal{job='node'}
node_memory_MemFree{job='node'}
node_memory_MemFree{job='node'}/node_memory_MemTotal{job='node'}

disk
rate(node_disk_sectors_read{job='node', device='sda' }[5m]) * 512
rate(node_disk_sectors_written{job='node', device='sda' }[5m]) * 512
上述两项相加

network
rate(node_network_receive_bytes{job='node', device!='lo'}[5m])
rate(node_network_transmit_bytes{job='node', device!='lo'}[5m])
// --------------------------------------------------------------------------------
其它参考

understanding-machine-cpu-usage
https://www.robustperception.io/understanding-machine-cpu-usage/

go and export to csv
https://github.com/ryotarai/prometheus-query
https://github.com/ryotarai/prometheus-query

python and export to csv
https://www.robustperception.io/prometheus-query-results-as-csv/
https://www.robustperception.io/prometheus-query-results-as-csv/

使用prometheus监控centos7主机

猜你喜欢