使用Monit部署服务器监控系统

Monit安装与配置
一、简介
Monit是一个在类unix平台下用于监视进程、文件、目录和设备的软件,可以修复停止运作或运作异常的程序,适合处理那些由于多种原因导致的软件错误。

https://mmonit.com/monit/dist/

http://www.open-open.com/lib/view/open1433139672385.html

二、安装
假定下面的安装和配置均在root身份下进行。
安装很简单,下载monit的源代码(现在最新版本是 monit-5.23.0)monit- monit-5.23.0.tar.gz,将其放到适合的目录中,然后解压,configure(默认设置即可),make,make install 。具体在终端中使用如下命令:

yum install pam-devel

wget https://mmonit.com/monit/dist/monit-5.23.0.tar.gz
tar –xzf monit- monit-5.23.0.tar.gz
cd monit-4.10.1
./configure
make
make install

cp monitrc /etc/monitrc

很快就可以安装完毕。










set daemon  60
set logfile /var/log/monit.log
set idfile /var/monit/id
set mail-format {from:[email protected]}
set mailserver localhost
set alert [email protected]
set statefile /var/monit/state
set httpd port 2812 and
    use address 117.27.152.126  # only accept connection from localhos
    allow localhost        # allow localhost to connect to the server and
    allow admin:monit      # require user 'admin' with password 'monit'
    allow @monit           # allow users of group 'monit' to connect (rw)
    allow @users readonly  # allow users of group 'users' to connect readonly
include /etc/monit.d/*
check system 127.0.0.1
    if loadavg (1min) > 4 then alert
    if loadavg (5min) > 2 then alert
    if memory usage > 75% then alert
    if cpu usage (user) > 70% then alert
    if cpu usage (system) > 30% then alert
    if cpu usage (wait) > 20% then alert
check process tomcat with pidfile  /home/yojee/tomcat/logs/yojee.pid
    stop program  = "/home/yojee/tomcat/bin/shutdown.sh"
    start program = "/home/yojee/tomcat/bin/startup.sh"
    if cpu usage > 60% for 5 cycles then alert
     if failed url http://127.0.0.1:30011/ timeout 30 seconds for 5 cycles then restart





三、配置

安装完毕后,在monit源代码的目录将monit的配置文件monitrc拷贝到/etc目录下,使用命令:
cp monitrc /etc
注意/etc/monitrc这个文件的访问权限不能大于0700,所以可能还需要修改它的访问权限:
chmod 600 /etc/monitrc
 然后打开/etc/monitrc这个文件进行配置,monit已经将大部分的配置的例子放在了里面,多数配置只需将配置前面的#(注释)去掉再做相应修改即可。我们主要用monit来监视tomcat服务器,所以配置如下:

set daemon  120                             # 设置monit作为守护进程运行,并且每2分钟监视一次

                                                     # 2分钟是默认的时间间隔,从网上的看到的多个配置的例子

                                                     # 看到的时间间隔也是2分钟,应该是比较合理的

set logfile /var/log/monit.log              # 设置日志文件的位置,如果要写入系统日志可以

                                                      # set logfile syslog

set httpd port 3000 and                     # monit内置了一个用于查看被监视服务

                                                      # 状态的http服务器,注意在防火墙中开启

                                                      # 该端口【1】,否则非localhost无法访问

     use address 192.168.1.184           # 设置这个http服务器的地址

                                                       # 若设置成localhost则只允许本地访问

     allow localhost                             # 允许本地访问

     allow 192.168.1.1/255.255.255.0    # 允许内网访问

     allow admin:monit11                     # 设置使用用户名admin和密码monit11

                                                        # 来访问这个地址

set mailserver  localhost                    # 设置邮件服务,设置后monit会将提示以

                                                       # 邮件的方式发送.这里使用localhost为邮

                                                       # 件服务器地址,前提是本地已安装并开启

                                                       # 了sendmail服务

set alert [email protected]                   # 收邮件地址,如果要发送到多个地址

                                                      # 可以写多条这样的设置

# 下面设置监视tomcat

check process tomcat with pidfile /var/run/catalina.pid     # 这个要另外说明【2】

    start program = "/etc/init.d/tomcat start"               # 设置启动命令

    stop program  = "/etc/init.d/tomcat stop"               # 设置停止命令

    if 9 restarts within 10 cycles then timeout              # 设置在10个监视周期内重

                                                                              # 启了9次则超时,不再监视

                                                                              # 这个服务。原因另外说明【3】

        if cpu usage > 90% for 5 cycles then alert          # 如果在5个周期内该服务

                                                                              # 的cpu使用率都超过90%

                                                                              # 则提示

# 若连续5个周期打开url都失败(120秒超时,超时也认为失败)

# 则重启服务

        if failed url http://127.0.0.1:4000/ timeout 120 seconds for 5 cycles then restart

        if failed url http://127.0.0.1:5000/ timeout 120 seconds for 5 cycles then restart

【1】可以使用命令:

/sbin/iptables -A INPUT -i eth0 -p tcp --dport 2812 -j ACCEPT

/sbin/service iptables save

【2】使用/var/run/catalina.pid这个pid文件来检查tomcat这个服务(服务名可以随便起),tomcat进程默认是不使用pid文件的,pid文件需要显式为tomcat设置,可以打开tomcat目录下的bin目录,打开catalina.sh文件,在开头(但不是第一行)处加入:

CATALINA_PID=/var/run/catalina.pid

即可指定pid文件,然后重启tomcat,这样就可以monit的配置中指定pid文件了。

【3】设置超时后不再监视是为了让服务不要一直重启,如果连续重启多次不成功,极有可能再重启下去也不会成功的。并且tomcat的重启需要占用大量系统资源,假如一直重启下去,反而会使其它服务也无法正常运作。


如果要监视其它服务,可以加入更多的监视逻辑,例如要监视MySQL服务,可以:

check process mysql with pidfile /var/run/mysqld/mysqld.pid

   start program = /etc/init.d/mysqld start"

   stop program = "/etc/init.d/mysqld stop"

   if failed host 127.0.0.1 port 3306 then restart

   if 5 restarts within 5 cycles then timeout

监视ssh服务:

check process sshd with pidfile /var/run/sshd.pid

   start program  "/etc/init.d/sshd start"

   stop program  "/etc/init.d/sshd stop"

   if failed port 22 protocol SSH then restart

   if 5 restarts within 5 cycles then timeout


如果监视的服务比较多,可以将各个服务的监视逻辑放在不同的文件,然后使用include命令包含进来,使配置文件更加清晰。例如:

include /etc/monit/includes/mysqld

上面的设置完后,设置monit随系统启动,在/etc/inittab文件的最后加入

# Run monit in standard run-levels

mo:2345:respawn:/usr/local/bin/monit -Ic /etc/monitrc
然后使用命令

telinit q
启动monit。

四、要注意的问题
由于将monit设置成了守护进程,并且在inittab中加入了随系统启动的设置,则monit进程如果停止,init进程会将其重启,而monit又监视着其它的服务,这意味着monit所监视的服务不能使用一般的方法来停止,因为一停止,monit又会将其启动.要停止monit所监视的服务,应该使用monit stop name这样的命令,例如要停止tomcat:
monit stop tomcat

要停止全部monit所监视的服务可以使用monit stop all.
要启动某个服务可以用monit stop name这样的命令,启动全部则是monit start all.

今天研究了下monit 如上兄弟写的很详细,就直接拿来主义了,补充下短信告警

因公司有短信接口所以就直接发送告警,如下:

监控本机部分性能: 

check system 127.0.0.1
    if loadavg (5min) > 4 for 4 times 5 cycles then exec "/etc/monit/script/sendsms sysload 5min >4"
    if memory usage > 90% then exec "/etc/monit/script/sendsms 127.0.0.1 memory useage>90%"
    if cpu usage (user)  > 70% for 4 times within 5 cycles then exec "/etc/monit/script/sendsms cpu(user) >70%"
    if cpu usage (system) > 30% for 4 times within 5 cycles then exec "/etc/monit/script/sendsms cpu(system) >30% "
    if cpu usage (wait)  > 20% for 4 times within 5 cycles then exec "/etc/monit/script/sendsms system busy! cpu(wait) >20%"

监控远程机器的部分端口:

check host Unicom_mobi with address 211.90.246.51
      if failed icmp type echo count 10 with timeout 20 seconds then exec "/etc/monit/script/sendsms Unicom_mobi  211.90.246.51 ping failed!"
      if failed port 22 type tcp with timeout 10 seconds for 2 times within 3 cycles then exec "/etc/monit/script/sendsms unicom 211.90.246.51:2222 connect failed!"
      if failed port 9528 type tcp with timeout 10 seconds for 2 times within 3 cycles then exec "/etc/monit/script/sendsms unicom 211.90.246.51:9528 connect failed!"
      if failed port 9529 type tcp with timeout 10 seconds for 2 times within 3 cycles then exec "/etc/monit/script/sendsms unicom 211.90.246.51:9529 connect failed!"
      if failed port 9530 type tcp with timeout 10 seconds for 2 times within 3 cycles then exec "/etc/monit/script/sendsms unicom 211.90.246.51:9530 connect failed!"

monit好处是可以在监控故障设置重启服务和执行自定义脚本,如下

       check file passwd path /etc/passwd

#     if failed md5 checksum
#      then exec "/usr/bin/killall -q monit"

2  check filesystem root with path /dev/mapper/VolGroup00-LogVol00

if space usage > 80% for 5 times within 15 cycles then exec "/etc/monit/script/clear_core.sh"
else if succeed for 1 times within 2 cycles then exec "/etc/monit/script/sendsms '/dev/sda1 usage > 90% clear core file succeed!'>/dev/null 2"





猜你喜欢

转载自blog.csdn.net/chen3888015/article/details/7994314