shell脚本:检测服务器存活状态/80端口/502状态码

检测服务器是否宕机

需求:ping一台需要被检测的服务器,如果丢包率为100%,则表示机器出问题了,随后发送报警邮件(首先需要一个邮箱账号,并开启smtp服务,报警邮件将由该邮箱发出)

创建发送邮件的Python脚本:

#!/usr/bin/python
#coding:utf-8

import smtplib
from email.mime.text import MIMEText
import sys

#发信地址
mail_user = '[email protected]'
#发信地址的SMTP授权密码
mail_pass = 'xxxxxxxx'

def send_mail(to_list,subject,content):
    me = "邮件报警"+"<"+mail_user+">"
    msg = MIMEText(content, 'plain', 'utf-8')
    msg['Subject'] = subject
    msg['From'] = me
    msg['to'] = to_list

    try:
    	#定义网易163邮箱提供的SMTP服务地址
        s = smtplib.SMTP("smtp.163.com", 25)
        s.login(mail_user,mail_pass)
        s.sendmail(me,to_list,msg.as_string())
        s.close()
        return True
    except Exception,e:
        print str(e)
        return False

if __name__ == "__main__":
    send_mail(sys.argv[1], sys.argv[2], sys.argv[3])

假设被检测机器IP为192.168.234.125,ping该地址:

[root@linux ~]# ping -c5 192.168.234.125
PING 192.168.234.125 (192.168.234.125) 56(84) bytes of data.
64 bytes from 192.168.234.125: icmp_seq=1 ttl=64 time=0.304 ms
64 bytes from 192.168.234.125: icmp_seq=2 ttl=64 time=0.982 ms
64 bytes from 192.168.234.125: icmp_seq=3 ttl=64 time=0.837 ms
64 bytes from 192.168.234.125: icmp_seq=4 ttl=64 time=0.863 ms
64 bytes from 192.168.234.125: icmp_seq=5 ttl=64 time=0.382 ms

--- 192.168.234.125 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4002ms
rtt min/avg/max/mdev = 0.304/0.673/0.982/0.276 ms

#关注倒数第二行的丢包率(packet loss)即可

脚本思路:排除丢包率为空或非数字的情况下,当丢包率为100%,即发送报警邮件

#!/bin/bash
ip=192.168.234.125
m=[email protected]
n=`ping -c5 $ip |grep "packet loss"|awk -F '%' '{print $1}'|awk '{print $NF}'`
if [ -z $n ]
then
    echo "脚本执行出错"
    /usr/bin/python /data/mail.py $m "检测机器存活脚本:$0错误" "丢包率变量获取不到值"
    exit
else
    n1=`echo $n |sed 's/[0-9]//g'`
    if [ -n $n1 ]
    then
	echo "脚本执行出错"
        /usr/bin/python /data/mail.py $m "检测机器存活脚本:$0错误" "丢包率变量含非数字的字符"
	exit
    fi
fi

if [ $n -eq 100 ]
then
    /usr/bin/python /data/mail.py $m "邮件报警" "$ip丢包率:$n%"
fi

检测web服务(80端口)是否正常

通过检测web服务的端口是否被监听,判断web服务状态,以nginx的80端口为例

脚本思路:查看80端口是否被监听,如果没有,重启nginx并发送邮件通知,每30秒检测一次

#!/bin/bash
m=`[email protected]`
while :
do
    n=`netstat -lntp |grep ":80 "|wc -l`
    if [ $n -eq 0 ]
    then
        /usr/bin/systemctl restart nginx
        /usr/bin/python mail.py $m "邮件通知" "检测到80端口未监听,已重启nginx"
    fi 
    sleep 30
done

#由于使用了循环,执行脚本时需要放到后台运行,也可以不使用循环,通过设置crontab任务计划设定检测间隔时间,检测端口还能使用nmap命令,判断对应端口state列的值是否为closed即可

检测http502状态码

502是nginx最普遍的错误状态码,一般由于php程序将php-fpm服务资源耗尽所导致,这种情况临时解决方法是重启php-fpm,事后通过分析日志寻找解决方法

日志示例:(可以发现每条请求的状态码前面都有空格,脚本中利用前后空格来更精准匹配)

192.168.234.1 - - [22/Oct/2019:20:34:07 +0800] "GET / HTTP/1.1" 200 53570 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36" "-"
192.168.234.1 - - [22/Oct/2019:20:34:07 +0800] "GET /wp-includes/css/dist/block-library/style.min.css?ver=5.2.3 HTTP/1.1" 301 169 "http://www.blog.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36" "-"
192.168.234.1 - - [22/Oct/2019:20:34:07 +0800] "GET /wp-includes/css/dist/block-library/theme.min.css?ver=5.2.3 HTTP/1.1" 301 169 "http://www.blog.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36" "-"
192.168.234.1 - - [22/Oct/2019:20:34:07 +0800] "GET /wp-content/themes/twentyseventeen/style.css?ver=5.2.3 HTTP/1.1" 301 169 "http://www.blog.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36" "-"
192.168.234.1 - - [22/Oct/2019:20:34:07 +0800] "GET /wp-content/themes/twentyseventeen/assets/js/jquery.scrollTo.js?ver=2.1.2 HTTP/1.1" 301 169 "http://www.blog.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36" "-"

脚本思路:获取访问日志中的http状态码,假设每30秒请求数为100,那么每30秒执行一次脚本,获取日志最后面的100条请求记录,匹配502关键字,当行数大于50时,(502错误码出现频率高于50%)重启php-fpm,重启后判断是否成功,如未成功,邮件报警

#!/bin/bash
log=/data/logs/access.log
while :
do
    n=`tail -n 100 $log|grep -c ' 502 '`
    if [ -z $n ]
    then
        exit
    fi

    if [ $n -gt 50 ]
    then
	/etc/init.d/php-fpm restart >/dev/null 2>/tmp/php-fpm.err
	php_n=`pgrep -l php-fpm|wc -l`
	if [ $php_n -eq 0 ]
	then
	    /usr/bin/pyhton mail.py [email protected] "php-fpm重启失败" "`head /tmp/php-fpm.err`"
	    exit
	fi
    fi
    sleep 30
done
发布了114 篇原创文章 · 获赞 851 · 访问量 7万+

猜你喜欢

转载自blog.csdn.net/Powerful_Fy/article/details/103186891