基于XtrabackupMysql异地压缩流备份与恢复
背景:
目前生产环境均为全量备份,为了节省时间和空间,提高资源的使用效率,将原有的备份方案改造为【累积增量备份】。
一、Innodbackupex备份和恢复原理
备份过程如下:
1.1 innobackupex在启动后,会先fork一个进程,启动xtrabackup进程,然后就等待xtrabackup备份完ibd数据文件;
1.2 xtrabackup在备InnoDB相关数据时,是有2种线程的,1种是redo拷贝线程,负责拷贝redo文件,种是ibd拷贝线程,负责拷贝ibd文件;redo拷贝线程只有一个,在ibd拷贝线程之前启动,在ibd线程结束后结束。xtrabackup进程开始执行后,先启动redo拷贝线程,从最新的checkpoint点开始顺序拷贝redo日志;然后再启动ibd数据拷贝线程,在xtrabackup拷贝ibd过程中,innobackupex进程一直处于等待状态(等待文件被创建)。
1.3 xtrabackup拷贝完成idb后,通知innobackupex(通过创建文件),同时自己进入等待(redo线程仍然继续拷贝);
1.4 innobackupex收到xtrabackup通知后,执行FLUSHTABLESWITHREADLOCK(FTWRL),取得一致性位点,然后开始备份非InnoDB文件(包括frm、MYD、MYI、CSV、opt、par等)。拷贝非InnoDB文件过程中,因为数据库处于全局只读状态,如果在业务的主库备份的话,要特别小心,非InnoDB表(主要是MyISAM)比较多的话整库只读时间就会比较长,这个影响一定要评估到。
1.5 当innobackupex拷贝完所有非InnoDB表文件后,通知xtrabackup(通过删文件),同时自己进入等待(等待另一个文件被创建);
1.6 xtrabackup收到innobackupex备份完非InnoDB通知后,就停止redo拷贝线程,然后通知innobackupexredolog拷贝完成(通过创建文件);
1.7 innobackupex收到redo备份完成通知后,就开始解锁,执行UNLOCKTABLES;
1.8 最后innobackupex和xtrabackup进程各自完成收尾工作,如资源的释放、写备份元数据信息等,innobackupex等待xtrabackup子进程结束后退出。
增量备份:
PXB是支持增量备份的,但是只能对InnoDB做增量,InnoDB每个page有个LSN号,LSN是全局递增的,page被更改时会记录当前的LSN号,page中的LSN越大,说明当前page越新(最近被更新)。每次备份会记录当前备份到的LSN(xtrabackup_checkpoints文件中),增量备份就是只拷贝LSN大于上次备份的page,比上次备份小的跳过,每个ibd文件最终备份出来的是增量delta文件。
MyISAM是没有增量的机制的,每次增量备份都是全部拷贝的。
增量备份过程和全量备份一样,只是在ibd文件拷贝上有不同。
恢复过程:
如果看恢复备份集的日志,会发现和mysqld启动时非常相似,其实备份集的恢复就是类似mysqldcrash后,做一次crashrecover。
恢复的目的是把备份集中的数据恢复到一个一致性位点,所谓一致就是指原数据库某一时间点各引擎数据的状态,比如MyISAM中的数据对应的是15:20时间点的,InnoDB中的数据对应的是15:00的,这种状态的数据就是不一致的。PXB备份集对应的一致点,就是备份时FTWRL的时间点,恢复出来的数据,就对应原数据库FTWRL时的状态。
因为备份时FTWRL后,数据库是处于只读的,非InnoDB数据是在持有全局读锁情况下拷贝的,所以非InnoDB数据本身就对应FTWRL时间点;InnoDB的ibd文件拷贝是在FTWRL前做的,拷贝出来的不同ibd文件最后更新时间点是不一样的,这种状态的ibd文件是不能直接用的,但是redolog是从备份开始一直持续拷贝的,最后的redo日志点是在持有FTWRL后取得的,所以最终通过redo应用后的ibd数据时间点也是和FTWRL一致的。
所以恢复过程只涉及InnoDB文件的恢复,非InnoDB数据是不动的。备份恢复完成后,就可以把数据文件拷贝到对应的目录,然后通过mysqld来启动了
二、测试环境和实施验证过程
测试环境:
IP |
OS |
Mysql version |
Xtraback version |
192.168.201.129 |
Ubuntu 16.04 |
5.7 |
2.4 |
192.168.201.177 |
Centos 7.2 |
5.7 |
2.4 |
验证过程:测试验证均以脚本形式执行,下面描述备份和恢复环节
备份环节:
恢复环节:
三、测试脚本
1 db_backup.sh
#!/bin/bash
# version 1.5
# data 2018-08-02
# author jimmyxing
# REMOTE COMPRESS STREAM BACKUP SCRIPTES
#定义参数
DatabaseName=$1
PASSWD=$2
IPLIST=$3
#日期格式
DateFormat=`date +%F`
Time="date +%T"
DateNum=`date +%d`
#数据和日志路径
LogPath=/var/log/backup/$DatabaseName
FullDataPath=/bak/mysql/$DatabaseName/fullbackup
IncreDataPath=/bak/mysql/$DatabaseName/increment
FULLLSN=/bak/mysql/full_chkpoint/
INCLSN=/bak/mysql/inc_chkpoint
ArchivePath=/bak/mysql/$DatabaseName/archive
TMPDIR=/tmp
# innobackup参数
USER=root
STREAM=xbstream
XTRSTR1=xtrabackup_checkpoints
XTRSTR2=xtrabackup_info
#检查传入的参数
[ $# -ne 3 ] && { echo -e "\033[41;37m [ERROR] \033[0m please echo DatabaseName like erpdatabase";exit 1; }
#检查备份脚本日志输出目录
id mysql >/dev/null || { useradd mysql; }
[ -d ${LogPath} ] || { mkdir -p ${LogPath};chown -R mysql. ${LogPath}; }
#清除30天前的备份日志
clear_log()
{
echo "$($Time)Clear Expired Sciprts Out Log before 30 days"
echo "........................................................"
echo -e "........................................................\n"
List=$(find $LogPath -mtime +30 -type f)
[ -z $List ] || find $LogPath -mtime +30 -type f |xargs rm
}
#检查相关软件包
soft_check()
#检查Xtrabackup
{
if dpkg -l|grep percona-xtrabackup-24;then
echo "">/dev/null
else
wget https://repo.percona.com/apt/percona-release_0.1-4.$(lsb_release -sc)_all.deb
dpkg -i percona-release_0.1-4.$(lsb_release -sc)_all.deb
apt-get update
apt-get install percona-xtrabackup-24 -y
Reva1=$?
if [ ${Reva1} -eq 0 ];then
echo "percona-xtrabackup-24 install sucess"
else
echo "\033[41;37m [ERROR] \033[0m install failure"
fi
fi
#检测curl软件安装包
if dpkg -L curl >/dev/null 2>&1;then
echo "" >>/dev/null
else
apt-get update
apt-get install curl -y
fi
}
#备份环境检查
dir_check()
{
#检查存放备份信息目录
[ -d ${FULLLSN} ] || { mkdir -p ${FULLLSN}; chown -R mysql. ${FULLLSN}; }
[ -d ${INCLSN} ] || { mkdir -p ${INCLSN}; chown -R mysql. ${INCLSN}; }
#检查远程环境相关文件和目录
ssh ${USER}@${IPLIST} -C "[ -d ${FullDataPath} ] || { mkdir -p ${FullDataPath}; }"
ssh ${USER}@${IPLIST} -C "[ -d ${IncreDataPath} ] || { mkdir -p ${IncreDataPath}; }"
ssh ${USER}@${IPLIST} -C "[ -d ${ArchivePath} ] || { mkdir -p ${ArchivePath}; }"
Reva2=$?
return ${Reva2}
}
#发送消息
send_mes()
{
END_TIME=`date +%T`
TOKEN=$(curl -X POST -H "Content-Type: application/json" -d '{"username":"user","password":"password"}' https://opsmind..com/api/1.0/account/token \
|awk -F ":" '/token/{print $2}'|sed 's/"//g'|sed 's/}//g')
curl -X POST \
https://opsmind..com/api/1.0/database/backup/log/list \
-H "Authorization: JWT $TOKEN" \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '{"server":"'$DatabaseName'",
"time_start":"'$DateFormat' '$START_TIME'",
"time_finish":"'$DateFormat' '$END_TIME'",
"status":"'$1'",
"log":"'$2'",
"size":"'$3'",
"types":"'$4'",
"file_path":"'$5'",
"backup_server":"[email protected]"
}'
}
#全量备份
full_backup(){
echo "$($Time) #################Starting Full Database Backup#####################"
START_TIME=`date +%T`
mysql -u${USER} -p${PASSWD} -e "flush logs;"
#删除上一次全备
ssh ${USER}@${IPLIST} -C "cd ${FullDataPath// /} && ls | xargs -n1 rm -r"
/usr/bin/innobackupex --user=${USER} --password=${PASSWD} --ftwrl-wait-threshold=60 --ftwrl-wait-timeout=120 \
--stream=${STREAM} --compress --extra-lsndir=${FULLLSN} ${TMPDIR} |ssh ${USER}@${IPLIST} "xbstream -x -C ${FullDataPath}"
if [ $? -eq 0 ];then
echo "$($Time) #################Ended Full Database Backup###################### "
#全备归档
ssh ${USER}@${IPLIST} -C "cd ${FullDataPath} && { tar -zcvf ${ArchivePath}/fullbackup_${DateFormat}.tar.gz *; }"
[ $? -eq 0 ] || { send_mes "success" "archived failed" "null" "full" "$FullDataPath"; }
Size=$(ssh $USER@$IPLIST -C "du -sb $FullDataPath" |awk -F " " '{print $1}')
send_mes "success" "success" "$Size" "full" "$FullDataPath"
else
send_mes "failed" "failed" "null" "full" "$FullDataPath"
echo -e "\033[41;37m [ERROR] \033[0m Backup Failure! Please View The Log ${LogPath}/backup_${DateFormat}.log"
exit 1
fi
}
#增量备份
incre_backup()
{
#判断全备信息存在
# xtrbackup 2.4 需测两个文件
if [ -s ${FULLLSN}/${XTRSTR1} -a -s ${FULLLSN}/${XTRSTR2} ];then
echo "" >/dev/null
else
echo -e "\033[41;37m [ERROR] \033[0m The File ${XTRSTR1} or ${XTRSTR2} is not exist "
exit 1
fi
echo "$($Time) Starting Incremental Database Backup"
START_TIME=`date +%T`
mysql -u$USER -p$PASSWD -e "flush logs;"
ssh ${USER}@${IPLIST} -C " [ -d ${IncreDataPath}/${DateFormat} ] || { /bin/mkdir ${IncreDataPath}/${DateFormat}; }"
/usr/bin/innobackupex --user=${USER} --password=${PASSWD} --ftwrl-wait-threshold=60 --ftwrl-wait-timeout=120 \
--stream=${STREAM} --compress --extra-lsndir=${INCLSN}/${DateFormat} ${TMPDIR} \
--incremental --incremental-basedir=${FULLLSN} |ssh ${USER}@${IPLIST} "xbstream -x -C ${IncreDataPath}/${DateFormat}"
if [ $? -eq 0 ];then
echo "$($Time) Ended Database Incremental Backup"
Size=$(ssh $USER@$IPLIST -C "du -sb $IncreDataPath" |awk -F " " '{print $1}')
send_mes "success" "success" "$Size" "increment" "$IncreDataPath"
else
echo -e "\033[41;37m [ERROR] \033[0m Incremental Backup Failure,Please View The Log ${LogPath}/backup_${DateFormat}.log "
send_mes "failed" "failed" "increment" "null" "$IncreDataPath"
exit 1
fi
}
#执行备份
exec_backup()
{
clear_log
soft_check
[ $? -eq 0 ] || { send_mes "failed" "xtrabackup software install fialure" "increment" "null" "$IncreDataPath"; exit 1; }
dir_check
[ $? -eq 0 ] || { send_mes "failed" "relative dirctory create failure" "increment" "null" "$IncreDataPath"; exit 1; }
if [ $? -eq 0 ];then
if [[ ${DateNum} -eq 01 || ! -s ${FULLLSN}/${XTRSTR1} && ! -s ${FULLLSN}/${XTRSTR2} ]];then
full_backup
else
incre_backup
fi
else
echo -e "\033[41;37m[WARNING] \033[0m Please Check The Env $IPLIST "
fi
}
exec_backup #>${LogPath}/backup_${DateFormat}.log 2>&1
2 restore.sh
#!/bin/bash
# version 1.0
# date 2018=07-25
# author jimmyxing
# restore database
export LANG=en_US.UTF-8
# 定义参数
DatabaseName=$1
INCDIR=$2
RecoveryTime=$3
#ServiceIP=$4
USER=root
# 定义日期格式
DateFormat=`date +%F`
Time="date +%T"
DateNum=`date +%d`
#定义日志,文件目录
BinlogPath=/bak/mysql/$DatabaseName/binlog
LogPath=/var/log/restore/$DatabaseName
FullDataPath=/bak/mysql/$DatabaseName/fullbackup
IncreDataPath=/bak/mysql/$DatabaseName/increment
ArchiveDataPath=/bak/mysql/$DatabaseName/archive
#MysqlPath=/appdata/mysql/
# 同步binlog日志脚本
sync_binlog()
{
ssh ${USER}@${ServiceIP} -C "/bin/bash /appdata/scripts/binlog_tran.sh"
}
#检查输入参数
[ $# -eq 3 ] || { echo -e "\033[41;37m [ERROR] \033[0m please input database date time \
like test 2018-07-24 18:00:00 "; exit 1; }
#检查数据日志目录
[ -d ${LogPath} ]|| { mkdir ${LogPath};chown -R mysql. ${LogPath}; }
#ssh ${USER}@${ServiceIP} -C "[ -d ${MysqlPath} ] || { mkdir ${MysqlPath};chown -R mysql. ${MysqlPath}; }"
# 列出可恢复的备份日期
restore_range()
{
qpress -df ${FullDataPath}/xtrabackup_info.qp /tmp/
backup_firsttime=$(cat /tmp/xtrabackup_info|awk -F " " '/start_time/{print $3,$4}')
backup_inc_list=$(ls ${IncreDataPath}|xargs -n1)
last_binlog=$(ls -lrt ${BinlogPath} |sed -n '$p'|awk -F " " '{print $NF}')
backup_latesttime=$(stat ${BinlogPath}/${last_binlog} | awk -F " " '/Modify/{print $2,$3}'|sed 's/\..*//')
echo "The Full Backup End time: ${backup_firsttime:-NULL}"
echo -e "The List of Incremental Backup DateList: ${backup_inc_list:-NULL}"
echo "The Latest Binlog time: ${backup_latesttime:-NULL}"
}
#mysql 完全恢复流程
full_restore()
{
if [ -f ${FullDataPath}/xtrabackup_info ];then
cd ${FullDataPath} && rm -rf *
cd ${ArchiveDataPath}
tar -zxvf fullbackup_${INCDIR}.tar.gz -C ${FullDataPath}
innobackupex --decompress ${FullDataPath} && { find ${FullDataPath} -name '*.qp'|xargs rm; }
innobackupex --apply-log ${FullDataPath}
Reval=$?
else
innobackupex --decompress ${FullDataPath} && { find ${FullDataPath} -name '*.qp'|xargs rm; }
innobackupex --apply-log ${FullDataPath}
Reval=$?
fi
if [ $Reval -eq 0 ];then
#scp -r ${FullDataPath}/* ${USER}@${ServiceIP}:${MysqlPath}
#ssh ${USER}@${ServiceIP} -C "chown -R mysql. ${MysqlPath};systemctl start mysqld"
echo "Restore fullbackup is success"
else
echo -e "\033[41;37m [ERROR] \033[0m Database Restore failure,Please view logs ${LogPath}/recover_${DateFormat}.log "
exit 1
fi
}
#mysql 增量恢复流程
inc_restore()
{
if [[ -d ${IncreDataPath} && `ls -l ${IncreDataPath}/${INCDIR}|wc -l` -gt 10 ]];then
if [ -f ${FullDataPath}/xtrabackup_info ];then
cd ${ArchiveDataPath}
if ls |grep "${INCDIR:0:7}.*\.tar\.gz";then
cd ${FullDataPath} && rm -rf *
tar -zxvf fullbackup_${INCDIR:0:7}*.tar.gz -C ${FullDataPath}
else
echo -e "\033[41;37m [ERROR] \033[0m The need restore fullbackup data not exist"
exit 1
fi
innobackupex --decompress ${FullDataPath} && { find ${FullDataPath} -name '*.qp' -type f |xargs rm; }
innobackupex --decompress ${IncreDataPath}/${INCDIR} &&{ find ${IncreDataPath}/${INCDIR} -name '*.qp' -type f |xargs rm;}
innobackupex --apply-log --redo-only ${FullDataPath}
innobackupex --apply-log --redo-only ${FullDataPath} --incremental-dir=${IncreDataPath}/${INCDIR}
innobackupex --apply-log ${FullDataPath}
Reval=$?
else
innobackupex --decompress ${FullDataPath} && { find ${FullDataPath} -name '*.qp' -type f |xargs rm; }
innobackupex --decompress ${IncreDataPath}/${INCDIR} &&{ find ${IncreDataPath}/${INCDIR} -name '*.qp' -type f |xargs rm;}
innobackupex --apply-log --redo-only ${FullDataPath}
innobackupex --apply-log --redo-only ${FullDataPath} --incremental-dir=${IncreDataPath}/${INCDIR}
innobackupex --apply-log ${FullDataPath}
Reval=$?
fi
if [ $Reval -eq 0 ];then
#scp -r ${FullDataPath}/* ${USER}@${ServiceIP}:${MysqlPath}
#ssh ${USER}@${ServiceIP} -C "chown -R mysql. ${MysqlPath}; systemctl start mysqld"
echo "Restore increbackp is sueccess"
else
echo -e "\033[41;37m [ERROR] \033[0m Database Restore failure Please view logs ${LogPath}/recover_${DateFormat}.log"
exit 1
fi
else
echo -e "\033[41;37m [ERROR] \033[0m The Directory ${FullDataPath} is Empty or Not Exist"
exit 1
fi
}
#恢复数据库
restore_database()
{
if [[ -d ${FullDataPath} ]];then
if [[ $# -eq 1 && ${#INCDIR} -eq 10 ]];then
if [ -d ${IncreDataPath}/${INCDIR} ];then
inc_restore
elif ls ${ArchiveDataPath} |grep "${INCDIR}" ;then
full_restore
else
echo -e "\033[41;37m [ERROR] \033[0m The need restore data not exist"
exit 1
fi
else
echo -e "\033[41;37m [ERROR] \033[0m Please Input Date Like 2018-07-05"
exit 1
fi
else
echo -e "\033[41;37m [ERROR] \033[0m The backup directory ${FullDataPath} is not exist"
exit 1
fi
}
#二进制文件恢复
binlog_restore()
{
if [[ -d ${BinlogPath} && `ls -l ${BinlogPath}|wc -l` -gt 1 ]];then
echo "" >/dev/null
restore_starttime=$(cat ${FullDataPath}/xtrabackup_info |grep end_time|cut -d" " -f3-4 )
if [ `echo ${restore_starttime} |sed 's/://g'|tr -d '[\-\: ]'` -ge `echo "${INCDIR} ${RecoveryTime}" |sed 's/://g'|tr -d '[\-\: ]'` ];then
echo "The inpute restore time ${RecoveryTime} is later ${restore_starttime}"
exit 1
else
combie_time=$(echo "${INCDIR} ${RecoveryTime}")
mysqlbinlog -vv --start-datetime="${restore_starttime}" --stop-datetime="${combie_time}" ${BinlogPath}/mysql-bin* >recovery_${DatabaseName}_${INCDIR}_${RecoveryTime}.sql
fi
else
echo -e "\033[41;37m [ERROR] \033[0m The BingLog is not exist"
fi
}
#开始执行恢复
restore_range
restore_database ${INCDIR} >${LogPath}/recover_${DateFormat}.log 2>&1
binlog_restore
3 binlog_tran.sh
#!/bin/bash
# date 2018-07-25
# version 1.0
# autor jimmyxing
# rsync同步binlog文件
#定义参数
USER=root
DatabaseName=$1
RemoteIP=$2
#日期格式
DateFormat=`date +"%F_%T"`
#定义binlog日志路径
Locbinlogdir=/var/lib/mysql
Rembinlogdir=/bak/mysql/$DatabaseName/binlog
#定义rsync输出日志
RsyLog=/var/log/rsync
#判断参数
[ $# -ne 2 ] && { echo -e "\033[41;37m[ERROR]\033[0m please input databasename,remoteip parameters"; exit 1; }
#检查目录存在
[ -d ${Locbinlogdir} ] || { echo -e "\033[41;37m[ERROR]\033[0m plear confirm the mysql binlog path"; exit 1; }
ssh ${USER}@${RemoteIP} -C "[ -d ${Rembinlogdir} ] || { mkdir -p ${Rembinlogdir}; }"
[ -d ${RsyLog} ] || { mkdir -p ${RsyLog}; }
#检查rsync软件安装
if dpkg -L rsync >/dev/null;then
echo "" >/dev/null
else
apt-get update
apt-get install rsync -y
if [ $? -eq 0 ];then
echo "install rsync success"
else
echo "install rsync failure"
fi
fi
#开始配置同步
rsync -avz --delete ${Locbinlogdir}/mysql-bin.[0-9]* ${USER}@${RemoteIP}:${Rembinlogdir} > ${RsyLog}/rsync_${DateFormat}.log
四、关于备份myisam 存储引擎表,flush table with read lock 问题和处理
1 修改myisam存储引起表为innodb
2 调整innobckupex备份参数
/usr/bin/innobackupex --user=${USER} --password=${PASSWD} --ftwrl-wait-threshold=60 --ftwrl-wait-timeout=120 --kill-long-queries-timeout=60 --kill-long-query-type=all --stream=${STREAM} --compress --extra-lsndir=${INCLSN}/${DateFormat} ${TMPDIR} \
--incremental --incremental-basedir=${FULLLSN} |ssh ${USER}@${IPLIST} "xbstream -x -C ${IncreDataPath}/${DateFormat}"
xtrabackup对于flush tables with read lock操作的设置
--kill-long-queries-timeout=N 指的是执行flush tables with read lock以后,如果flush操作被卡了N秒,则杀掉卡住它的线程,默认0的情况就是不杀死任何卡住flush的sql,直到该sql执行完成
--kill-long-query-type=all|select 被kill的sql种类,默认为全部
--ftwrl-wait-timeout=M 指的是执行flush tables with read lock以前,如果检测到存在长SQL,先等待M秒,如果超过M秒还存在长SQL,则备份报错退出。默认为0表示立即执行flush tables with read lock
--ftwrl-wait-threshold=X 指的是执行flush tables with read lock以前,检测长SQL的方法,如果在执行flush以前存在已经运行了超过X秒的SQL,则将该SQL定义为长SQL,默认60s,如果没有定义-ftwrl-wait-timeout=M,那么这个参数设置无效
使用概述:
--kill-long-queries-timeout是比较粗暴的设置,--ftwrl-wait-timeout是比较温和的设置,根据业务需求两者选其一就行,当然如果两个参数都设置,那么应该是这样的执行逻辑:
1 执行flush tables with read lock以前,如果存在超过--ftwrl-wait-threshold=X秒的长SQL,则等待M秒,如果M秒后还存在长SQL,则备份报错退出
2 如果M秒后没有长SQL了,则执行flush tables with read lock,执行以后,如果flush 操作被卡了N秒,则杀掉卡住它的线程·