企业级 ##MySQL高可用架构之MHA##

简介:

MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton(现就职于Facebook公司)开发,是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。

该软件由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)。MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。MHA Node运行在每台MySQL服务器上,MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。

在MHA自动故障切换过程中,MHA试图从宕机的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失了最新的数据。使用MySQL 5.5的半同步复制,可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性。

目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,因为至少需要三台服务器,出于机器成本的考虑,淘宝也在该基础上进行了改造,目前淘宝TMHA已经支持一主一从。另外对于想快速搭建的可以参考:MHA快速搭建

我们自己使用其实也可以使用1主1从,但是master主机宕机后无法切换,以及无法补全binlog。master的mysqld进程crash后,还是可以切换成功,以及补全binlog的。

官方介绍:https://code.google.com/p/mysql-master-ha/

工作原理:

1)从宕机崩溃的master保存二进制日志事件(binlog events);

(2)识别含有最新更新的slave;

(3)应用差异的中继日志(relay log)到其他的slave;

(4)应用从master保存的二进制日志事件(binlog events);

(5)提升一个slave为新的master;

(6)使其他的slave连接新的master进行复制;

这里写图片描述


一.环境介绍:
【server5】:主库 172.25.39.5
【server6】:从库 172.25.39.6
【server8】:从库 172.25.39.8
三个数据库搭建GTID主从复制


二.部署MHA:
1.编辑【server5】/etc/masterha/apple.cnf 文件,添加MHA配置:

[server default]
manager_workdir=/etc/masterha/             //设置manager的工作目录
manager_log=/etc/masterha/app1.log          //设置manager的日志
master_binlog_dir=/var/lib/mysql                         //设置master 保存binlog的位置,以便MHA可以找到master的日志,我这里的也就是mysql的数据目录
#master_ip_failover_script= /usr/local/bin/master_ip_failover    //设置自动failover时候的切换脚本
#master_ip_online_change_script= /usr/local/bin/master_ip_online_change  //设置手
动切换时候的切换脚本
password=Xa85215295##         //设置mysql中root用户的密码,这个密码是前文中创建>监控用户的那个密码
user=root               设置监控用户root
ping_interval=1         //设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三>次没有回应的时候自动进行railover
remote_workdir=/tmp     //设置远端mysql在发生切换时binlog的保存位置
repl_password=Xa85215295##    //设置复制用户的密码
repl_user=repl          //设置复制环境中的复制用户名
#report_script=/usr/local/send_report    //设置发生切换后发送的报警的脚本
#secondary_check_script= /usr/local/bin/masterha_secondary_check -s server03 -s server02
shutdown_script=""      //设置故障发生后关闭故障主机脚本(该脚本的主要作用是关闭
主机放在发生脑裂,这里没有使用)
ssh_user=root           //设置ssh的登录用户名

[server5]
hostname=172.25.39.5
port=3306

[server6]
hostname=172.25.39.6
port=3306
candidate_master=1   //设置为候选master,如果设置该参数以后,发生主从切换以后将>会将此从库提升为主库,即使这个主库不是集群中事件最新的slave
check_repl_delay=0   //默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master,因为对于这个slave的恢复需要花费很长时间,
通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时
,这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程
中一定是新的master

[server8]
hostname=172.25.39.8
port=3306

2.配置免密连接:
(1)【server5】生成密钥

[root@server5 ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
1b:a5:82:cc:70:8b:53:90:85:d2:0a:31:ce:0c:fa:8e root@server5
The key's randomart image is:
+--[ RSA 2048]----+
|+o.+.            |
|O.+.             |
|o*. o     .      |
|.. B o   o       |
|  + = . S        |
| o .   . o       |
|E .     .        |
|                 |
|                 |
+-----------------+

(2)发送到各个节点

[root@server5 ~]# ssh-copy-id 172.25.39.5
The authenticity of host '172.25.39.5 (172.25.39.5)' can't be established.
RSA key fingerprint is ce:b7:35:21:60:9f:f3:8d:f4:25:af:73:ad:ad:bc:ab.
Are you sure you want to continue connecting (yes/no)? yes
[root@server5 ~]# scp -r .ssh/ 172.25.39.6:
The authenticity of host '172.25.39.6 (172.25.39.6)' can't be established.
RSA key fingerprint is ce:b7:35:21:60:9f:f3:8d:f4:25:af:73:ad:ad:bc:ab.
Are you sure you want to continue connecting (yes/no)? yes
[root@server5 ~]# scp -r .ssh/ 172.25.39.8:
The authenticity of host '172.25.39.8 (172.25.39.8)' can't be established.
RSA key fingerprint is ce:b7:35:21:60:9f:f3:8d:f4:25:af:73:ad:ad:bc:ab.
Are you sure you want to continue connecting (yes/no)? yes

(3)进行免密测试:
【server5】 其他节点做相同测试

[root@server5 ~]# ssh 172.25.39.5
Last login: Sat Aug 11 09:40:18 2018 from 172.25.39.250
[root@server5 ~]# logout
Connection to 172.25.39.5 closed.
[root@server5 ~]# ssh 172.25.39.6
Last login: Sat Aug 11 09:40:27 2018 from 172.25.39.250
[root@server6 ~]# logout
Connection to 172.25.39.6 closed.
[root@server5 ~]# ssh 172.25.39.8
Last login: Sat Aug 11 10:11:54 2018 from 172.25.39.250
[root@server8 ~]# logout
Connection to 172.25.39.8 closed.

3.检测【server5】与【server6】【server8】的SSH连接状态:

[root@server5 ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Sat Aug 11 11:17:37 2018 - [debug]  Connecting via SSH from root@172.25.39.8(172.25.39.8:22) to root@172.25.39.5(172.25.39.5:22)..
Sat Aug 11 11:17:37 2018 - [debug]   ok.
Sat Aug 11 11:17:37 2018 - [debug]  Connecting via SSH from root@172.25.39.8(172.25.39.8:22) to root@172.25.39.6(172.25.39.6:22)..
Sat Aug 11 11:17:37 2018 - [debug]   ok.
Sat Aug 11 11:17:38 2018 - [info] All SSH connection tests passed successfully.

4.【server5】添加root权限:

mysql> grant all on *.* to root@'%' identified by 'Xa85215295##';
Query OK, 0 rows affected, 1 warning (0.04 sec)

5.检测通过masterha_check_repl脚本整个集群的状态:

[root@server5 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf 
Sat Aug 11 11:33:56 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Aug 11 11:33:56 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sat Aug 11 11:33:56 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf..
MySQL Replication Health is OK.
##NOT OK!!!时检测报错:
【server6】
mysql> set GLOBAL read_only=1;
Query OK, 0 rows affected (0.00 sec)

三.进行master主机切换测试:
1.手动切换master主机:
(1)把【server5】主机切换为【server6】主机

[root@server5 ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf  --master_state=alive --new_master_host=172.25.39.6 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
Sat Aug 11 12:09:31 2018 - [info] * Phase 5: New master cleanup phase..
Sat Aug 11 12:09:31 2018 - [info] 
Sat Aug 11 12:09:31 2018 - [info]  172.25.39.6: Resetting slave info succeeded.
Sat Aug 11 12:09:31 2018 - [info] Switching master to 172.25.39.6(172.25.39.6:3306) completed successfully.
##有successfully表示切换成功

(2)在【server5】与【server8】上进行查看
【server5】

[root@server5 ~]# mysql -pXa85215295##          ##登陆数据库
mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.25.39.6     ##切换成功
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000003
          Read_Master_Log_Pos: 1724
               Relay_Log_File: server5-relay-bin.000003
                Relay_Log_Pos: 841
        Relay_Master_Log_File: binlog.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes        ##连接状态成功

【server8】

[root@server5 ~]# mysql -pXa85215295##
mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.25.39.6
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000001
          Read_Master_Log_Pos: 450
               Relay_Log_File: server8-relay-bin.000003
                Relay_Log_Pos: 657
        Relay_Master_Log_File: binlog.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

如果不是两个yes,则:

【server6】添加授权
mysql> reset master;
Query OK, 0 rows affected (0.22 sec)

mysql> grant all  on *.* to root@'172.25.39.%' identified by 'Xa85215295##';
Query OK, 0 rows affected, 1 warning (0.05 sec)

【server8】 

mysql> stop slave;
Query OK, 0 rows affected (0.04 sec)

mysql> reset slave;
Query OK, 0 rows affected (0.35 sec)

mysql> reset master;
Query OK, 0 rows affected (0.27 sec)

mysql> start slave;
Query OK, 0 rows affected (0.38 sec)

再次查看

2.自动切换master主机:
(1)在manager机上将master进程打入后台

[root@server5 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --ignore_last_failover &
[1] 2956
[root@server5 ~]# nohup: ignoring input and appending output to `nohup.out'

(2)结束掉【server6】master主机

[root@server6 ~]# ps ax             ##查看进程
 2549 pts/0    S      0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket
 2846 pts/0    Sl     0:00 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plu
 2894 pts/0    R+     0:00 ps ax
[root@server6 ~]# kill -9 2549
[root@server6 ~]# kill -9 2846

(3)打开【server6】数据库登陆

[root@server6 ~]# /etc/init.d/mysqld start
Starting mysqld:                                           [  OK  ]
[root@server6 ~]# mysql -pXa85215295##

(3)用【server6】重新连接【server5】master主机

mysql> change master to master_host='172.25.39.5', master_user='repl', master_password='Xa85215295##', master_auto_position=1;
mysql> stop slave;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> reset slave;
Query OK, 0 rows affected (0.27 sec)

mysql> reset master;
Query OK, 0 rows affected (0.12 sec)

mysql> start slave;
Query OK, 0 rows affected (0.07 sec)

(4)在【server5】master主机查看master进程状态

mysql> show master status;
+---------------+----------+--------------+------------------+----------------------------------------------------------------------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                                                                |
+---------------+----------+--------------+------------------+----------------------------------------------------------------------------------+
| binlog.000001 |     1645 |              |                  | d7c9bd1d-9d07-11e8-b333-525400f9cbe2:1,
e6d178a2-9d09-11e8-83df-525400b9273c:1-5 |
+---------------+----------+--------------+------------------+----------------------------------------------------------------------------------+
1 row in set (0.00 sec)

(5)在【server6】【server8】查看连接状态

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.25.39.5         ##主机master已经改变
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000001
          Read_Master_Log_Pos: 1645
               Relay_Log_File: server6-relay-bin.000002
                Relay_Log_Pos: 651
        Relay_Master_Log_File: binlog.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes         ##连接成功
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1645
              Relay_Log_Space: 860
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 5
                  Master_UUID: d7c9bd1d-9d07-11e8-b333-525400f9cbe2
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: d7c9bd1d-9d07-11e8-b333-525400f9cbe2:1
            Executed_Gtid_Set: d7c9bd1d-9d07-11e8-b333-525400f9cbe2:1
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

(4)在【sever5】主机添加数据,在【server6】【server8】查看数据

mysql> use westos
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> insert into userlist values('user3','333');
Query OK, 1 row affected (0.08 sec)

mysql> insert into userlist values('user4','444');
Query OK, 1 row affected (0.21 sec)

mysql> select * from userlist;
+----------+----------+
| username | password |
+----------+----------+
| user1    | 111      |
| user2    | 222      |
| user3    | 333      |
| user4    | 444      |
+----------+----------+
4 rows in set (0.00 sec)

加入VIP主机连接测试:

1.【server5】中修改配置文件:
(1)app1.cnf打开【master_ip_failover】【master_ip_online_change】脚本的注释行

[root@server5 masterha]# pwd
/etc/masterha
[root@server5 masterha]# vim app1.cnf 

这里写图片描述
(2)修改【master_ip_failover】脚本
这里写图片描述
(3)修改【master_ip_online_change】脚本
这里写图片描述
2.移动脚本,加可执行权限:

[root@server5 MHA]# mv master_ip_* /usr/local/bin/
[root@server5 MHA]# cd /usr/local/bin/
[root@server5 bin]# ls
master_ip_failover  master_ip_online_change
[root@server5 bin]# chmod +x *
[root@server5 bin]# ll
total 8
-rwxr-xr-x 1 root root 2172 Aug 11 15:46 master_ip_failover
-rwxr-xr-x 1 root root 3847 Aug 11 15:48 master_ip_online_change

3.【server5】此时为master主机:

[root@server5 ~]# ip addr add 172.25.39.100/24 dev eth0    #加入vip
[root@server5 ~]# mv master_ip_* /usr/local/bin
[root@server5 ~]# cd /usr/local/bin
[root@server5 bin]# chmod +x *
[root@server5 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf  --ignore_last_failover  &     #将进程打入后台

4.连接vip服务端,进行测试:

[root@foundation39 Desktop]# ssh [email protected]
mysql> use westos
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select * from userlist;
+----------+----------+
| username | password |
+----------+----------+
| user1    | 111      |
| user2    | 222      |
| user3    | 333      |
| user4    | 444      |
+----------+----------+
mysql> insert into userlist values ('user5','555');   #添加信息

5.在【server6】【server8】上查看建立的信息:

mysql> use westos
mysql>  select * from userlist;
+----------+----------+
| username | password |
+----------+----------+
| user1    | 111      |
| user2    | 222      |
| user3    | 333      |
| user4    | 444      |
| user5    | 555      |
+----------+----------+
5 rows in set (0.00 sec)

6.【server5】结束数据库端,自动跳转其他master端

[root@server5 ~]# ps ax
 2333 pts/1    S      0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket
 2626 pts/1    Sl     0:14 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plu
 3053 ?        S      0:00 pickup -l -t fifo -u
[root@server5 masterha]# kill -9 2333
[root@server5 masterha]# kill -9 2626

7.【server5】 master切换到server6主机上:

[root@server6 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:b9:27:3c brd ff:ff:ff:ff:ff:ff
    inet 172.25.39.6/24 brd 172.25.39.255 scope global eth0
    inet 172.25.39.100/24 scope global secondary eth0
    inet6 fe80::5054:ff:feb9:273c/64 scope link 
       valid_lft forever preferred_lft forever
查看server5的VIP已经漂移到server6的主机上,表示master已经切换到了server6主机上

8.【server5】将server5主机连接到master上:

[root@server5 ~]# /etc/init.d/mysqld start
[root@server5 ~]# mysql -p
mysql> change master to master_host='172.25.39.6', master_user='repl', master_password='Xa85215295##', master_auto_position=1;
mysql> use westos;
mysql> select * from userlist;
mysql>  select * from userlist;
+----------+----------+
| username | password |
+----------+----------+
| user1    | 111      |
| user2    | 222      |
| user3    | 333      |
| user4    | 444      |
| user5    | 555      |
+----------+----------+
5 rows in set (0.00 sec)

猜你喜欢

转载自blog.csdn.net/China_zgd/article/details/81584461