postgres主备库切换测试

主备库切换常用的有两种方式，第一种是使用触发文件，9.0之前的版本只能使用此种方式切换，第二个是使用命令promote的方式。

切换之前需要查看主备库的角色，查看角色的方式可以参考：

https://blog.csdn.net/m15217321304/article/details/86628843

文件触发方式的主要步骤如下：

1) 配置备库的recovery.conf文件trigger_file参数。

2)关闭主库，关闭方式建议使用 -m fast，干净的关闭

3）在备库上面创建trigger_file指定的文件，如果备库激活成功，recovery.conf会变成recovery.done

4) 原主库创建recovery.conf文件，然后按照配置备库的方式修改recovery.conf文件。

5）启动原主库(现在的备库)

1、主库IP 192.168.40.130 主机名:postgres 端口:5442

备库IP 192.168.40.131 主机名:postgreshot 端口:5442

扫描二维码关注公众号，回复： 5913149 查看本文章

2、查询当前数据库状态

postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 43754
usesysid         | 16384
usename          | replica
application_name | walreceiver
client_addr      | 192.168.40.131
client_hostname  | 
client_port      | 36568
backend_start    | 2019-01-24 16:01:03.500056-05
backend_xmin     | 582
state            | streaming
sent_lsn         | 0/3033760
write_lsn        | 0/3033760
flush_lsn        | 0/3033760
replay_lsn       | 0/3033760
write_lag        | 
flush_lag        | 
replay_lag       | 
sync_priority    | 0
sync_state       | async

3、配置备库的recovery.conf文件

[postgres@postgreshot pg11]$ hostname
postgreshot
[postgres@postgreshot pg11]$ cat recovery.conf |grep -iv '^#'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.40.130 port=5442 user=replica'         # e.g. 'host=localhost port=5432'
trigger_file = '/home/postgres/pg11/trigger'
[postgres@postgreshot pg11]$

备注：主备库的认证方式使用的.pgpass方式，建议不要把密码文件直接配置到recovery.conf文件里面。调整以上参数需要重启才能生效。

4、停止主库

[postgres@postgres ~]$ hostname
postgres
[postgres@postgres ~]$ pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped
[postgres@postgres ~]$

5、在备库上激活成主库

激活之前的cluster state

[postgres@postgreshot pg11]$ pg_controldata | grep 'cluster'
Database cluster state: in archive recovery
[postgres@postgreshot pg11]$

创建激活需要的文件

[postgres@postgreshot pg11]$ touch /home/postgres/pg11/trigger
[postgres@postgreshot pg11]$ ls -ltr
total 140
drwx------ 4 postgres postgres  4096 Jan 24 08:03 pg_multixact
-rwx------ 1 postgres postgres  1636 Jan 24 08:03 pg_ident.conf
-rwx------ 1 postgres postgres   224 Jan 24 08:03 backup_label.old
-rwx------ 1 postgres postgres    88 Jan 24 08:03 postgresql.auto.conf
-rwx------ 1 postgres postgres     3 Jan 24 08:03 PG_VERSION
drwx------ 2 postgres postgres  4096 Jan 24 08:03 pg_commit_ts
drwx------ 2 postgres postgres  4096 Jan 24 08:03 pg_twophase
drwx------ 2 postgres postgres  4096 Jan 24 08:03 pg_tblspc
drwx------ 2 postgres postgres  4096 Jan 24 08:03 pg_serial
drwx------ 2 postgres postgres  4096 Jan 24 08:03 pg_replslot
drwx------ 2 postgres postgres  4096 Jan 24 08:03 pg_dynshmem
drwx------ 2 postgres postgres  4096 Jan 24 08:03 pg_xact
drwx------ 2 postgres postgres  4096 Jan 24 08:03 pg_snapshots
drwx------ 2 postgres postgres  4096 Jan 24 08:15 pg_subtrans
-rwx------ 1 postgres postgres 24406 Jan 24 10:31 postgresql.conf
drwx------ 5 postgres postgres  4096 Jan 24 10:36 base
-rwx------ 1 postgres postgres  4705 Jan 24 11:40 pg_hba.conf
-rwx------ 1 postgres postgres  5923 Jan 24 20:47 recovery.done <=====从recovery.conf变成了done

注：触发文件的位置必须和recovery.conf中配置的一致，激活之后conf变成了done。

激活之后的cluster state

[postgres@postgreshot pg11]$ pg_controldata |grep 'cluster'
Database cluster state: in production
[postgres@postgreshot pg11]$

6、将新主库(主机名：postgreshot)的recovery.done文件拷贝到原主库

[postgres@postgreshot pg11]$ scp recovery.done postgres:/home/postgres/pg11/
recovery.done 100% 5923 5.8KB/s 00:00
[postgres@postgreshot pg11]$

7、把recovery.done修改成recovery.conf，并修改如下内容

[postgres@postgres pg11]$ cat recovery.conf |grep -iv '^#'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.40.131 port=5442 user=replica'         # e.g. 'host=localhost port=5432'
trigger_file = '/home/postgres/pg11/trigger'
[postgres@postgres pg11]$

8、启动备库

[postgres@postgres pg11]$ 
[postgres@postgres pg11]$ hostname
postgres
[postgres@postgres pg11]$ pg_ctl start
waiting for server to start....2019-01-24 21:00:59.533 EST [77641] LOG:  listening on IPv4 address "0.0.0.0", port 5442
2019-01-24 21:00:59.533 EST [77641] LOG:  listening on IPv6 address "::", port 5442
2019-01-24 21:00:59.541 EST [77641] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5442"
2019-01-24 21:00:59.555 EST [77642] LOG:  database system was shut down at 2019-01-24 20:51:19 EST
2019-01-24 21:00:59.555 EST [77642] LOG:  entering standby mode
2019-01-24 21:00:59.581 EST [77642] LOG:  consistent recovery state reached at 0/30337D0
2019-01-24 21:00:59.581 EST [77642] LOG:  invalid record length at 0/30337D0: wanted 24, got 0
2019-01-24 21:00:59.582 EST [77641] LOG:  database system is ready to accept read only connections
2019-01-24 21:00:59.659 EST [77646] LOG:  fetching timeline history file for timeline 2 from primary server
 done
server started
[postgres@postgres pg11]$ 2019-01-24 21:00:59.690 EST [77646] LOG:  started streaming WAL from primary at 0/3000000 on timeline 1
2019-01-24 21:00:59.693 EST [77646] LOG:  replication terminated by primary server
2019-01-24 21:00:59.693 EST [77646] DETAIL:  End of WAL reached on timeline 1 at 0/30337D0.
2019-01-24 21:00:59.704 EST [77642] LOG:  new target timeline is 2
2019-01-24 21:00:59.706 EST [77646] LOG:  restarted WAL streaming at 0/3000000 on timeline 2
2019-01-24 21:00:59.973 EST [77642] LOG:  redo starts at 0/30337D0

[postgres@postgres pg11]$ ps -ef |grep postgres
root      64511  63718  0 19:05 pts/4    00:00:00 su - postgres
postgres  64512  64511  0 19:05 pts/4    00:00:00 -bash
postgres  77641      1  0 21:00 pts/4    00:00:00 /opt/pgsql11/bin/postgres
postgres  77642  77641  0 21:00 ?        00:00:00 postgres: startup   recovering 000000020000000000000003
postgres  77643  77641  0 21:00 ?        00:00:00 postgres: checkpointer   
postgres  77644  77641  0 21:00 ?        00:00:00 postgres: background writer   
postgres  77645  77641  0 21:00 ?        00:00:00 postgres: stats collector   
postgres  77646  77641  0 21:00 ?        00:00:00 postgres: walreceiver   streaming 0/3033918
postgres  77683  64512  0 21:01 pts/4    00:00:00 ps -ef
postgres  77684  64512  0 21:01 pts/4    00:00:00 grep postgres
[postgres@postgres pg11]$

9、查询cluster state

[postgres@postgres pg11]$ pg_controldata |grep 'cluster'
Database cluster state:               in archive recovery
[postgres@postgres pg11]$

可以看到已经切换成功。

二、第二种切换方式pg_ctl promote方式

命令格式如下：

pg_ctl promote [-D datadir]

promote命令发出后，运行中的备库将停止恢复模式并切换成读写模式的主库。

切换步骤：

1)关闭主库，建议使用-m fast的模式

2)在备库上面执行pg_ctl promote 命令激活成主库，如果recovery.conf 变成了recovery.done表示备库已切换成主库。

3）在原主库创建recovery.conf文件。

4）启动原主库

演示过程

因为上面已经把主库postgres切换成了备库，postgreshot备库已经切换成主库.

当前主库

[postgres@postgreshot pg11]$ hostname
postgreshot
[postgres@postgreshot pg11]$ pg_controldata | grep 'cluster'
Database cluster state: in production
[postgres@postgreshot pg11]$

当前备库

[postgres@postgres pg11]$ hostname
postgres
[postgres@postgres pg11]$ pg_controldata |grep 'cluster'
Database cluster state: in archive recovery
[postgres@postgres pg11]$

1、关闭主库

[postgres@postgreshot pg11]$ hostname
postgreshot
[postgres@postgreshot pg11]$ pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped
[postgres@postgreshot pg11]$

2、在备库执行promote命名激活成主库

[postgres@postgres pg11]$ cat recovery.conf |grep -iv '^#'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.40.131 port=5442 user=replica' # e.g. 'host=localhost port=5432'
[postgres@postgres pg11]$

[postgres@postgres pg11]$ pg_ctl promote -D /home/postgres/pg11
waiting for server to promote.... done
server promoted
[postgres@postgres pg11]$
[postgres@postgres pg11]$

[postgres@postgres pg11]$ ls -ltr recovery*
-rwx------ 1 postgres postgres 5923 Jan 24 20:59 recovery.done
[postgres@postgres pg11]$ pg_controldata |grep 'cluster'
Database cluster state: in production
[postgres@postgres pg11]$

已经切换成主备。

3、把原主库的recovery.done修改成recovery.conf

[postgres@postgreshot pg11]$ mv recovery.done recovery.conf
[postgres@postgreshot pg11]$ cat recovery.conf |grep -iv '^#'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.40.130 port=5442 user=replica' # e.g. 'host=localhost port=5432'
[postgres@postgreshot pg11]$

4、启动原主库

[postgres@postgreshot pg11]$ pg_ctl start

done
server started
[postgres@postgreshot pg11]$

[postgres@postgreshot pg11]$ pg_controldata |grep 'cluster'
Database cluster state: in archive recovery
[postgres@postgreshot pg11]$

postgres主备库切换测试

猜你喜欢