pgpool + streaming replication mode + slave down up

os: centos7.4
postgresql:9.6.8
pgpool:3.7.3

采用 streaming replication mode 模式，这是比较通用的方案。
The streaming replication mode can be used with PostgreSQL servers operating streaming replication. In this mode, PostgreSQL is responsible for synchronizing databases. This mode is widely used and most recommended way to use Pgpool-II. Load balancing is possible in the mode. The sample configuration file is $prefix/etc/pgpool.conf.sample-stream.

ip 规划

pgpool    192.168.56.100

pgsql1    192.168.56.101
pgsql2    192.168.56.102

pgpool.conf

Streaming replication mode 模式

$ cp pgpool.conf.sample-stream pgpool.conf
$ vi pgpool.conf
#pgpool设置
listen_addresses = '*'
port = 9999
socket_dir = '/tmp'

#pcp设置
pcp_listen_addresses = '*'
pcp_port = 9898
pcp_socket_dir = '/tmp'

#master 节点
backend_hostname0 = 'pgsql1'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/pgsql/9.6/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
#stream slave 节点
backend_hostname1 = 'pgsql2'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/pgsql/9.6/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'

#密码认证
enable_pool_hba = on
pool_passwd = 'pool_passwd'
authentication_timeout = 60

#日志
log_destination = 'syslog,stderr'
log_line_prefix = '%t: pid %p: '
log_connections = on
log_hostname = on
log_statement = off
log_per_node_statement = off
log_standby_delay = 'if_over_threshold'

#复制模式
replication_mode = off
#负载模式
load_balance_mode = on
#stream master/slave模式
master_slave_mode = on
master_slave_sub_mode = 'stream'

#stream check
sr_check_period = 30
sr_check_user = 'replicator'
sr_check_password = 'passw0rd'
sr_check_database = 'postgres'
delay_threshold = 10000000
#health check
health_check_period = 10
health_check_timeout = 20
health_check_user = 'replicator'
health_check_password = 'passw0rd'
health_check_database = 'postgres'
health_check_max_retries = 99999
health_check_retry_delay = 1
connect_timeout = 10000

#online recovery
recovery_user = 'postgres'
recovery_password = 'postgrespostgres'
recovery_1st_stage_command = ''
recovery_2nd_stage_command = ''
recovery_timeout = 90
client_idle_limit_in_recovery = 0

pgpool的使用

pgsql1(master) 上创建用户和数据库

postgres=# create user peiyb with password 'peiybpeiyb';
CREATE ROLE
postgres=# create database peiybdb owner = peiyb;
CREATE DATABASE

连接pgpool

$ pg_md5 -h
$ pg_md5 -m -p -u peiyb
$ cat pool_passwd
peiyb:md5bd0875843854575a4b7328813ea498cb

$ psql -h 192.168.56.100 -p 9999 -d peiybdb -U peiyb
Password for user peiyb:
psql (9.6.9)
Type "help" for help.

peiybdb=>
peiybdb=> show pool_version;
     pool_version     
----------------------
 3.7.3 (amefuriboshi)
(1 row)

peiybdb=> show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | pgsql1   | 5432 | up     | 0.500000  | primary | 0          | false             | 0
 1       | pgsql2   | 5432 | up     | 0.500000  | standby | 0          | true              | 0
(2 rows)

关闭pgsql2的slave

pgsql2的slave关闭，停1分钟，再启动

peiybdb=> show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | pgsql1   | 5432 | up     | 0.500000  | primary | 8          | true              | 0
 1       | pgsql2   | 5432 | down   | 0.500000  | standby | 0          | false             | 0
(2 rows)

此时，登录pgpool能看到 pgsql2，但是 show pool_nodes 的status 依然为 down。
把pgsql1、pgsql2节点的 log_connections、log_disconnections 都设置为 on后，从日志的数据来看，pgsql2 的slave关闭再启动后，pgpool并没有去check。
此时需要执行 pcp_recovery_node 把备库节点的状态变为正常。

$ which pcp_recovery_node
/usr/pgpool/pgpool3.7.3/bin/pcp_recovery_node
$ pcp_recovery_node --help
pcp_recovery_node - recover a node
Usage:
pcp_recovery_node [OPTION...] [node-id]
Options:
  -U, --username=NAME    username for PCP authentication
  -h, --host=HOSTNAME    pgpool-II host
  -p, --port=PORT        PCP port number
  -w, --no-password      never prompt for password
  -W, --password         force password prompt (should happen automatically)
  -n, --node-id=NODEID   ID of a backend node
  -d, --debug            enable debug message (optional)
  -v, --verbose          output verbose messages
  -?, --help             print this help

$ pcp_recovery_node -d -h 192.168.56.100 -p 9898 -U pgpool -W -n 1
Password: 
DEBUG: recv: tos="m", len=8
DEBUG: recv: tos="r", len=21
DEBUG: send: tos="D", len=6
DEBUG: recv: tos="E", len=117
ERROR:  executing remote start failed with error: "ERROR:  pgpool_remote_start failed"
DEBUG: send: tos="X", len=4

分析pgsql1的日志后发现
sh: /var/lib/pgsql/9.6/data/pgpool_remote_start: No such file or directory
< 2018-05-15 14:00:29.546 CST > ERROR: pgpool_remote_start failed
< 2018-05-15 14:00:29.546 CST > STATEMENT: SELECT pgpool_remote_start(‘pgsql2’, ‘/var/lib/pgsql/9.6/data’);

说明在/var/lib/pgsql/9.6/data/下缺少 pgpool_remote_start 这个sh脚本文件
参考pgpool节点的该文件

# more /tmp/pgpool-II-3.7.3/src/sample/pgpool_remote_start
#! /bin/sh

if [ $# -ne 2 ]
then
    echo "pgpool_remote_start remote_host remote_datadir"
    exit 1
fi

DEST=$1
DESTDIR=$2
PGCTL=/usr/local/pgsql/bin/pg_ctl

ssh -T $DEST $PGCTL -w -D $DESTDIR start 2>/dev/null 1>/dev/null < /dev/null &

pgpool_remote_start 脚本文件

pgsql1、pgsql2上创建pgpool_remote_start脚本文件

$ cd /var/lib/pgsql/9.6/data
$ vi pgpool_remote_start
#! /bin/sh

if [ $# -ne 2 ]
then
    echo "pgpool_remote_start remote_host remote_datadir"
    exit 1
fi

DEST=$1
DESTDIR=$2
PGCTL=/usr/pgsql-9.6/bin/pg_ctl

ssh -T $DEST $PGCTL -w -D $DESTDIR start 2>/dev/null 1>/dev/null < /dev/null &

$ chmod 700 pgpool_remote_start

再次执行 pcp_recovery_node

[postgres@pgpool etc]$ pcp_recovery_node -d -h 192.168.56.100 -p 9898 -U pgpool -W -n 1
Password: 
DEBUG: recv: tos="m", len=8
DEBUG: recv: tos="r", len=21
DEBUG: send: tos="D", len=6
DEBUG: recv: tos="c", len=20
pcp_recovery_node -- Command Successful
DEBUG: send: tos="X", len=4

$ psql -h 192.168.56.100 -p 9999 -d peiybdb -U peiyb
peiybdb=> show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | pgsql1   | 5432 | up     | 0.500000  | primary | 0          | true              | 0
 1       | pgsql2   | 5432 | up     | 0.500000  | standby | 0          | false             | 0
(2 rows)

可以看到两个节点都是up状态。