postgresql 高可用 etcd + patroni 之 patroni

os: centos 7.4
postgresql: 9.6.9
etcd: 3.2.18
patroni: 1.4.4

patroni + etcd 是在一个postgrsql 开源大会上亚信的一个哥们讲解的高可用方案。
依然是基于 postgreql stream replication。

ip规划
192.168.56.101 node1 master
192.168.56.102 node2 slave
192.168.56.102 node3 slave

安装etcd

参考上一篇blog安装好etcd

安装postgresql并配置好stream

node1节点上注意设置如下几个参数，node2、node3的值需要相应变化

synchronous_commit = on
synchronous_standby_names = 'pg96_102,pg96_103'
max_replication_slots = 10

创建复制槽，至关重要，patroni 用到了这个玩意

select * from pg_create_physical_replication_slot('pg96_101');
select * from pg_create_physical_replication_slot('pg96_102');
select * from pg_create_physical_replication_slot('pg96_103');

查看复制状态

select version();
                                                 version                                                  
----------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16), 64-bit
(1 row)


select client_addr,
       pg_xlog_location_diff(sent_location, write_location) as write_delay,
       pg_xlog_location_diff(sent_location, flush_location) as flush_delay,
       pg_xlog_location_diff(sent_location, replay_location) as replay_delay 
 from pg_stat_replication;

  client_addr   | write_delay | flush_delay | replay_delay 
----------------+-------------+-------------+--------------
 192.168.56.102 |           0 |           0 |            0
 192.168.56.103 |           0 |           0 |            0
(2 row)

下载、安装 patroni

# cd /tmp
# curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
# python get-pip.py
# pip install patroni

patroni的一些依赖

urllib3>=1.19.1,!=1.21
boto
psycopg2>=2.5.4
PyYAML
requests
six>=1.7
kazoo>=1.3.1
python-etcd>=0.4.3,<0.5
python-consul>=0.7.0
click>=4.1
prettytable>=0.7
tzlocal
python-dateutil
psutil
cdiff
kubernetes>=2.0.0,<=6.0.0,!=4.0.*,!=5.0.*

patroni 的配置

# which patroni
/usr/bin/patroni

# patroni --help
/usr/lib64/python2.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)
Usage: /usr/bin/patroni config.yml
    Patroni may also read the configuration from the PATRONI_CONFIGURATION environment variable

错误提示 please use “pip install psycopg2-binary” instead

# pip install psycopg2-binary

patroni 配置文件

# mkdir -p /usr/patroni/conf
# cd /usr/patroni/conf/

# vi postgresql.yml

scope: pg96
namespace: /pg96/
name: pg96_101

restapi:
  listen: 127.0.0.1:8008
  connect_address: 127.0.0.1:8008

etcd:
  host: 192.168.56.101:2379

bootstrap:
  # this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster
  # and all other cluster members will use it as a `global configuration`
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
#    master_start_timeout: 300
#    synchronous_mode: false
    postgresql:
      use_pg_rewind: true
#      use_slots: true
      parameters:
         listen_addresses: "*"
         port: 5432
#        wal_level: hot_standby
#        hot_standby: "on"
#        wal_keep_segments: 8
#        max_wal_senders: 10
#        max_replication_slots: 10
#        wal_log_hints: "on"
#        archive_mode: "on"
#        archive_timeout: 1800s
#        archive_command: mkdir -p ../wal_archive && test ! -f ../wal_archive/%f && cp %p ../wal_archive/%f
#      recovery_conf:
#        restore_command: cp ../wal_archive/%f %p

postgresql:
  listen: 192.168.56.101:5432
  connect_address: 192.168.56.101:5432
  data_dir: /var/lib/pgsql/9.6/data
  bin_dir: /usr/pgsql-9.6/bin
  authentication:
    replication:
      username: replicator
      password: 1qaz2wsx
    superuser:
      username: postgres
      password: 1qaz2wsx

启动 patroni

node1、node2、node3 三个节点依次启动

扫描二维码关注公众号，回复： 2433412 查看本文章

$ patroni /usr/patroni/conf/postgresql.yml

node1 的日志如下

2018-07-11 18:17:22,402 INFO: Lock owner: pg96_101; I am pg96_101
2018-07-11 18:17:22,430 INFO: no action.  i am the leader with the lock
2018-07-11 18:17:32,403 INFO: Lock owner: pg96_101; I am pg96_101
2018-07-11 18:17:32,432 INFO: no action.  i am the leader with the lock

node2 的日志如下

2018-07-11 18:17:22,421 INFO: Lock owner: pg96_101; I am pg96_102
2018-07-11 18:17:22,421 INFO: does not have lock
2018-07-11 18:17:22,435 INFO: no action.  i am a secondary and i am following a leader
2018-07-11 18:17:32,426 INFO: Lock owner: pg96_101; I am pg96_102
2018-07-11 18:17:32,426 INFO: does not have lock
2018-07-11 18:17:32,436 INFO: no action.  i am a secondary and i am following a leader

node3 的日志如下

2018-07-11 18:17:22,409 INFO: Lock owner: pg96_101; I am pg96_103
2018-07-11 18:17:22,410 INFO: does not have lock
2018-07-11 18:17:22,423 INFO: no action.  i am a secondary and i am following a leader
2018-07-11 18:17:32,415 INFO: Lock owner: pg96_101; I am pg96_103
2018-07-11 18:17:32,415 INFO: does not have lock
2018-07-11 18:17:32,425 INFO: no action.  i am a secondary and i am following a leader

查看集群状态

查看 patroni 集群状态

$ patronictl -c /usr/patroni/conf/postgresql.yml list pg96
+---------+----------+----------------+--------+---------+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | Lag in MB |
+---------+----------+----------------+--------+---------+-----------+
|   pg96  | pg96_101 | 192.168.56.101 | Leader | running |       0.0 |
|   pg96  | pg96_102 | 192.168.56.102 |        | running |       0.0 |
|   pg96  | pg96_103 | 192.168.56.103 |        | running |       0.0 |
+---------+----------+----------------+--------+---------+-----------+

$ patronictl -c /usr/patroni/conf/postgresql.yml show-config pg96
loop_wait: 10
maximum_lag_on_failover: 1048576
postgresql:
  parameters:
    listen_addresses: '*'
    port: 5432
  use_pg_rewind: true
retry_timeout: 10
ttl: 30

查看 etcd 的信息

$ etcdctl ls /pg96/pg96/
/pg96/pg96/members
/pg96/pg96/initialize
/pg96/pg96/leader
/pg96/pg96/config
/pg96/pg96/optime

$ etcdctl get /pg96/pg96/members/pg96_101
{"conn_url":"postgres://192.168.56.101:5432/postgres","api_url":"http://127.0.0.1:8008/patroni","timeline":1,"state":"running","role":"master","xlog_location":50378640}
$ etcdctl get /pg96/pg96/members/pg96_102
{"conn_url":"postgres://192.168.56.102:5432/postgres","api_url":"http://127.0.0.1:8008/patroni","timeline":1,"state":"running","role":"replica","xlog_location":50378640}
$ etcdctl get /pg96/pg96/members/pg96_103
{"conn_url":"postgres://192.168.56.103:5432/postgres","api_url":"http://127.0.0.1:8008/patroni","timeline":1,"state":"running","role":"replica","xlog_location":50378640}

$ etcdctl get /pg96/pg96/initialize
6576484813966394513

$ etcdctl get /pg96/pg96/leader
pg96_101

$ etcdctl get /pg96/pg96/config
{"ttl":30,"maximum_lag_on_failover":1048576,"retry_timeout":10,"postgresql":{"use_pg_rewind":true,"parameters":{"listen_addresses":"*","port":5432}},"loop_wait":10}

$ etcdctl get /pg96/pg96/optime/leader
50378640

验证failover

node1 的 master 关闭

# systemctl stop postgresql-9.6.service

mode1 的 patroni 马上就有信息输出

2018-07-11 21:43:52,402 INFO: Lock owner: pg96_101; I am pg96_101
2018-07-11 21:43:52,441 INFO: no action.  i am the leader with the lock
2018-07-11 21:44:02,405 WARNING: Postgresql is not running.
2018-07-11 21:44:02,406 INFO: Lock owner: pg96_101; I am pg96_101
2018-07-11 21:44:02,444 INFO: Lock owner: pg96_101; I am pg96_101
2018-07-11 21:44:02,455 INFO: starting as readonly because i had the session lock
2018-07-11 21:44:02,456 INFO: closed patroni connection to the postgresql cluster
2018-07-11 21:44:02,491 INFO: postmaster pid=11705
192.168.56.101:5432 - no response
< 2018-07-11 21:44:02.525 CST > LOG:  redirecting log output to logging collector process
< 2018-07-11 21:44:02.525 CST > HINT:  Future log output will appear in directory "pg_log".
192.168.56.101:5432 - accepting connections
192.168.56.101:5432 - accepting connections
2018-07-11 21:44:03,555 INFO: Lock owner: pg96_101; I am pg96_101
2018-07-11 21:44:03,555 INFO: establishing a new patroni connection to the postgres cluster
2018-07-11 21:44:03,597 INFO: promoted self to leader because i had the session lock
server promoting
2018-07-11 21:44:03,603 INFO: cleared rewind state after becoming the leader

看到日志输出，马上就把 master 拉起来了。

node1 的 os 掉电

节点掉电是一种极端的情况，在各种ha架构中都会模拟。
可以看到其中一个节点的patroni 很快就有信息输出

2018-07-11 21:49:44,632 INFO: Lock owner: pg96_101; I am pg96_103
2018-07-11 21:49:44,632 INFO: does not have lock
2018-07-11 21:49:44,642 INFO: no action.  i am a secondary and i am following a leader
2018-07-11 21:49:55,140 INFO: Selected new etcd server http://192.168.56.101:2379
2018-07-11 21:49:57,643 WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=0, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f2072ca2b50>, u'Connection to 192.168.56.101 timed out. (connect timeout=2.5)')': /v2/keys/pg96/pg96/?recursive=true
2018-07-11 21:50:00,148 ERROR: Request to server http://192.168.56.101:2379 failed: MaxRetryError(u"HTTPConnectionPool(host=u'192.168.56.101', port=2379): Max retries exceeded with url: /v2/keys/pg96/pg96/?recursive=true (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f2072ca2c10>, u'Connection to 192.168.56.101 timed out. (connect timeout=2.5)'))",)
2018-07-11 21:50:00,149 INFO: Reconnection allowed, looking for another server.
2018-07-11 21:50:00,149 INFO: Selected new etcd server http://192.168.56.102:2379
2018-07-11 21:50:00,172 INFO: Lock owner: pg96_101; I am pg96_103
2018-07-11 21:50:00,172 INFO: does not have lock
2018-07-11 21:50:00,191 INFO: no action.  i am a secondary and i am following a leader
2018-07-11 21:50:05,137 INFO: Selected new etcd server http://192.168.56.103:2379
2018-07-11 21:50:05,141 INFO: Lock owner: pg96_101; I am pg96_103
2018-07-11 21:50:05,141 INFO: does not have lock
2018-07-11 21:50:05,146 INFO: no action.  i am a secondary and i am following a leader
2018-07-11 21:50:15,060 INFO: Got response from pg96_102 http://127.0.0.1:8008/patroni: {"database_system_identifier": "6576484813966394513", "postmaster_start_time": "2018-07-11 17:38:41.768 CST", "timeline": 2, "xlog": {"received_location": 50379696, "replayed_timestamp": "2018-07-11 18:03:34.386 CST", "paused": false, "replayed_location": 50379696}, "patroni": {"scope": "pg96", "version": "1.4.4"}, "state": "running", "role": "replica", "server_version": 90609}
2018-07-11 21:50:15,066 INFO: Got response from pg96_101 http://127.0.0.1:8008/patroni: {"database_system_identifier": "6576484813966394513", "postmaster_start_time": "2018-07-11 17:38:41.768 CST", "timeline": 2, "xlog": {"received_location": 50379696, "replayed_timestamp": "2018-07-11 18:03:34.386 CST", "paused": false, "replayed_location": 50379696}, "patroni": {"scope": "pg96", "version": "1.4.4"}, "state": "running", "role": "replica", "server_version": 90609}
2018-07-11 21:50:15,113 WARNING: Could not activate Linux watchdog device: "Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'"
2018-07-11 21:50:15,119 INFO: promoted self to leader by acquiring session lock
server promoting
2018-07-11 21:50:15,190 INFO: cleared rewind state after becoming the leader
2018-07-11 21:50:16,257 INFO: Lock owner: pg96_103; I am pg96_103
2018-07-11 21:50:16,318 INFO: no action.  i am the leader with the lock
2018-07-11 21:50:26,254 INFO: Lock owner: pg96_103; I am pg96_103
2018-07-11 21:50:26,279 INFO: no action.  i am the leader with the lock

看到日志输出有 server promoting。说明该节点的 slave 被提升为新的master

再次查看 patroni 集群状态

$ patronictl -c /usr/patroni/conf/postgresql.yml list pg96
+---------+----------+----------------+--------+---------+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | Lag in MB |
+---------+----------+----------------+--------+---------+-----------+
|   pg96  | pg96_102 | 192.168.56.102 |        | running |       0.0 |
|   pg96  | pg96_103 | 192.168.56.103 | Leader | running |       0.0 |
+---------+----------+----------------+--------+---------+-----------+

果然如预期一样。这个时候再 node3 节点上查看复制情况。

select client_addr,
       pg_xlog_location_diff(sent_location, write_location) as write_delay,
       pg_xlog_location_diff(sent_location, flush_location) as flush_delay,
       pg_xlog_location_diff(sent_location, replay_location) as replay_delay 
 from pg_stat_replication;

  client_addr   | write_delay | flush_delay | replay_delay 
----------------+-------------+-------------+--------------
 192.168.56.102 |           0 |           0 |            0
(1 row)

哈哈。

再启动node1后，查看信息

# ps -ef|grep -i etcd
etcd       996     1  2 21:57 ?        00:00:00 /usr/bin/etcd --name=node1 --data-dir=/var/lib/etcd/node1.etcd --listen-peer-urls=http://192.168.56.101:2380,http://127.0.0.1:2380 --listen-client-urls=http://192.168.56.101:2379,http://127.0.0.1:2379 --initial-advertise-peer-urls=http://192.168.56.101:2380 --advertise-client-urls=http://192.168.56.101:2379 --initial-cluster=node1=http://192.168.56.101:2380,node2=http://192.168.56.102:2380,node3=http://192.168.56.103:2380 --initial-cluster-token=etcd-cluster --initial-cluster-state=new
root      1486  1332  0 21:57 pts/0    00:00:00 grep --color=auto -i etcd

patroni 没有起来，需要设置成service，随机启动。手动启动postgresql, patroni

$ mv recovery.done recovery.conf
$ cat recovery.conf 
primary_slot_name = 'pg96_101'
standby_mode = 'on'
recovery_target_timeline = 'latest'
primary_conninfo = 'user=replicator password=1qaz2wsx host=192.168.56.103 port=5432 sslmode=prefer sslcompression=1 application_name=pg96_101'
# systemctl start postgresql-9.6.service 
$ patroni /usr/patroni/conf/postgresql.yml

查看 patroni 集群状态后，发现node1的postgreql居然没有加进去。

$ patronictl -c /usr/patroni/conf/postgresql.yml list pg96
+---------+----------+----------------+--------+---------+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | Lag in MB |
+---------+----------+----------------+--------+---------+-----------+
|   pg96  | pg96_102 | 192.168.56.102 |        | running |       0.0 |
|   pg96  | pg96_103 | 192.168.56.103 | Leader | running |       0.0 |
+---------+----------+----------------+--------+---------+-----------+

查看日志后提示信息为 “replication slot “”pg96_101”” does not exist”,,,,,,,,,”pg96_101”
奇怪了，前面明明创建了 pg96_101 的slot，node3的日志居然提示没有。

postgres=# select * from pg_replication_slots;
-[ RECORD 1 ]-------+----------
slot_name           | pg96_102
plugin              | 
slot_type           | physical
datoid              | 
database            | 
active              | t
active_pid          | 19347
xmin                | 
catalog_xmin        | 
restart_lsn         | 0/300C058
confirmed_flush_lsn |

确实没有，汗，那就再尝试创建一个吧。

select * from pg_create_physical_replication_slot('pg96_101');

node3 的日志里提示：

$ tail -n 1000 postgresql-2018-07-11.csv
2018-07-11 22:14:27.247 CST,"replicator","",19982,"192.168.56.101:53204",5b4610c3.4e0e,3,"idle",2018-07-11 22:14:27 CST,5/0,0,ERROR,42704,"replication slot ""pg96_101"" does not exist",,,,,,,,,"pg96_101"
2018-07-11 22:14:27.248 CST,"replicator","",19982,"192.168.56.101:53204",5b4610c3.4e0e,4,"idle",2018-07-11 22:14:27 CST,,0,LOG,00000,"disconnection: session time: 0:00:00.005 user=replicator database= host=192.168.56.101 port=53204",,,,,,,,,"pg96_101"
2018-07-11 22:14:28.367 CST,"postgres","postgres",19929,"[local]",5b461068.4dd9,6,"SELECT",2018-07-11 22:12:56 CST,4/0,0,LOG,00000,"duration: 29.076 ms",,,,,,,,,"psql"
2018-07-11 22:14:30.777 CST,"postgres","postgres",19929,"[local]",5b461068.4dd9,7,"SELECT",2018-07-11 22:12:56 CST,4/0,0,LOG,00000,"duration: 1.060 ms",,,,,,,,,"psql"
2018-07-11 22:14:32.249 CST,,,19984,"192.168.56.101:53206",5b4610c8.4e10,1,"",2018-07-11 22:14:32 CST,,0,LOG,00000,"connection received: host=192.168.56.101 port=53206",,,,,,,,,""
2018-07-11 22:14:32.252 CST,"replicator","",19984,"192.168.56.101:53206",5b4610c8.4e10,2,"authentication",2018-07-11 22:14:32 CST,5/121,0,LOG,00000,"replication connection authorized: user=replicator",,,,,,,,,""

2018-07-11 22:14:32.315 CST,"replicator","",19984,"192.168.56.101:53206",5b4610c8.4e10,3,"streaming 0/300C138",2018-07-11 22:14:32 CST,5/0,0,LOG,00000,"standby ""pg96_101"" is now a synchronous standby with priority 1",,,,,,,,,"pg96_101"
2018-07-11 22:14:36.255 CST,"postgres","postgres",16066,"192.168.56.103:54424",5b45d88a.3ec2,1453,"SELECT",2018-07-11 18:14:34 CST,2/0,0,LOG,00000,"duration: 0.448 ms",,,,,,,,,"Patroni"
2018-07-11 22:14:46.258 CST,"postgres","postgres",16066,"192.168.56.103:54424",5b45d88a.3ec2,1454,"SELECT",2018-07-11 18:14:34 CST,2/0,0,LOG,00000,"duration: 0.568 ms",,,,,,,,,"Patroni"

稍等一会后，已经可以看到node1 已经加到slave里了

postgres=# select client_addr,
       pg_xlog_location_diff(sent_location, write_location) as write_delay,
       pg_xlog_location_diff(sent_location, flush_location) as flush_delay,
       pg_xlog_location_diff(sent_location, replay_location) as replay_delay 
 from pg_stat_replication;
  client_addr   | write_delay | flush_delay | replay_delay 
----------------+-------------+-------------+--------------
 192.168.56.102 |           0 |           0 |            0
 192.168.56.101 |           0 |           0 |            0
(2 rows)

但是用 patronictl 还是查看不到 node1的信息。

$ patronictl -c /usr/patroni/conf/postgresql.yml list pg96
+---------+----------+----------------+--------+---------+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | Lag in MB |
+---------+----------+----------------+--------+---------+-----------+
|   pg96  | pg96_102 | 192.168.56.102 |        | running |       0.0 |
|   pg96  | pg96_103 | 192.168.56.103 | Leader | running |       0.0 |
+---------+----------+----------------+--------+---------+-----------+

等了一段时间，又ok了

$ patronictl -c /usr/patroni/conf/postgresql.yml list pg96
+---------+----------+----------------+--------+---------+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | Lag in MB |
+---------+----------+----------------+--------+---------+-----------+
|   pg96  | pg96_101 | 192.168.56.101 |        | running |       0.0 |
|   pg96  | pg96_102 | 192.168.56.102 |        | running |       0.0 |
|   pg96  | pg96_103 | 192.168.56.103 | Leader | running |       0.0 |
+---------+----------+----------------+--------+---------+-----------+

手动 switchover

切换前的状态

$ patronictl -c /usr/patroni/conf/postgresql.yml list pg96
+---------+----------+----------------+--------+---------+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | Lag in MB |
+---------+----------+----------------+--------+---------+-----------+
|   pg96  | pg96_101 | 192.168.56.101 |        | running |       0.0 |
|   pg96  | pg96_102 | 192.168.56.102 |        | running |       0.0 |
|   pg96  | pg96_103 | 192.168.56.103 | Leader | running |       0.0 |
+---------+----------+----------------+--------+---------+-----------+

执行手动切换

$ patronictl -c /usr/patroni/conf/postgresql.yml switchover
Master [pg96_103]: pg96_103
Candidate ['pg96_101', 'pg96_102'] []: pg96_101
When should the switchover take place (e.g. 2015-10-01T14:30)  [now]: now
Current cluster topology
+---------+----------+----------------+--------+---------+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | Lag in MB |
+---------+----------+----------------+--------+---------+-----------+
|   pg96  | pg96_101 | 192.168.56.101 |        | running |       0.0 |
|   pg96  | pg96_102 | 192.168.56.102 |        | running |       0.0 |
|   pg96  | pg96_103 | 192.168.56.103 | Leader | running |       0.0 |
+---------+----------+----------------+--------+---------+-----------+
Are you sure you want to switchover cluster pg96, demoting current master pg96_103? [y/N]: y
Switchover failed, details: 503, Switchover failed

node1 的日志如下

2018-07-11 23:26:34,635 INFO: received switchover request with leader=pg96_103 candidate=pg96_101 scheduled_at=None
2018-07-11 23:26:34,645 INFO: Got response from pg96_101 http://127.0.0.1:8008/patroni: {"database_system_identifier": "6576484813966394513", "postmaster_start_time": "2018-07-11 22:03:55.130 CST", "timeline": 3, "xlog": {"received_location": 50385856, "replayed_timestamp": "2018-07-11 22:34:29.725 CST", "paused": false, "replayed_location": 50385856}, "patroni": {"scope": "pg96", "version": "1.4.4"}, "state": "running", "role": "replica", "server_version": 90609}
2018-07-11 23:26:39,126 INFO: Lock owner: pg96_103; I am pg96_101
2018-07-11 23:26:39,126 INFO: does not have lock
2018-07-11 23:26:39,142 INFO: no action.  i am a secondary and i am following a leader

node3 的日志如下

2018-07-11 23:27:06,254 INFO: Lock owner: pg96_103; I am pg96_103
2018-07-11 23:27:06,274 INFO: Got response from pg96_101 http://127.0.0.1:8008/patroni: {"database_system_identifier": "6576484813966394513", "postmaster_start_time": "2018-07-11 17:38:41.768 CST", "timeline": 3, "xlog": {"location": 50385856}, "patroni": {"scope": "pg96", "version": "1.4.4"}, "replication": [{"sync_state": "potential", "sync_priority": 2, "client_addr": "192.168.56.102", "state": "streaming", "application_name": "pg96_102", "usename": "replicator"}, {"sync_state": "sync", "sync_priority": 1, "client_addr": "192.168.56.101", "state": "streaming", "application_name": "pg96_101", "usename": "replicator"}], "state": "running", "role": "master", "server_version": 90609}
2018-07-11 23:27:06,364 INFO: Member pg96_101 exceeds maximum replication lag
2018-07-11 23:27:06,365 WARNING: manual failover: no healthy members found, failover is not possible
2018-07-11 23:27:06,365 INFO: Cleaning up failover key
2018-07-11 23:27:06,389 INFO: no action.  i am the leader with the lock

在node3的日志输出发现 WARNING: manual failover: no healthy members found, failover is not possible
先记录下，研究明白后再补充。

connction

using jdbc:

jdbc:postgresql://node1,node2,node3/postgres?targetServerType=master

libpq starting from postgresql 10:

postgresql://node1:port,node2:port,node3:port/?target_session_attrs=read-write

总结：
个人感觉 etcd + patroni 还是相当不错的，会继续对patroni 研究下。

参考：
https://github.com/zalando/patroni
https://patroni.readthedocs.io/en/latest/
https://pypi.org/project/patroni/

https://github.com/zalando/patroni/blob/master/docs/replication_modes.rst
https://postgresconf.org/system/events/document/000/000/228/Patroni_tutorial_4x3-2.pdf