Redis第2️⃣2️⃣课 Cluster故障转移

一、故障发现

节点间通过ping / pong 消息实现故障发现：不需要sentinel。ping / pong 不仅传播节点槽的信息（参见前面章节），亦可以传播主从状态，节点故障，

1. 主观下线

定义：某一个节点认为另一个节点不可用，“偏见”

主观下线流程

2. 客观下线

定义：当半数以上持有槽的主节点都标记某节点主观下线

客观下线逻辑流程

尝试客观下线

通知集群内所有节点标记故障节点为客观下线
通知故障节点的从节点触发故障转移流程

二、故障恢复

1. 资格检查

每个从节点检查与故障主节点的断线时间
超过(cluster-node-timeout * cluster-slave-validity-factor)取消资格。
cluster-slave-validity-factor默认是10

2. 准备选举时间

准备选举时间

为了保证偏移量大的节点有更小的延迟达到选举时间，为了保证数据的一致性更高。偏移量较大的更有可能成为未来的master节点，所以我们给他更小的选举时间，让它首先达到选举时间，然后完成未来的选举，票数多。

3. 选举投票

选举投票

1）当前从节点取消复制变为主节点（slave of no one）
2）执行clusterDelSlot撤销故障主节点复制的槽，并执行clusterAddSlot 把这些槽分给自己
3）向集群中广播自己的pong消息，表面已经替换了故障主节点。

4. 替换主节点

三、故障转移实战演练

故障演练示例图解

1）kill某主节点

#查询集群节点信息
$ redis-cli --cluster info localhost:7000
localhost:7000 (1ac9fbbf...) -> 2 keys | 5461 slots | 1 slaves.
127.0.0.1:7001 (a3c0d3b4...) -> 2 keys | 5462 slots | 1 slaves. # 将要kill掉的主节点
127.0.0.1:7002 (a89a427b...) -> 1 keys | 5461 slots | 1 slaves.

#查看某节点的进程号
$ redis-cli -p 7002 info Server | grep process_id
process_id:4386
# 循环遍历查询程序报异常，过一会儿自己好了
kill 4386

$ redis-cli --cluster info localhost:7000
Could not connect to Redis at 127.0.0.1:7002: Connection refused
localhost:7000 (1ac9fbbf...) -> 2 keys | 5461 slots | 1 slaves.
127.0.0.1:7005 (09792d31...) -> 1 keys | 5461 slots | 0 slaves. #新主节点
127.0.0.1:7001 (a3c0d3b4...) -> 2 keys | 5462 slots | 1 slaves.

$ redis-cli -p 7000 cluster slots
1) 1) (integer) 10923
   2) (integer) 16383
   3) 1) "127.0.0.1"
      2) (integer) 7005
      3) "09792d31e728ad714a5a90bc7639f277d817fb4e"
2) 1) (integer) 5461
   2) (integer) 10922
   3) 1) "127.0.0.1"
      2) (integer) 7001
      3) "a3c0d3b42da023dc402faf439d4f93a1cb44d402"
   4) 1) "127.0.0.1"
      2) (integer) 7004
      3) "5a4f085dee8400093f45ce2cfa42cbd206167f73"
3) 1) (integer) 0
   2) (integer) 5460
   3) 1) "127.0.0.1"
      2) (integer) 7000
      3) "1ac9fbbfe11362e151204132e3d110b18139a1d9"
   4) 1) "127.0.0.1"
      2) (integer) 7003
      3) "2d19dda2a8a790d5636a664fe3ed54aa3dd7677c"

2）新晋主节点日志：`redis-cluster-7005.log`（原被kill掉的master的slave）

 79 4394:S 31 May 2019 12:09:53.401 # Connection with master lost.
 80 4394:S 31 May 2019 12:09:53.404 * Caching the disconnected master state.
 81 4394:S 31 May 2019 12:09:53.971 * Connecting to MASTER 127.0.0.1:7002
 82 4394:S 31 May 2019 12:09:53.972 * MASTER <-> REPLICA sync started
 83 4394:S 31 May 2019 12:09:53.973 # Error condition on socket for SYNC: Connection refused
 84 4394:S 31 May 2019 12:09:54.987 * Connecting to MASTER 127.0.0.1:7002
 85 4394:S 31 May 2019 12:09:54.988 * MASTER <-> REPLICA sync started
 86 4394:S 31 May 2019 12:09:54.989 # Error condition on socket for SYNC: Connection refused
 87 4394:S 31 May 2019 12:09:56.000 * Connecting to MASTER 127.0.0.1:7002
 88 4394:S 31 May 2019 12:09:56.001 * MASTER <-> REPLICA sync started
 89 4394:S 31 May 2019 12:09:56.002 # Error condition on socket for SYNC: Connection refused
 90 4394:S 31 May 2019 12:09:57.010 * Connecting to MASTER 127.0.0.1:7002
 91 4394:S 31 May 2019 12:09:57.011 * MASTER <-> REPLICA sync started
 92 4394:S 31 May 2019 12:09:57.012 # Error condition on socket for SYNC: Connection refused
 93 4394:S 31 May 2019 12:09:58.025 * Connecting to MASTER 127.0.0.1:7002
 94 4394:S 31 May 2019 12:09:58.026 * MASTER <-> REPLICA sync started
 95 4394:S 31 May 2019 12:09:58.027 # Error condition on socket for SYNC: Connection refused
 96 4394:S 31 May 2019 12:09:59.038 * Connecting to MASTER 127.0.0.1:7002
 97 4394:S 31 May 2019 12:09:59.039 * MASTER <-> REPLICA sync started
 98 4394:S 31 May 2019 12:09:59.040 # Error condition on socket for SYNC: Connection refused
 99 4394:S 31 May 2019 12:10:00.051 * Connecting to MASTER 127.0.0.1:7002
100 4394:S 31 May 2019 12:10:00.051 * MASTER <-> REPLICA sync started
101 4394:S 31 May 2019 12:10:00.053 # Error condition on socket for SYNC: Connection refused
102 4394:S 31 May 2019 12:10:01.063 * Connecting to MASTER 127.0.0.1:7002
103 4394:S 31 May 2019 12:10:01.064 * MASTER <-> REPLICA sync started
104 4394:S 31 May 2019 12:10:01.065 # Error condition on socket for SYNC: Connection refused
105 4394:S 31 May 2019 12:10:02.076 * Connecting to MASTER 127.0.0.1:7002
106 4394:S 31 May 2019 12:10:02.077 * MASTER <-> REPLICA sync started
107 4394:S 31 May 2019 12:10:02.078 # Error condition on socket for SYNC: Connection refused
108 4394:S 31 May 2019 12:10:03.089 * Connecting to MASTER 127.0.0.1:7002
109 4394:S 31 May 2019 12:10:03.090 * MASTER <-> REPLICA sync started
110 4394:S 31 May 2019 12:10:03.091 # Error condition on socket for SYNC: Connection refused
111 4394:S 31 May 2019 12:10:04.099 * Connecting to MASTER 127.0.0.1:7002
112 4394:S 31 May 2019 12:10:04.100 * MASTER <-> REPLICA sync started
113 4394:S 31 May 2019 12:10:04.101 # Error condition on socket for SYNC: Connection refused
114 4394:S 31 May 2019 12:10:05.111 * Connecting to MASTER 127.0.0.1:7002
115 4394:S 31 May 2019 12:10:05.111 * MASTER <-> REPLICA sync started
116 4394:S 31 May 2019 12:10:05.112 # Error condition on socket for SYNC: Connection refused
117 4394:S 31 May 2019 12:10:06.121 * Connecting to MASTER 127.0.0.1:7002
118 4394:S 31 May 2019 12:10:06.121 * MASTER <-> REPLICA sync started
119 4394:S 31 May 2019 12:10:06.122 # Error condition on socket for SYNC: Connection refused
120 4394:S 31 May 2019 12:10:07.135 * Connecting to MASTER 127.0.0.1:7002
121 4394:S 31 May 2019 12:10:07.136 * MASTER <-> REPLICA sync started
122 4394:S 31 May 2019 12:10:07.137 # Error condition on socket for SYNC: Connection refused
123 4394:S 31 May 2019 12:10:08.149 * Connecting to MASTER 127.0.0.1:7002
124 4394:S 31 May 2019 12:10:08.149 * MASTER <-> REPLICA sync started
125 4394:S 31 May 2019 12:10:08.150 # Error condition on socket for SYNC: Connection refused
126 4394:S 31 May 2019 12:10:09.157 * Connecting to MASTER 127.0.0.1:7002
127 4394:S 31 May 2019 12:10:09.158 * MASTER <-> REPLICA sync started
128 4394:S 31 May 2019 12:10:09.159 # Error condition on socket for SYNC: Connection refused
#从7001获取信息失败,主观失败的消息
129 4394:S 31 May 2019 12:10:09.532 * FAIL message received from a3c0d3b42da023dc402faf439d4f93a1cb44d402 about a89a427b5fe8b2b0ef07ac8c6252d    c3c8efa1f77
130 4394:S 31 May 2019 12:10:09.565 # Start of election delayed for 925 milliseconds (rank #0, offset 249926).
131 4394:S 31 May 2019 12:10:10.173 * Connecting to MASTER 127.0.0.1:7002
132 4394:S 31 May 2019 12:10:10.173 * MASTER <-> REPLICA sync started
133 4394:S 31 May 2019 12:10:10.174 # Error condition on socket for SYNC: Connection refused
 # 开始新的选举
134 4394:S 31 May 2019 12:10:10.578 # Starting a failover election for epoch 13.
 # 选举胜出，我是新的master
135 4394:S 31 May 2019 12:10:10.591 # Failover election won: I'm the new master.  
136 4394:S 31 May 2019 12:10:10.591 # configEpoch set to 13 after successful failover
137 4394:M 31 May 2019 12:10:10.592 # Setting secondary replication ID to 27803313625ab7581c806b2a8343d1aff567354b, valid up to offset: 24992    7. New replication ID is 7083e19600c686aece101102f81bede77a55e6dc
138 4394:M 31 May 2019 12:10:10.593 * Discarding previously cached master state.

故障恢复时间 = 主观下线时间 + 客观下线时间 + 选举时间

大概不到20秒。如果你无法容忍这个时间，那么可以把sendTimeout调小。但是这个参数会影响到带宽的传播速率、消息在节点中传播的频率，可能会加重带宽。所以这个参数的设置是一般是根据实际情况综合考量而得出的结果。

3）重启被kill的主节点

$ redis-server ../etc/cluster/redis-7002.conf 

#kill掉的7002变成了7005的从
$ redis-cli -p 7000 cluster slots
1) 1) (integer) 10923
   2) (integer) 16383
   3) 1) "127.0.0.1"
      2) (integer) 7005
      3) "09792d31e728ad714a5a90bc7639f277d817fb4e"
   4) 1) "127.0.0.1"
      2) (integer) 7002
      3) "a89a427b5fe8b2b0ef07ac8c6252dc3c8efa1f77"
2) 1) (integer) 5461
   2) (integer) 10922
   3) 1) "127.0.0.1"
      2) (integer) 7001
      3) "a3c0d3b42da023dc402faf439d4f93a1cb44d402"
   4) 1) "127.0.0.1"
      2) (integer) 7004
      3) "5a4f085dee8400093f45ce2cfa42cbd206167f73"
3) 1) (integer) 0
   2) (integer) 5460
   3) 1) "127.0.0.1"
      2) (integer) 7000
      3) "1ac9fbbfe11362e151204132e3d110b18139a1d9"
   4) 1) "127.0.0.1"
      2) (integer) 7003
      3) "2d19dda2a8a790d5636a664fe3ed54aa3dd7677c"

redis-cluster-7002.log

$ tail -30 redis-cluster-7002.log
28746:C 31 May 2019 20:53:03.405 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
28746:C 31 May 2019 20:53:03.407 # Redis version=5.0.4, bits=64, commit=00000000, modified=0, pid=28746, just started
28746:C 31 May 2019 20:53:03.407 # Configuration loaded
28747:M 31 May 2019 20:53:03.410 * Increased maximum number of open files to 10032 (it was originally set to 256).
28747:M 31 May 2019 20:53:03.412 * Node configuration loaded, I'm a89a427b5fe8b2b0ef07ac8c6252dc3c8efa1f77
28747:M 31 May 2019 20:53:03.413 * Running mode=cluster, port=7002.
28747:M 31 May 2019 20:53:03.414 # Server initialized
28747:M 31 May 2019 20:53:03.415 * DB loaded from disk: 0.001 seconds
28747:M 31 May 2019 20:53:03.416 * Ready to accept connections

# 重新配置自己为xxxId节点的从节点
28747:M 31 May 2019 20:53:03.419 # Configuration change detected. Reconfiguring myself as a replica of 09792d31e728ad714a5a90bc7639f277d817fb4e
28747:S 31 May 2019 20:53:03.419 * Before turning into a replica, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
28747:S 31 May 2019 20:53:03.420 # Cluster state changed: ok
#连接到主节点7005
28747:S 31 May 2019 20:53:04.430 * Connecting to MASTER 127.0.0.1:7005
#开始主从数据同步
28747:S 31 May 2019 20:53:04.431 * MASTER <-> REPLICA sync started
28747:S 31 May 2019 20:53:04.431 * Non blocking connect for SYNC fired the event.
28747:S 31 May 2019 20:53:04.432 * Master replied to PING, replication can continue...
28747:S 31 May 2019 20:53:04.433 * Trying a partial resynchronization (request 8931dcb4de60e18b8f9835b25f828cebf564c1cf:1).
28747:S 31 May 2019 20:53:04.441 * Full resync from master: 7318c71d3e107b0896c561f9f1c5294d43619178:249926
28747:S 31 May 2019 20:53:04.441 * Discarding previously cached master state.
28747:S 31 May 2019 20:53:04.513 * MASTER <-> REPLICA sync: receiving 192 bytes from master
28747:S 31 May 2019 20:53:04.515 * MASTER <-> REPLICA sync: Flushing old data
28747:S 31 May 2019 20:53:04.516 * MASTER <-> REPLICA sync: Loading DB in memory
28747:S 31 May 2019 20:53:04.516 * MASTER <-> REPLICA sync: Finished with success

redis-cluster-7005.log

4394:M 31 May 2019 13:10:11.573 * Replication backlog freed after 3600 seconds without connected replicas.
4394:M 31 May 2019 20:53:03.500 * Clear FAIL state for node a89a427b5fe8b2b0ef07ac8c6252dc3c8efa1f77: master without slots is reachable again.
4394:M 31 May 2019 20:53:04.434 * Replica 127.0.0.1:7002 asks for synchronization
4394:M 31 May 2019 20:53:04.434 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '8931dcb4de60e18b8f9835b25f828cebf564c1cf', my replication IDs are 'd1547b3a6d4eb61969a5cd19f55f907e2f18b10c' and '0000000000000000000000000000000000000000')
4394:M 31 May 2019 20:53:04.436 * Starting BGSAVE for SYNC with target: disk
4394:M 31 May 2019 20:53:04.440 * Background saving started by pid 28748
28748:C 31 May 2019 20:53:04.448 * DB saved on disk
4394:M 31 May 2019 20:53:04.511 * Background saving terminated with success
4394:M 31 May 2019 20:53:04.513 * Synchronization with replica 127.0.0.1:7002 succeeded

转载于:https://www.jianshu.com/p/8e4bdeb4ee83