Redis的集群(伸缩)

Redis集群提供了节点的扩容和收缩方案,在不影响集群对外服务的情况下,可以为集群添加节点进行扩容,也可以下线节点进行缩容。其中的原理可理解为槽和对应的数据在不同节点间移动。



扩容集群


在Redis的集群(搭建)中搭建了6个节点,其中3个主节点分别维护自己负责的槽和数据,为了后续测试,填充若干测试数据。

$ for i in $(seq 1 70000); do redis-cli -p 6879 -c set key:migrate:test:${i} ${i}; done


$ redis-trib.rb info 127.0.0.1:6879

127.0.0.1:6879 (90cb860b...) -> 20304 keys | 5461 slots | 1 slaves.

127.0.0.1:6881 (fa2acce2...) -> 20201 keys | 5461 slots | 1 slaves.

127.0.0.1:6880 (3f121a67...) -> 20174 keys | 5462 slots | 1 slaves.

[OK] 60679 keys in 3 masters.

3.70 keys per slot on average.


若加入1个节点实现集群扩容时,要通过相关命令把一部分槽和数据迁移给新节点,按照3个过程进行。


1. 准备新节点

准备6885和6886两个新端口(节点),运行在集群模式下。


2. 加入集群

127.0.0.1:6879> cluster meet 127.0.0.1 6885

127.0.0.1:6879> cluster meet 127.0.0.1 6886


127.0.0.1:6879> cluster nodes

e090ec47b8e66e415d69e9452c9c8e7deccd3624 127.0.0.1:6886 master - 0 1532846452277 0 connected

79c8cf3c11ede4962a2d690c2a2545b86c2f56ed 127.0.0.1:6885 master - 0 1532846453289 0 connected

...

90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879 myself,master - 0 0 1 connected 0-5460


新节点刚开始都是主节点状态,由于没有负责的槽,不能接受读写操作。对于新节点的后续操作一般有两种选择,一个是为它迁移槽和数据实现扩容,一个是作为其它主节点的从节点负责故障转移。


在正式环境建议使用redis-trib.rb add-node命令加入新节点,该命令会对新节点是否包含数据或已经加入其它集群进行检查。

$ redis-trib.rb add-node 127.0.0.1:6885 127.0.0.1:6879

>>> Adding node 127.0.0.1:6885 to cluster 127.0.0.1:6879

...

[OK] All nodes agree about slots configuration.

>>> Check for open slots...

>>> Check slots coverage...

[OK] All 16384 slots covered.

>>> Send CLUSTER MEET to node 127.0.0.1:6885 to make it join the cluster.

[OK] New node added correctly.


$ redis-trib.rb add-node 127.0.0.1:6886 127.0.0.1:6879

>>> Adding node 127.0.0.1:6886 to cluster 127.0.0.1:6879

...

M: 99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885

   slots: (0 slots) master

   0 additional replica(s)

...

[OK] All nodes agree about slots configuration.

>>> Check for open slots...

>>> Check slots coverage...

[OK] All 16384 slots covered.

>>> Send CLUSTER MEET to node 127.0.0.1:6886 to make it join the cluster.

[OK] New node added correctly.


3. 迁移槽和数据

新节点加入集群后,需为其迁移槽和相关数据,迁移过程是集群扩容最核心的环节,按照3个步骤进行。


(1)槽迁移计划

加入6885节点后,原有节点负责的槽数量从6380变为4096个。


(2)迁移数据

数据迁移过程是逐个槽进行的,每个槽数据迁移的流程如下:

1)对目标节点发送cluster setslot {slot} importing {sourceNodeId}命令,让目标节点准备导入槽的数据。


2)对源节点发送cluster setslot {slot} migrating {targetNodeId}命令,让源节点准备迁出槽的数据。


3)源节点循环执行cluster getkeysinslot {slot} {count}命令,获取count个属于槽{slot}的键。


4)在源节点上执行migrate {targetIp} {targetPort} "" 0 {timeout} keys {keys...}命令,把获取的键通过Pipeline机制批量迁移到目标节点。


5)重复执行步骤3)和4),直到槽下所有的键值数据迁移到目标节点。


6)向集群内所有主节点发送cluster setslot {slot} node {targetNodeId}命令,通知槽分配给目标节点。


根据上面流程,手动使用命令把源节点6879负责的槽4096迁移到目标节点6885中。


1)目标节点准备导入槽4096数据。

127.0.0.1:6885> cluster setslot 4096 importing 90cb860b7f4ff516304c577bc1e514dc95ecd09b


确认槽4096导入状态开启。

127.0.0.1:6885> cluster nodes

99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885 myself,master - 0 0 0 connected [4096-<-90cb860b7f4ff516304c577bc1e514dc95ecd09b]

...

90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879 master - 0 1532849864072 1 connected 0-5460


2)源节点准备导出槽4096数据。

127.0.0.1:6879> cluster setslot 4096 migrating 99ea0df1d9683affb1271a5092fc8b15b378adba


确认槽4096导出状态开启。

127.0.0.1:6879> cluster nodes

99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885 master - 0 1532850180009 0 connected

...

90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879 myself,master - 0 0 1 connected 0-5460 [4096->-99ea0df1d9683affb1271a5092fc8b15b378adba]

127.0.0.1:6879> 


3)批量获取槽4096对应的键,这里获取4个处于该槽的键。

127.0.0.1:6879> cluster getkeysinslot 4096 4

1) "key:migrate:test:13752"

2) "key:migrate:test:16020"

3) "key:migrate:test:20791"

4) "key:migrate:test:5512"


确认这4个键存在于源节点,不在目标节点上。

127.0.0.1:6879> mget key:migrate:test:13752 key:migrate:test:16020 key:migrate:test:20791 key:migrate:test:5512

1) "13752"

2) "16020"

3) "20791"

4) "5512"


$ redis-cli -p 6885 -c

127.0.0.1:6885> mget key:migrate:test:13752 key:migrate:test:16020 key:migrate:test:20791 key:migrate:test:5512

-> Redirected to slot [4096] located at 127.0.0.1:6879

1) "13752"

2) "16020"

3) "20791"

4) "5512"


批量迁移这4个键。

127.0.0.1:6879> migrate 127.0.0.1 6885 "" 0 5000 keys key:migrate:test:13752 key:migrate:test:16020 key:migrate:test:20791 key:migrate:test:5512


再次查看这4个键,已不再源节点。

127.0.0.1:6879> mget key:migrate:test:13752 key:migrate:test:16020 key:migrate:test:20791 key:migrate:test:5512

(error) ASK 4096 127.0.0.1:6885


通知所有主节点槽4096指派给目标节点6885。

127.0.0.1:6879> cluster setslot 4096 node 99ea0df1d9683affb1271a5092fc8b15b378adba

127.0.0.1:6880> cluster setslot 4096 node 99ea0df1d9683affb1271a5092fc8b15b378adba

127.0.0.1:6881> cluster setslot 4096 node 99ea0df1d9683affb1271a5092fc8b15b378adba

127.0.0.1:6885> cluster setslot 4096 node 99ea0df1d9683affb1271a5092fc8b15b378adba


确认源节点6879不再负责槽4096,改为目标节点6885负责。

127.0.0.1:6879> cluster nodes

99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885 master - 0 1532851434584 9 connected 4096

...

90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879 myself,master - 0 0 1 connected 0-4095 4097-5460


实际迁移过程中会涉及大量槽,每个槽会有非常多的键,因此redis-trib.rb reshard提供了槽重分片功能,reshard命令简化了槽迁移的过程,剩下槽迁移使用redis-trib.rb完成。

$ redis-trib.rb reshard 127.0.0.1:6879

...

M: 99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885

   slots:4096 (1 slots) master

   0 additional replica(s)

M: 558b0fb8d44933e694b46c15d05e595ce5ae4fab 127.0.0.1:6886

   slots: (0 slots) master

   0 additional replica(s)

...

[OK] All nodes agree about slots configuration.

>>> Check for open slots...

>>> Check slots coverage...

[OK] All 16384 slots covered.

How many slots do you want to move (from 1 to 16384)? 4096

What is the receiving node ID? 99ea0df1d9683affb1271a5092fc8b15b378adba

Please enter all the source node IDs.

  Type 'all' to use all the nodes as source nodes for the hash slots.

  Type 'done' once you entered all the source nodes IDs.

Source node #1:90cb860b7f4ff516304c577bc1e514dc95ecd09b

Source node #2:3f121a67fab0d74f0d31b69326259e687902e1b3

Source node #3:fa2acce219d088e2b33756dac2e85ca92936a8dd

Source node #4:done


Ready to move 4096 slots.

  Source nodes:

    M: 90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879

   slots:0-4095,4097-5460 (5460 slots) master

   1 additional replica(s)

    M: 3f121a67fab0d74f0d31b69326259e687902e1b3 127.0.0.1:6880

   slots:5461-10922 (5462 slots) master

   1 additional replica(s)

    M: fa2acce219d088e2b33756dac2e85ca92936a8dd 127.0.0.1:6881

   slots:10923-16383 (5461 slots) master

   1 additional replica(s)

  Destination node:

    M: 99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885

   slots:4096 (1 slots) master

   0 additional replica(s)

  Resharding plan:

    Moving slot 5461 from 3f121a67fab0d74f0d31b69326259e687902e1b3

    ...

    Moving slot 11090 from fa2acce219d088e2b33756dac2e85ca92936a8dd

    ...

    Moving slot 1364 from 90cb860b7f4ff516304c577bc1e514dc95ecd09b

Do you want to proceed with the proposed reshard plan (yes/no)? yes     

Moving slot 5461 from 127.0.0.1:6880 to 127.0.0.1:6885: ..

...

Moving slot 12177 from 127.0.0.1:6881 to 127.0.0.1:6885: ....

...

Moving slot 1364 from 127.0.0.1:6879 to 127.0.0.1:6885: .....


查看节点和槽新的映射关系。

127.0.0.1:6879> cluster nodes

99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885 master - 0 1532852546583 9 connected 0-1364 4096 5461-6826 10923-12287

558b0fb8d44933e694b46c15d05e595ce5ae4fab 127.0.0.1:6886 master - 0 1532852550630 8 connected

fa2acce219d088e2b33756dac2e85ca92936a8dd 127.0.0.1:6881 master - 0 1532852548097 3 connected 12288-16383

3f121a67fab0d74f0d31b69326259e687902e1b3 127.0.0.1:6880 master - 0 1532852549619 2 connected 6827-10922

...

90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879 myself,master - 0 0 1 connected 1365-4095 4097-5460


迁移后使用redis-trib.rb rebalance命令检查节点间槽的均衡性。

$ redis-trib.rb rebalance 127.0.0.1:6879

...

[OK] All 16384 slots covered.

*** No rebalancing needed! All nodes are within the 2.0% threshold.


(3)添加从节点

把节点6886作为6885的从节点,保证整个集群的高可用。

127.0.0.1:6886> cluster replicate 99ea0df1d9683affb1271a5092fc8b15b378adba


查看节点6886状态已成为6885的从节点,至此扩容完成。

127.0.0.1:6886> cluster nodes

99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885 master - 0 1532852938340 9 connected 0-1364 4096 5461-6826 10923-12287

558b0fb8d44933e694b46c15d05e595ce5ae4fab 127.0.0.1:6886 myself,slave 99ea0df1d9683affb1271a5092fc8b15b378adba 0 0 8 connected

...



收缩集群

收缩集群意味着从现有集群中安全下线部分节点。首先要确定下线节点是否有负责的槽,若有,需把槽迁移到其它节点,保证节点下线后整个集群槽和节点映射的完整性。当下线节点不再负责槽或本身是从节点时,就可以通知集群内其它节点忘记下线节点,当所有节点忘记该节点后可以正常关闭。下面按照2个过程进行。


(1)下线迁移槽

下线节点要把自己负责的槽迁移到其它节点,原理和节点扩容槽迁移过程一致。如把6881和6884节点下线,6881是主节点,负责槽(12288-16383),6884是它的从节点。下线6881节点之前,要把它负责的槽迁移到6879,6880和6885这3个节点。由于每次执行reshard命令只能有一个目标节点,因此要执行3次reshard命令,分别迁移1365,1365和1366个槽。


$ redis-trib.rb reshard 127.0.0.1:6879

...

How many slots do you want to move (from 1 to 16384)? 1365

What is the receiving node ID? 90cb860b7f4ff516304c577bc1e514dc95ecd09b

Please enter all the source node IDs.

  Type 'all' to use all the nodes as source nodes for the hash slots.

  Type 'done' once you entered all the source nodes IDs.

Source node #1:fa2acce219d088e2b33756dac2e85ca92936a8dd

Source node #2:done


Ready to move 1365 slots.

  Source nodes:

    M: fa2acce219d088e2b33756dac2e85ca92936a8dd 127.0.0.1:6881

   slots:12288-16383 (4096 slots) master

   1 additional replica(s)

  Destination node:

    M: 90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879

   slots:1365-4095,4097-5460 (4095 slots) master

   1 additional replica(s)

  Resharding plan:

    Moving slot 12288 from fa2acce219d088e2b33756dac2e85ca92936a8dd

    Moving slot 12289 from fa2acce219d088e2b33756dac2e85ca92936a8dd

Do you want to proceed with the proposed reshard plan (yes/no)? yes

...

Moving slot 13651 from 127.0.0.1:6881 to 127.0.0.1:6879: ..

Moving slot 13652 from 127.0.0.1:6881 to 127.0.0.1:6879: ...


槽迁移完成后,6879节点接管了6881节点的1365个槽12288-13652。


继续把1365个,和1366个槽迁移到6880节点,和6885节点。

$ redis-trib.rb reshard 127.0.0.1:6879

...

How many slots do you want to move (from 1 to 16384)? 1365

What is the receiving node ID? 3f121a67fab0d74f0d31b69326259e687902e1b3

Please enter all the source node IDs.

  Type 'all' to use all the nodes as source nodes for the hash slots.

  Type 'done' once you entered all the source nodes IDs.

Source node #1:fa2acce219d088e2b33756dac2e85ca92936a8dd

Source node #2:done

...

Do you want to proceed with the proposed reshard plan (yes/no)? yes

...


$ redis-trib.rb reshard 127.0.0.1:6879

...

How many slots do you want to move (from 1 to 16384)? 1366

What is the receiving node ID? 99ea0df1d9683affb1271a5092fc8b15b378adba

Please enter all the source node IDs.

  Type 'all' to use all the nodes as source nodes for the hash slots.

  Type 'done' once you entered all the source nodes IDs.

Source node #1:fa2acce219d088e2b33756dac2e85ca92936a8dd

Source node #2:done

...

Do you want to proceed with the proposed reshard plan (yes/no)? yes

...


到此为止,6881节点所有的槽全部迁出完成,集群状态如下:

127.0.0.1:6885> cluster nodes

99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885 myself,master - 0 0 12 connected 0-1364 4096 5461-6826 10923-12287 15018-16383

...

fa2acce219d088e2b33756dac2e85ca92936a8dd 127.0.0.1:6881 master - 0 1532862908561 3 connected

904db05d81825413702f7eac960cd2f656b217f7 127.0.0.1:6884 slave 99ea0df1d9683affb1271a5092fc8b15b378adba 0 1532862911596 12 connected

90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879 master - 0 1532862913617 10 connected 1365-4095 4097-5460 12288-13652

3f121a67fab0d74f0d31b69326259e687902e1b3 127.0.0.1:6880 master - 0 1532862907549 11 connected 6827-10922 13653-15017


(2)忘记节点

当下线主节点具有从节点时,需要把该从节点指向到其它主节点。对于主从节点都下线的情况,要先下线从节点再下线主节点,防止不必要的切换。对于6881和6884节点下线操作,命令如下:


$ redis-trib.rb del-node 127.0.0.1:6879 904db05d81825413702f7eac960cd2f656b217f7

>>> Removing node 904db05d81825413702f7eac960cd2f656b217f7 from cluster 127.0.0.1:6879

>>> Sending CLUSTER FORGET messages to the cluster...

>>> SHUTDOWN the node.


$ redis-trib.rb del-node 127.0.0.1:6879 fa2acce219d088e2b33756dac2e85ca92936a8dd

...


节点下线后,集群最终的状态。

127.0.0.1:6885> cluster nodes

99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885 myself,master - 0 0 12 connected 0-1364 4096 5461-6826 10923-12287 15018-16383

558b0fb8d44933e694b46c15d05e595ce5ae4fab 127.0.0.1:6886 slave 99ea0df1d9683affb1271a5092fc8b15b378adba 0 1532863551984 12 connected

90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879 master - 0 1532863552997 10 connected 1365-4095 4097-5460 12288-13652

c88a8bbe719e337e9015aa84aab40db06878b728 127.0.0.1:6882 slave 90cb860b7f4ff516304c577bc1e514dc95ecd09b 0 1532863555528 10 connected

3f121a67fab0d74f0d31b69326259e687902e1b3 127.0.0.1:6880 master - 0 1532863556034 11 connected 6827-10922 13653-15017

200da7d61d40c384a3e55b74434bf229333a5fe8 127.0.0.1:6883 slave 3f121a67fab0d74f0d31b69326259e687902e1b3 0 1532863555023 11 connected


若感兴趣可关注订阅号”数据库最佳实践”(DBBestPractice).

qrcode_for_gh_54ffa7e55478_258.jpg

猜你喜欢

转载自blog.51cto.com/coveringindex/2151871
今日推荐