Redis单点时,当一台机器挂机了,redis的服务完全停止,这时就会影响其他服务的正常运行。下面利用redis sentinel做一个主从切换的集群管理。
下面两段官方的说辞:
Redis Sentinel provides high availability for Redis. In practical terms this means that using Sentinel you can create a Redis deployment that resists without human intervention to certain kind of failures.
Redis Sentinel also provides other collateral tasks such as monitoring, notifications and acts as a configuration provider for clients.
环境配置:
由于我这次配置没有太多的机器,参考前面的主从搭建,测试环境就两台Linux机器。
集群配置最少需要三台机器,那么我就两台Linux机器。
IP分别:
192.168.1.230 (redis sentinel 集群监控)
192.168.1.228 (redis 主)
192.168.1.229 (redis 从)
启动主和从,然后在主查看Replication信息
cd /usr/local/redis/bin
[root@master bin]# ./redis-cli -h 192.168.1.228 info Replication
# Replication
role:master
connected_slaves:1
slave0:ip=192.168.1.229,port=6380,state=online,offset=2003,lag=0
master_repl_offset:2003
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:2002
相看从机器的Replication信息
cd /usr/local/redis/bin
[root@slave1 bin]# ./redis-cli -h 192.168.1.229 -p 6380 info Replication
# Replication
role:slave
master_host:192.168.1.228
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:2171
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
配置redis sentinel集群监控服务
1.添加一份redis sentinel 配置文件,在192.168.0.148上,配置此文件。
[root@slave2 bin]# cp /usr/local/redis/redis-3.0.7/sentinel.conf /usr/local/redis/bin/
[root@slave2 bin]# cd /usr/local/redis/bin
[root@slave2 bin]# vi sentinel.conf
The Redis source distribution contains a file called sentinel.conf that is a self-documented example configuration file you can use to configure Sentinel, however a typical minimal configuration file looks like the following:
sentinel monitor mymaster 192.168.1.228 6379 1
sentinel down-after-milliseconds mymaster 60000
sentinel failover-timeout mymaster 180000
2,启动redis sentinel
有配置文件了,那么启动redis sentinel做redis集群监听。
把redis sentinel 集群监听启动,观察redis sentinel 日志信息
[root@slave2 bin]# ./redis-sentinel sentinel.conf --sentinel
8816:X 17 Jan 09:49:42.620 * Increased maximum number of open files to 10032 (it was originally set to 1024).
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 3.0.7 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in sentinel mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 26379
| `-._ `._ / _.-' | PID: 8816
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
8816:X 17 Jan 09:49:42.621 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
8816:X 17 Jan 09:49:42.621 # Sentinel runid is d4ae05dd664e50f2347e1919265b0e5f753f1a14
8816:X 17 Jan 09:49:42.621 # +monitor master mymaster 192.168.1.228 6379 quorum 1
8816:X 17 Jan 09:49:42.626 * +slave slave 192.168.1.229:6380 192.168.1.229 6380 @ mymaster 192.168.1.228 6379
这里很清楚地看到,从的redis加入了集群,上面加了红框的那行。
执行以下命令,查看redis主从信息。
[root@slave2 bin]# ./redis-cli -p 26379 info Sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
master0:name=mymaster,status=ok,address=192.168.1.228:6379,slaves=1,sentinels=1
那么表示一切都正常了。你的redis sentinel集群已经配置成功!
3.故障演示
执行以下命令使用主的redis(228)服务停止 stopRedis。查看sentinel机器的日志,发现master发生了转移。
8816:X 17 Jan 09:54:32.136 # +sdown master mymaster 192.168.1.228 6379
8816:X 17 Jan 09:54:32.136 # +odown master mymaster 192.168.1.228 6379 #quorum 1/1
8816:X 17 Jan 09:54:32.136 # +new-epoch 1
8816:X 17 Jan 09:54:32.136 # +try-failover master mymaster 192.168.1.228 6379
8816:X 17 Jan 09:54:32.147 # +vote-for-leader d4ae05dd664e50f2347e1919265b0e5f753f1a14 1
8816:X 17 Jan 09:54:32.147 # +elected-leader master mymaster 192.168.1.228 6379
8816:X 17 Jan 09:54:32.147 # +failover-state-select-slave master mymaster 192.168.1.228 6379
8816:X 17 Jan 09:54:32.225 # +selected-slave slave 192.168.1.229:6380 192.168.1.229 6380 @ mymaster 192.168.1.228 6379
8816:X 17 Jan 09:54:32.225 * +failover-state-send-slaveof-noone slave 192.168.1.229:6380 192.168.1.229 6380 @ mymaster 192.168.1.228 6379
8816:X 17 Jan 09:54:32.318 * +failover-state-wait-promotion slave 192.168.1.229:6380 192.168.1.229 6380 @ mymaster 192.168.1.228 6379
8816:X 17 Jan 09:54:33.196 # +promoted-slave slave 192.168.1.229:6380 192.168.1.229 6380 @ mymaster 192.168.1.228 6379
8816:X 17 Jan 09:54:33.196 # +failover-state-reconf-slaves master mymaster 192.168.1.228 6379
8816:X 17 Jan 09:54:33.271 # +failover-end master mymaster 192.168.1.228 6379
8816:X 17 Jan 09:54:33.271 # +switch-master mymaster 192.168.1.228 6379 192.168.1.229 6380
8816:X 17 Jan 09:54:33.272 * +slave slave 192.168.1.228:6379 192.168.1.228 6379 @ mymaster 192.168.1.229 6380
这张图片很清晰地反应到,redis sentinel 监控到主的redis(149)服务停止,然后自动把从的redis(68)切换到主。
再执行以下命令,查看redis主从信息。
[root@slave2 redis]# cd /usr/local/redis/bin
[root@slave2 bin]# ./redis-cli -p 26379 info Sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
master0:name=mymaster,status=ok,address=192.168.1.229:6380,slaves=1,sentinels=1
4,恢复启动原主Redis
当我们已经发现,一台redis发生故障了,可能会收到一些故障信息,那么再把服务已关闭的redis恢复服务状态,会发生怎么样的情况呢?
8816:X 17 Jan 09:57:47.442 * +convert-to-slave slave 192.168.1.228:6379 192.168.1.228 6379 @ mymaster 192.168.1.229 6380
redis sentinel 集群服务,会把上次主redis重新加入服务中,但是他再以不是主的redis了,变成从的reids。
5,恢复原主,这是手动操作,现实中的自动切换不用操作这步,在现在的主Redis 68上的的执行下面的命令,将149变成主。
cd /usr/local/redis/bin
./redis-cli -p 6380 slaveof 192.168.1.228 6379
一段时间后:
[root@slave2 bin]# ./redis-cli -p 26379 info Sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
master0:name=mymaster,status=ok,address=192.168.1.228:6379,slaves=1,sentinels=1