Redis Sentinel mechanism and usage (1)

from:https://my.oschina.net/dyyweb/blog/513680
Abstract: Redis Sentinel mechanism and usage (1)

Sentinel spring cluster configuration:

See message code below.

Overview

Redis-Sentinel is a high-availability (HA) solution officially recommended by Redis. When using Redis as a high-availability solution for Master-slave, if the master goes down, Redis itself (including many of its clients) does not realize automatic processing. Master-slave switchover, and Redis-sentinel itself is an independent process that can monitor multiple master-slave clusters, and can perform self-understanding switchover when the master is down.

Its main functions are as follows

  • Monitor from time to time that redis is performing well as expected;
  • If a redis node is found to be running, it can notify another process (such as its client);
  • Automatic switching is possible. When a master node is unavailable, one of the master's multiple slaves (if there is more than one slave) can be elected as the new master, and other slave nodes will change the address of the master it follows to be promoted to The new address of the master slave.

Sentinel supports clusters

Obviously, it is unreliable to use only a single sentinel process to monitor the redis cluster. When the sentinel process goes down (sentinel itself also has a single point of problems, single-point-of-failure) the entire cluster system will not work as expected. So it is necessary to cluster the sentinel, which has several advantages:

  • Even if some sentinel processes are down, the master-slave switch of the redis cluster can still be performed;
  • If there is only one sentinel process, if the process runs incorrectly or the network is blocked, the master-slave switchover of the redis cluster will not be possible (single point problem);
  • If there are multiple sentinels, the redis client can connect to any sentinel at will to obtain information about the redis cluster.

Sentinel version

The current latest stable version of Sentinel is called Sentinel 2 (to distinguish it from the previous Sentinel 1 ). Released with the installation package of redis2.8. After installing Redis2.8, you can find the startup program of Redis-sentinel in redis2.8/src/ .

Strong recommendation :
If you are using redis2.6 (sentinel version is sentinel 1 ), you should use the redis2.8 version of sentinel 2 , because sentinel 1 has many bugs and has been officially deprecated, so it is strongly recommended to use redis2.8 and sentinel 2.

Run Sentinel

There are two ways to run sentinel:

  • The first

    redis-sentinel /path/to/sentinel.conf
  • the second

    redis-server /path/to/sentinel.conf --sentinel

For the above two methods, a sentinel configuration file sentinel.conf must be specified. If it is not specified, sentinel cannot be started. Sentinel listens on port 26379 by default, so you must make sure that the port is not occupied by other processes before running.

Configuration of Sentinel

The Redis source package contains a sentinel.conf file as the sentinel configuration file. The configuration file comes with explanations about each configuration item. Typical configuration items are as follows:

sentinel monitor mymaster 127.0.0.1 6379 2 sentinel down-after-milliseconds mymaster 60000 sentinel failover-timeout mymaster 180000 sentinel parallel-syncs mymaster 1 sentinel monitor resque 192.168.1.3 6380 4 sentinel down-after-milliseconds resque 10000 sentinel failover-timeout resque 180000 sentinel parallel-syncs resque 5

The above configuration item configures two masters named mymaster and resque. The configuration file only needs to configure the information of the master, not the information of the slave, because the slave can be automatically detected (the master node will have information about the slave. information). It should be noted that the configuration file will be dynamically modified during the Sentinel operation. For example, when the master-standby switchover occurs, the master in the configuration file will be modified to another slave. In this way, if sentinel is restarted later, it can restore the state of the redis cluster it previously monitored according to this configuration.

Next we will explain the above configuration items line by line :

sentinel monitor mymaster 127.0.0.1 6379 2

This line represents that the name of the master monitored by sentinel is mymaster , and the address is 127.0.0.1:6379 . What does the last 2 at the end of the line mean? We know that the network is unreliable. Sometimes a sentinel will mistakenly think that a master redis has died due to network congestion. When the sentinel is clustered, the solution to this problem becomes very simple, just need multiple sentinels to communicate with each other. Communication to confirm whether a master is really dead. This 2 means that when there are two sentinels in the cluster that the master is dead, the master can truly be considered unavailable. (The sentinels in the sentinel cluster also communicate with each other through the gossip protocol).

Except for the first line of configuration, we found that the rest of the configuration has a uniform format:

sentinel <option_name> <master_name> <option_value>

Next, we explain these configuration items one by one according to the option_name in the above format:

  • The down-after-milliseconds
    sentinel will send a heartbeat PING to the master to confirm whether the master is alive. If the master does not respond to PONG within a "certain time frame"  or responds with an error message, then the sentinel will subjectively (unilaterally) think that The master is no longer available (subjectively down, also referred to as SDOWN). And this down-after-milliseconds is used to specify this "certain time range" in milliseconds.

    However, it should be noted that at this time, Sentinel will not immediately perform failover master-standby switching. This sentinel also needs to refer to the opinions of other sentinels in the sentinel cluster. If more than a certain number of sentinels also subjectively think that the master is dead, then this The master will be objectively (note that this time, it is not subjective, it is objective, as opposed to the subjective down just now, this time it is objectively down, abbreviated as ODOWN) that it is dead. The number of sentinels that need to be decided together is configured in the previous configuration.

  • When parallel-syncs
    failover master-slave switchover occurs, this option specifies how many slaves can synchronize the new master at the same time. The smaller the number, the longer it takes to complete the failover, but if the number is larger , which means that more slaves are unavailable due to replication. You can set this value to 1 to ensure that only one slave is in a state that cannot process command requests at a time.

Other configuration items are explained in detail in sentinel.conf.
All configurations can be dynamically modified at runtime with the command SENTINEL SET command.

Sentinel's "Arbitration Council"

As we mentioned earlier, when a master is monitored by a sentinel cluster, a parameter needs to be specified for it. This parameter specifies the number of sentinels required when the master needs to be judged to be unavailable and failover is performed. In this article, we temporarily call this The parameter is the number of votes

However, when the failover active/standby switch is actually triggered, the failover will not be performed immediately, and the failover can only be performed after the authorization of most of the sentinels in the sentinel.
When ODOWN, failover is triggered. Once the failover is triggered, the sentinel that tries to failover will get the authorization of the "majority" of the sentinels (if the votes are larger than the majority, then ask for more sentinels)
This difference seems subtle, but easy understand and use. For example, there are 5 sentinels in the cluster, and the number of votes is set to 2. When 2 sentinels think that a master is unavailable, a failover will be triggered. However, the sentinel that performs failover must first obtain authorization from at least 3 sentinels. Failover can be implemented.
If the number of votes is set to 5, to reach the ODOWN state, all 5 sentinels must subjectively think that the master is unavailable, and to perform failover, all 5 sentinels must be authorized.

Configuration version number

Why do you need to get the approval of most sentinels before you can actually execute failover?

When a sentinel is authorized, it will get a copy of the latest configuration version number of the down master, and when the failover execution ends, this version number will be used for the latest configuration. Because most sentinels already know that the version number has been taken by the sentinel that is going to execute the failover, other sentinels can no longer use this version number. This means that each failover will be accompanied by a unique version number. We will see the importance of doing so.

Moreover, sentinel clusters follow a rule: if sentinel A recommends sentinel B to execute failover, B will wait for a period of time and then execute failover on the same master again by itself. This waiting time is configured through the failover-timeout configuration item. . It can be seen from this rule that the sentinel in the sentinel cluster will not failover the same master concurrently at the same time. If the first sentinel to failover fails, the other one will failover again within a certain period of time. analogy.

Redis sentinel guarantees liveness: if most sentinels can communicate with each other, eventually one will be authorized to failover.
Redis sentinel also guarantees security: every sentinel that tries to failover the same master will get a unique version No.

Configure propagation

Once a sentinel successfully failsover a master, it will broadcast the latest configuration of the master to other sentinels, and other sentinels will update the configuration of the corresponding master.

For a faiover to be successfully implemented, the sentinel must be able to send the SLAVE OF NO ONE command to the slave selected as the master, and then be able to see the configuration information of the new master through the INFO command.

When a slave is elected as the master and ``SLAVE OF NO ONE``` is sent, the failover is considered successful even if the other slaves have not reconfigured themselves for the new master, and all sentinels will issue new configuration information.

The way new configurations are propagated to each other in the cluster is why we need a version number that must be authorized when a sentinel failsover.

Each sentinel uses the ##pub/sub## method to continuously propagate the master's configuration version information. The ##pub/sub## pipeline for configuration propagation is: __sentinel__:hello.

Because each configuration has a version number, the one with the highest version number is used as the standard.

For example: Suppose there is an address named mymaster at 192.168.1.50:6379. At the beginning, all sentinels in the cluster know this address, so the configuration of mymaster is marked with version number 1. After some time mymaster died and there was a sentinel authorized to failover it with version 2. If the failover is successful, suppose the address is changed to 192.168.1.50:9000, and the version number configured at this time is 2. Sentinels performing failover will broadcast the new configuration to other sentinels. When the version number of the configuration is 2, the version number becomes larger, indicating that the configuration is updated, so the configuration with the latest version number 2 will be used.

This means that sentinel clusters guarantee a second kind of liveness: a sentinel cluster that can communicate with each other will end up with the highest version and the same configuration.

More details on SDOWN and ODOWN

Sentinel has two different views on unavailability, one is called subjective unavailability (SDOWN), and the other is called objective unavailability (ODOWN). SDOWN is the status of the master detected subjectively by sentinel itself. ODOWN needs a certain number of sentinels to reach an agreement to consider that a master has been down objectively. The command SENTINEL is_master_down_by_addr is used between each sentinel to obtain the detection results of other sentinels on the master. .

From the sentinel's point of view, if no valid reply is received within a certain period of time after the PING heartbeat is sent, the SDOWN condition is reached. This time is configured through the is-master-down-after-milliseconds parameter in the configuration.

When sentinel sends a PING , one of the following replies is considered valid:

PING replied with +PONG.
PING replied with -LOADING error.
PING replied with -MASTERDOWN error.

Any other reply (or no reply at all) is illegal.

Switching from SDOWN to ODOWN does not require any consensus algorithm, just a gossip protocol: if a sentinel receives enough messages from sentinels telling it that a master has been down, the SDOWN state will become the ODOWN state. If the master becomes available later, this state will be cleaned up accordingly.

As explained before, a true failover requires an authorization process, but all failovers start in an ODOWN state.

The ODOWN state only applies to the master. No negotiation is required between the sentinels of the redis nodes that are not the master, and the slaves and sentinels will not have the ODOWN state.

Automatic discovery mechanism between Sentinels and Slaves

Although each sentinel in the sentinel cluster is connected to each other to check the availability of each other and send messages to each other. But you don't need to configure any other sentinel nodes in any one sentinel. Because sentinel uses the master's publish/subscribe mechanism to automatically discover other sentinel nodes that also monitor the unified master.

It does this by sending a message to a pipe named __sentinel__:hello.

Similarly, you do not need to configure the addresses of all slaves of a master in sentinel. Sentinel will obtain the addresses of these slaves by asking the master.

Each sentinel announces its existence by sending a message every second to each master and slave's pub/ sub channel __sentinel__:hello.
Each sentinel also subscribes to the content of the channel __sentinel__:hello of each master and slave to discover unknown sentinels. When a new sentinel is detected, it is added to the master monitoring list maintained by itself.
The messages sent by each sentinel also contain the latest master configuration that it currently maintains. If a sentinel finds that
its configuration version is lower than the received configuration version, it will update its master configuration with the new configuration.

Before adding a new sentinel to a master, the sentinel always checks to see if there is already a sentinel with the same process ID or address as the new sentinel. If so, the sentinel will be removed and the new sentinel added.

Consistency in network isolation

The configuration consistency model of the redis sentinel cluster is eventual consistency, and each sentinel in the cluster will eventually adopt the highest version of the configuration. However, in the actual application environment, there are three different roles that will deal with sentinel:

  • Redis instance.
  • Sentinel instance.
  • client.

To examine the behavior of the entire system we must consider all three roles.

Here's a simple example with three hosts, each running a redis and a sentinel:

+-------------+
             | Sentinel 1 | <--- Client A | Redis 1 (M) | +-------------+ | | +-------------+ | +------------+ | Sentinel 2 |-----+-- / partition / ----| Sentinel 3 | <--- Client B | Redis 2 (S) | | Redis 3 (M)| +-------------+ +------------+

In this system, redis3 is the master in the initial state, and redis1 and redis2 are slaves. After that, the host network where redis3 is located is unavailable, and sentinel1 and sentinel2 start failover and elect redis1 as the master.

Sentinel cluster features ensure that sentinel1 and sentinel2 get the latest configuration on the master. But sentinel3 still holds the old configuration because it is isolated from the outside world.

When the network is restored, we know that sentinel3 will update its configuration. But what happens if the master the client connects to is isolated from the network?

The client will still be able to write data to redis3, but when the network is restored, redis3 will become a slave of redis, so during network isolation, the data written by the client to redis3 will be lost.

Maybe you don't want this scenario to happen:

  • If you use redis as a cache, then you may be able to tolerate the loss of this part of the data.
  • But if you use redis as a storage system, you may not be able to tolerate the loss of this part of the data.

Because redis uses asynchronous replication, in such a scenario, there is no way to avoid data loss. However, you can configure redis3 and redis1 with the following configuration so that data will not be lost.

min-slaves-to-write 1 min-slaves-max-lag 10

With the above configuration, when a redis is the master, if it cannot write data to at least one slave ( min-slaves-to-write above specifies the number of slaves), it will refuse to accept the client's write request. Since replication is asynchronous, the master's inability to write data to the slave means that the slave is either disconnected or not sending a request for synchronous data to the master within the specified time ( min-slaves-max-lag above specifies this time).

Sentinel state persistence

The state of the snetinel is persistently written to the sentinel configuration file. Every time a new configuration is received, or a new configuration is created, the configuration is persisted to the hard disk with a version stamp of the configuration. This means that the sentinel process can be safely stopped and restarted.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325910605&siteId=291194637