Zookeeper is how to solve the split brain?

What is the split brain

Brain split (split-brain) is the "split brain", which is originally a "brain" is split two or more "brains", we all know that if a person has more brains, and independent of each other, then, it causes the body to "dancing", "lose control."

Split brain typically occurs in a clustered environment, such as ElasticSearch, Zookeeper clusters, and these clusters have a unified environment feature is that they have a brain, such as the Master node in the cluster is ElasticSearch, Zookeeper cluster nodes have Leader.

This article focuses on to tell you about the split brain problem Zookeeper, as well as if the split brain solve the problem.

Zookeeper cluster split brain scenarios

For a cluster, we want to improve the availability of the cluster, usually multi-room deployment, such as now there is a cluster of six zkServer composed, deployed in two rooms:
Here Insert Picture Description
Under normal circumstances, this will be only one cluster Leader, then later if the network between the room broken, zkServer in the two rooms can still communicate with each other, if more than half do not consider the mechanism, it will appear inside each room will elect a Leader.
Here Insert Picture Description
This is equivalent to the original one cluster, is divided into two clusters, there were two "brain", which is split brain.

In this case, we can also see, was supposed to be a cluster of unified external service provider, now turned into two clusters at the same time provide services, and if after a while, suddenly broken Unicom network, then At this point there will be a problem, both just two clusters provide services, and how to merge data, and so how to solve the problem of data collision.

Just in the description of split-brain scenario, there is not considered a prerequisite for more than half of the mechanism, so in fact Zookeeper is a cluster split brain problem does not occur, and the reason will not be just more than half of the mechanisms.

More than half Mechanism

In the process of the election of the leader. If an zkServer won more than half of the votes, this zkServer can become a Leader.

Source mechanisms to achieve more than half is actually very simple:

public class QuorumMaj implements QuorumVerifier {
    private static final Logger LOG = LoggerFactory.getLogger(QuorumMaj.class);
    
    int half;
    
    // n表示集群中zkServer的个数(准确的说是参与者的个数,参与者不包括观察者节点)
    public QuorumMaj(int n){
        this.half = n/2;
    }
 
    // 验证是否符合过半机制
    public boolean containsQuorum(Set<Long> set){
        // half是在构造方法里赋值的
        // set.size()表示某台zkServer获得的票数
        return (set.size() > half);
    }
    
}

We carefully look at the comments in the above method, the core code is the following two lines:

this.half = n/2;
return (set.size() > half);

举个简单的例子:如果现在集群中有5台zkServer,那么half=5/2=2,那么也就是说,领导者选举的过程中至少要有三台zkServer投了同一个zkServer,才会符合过半机制,才能选出来一个Leader。

那么有一个问题我们想一下,选举的过程中为什么一定要有一个过半机制验证?因为这样不需要等待所有zkServer都投了同一个zkServer就可以选举出来一个Leader了,这样比较快,所以叫快速领导者选举算法呗。

那么再来想一个问题,过半机制中为什么是大于,而不是大于等于呢?

这就是更脑裂问题有关系了,比如回到上文出现脑裂问题的场景:
Here Insert Picture Description
当机房中间的网络断掉之后,机房1内的三台服务器会进行领导者选举,但是此时过半机制的条件是set.size() > 3,也就是说至少要4台zkServer才能选出来一个Leader,所以对于机房1来说它不能选出一个Leader,同样机房2也不能选出一个Leader,这种情况下整个集群当机房间的网络断掉后,整个集群将没有Leader。

而如果过半机制的条件是set.size() >= 3,那么机房1和机房2都会选出一个Leader,这样就出现了脑裂。所以我们就知道了,为什么过半机制中是大于,而不是大于等于。就是为了防止脑裂。

如果假设我们现在只有5台机器,也部署在两个机房:
Here Insert Picture Description
此时过半机制的条件是set.size() > 2,也就是至少要3台服务器才能选出一个Leader,此时机房件的网络断开了,对于机房1来说是没有影响的,Leader依然还是Leader,对于机房2来说是选不出来Leader的,此时整个集群中只有一个Leader。

所以,我们可以总结得出,有了过半机制,对于一个Zookeeper集群,要么没有Leader,要没只有1个Leader,这样就避免了脑裂问题。

Published 46 original articles · won praise 27 · views 160 000 +

Guess you like

Origin blog.csdn.net/shichen2010/article/details/104550025