Zab协议(5)-选举算法选举阶段源码解析(4)

2021SC@SDUSC

回顾

上一环节讲了一下选举过程中最核心代码的LOOKING状态,这篇文章分析一下选举过程中的网络通信。

源码分析

                case OBSERVING:
                    try {
    
    
                        LOG.info("OBSERVING");
                        setObserver(makeObserver(logFactory));
                        observer.observeLeader();
                    } catch (Exception e) {
    
    
                        LOG.warn("Unexpected exception",e );                        
                    } finally {
    
    
                        observer.shutdown();
                        setObserver(null);
                        setPeerState(ServerState.LOOKING);
                    }
                    break;
                case FOLLOWING:
                    try {
    
    
                        LOG.info("FOLLOWING");
                        setFollower(makeFollower(logFactory));
                        follower.followLeader();
                    } catch (Exception e) {
    
    
                        LOG.warn("Unexpected exception",e);
                    } finally {
    
    
                        follower.shutdown();
                        setFollower(null);
                        setPeerState(ServerState.LOOKING);
                    }
                    break;
                case LEADING:
                    LOG.info("LEADING");
                    try {
    
    
                        setLeader(makeLeader(logFactory));
                        leader.lead();
                        setLeader(null);
                    } catch (Exception e) {
    
    
                        LOG.warn("Unexpected exception",e);
                    } finally {
    
    
                        if (leader != null) {
    
    
                            leader.shutdown("Forcing shutdown");
                            setLeader(null);
                        }
                        setPeerState(ServerState.LOOKING);
                    }
                    break;
                }
            }
        } finally {
    
    
            LOG.warn("QuorumPeer main thread exited");
            try {
    
    
                MBeanRegistry.getInstance().unregisterAll();
            } catch (Exception e) {
    
    
                LOG.warn("Failed to unregister with JMX", e);
            }
            jmxQuorumBean = null;
            jmxLocalPeerBean = null;
        }
    }

这一部分是在LOOING后,完成leader选举后,各服务器会根据自己角色创建相应的服务器实例,并开始进入各自角色的主流程。
每台服务器启动的时候,都会启动一个QuorumCnxManager负责网络通信。

    /*
     * Mapping from Peer to Thread number
     */
    final ConcurrentHashMap<Long, SendWorker> senderWorkerMap;
    final ConcurrentHashMap<Long, ArrayBlockingQueue<ByteBuffer>> queueSendMap;
    final ConcurrentHashMap<Long, ByteBuffer> lastMessageSent;

    /*
     * Reception queue
     */
    public final ArrayBlockingQueue<Message> recvQueue;

一部分是消息队列,用于保存接收到的,待发送的消息,以及消息的发送器。除了接收队列外,这里提到的所有队列都按SID分组形成队列集合。

为了可以相互投票,同一集群的所有服务器需要建立起网络连接。QuorumCnxManager启动时,会创建一个ServerSocket来监听Leader选举的通信端口(默认端口是3888)。开启端口监听后,就能接收其他服务器的“创建连接”请求。

 public class Listener extends ZooKeeperThread {
    
    

        volatile ServerSocket ss = null;
            public void run() {
    
    
            int numRetries = 0;
            InetSocketAddress addr;
            while((!shutdown) && (numRetries < 3)){
    
    
                try {
    
    
                    ss = new ServerSocket();
                    ss.setReuseAddress(true);
                    ...
                    ss.bind(addr);
                    while (!shutdown) {
    
    
                        Socket client = ss.accept();
                        setSockOpts(client);
                        ...
                        /*接收到其他服务器TCP连接请求时交由receiveConnection处理*/
                        if (quorumSaslAuthEnabled) {
    
    
                            receiveConnectionAsync(client);
                        } else {
    
    
                            receiveConnection(client);
                        }

                        numRetries = 0;
                    }
                } 

为了防止两台服务器重复连接,zookeeper定义规则:只能sid大的去连接sid小的。如果sid小的连接了sid大的,会中断这个连接。
相关代码如下:

    private void handleConnection(Socket sock, DataInputStream din)
            throws IOException {
    
    
        Long sid = null;
        try {
    
    
            // Read server id
            sid = din.readLong();
            if (sid < 0) {
    
     // this is not a server id but a protocol version (see ZOOKEEPER-1633)
                sid = din.readLong();

                // next comes the #bytes in the remainder of the message
                // note that 0 bytes is fine (old servers)
                int num_remaining_bytes = din.readInt();
                if (num_remaining_bytes < 0 || num_remaining_bytes > maxBuffer) {
    
    
                    LOG.error("Unreasonable buffer length: {}", num_remaining_bytes);
                    closeSocket(sock);
                    return;
                }
                byte[] b = new byte[num_remaining_bytes];

                // remove the remainder of the message from din
                int num_read = din.read(b);
                if (num_read != num_remaining_bytes) {
    
    
                    LOG.error("Read only " + num_read + " bytes out of " + num_remaining_bytes + " sent by server " + sid);
                }
            }
            if (sid == QuorumPeer.OBSERVER_ID) {
    
    
                /*
                 * Choose identifier at random. We need a value to identify
                 * the connection.
                 */
                sid = observerCounter.getAndDecrement();
                LOG.info("Setting arbitrary identifier to observer: " + sid);
            }
        } catch (IOException e) {
    
    
            closeSocket(sock);
            LOG.warn("Exception reading or writing challenge: " + e.toString());
            return;
        }

        // do authenticating learner
        LOG.debug("Authenticating learner server.id: {}", sid);
        authServer.authenticate(sock, din);

        //If wins the challenge, then close the new connection.
        if (sid < this.mySid) {
    
    
            /*
             * This replica might still believe that the connection to sid is
             * up, so we have to shut down the workers before trying to open a
             * new connection.
             */
            SendWorker sw = senderWorkerMap.get(sid);
            if (sw != null) {
    
    
                sw.finish();
            }

            /*
             * Now we start a new connection
             */
            LOG.debug("Create new connection to server: " + sid);
            closeSocket(sock);
            connectOne(sid);

            // Otherwise start worker threads to receive data.
        } else {
    
    
            SendWorker sw = new SendWorker(sock, sid);
            RecvWorker rw = new RecvWorker(sock, din, sid, sw);
            sw.setRecv(rw);

            SendWorker vsw = senderWorkerMap.get(sid);
            
            if(vsw != null)
                vsw.finish();
            
            senderWorkerMap.put(sid, sw);
            queueSendMap.putIfAbsent(sid, new ArrayBlockingQueue<ByteBuffer>(SEND_CAPACITY));
            
            sw.start();
            rw.start();
            
            return;
        }
    }

猜你喜欢

转载自blog.csdn.net/azzin/article/details/121197975