hbase-backup master

as some distribute systems ,a spare/second master is always supplied to guarantee high availablity.

like hbase,u can run some backup masters against with master,when the real master is failure,then the backup ones will compete to manage to znode set in zookeeper,and all others will stay in backup state sparely.

flow chart

Add backup master(s)

1.add some hosts to backup-masters under dir conf(but this hosts should NOT run master simutaneously)

also,if u want to run certain backups on the same node with master,u can do this:

local-master-backup.sh start offset1 offset2 ...

the offsets are used to specify the increasement ports to be added to base line:60000 for master port,60010 for master.info.port

2.run 'start-hbase.sh' to start up backups.then the backups will only stay in standby mode to loop and check whether the current master have been failure.

/**
   * Try becoming active master.
   * @param startupStatus 
   * @return True if we could successfully become the active master.
   * @throws InterruptedException
   */
  private boolean becomeActiveMaster(MonitoredTask startupStatus)
  throws InterruptedException {
    // TODO: This is wrong!!!! Should have new servername if we restart ourselves,
    // if we come back to life.
    this.activeMasterManager = new ActiveMasterManager(zooKeeper, this.serverName,
        this);
    this.zooKeeper.registerListener(activeMasterManager);
    stallIfBackupMaster(this.conf, this.activeMasterManager);	//for backup master only

    // The ClusterStatusTracker is setup before the other
    // ZKBasedSystemTrackers because it's needed by the activeMasterManager
    // to check if the cluster should be shutdown.
    this.clusterStatusTracker = new ClusterStatusTracker(getZooKeeper(), this);
    this.clusterStatusTracker.start();
    return this.activeMasterManager.blockUntilBecomingActiveMaster(startupStatus,
        this.clusterStatusTracker);
  }

/**
   * Block until becoming the active master.
   *
   * Method blocks until there is not another active master and our attempt
   * to become the new active master is successful.
   *
   * This also makes sure that we are watching the master znode so will be
   * notified if another master dies.
   * @param startupStatus
   * @return True if no issue becoming active master else false if another
   * master was running or if some other problem (zookeeper, stop flag has been
   * set on this Master)
   */
  boolean blockUntilBecomingActiveMaster(MonitoredTask startupStatus,
    ClusterStatusTracker clusterStatusTracker) {
    while (true) {
      startupStatus.setStatus("Trying to register in ZK as active master");
      // Try to become the active master, watch if there is another master.
      // Write out our ServerName as versioned bytes.
      try {
        String backupZNode = ZKUtil.joinZNode(
          this.watcher.backupMasterAddressesZNode, this.sn.toString());
        if (ZKUtil.createEphemeralNodeAndWatch(this.watcher,
          this.watcher.masterAddressZNode, this.sn.getVersionedBytes())) { //要在-本次-调用中直接建立master node成功，成为active
          // If we were a backup master before, delete our ZNode from the backup
          // master directory since we are the active now
          LOG.info("Deleting ZNode for " + backupZNode +
            " from backup master directory");
          ZKUtil.deleteNodeFailSilent(this.watcher, backupZNode);	//不一定存在backup node

          // We are the master, return
          startupStatus.setStatus("Successfully registered as active master.");
          this.clusterHasActiveMaster.set(true);
          LOG.info("Master=" + this.sn);
          return true;
        }

        // There is another active master running elsewhere or this is a restart
        // and the master ephemeral node has not expired yet.
        this.clusterHasActiveMaster.set(true);

        /*
         * Add a ZNode for ourselves in the backup master directory since we are
         * not the active master.
         *
         * If we become the active master later, ActiveMasterManager will delete
         * this node explicitly.  If we crash before then, ZooKeeper will delete
         * this node for us since it is ephemeral.
         */
        LOG.info("Adding ZNode for " + backupZNode +
          " in backup master directory");
        ZKUtil.createEphemeralNodeAndWatch(this.watcher, backupZNode,
          this.sn.getVersionedBytes());

        String msg;
        byte [] bytes =
          ZKUtil.getDataAndWatch(this.watcher, this.watcher.masterAddressZNode);
        if (bytes == null) {
          msg = ("A master was detected, but went down before its address " +
            "could be read.  Attempting to become the next active master");
        } else {
          ServerName currentMaster = ServerName.parseVersionedServerName(bytes);
          if (ServerName.isSameHostnameAndPort(currentMaster, this.sn)) {
            msg = ("Current master has this master's address, " +
              currentMaster + "; master was restarted? Deleting node.");
            // Hurry along the expiration of the znode.
            ZKUtil.deleteNode(this.watcher, this.watcher.masterAddressZNode);
          } else {
            msg = "Another master is the active master, " + currentMaster +
              "; waiting to become the next active master";
          }
        }
        LOG.info(msg);
        startupStatus.setStatus(msg);
      } catch (KeeperException ke) {
        master.abort("Received an unexpected KeeperException, aborting", ke);
        return false;
      }
      synchronized (this.clusterHasActiveMaster) {	//interact with zk events in this class
        while (this.clusterHasActiveMaster.get() && !this.master.isStopped()) {
          try {	//已经有另一atcive　master,wait to become new active one NOTE
            this.clusterHasActiveMaster.wait();	//notified by this.handleMasterNodeChange()
          } catch (InterruptedException e) {
            // We expect to be interrupted when a master dies, will fall out if so
            LOG.debug("Interrupted waiting for master to die", e);
          }
        }
        if (!clusterStatusTracker.isClusterUp()) {
          this.master.stop("Cluster went down before this master became active");
        }
        if (this.master.isStopped()) {
          return false;
        }
        // Try to become active master again now that there is no active master
      }
    }
  }

simulate the failure of master

1.kill the master

2.after a while,one the backup masters will takes over the failure master,and the others will keep in their original state.

one backup enter the new master state

// We are either the active master or we were asked to shutdown
      if (!this.stopped) {
        finishInitialization(startupStatus, false);
        loop();	//stick this thread like daemon,so this process is in background
      }

猜你喜欢