1. QoS

HBase的请求都有一个请求级别，即优先级(priorityLevel)。在RPC那一层也有它们相应级别的线程池，根据请求的优先级放到相应的线程池中。这两个线程池的线程数量分别由参数hbase.regionserver.handler.count 和hbase.regionserver.metahandler.count配置。

在regionserver中，优先级＜＝10的被认为是一个普通请求，它会分配到IPC Server handler 队列中去；优先级>10的请求是被认为是优先处理请求，它会被分配到PRI IPC Server handler中去。能够放入优先请求队列的请求有如下两个特征：

该请求和调用方法被注解@QosPriority了，并且该注解的priority值大于10。例如在HRegionServer里有这些函数是具备较高优先级别的：openRegion，closeRegion，flushRegion，splitRegion，compactRegion，getProtocolSignature，getRegionInfo，unlockRow等
该请求是操作元数据region：即操作的是.META.或者-ROOT-表

它们的值是通过org.apache.hadoop.hbase.regionserver.HRegionServer.QosFunction计算出来的。

2. ZooKeeperWatcher

ZooKeeperWatcher是HBase实现ZooKeeper Watcher的惟一实现。通过它控制着zookeeper里面所有的节点状态：创建，删除，更新，事件回调等等。在HMaster, HRegionServer和Client都只有一个的实例去连接ZooKeeper集群。

3. XxxTracker

在HBase里面有很多的Tracker类，他们分别承担着不同的作用。

ClusterStatusTracker 对应/hbase/shutdown，在hmaster中用来记录集群状态信息，例如，集群的上线时间。
DrainingServerTracker 对应/hbase/draining，在hmaster中记录这些regionserver列表不能够再分配新的region。
MetaNodeTracker 被CatalogTracker调用
CatalogTracker 监控对.META.表和-ROOT-表的可用性，管理着RootRegionTracker和MetaNodeTracker。CatalogTracker记录着.meta./-root-表所在region的状态。事实上，zookeeper/hbase/root-region-server记录着-root-表的位置，.meta.的信息记录在-root-表里，而那些用户表的信息都放在.meta.表里。
RegionServerTracker 对应/hbase/rs，维护着活着的regionserver列表信息，
RootRegionTracker 对应/hbase/root-region-server，
ZooKeeperNodeTracker 是一个对应ZooKeeper节点的Tracker，是一个抽象类。

还有一个/hbase/unassigned下面的region还是处于待分配状态。

4. hbase表状态不一致

hbase表状态不一致是通常指hbase .meta.表中的元数据信息与存取在hdfs上的数据信息不一致。造成hbase表状态不一致的原因有很多种。大多数情况下是在region split时出现hbase regionserver突然挂掉，操作失败导致hbase回滚等等原因引发的不一致。可以通过命令hbase hbck查看hbase集群状态是否是完整的，查看哪些数据是不一致的。同时可以通过hbase hbck -repair修复不一致的数据。 5. .META.表不能被split

    public byte[] checkSplit() {
    // Can't split META
    if (getRegionInfo().isMetaRegion()) {
      if (shouldForceSplit()) {
        LOG.warn("Cannot split meta regions in HBase 0.20 and above");
      }
      return null;
    }

    if (!splitPolicy.shouldSplit()) {
      return null;
    }

    byte[] ret = splitPolicy.getSplitPoint();

    if (ret != null) {
      try {
        checkRow(ret, "calculated split");
      } catch (IOException e) {
        LOG.error("Ignoring invalid split", e);
        return null;
      }
    }
    return ret;
  }

6. DrainingServer

drainingServer里的regionserver不再分配新region，你即使把某个region move到该节点上，也会自动随机分配到其它的节点中去。详情可以参考这个JIRA： Support to drain RS nodes through ZK

/**
   * @param state
   * @param serverToExclude Server to exclude (we know its bad). Pass null if
   * all servers are thought to be assignable.
   * @param forceNewPlan If true, then if an existing plan exists, a new plan
   * will be generated.
   * @return Plan for passed <code>state</code> (If none currently, it creates one or
   * if no servers to assign, it returns null).
   */
  RegionPlan getRegionPlan(final RegionState state,
      final ServerName serverToExclude, final boolean forceNewPlan) {
    // Pickup existing plan or make a new one
    final String encodedName = state.getRegion().getEncodedName();
    final List<ServerName> servers = this.serverManager.getOnlineServersList();
    final List<ServerName> drainingServers = this.serverManager.getDrainingServersList();  //draining server 列表


    if (serverToExclude != null) servers.remove(serverToExclude);

    // Loop through the draining server list and remove them from the server
    // list.
    if (!drainingServers.isEmpty()) {
      for (final ServerName server: drainingServers) {  // 从onlineserver列表里面去掉draining server
        LOG.debug("Removing draining server: " + server +
            " from eligible server pool.");
        servers.remove(server);
      }
    }

    // Remove the deadNotExpired servers from the server list.
    removeDeadNotExpiredServers(servers);



    if (servers.isEmpty()) return null;

    RegionPlan randomPlan = null;
    boolean newPlan = false;
    RegionPlan existingPlan = null;

    synchronized (this.regionPlans) {
      existingPlan = this.regionPlans.get(encodedName);

      if (existingPlan != null && existingPlan.getDestination() != null) {
        LOG.debug("Found an existing plan for " +
            state.getRegion().getRegionNameAsString() +
       " destination server is " + existingPlan.getDestination().toString());
      }

      if (forceNewPlan
          || existingPlan == null
          || existingPlan.getDestination() == null
          || drainingServers.contains(existingPlan.getDestination())) {  //如果计划move 到draining server里面，那么就随机分配一个destination server
        newPlan = true;
        randomPlan = new RegionPlan(state.getRegion(), null, balancer
            .randomAssignment(servers));
        this.regionPlans.put(encodedName, randomPlan);
      }
    }

    if (newPlan) {
      LOG.debug("No previous transition plan was found (or we are ignoring " +
        "an existing plan) for " + state.getRegion().getRegionNameAsString() +
        " so generated a random one; " + randomPlan + "; " +
        serverManager.countOfRegionServers() +
               " (online=" + serverManager.getOnlineServers().size() +
               ", available=" + servers.size() + ") available servers");
        return randomPlan;
      }
    LOG.debug("Using pre-existing plan for region " +
               state.getRegion().getRegionNameAsString() + "; plan=" + existingPlan);
      return existingPlan;
  }

7. openRegion原理

openRegion就是对HRegion进行初始化工作。下面是真正进行初始化region的代码。

private long initializeRegionInternals(final CancelableProgressable reporter,
      MonitoredTask status) throws IOException, UnsupportedEncodingException {
    if (coprocessorHost != null) {
      status.setStatus("Running coprocessor pre-open hook");
      coprocessorHost.preOpen();
    }

    // Write HRI to a file in case we need to recover .META.
    status.setStatus("Writing region info on filesystem");
    checkRegioninfoOnFilesystem();

    // Remove temporary data left over from old regions
    status.setStatus("Cleaning up temporary data from old regions");
    cleanupTmpDir();

    // Load in all the HStores.
    // Get minimum of the maxSeqId across all the store.
    //
    // Context: During replay we want to ensure that we do not lose any data. So, we
    // have to be conservative in how we replay logs. For each store, we calculate
    // the maxSeqId up to which the store was flushed. But, since different stores
    // could have a different maxSeqId, we choose the
    // minimum across all the stores.
    // This could potentially result in duplication of data for stores that are ahead
    // of others. ColumnTrackers in the ScanQueryMatchers do the de-duplication, so we
    // do not have to worry.
    // TODO: If there is a store that was never flushed in a long time, we could replay
    // a lot of data. Currently, this is not a problem because we flush all the stores at
    // the same time. If we move to per-cf flushing, we might want to revisit this and send
    // in a vector of maxSeqIds instead of sending in a single number, which has to be the
    // min across all the max.
    long minSeqId = -1;
    long maxSeqId = -1;
    // initialized to -1 so that we pick up MemstoreTS from column families
    long maxMemstoreTS = -1;

    if (this.htableDescriptor != null &&
        !htableDescriptor.getFamilies().isEmpty()) {
      // initialize the thread pool for opening stores in parallel.
      ThreadPoolExecutor storeOpenerThreadPool =
        getStoreOpenAndCloseThreadPool(
          "StoreOpenerThread-" + this.regionInfo.getRegionNameAsString());
      CompletionService<Store> completionService =
        new ExecutorCompletionService<Store>(storeOpenerThreadPool);

      // initialize each store in parallel
      for (final HColumnDescriptor family : htableDescriptor.getFamilies()) {
        status.setStatus("Instantiating store for column family " + family);
        completionService.submit(new Callable<Store>() {
          public Store call() throws IOException {
            return instantiateHStore(tableDir, family);
          }
        });
      }
      try {
        for (int i = 0; i < htableDescriptor.getFamilies().size(); i++) {
          Future<Store> future = completionService.take();
          Store store = future.get();

          this.stores.put(store.getColumnFamilyName().getBytes(), store);
          long storeSeqId = store.getMaxSequenceId();
          if (minSeqId == -1 || storeSeqId < minSeqId) {
            minSeqId = storeSeqId;
          }
          if (maxSeqId == -1 || storeSeqId > maxSeqId) {
            maxSeqId = storeSeqId;
          }
          long maxStoreMemstoreTS = store.getMaxMemstoreTS();
          if (maxStoreMemstoreTS > maxMemstoreTS) {
            maxMemstoreTS = maxStoreMemstoreTS;
          }
        }
      } catch (InterruptedException e) {
        throw new IOException(e);
      } catch (ExecutionException e) {
        throw new IOException(e.getCause());
      } finally {
        storeOpenerThreadPool.shutdownNow();
      }
    }
    mvcc.initialize(maxMemstoreTS + 1);
    // Recover any edits if available.
    maxSeqId = Math.max(maxSeqId, replayRecoveredEditsIfAny(
        this.regiondir, minSeqId, reporter, status));

    status.setStatus("Cleaning up detritus from prior splits");
    // Get rid of any splits or merges that were lost in-progress.  Clean out
    // these directories here on open.  We may be opening a region that was
    // being split but we crashed in the middle of it all.
    SplitTransaction.cleanupAnySplitDetritus(this);
    FSUtils.deleteDirectory(this.fs, new Path(regiondir, MERGEDIR));

    this.writestate.setReadOnly(this.htableDescriptor.isReadOnly());

    this.writestate.flushRequested = false;
    this.writestate.compacting = 0;

    // Initialize split policy
    this.splitPolicy = RegionSplitPolicy.create(this, conf);

    this.lastFlushTime = EnvironmentEdgeManager.currentTimeMillis();
    // Use maximum of log sequenceid or that which was found in stores
    // (particularly if no recovered edits, seqid will be -1).
    long nextSeqid = maxSeqId + 1;
    LOG.info("Onlined " + this.toString() + "; next sequenceid=" + nextSeqid);

    // A region can be reopened if failed a split; reset flags
    this.closing.set(false);
    this.closed.set(false);

    if (coprocessorHost != null) {
      status.setStatus("Running coprocessor post-open hooks");
      coprocessorHost.postOpen();
    }

    status.markComplete("Region opened successfully");
    return nextSeqid;
  }

HBase读书笔记2