1. 租约线程初始化:
HRegionServer的run方法会调用一次preRegistrationInitialization方法,再调用initializeThreads时,会new lease
this.leases = new Leases((int) conf.getLong( HConstants.HBASE_REGIONSERVER_LEASE_PERIOD_KEY, HConstants.DEFAULT_HBASE_REGIONSERVER_LEASE_PERIOD), this.threadWakeFrequency);
这里默认的过期时间是60s:
public static String HBASE_REGIONSERVER_LEASE_PERIOD_KEY = "hbase.regionserver.lease.period"; public static long DEFAULT_HBASE_REGIONSERVER_LEASE_PERIOD = 60000;
默认的lease线程周期性检查时间是10s
/** Parameter name for how often threads should wake up */ public static final String THREAD_WAKE_FREQUENCY = "hbase.server.thread.wakefrequency"; /** Default value for thread wake frequency */ public static final int DEFAULT_THREAD_WAKE_FREQUENCY = 10 * 1000;
最终在HRegionServer的startServiceThreads启动lease线程。
this.leases.setName(n + ".leaseChecker"); this.leases.start();
2. 租约的创建
在openScanner和addRowLock时会创建租约
openScanner时,对于一个新的scanner会creatLease.
protected long addScanner(RegionScanner s) throws LeaseStillHeldException { long scannerId = -1L; while (true) { scannerId = rand.nextLong(); if (scannerId == -1) continue; String scannerName = String.valueOf(scannerId); RegionScanner existing = scanners.putIfAbsent(scannerName, s); if (existing == null) { this.leases.createLease(scannerName, new ScannerListener(scannerName)); break; } } return scannerId; }
最终将lease以scannerId加入DelayQueue中,
public void addLease(final Lease lease) throws LeaseStillHeldException { if (this.stopRequested) { return; } lease.setExpirationTime(System.currentTimeMillis() + this.leasePeriod); synchronized (leaseQueue) { if (leases.containsKey(lease.getLeaseName())) { throw new LeaseStillHeldException(lease.getLeaseName()); } leases.put(lease.getLeaseName(), lease); leaseQueue.add(lease); } }
3. 租约的失效
租约线程每10s会检查一次leaseQueue,leaseQueue是一个java.util.concurrent.DelayQueue, 是一个使用优先队列(PriorityQueue)实现的BlockingQueue,优先队列的以指定的时间做为比较的基准值。
public void run() { while (!stopRequested || (stopRequested && leaseQueue.size() > 0) ) { Lease lease = null; try { lease = leaseQueue.poll(leaseCheckFrequency, TimeUnit.MILLISECONDS); } catch (InterruptedException e) { continue; } catch (ConcurrentModificationException e) { continue; } catch (Throwable e) { LOG.fatal("Unexpected exception killed leases thread", e); break; } if (lease == null) { continue; } // A lease expired. Run the expired code before removing from queue // since its presence in queue is used to see if lease exists still. if (lease.getListener() == null) { LOG.error("lease listener is null for lease " + lease.getLeaseName()); } else { lease.getListener().leaseExpired(); } synchronized (leaseQueue) { leases.remove(lease.getLeaseName()); } } close(); }
poll方法会取出到期的lease并执行其Listener的过期方法。
public void leaseExpired() { RegionScanner s = scanners.remove(this.scannerName); if (s != null) { LOG.info("Scanner " + this.scannerName + " lease expired on region " + s.getRegionInfo().getRegionNameAsString()); try { HRegion region = getRegion(s.getRegionInfo().getRegionName()); if (region != null && region.getCoprocessorHost() != null) { region.getCoprocessorHost().preScannerClose(s); } s.close(); if (region != null && region.getCoprocessorHost() != null) { region.getCoprocessorHost().postScannerClose(s); } } catch (IOException e) { LOG.error("Closing scanner for " + s.getRegionInfo().getRegionNameAsString(), e); } } else { LOG.info("Scanner " + this.scannerName + " lease expired"); } }
过期方法中会将此scanner从内存中删除并将scanner关闭。
4. 常见错误
2013-11-06 16:16:38,684 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: org.apache.hadoop.hbase.regionserver.LeaseException: lease '-2408052186420749395' does not exist at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2783) at sun.reflect.GeneratedMethodAccessor55.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597)
以上常见的错误是因为leaser失效,而client可能没有关闭scanner,使用老的scannerid过来next时,会有一个重新生成lease的过程,过程如下:
1.
lease = this.leases.removeLease(scannerName);
在next方法中,先执行一次删除lease的操作,看看lease能不能正常删除
Lease removeLease(final String leaseName) throws LeaseException { Lease lease = null; synchronized (leaseQueue) { lease = leases.remove(leaseName); if (lease == null) { throw new LeaseException("lease '" + leaseName + "' does not exist"); } leaseQueue.remove(lease); } return lease; }
如果这个lease是存在的,自然可以正常删除,一量lease已经失效,则会抛LeaseException,
正常情况下,lease被remove之后,为了一个正常的next能继续运行下去,那么在最后会再增加一个lease,leasename还是原来的scannerid
if (this.scanners.containsKey(scannerName)) { if (lease != null) this.leases.addLease(lease); }
针对以上错误
1.检查hbase.rpc.timeout(默认60000ms) 是否大于等于hbase.regionserver.lease.period(默认为60000ms), 大于等于才是对的。
2. 检查是否有scanner没有关闭。