[3] HBase Flush
每个Hstore都有一个MemStore。
每一次Put/Delete请求都写入到Memstore中。当MemStore满后,会flush成行的Hfile。Memsotre的最小Flush单位是HRegion而不是单个Memstore
@1 => Memstore级别
当Region中任意一个MemStore的大小达到上限(hbase.hregion.memstore.flush.size,默认128MB)会出发Memstore刷新
@2 => Region级别
当Region中所有的Memstore的大小总和达到上限(hbase.hregion.memstore.block.multiplier * hbase.hregion.memstore.flush.size) = 2*128 = 256M
@3 => RegionServer级别
当一个RegionServer的所有Memstore的大小总和达到上限(hbase.regionserver.global.memstore.upperLimit * hbase_heapsize) 40%
Flush顺序是按照Memstore由大到小,直到 hbase.regionserver.global.memstore.lowerLimit * hbase_heapsize 38%
@4 => HLog数量上限
hbase.regionserver.max.logs
@5 => Hbase定期刷新Memstore,周期为1hours
Flush Steps:
#1 prepare阶段
遍历当前region中所有的Memstore,将Memstore中当前数据集kvset做一个snapshot,然后再新建一个kvset,后续的所有写操作会写入新的kvset中。
在Flush阶段,读操作,会分别遍历kvset、snapshot,如果查不到则去找hfile。
#2 flush阶段
遍历所有的Memstore,将prepare阶段的snapshot时就会为临时文件,临时文件会统一放在.tmp目录下
#3 commit阶段
遍历所有的memstore,将flush阶段生成的临时文件移动到知道的CF目录下,针对hfile生成storefile和reader,将storefile提交到Hstore的storefile列表,最后再清空prepare阶段的snapshot
Impact:
maxHeap = 71
hbase.regionserver.global.memstore.upperLimit = 0.35
hbase.regionserver.global.memstore.lowerLimit = 0.30
基于上述配置,可以得到RegisonServer级别的总MemStore内存为24.9G = 71 * 0.25
假设每个Memstore的大小为128M,则上述配置下,如果每个Region有两个Memstore,整个RegionServer运行100个Region
128 * 100 * 2 = 25.6 > 24.9
==================================================================>>>>>>
#examples:
#1 create pre-split table
hbase> create 'flush_test','info',{SPLITS=>['e','l','o','u']}
#2 check hfile
[root@rphf1hsn001 ~]# hadoop fs -ls /hbase/data/default/flush_test/*/info
#3 put some data.let it in
hbase> put 'flush_test','a','info:x','a'
hbase> put 'flush_test','b','info:x','b'
hbase> put 'flush_test','c','info:x','c'
hbase> put 'flush_test','l','info:x','l'
hbase> put 'flush_test','m','info:x','m'
hbase> put 'flush_test','n','info:x','n'
#4 check hfile again
[root@rphf1hsn001 ~]# hadoop fs -ls /hbase/data/default/flush_test/*/info
#5 flush region
#
# hbase> flush 'TABLENAME'
# hbase> flush 'REGIONNAME'
# hbase> flush 'ENCODED_REGIONNAME'
#
#5.1 => find regionname/encoded_regionname
hbase> scan 'hbase:meta',{STARTROW=>'flush_test',COLUMNS=>['info:regioninfo']}
flush_test,,1503906853247.ad661b6ee63cf119fcc4070bf column=info:regioninfo, timestamp=1503906858620, value={ENCODED => ad661b6ee63cf119fcc4070bf3123543, NAME => 'flush_test,,1503906853247.ad661b6ee63cf113123543.9fcc4070bf3123543.', STARTKEY => '', ENDKEY => 'e'}
flush_test,e,1503906853247.a23543cf1a44fb0bdd56e238 column=info:regioninfo, timestamp=1503906858619, value={ENCODED => a23543cf1a44fb0bdd56e238fb58c91a, NAME => 'flush_test,e,1503906853247.a23543cf1a44fbfb58c91a.0bdd56e238fb58c91a.', STARTKEY => 'e', ENDKEY => 'l'}
flush_test,l,1503906853247.71e7694a44576144422519ae column=info:regioninfo, timestamp=1503906858620, value={ENCODED => 71e7694a44576144422519aead2cad5d, NAME => 'flush_test,l,1503906853247.71e7694a445761ad2cad5d.44422519aead2cad5d.', STARTKEY => 'l', ENDKEY => 'o'}
flush_test,o,1503906853247.65ca864805bb64c2a20fd833 column=info:regioninfo, timestamp=1503906858494, value={ENCODED => 65ca864805bb64c2a20fd833759b3ff4, NAME => 'flush_test,o,1503906853247.65ca864805bb64759b3ff4.c2a20fd833759b3ff4.', STARTKEY => 'o', ENDKEY => 'u'}
flush_test,u,1503906853247.d33efbefa0e7d2dd8663c755 column=info:regioninfo, timestamp=1503906858621, value={ENCODED => d33efbefa0e7d2dd8663c755a32db6fd, NAME => 'flush_test,u,1503906853247.d33efbefa0e7d2a32db6fd.
#5.2 => flush [a~e]
hbase> flush 'ad661b6ee63cf119fcc4070bf3123543'
#5.3 =>
[root@rphf1hsn001 ~]# hadoop fs -ls /hbase/data/default/flush_test/*/info
/hbase/data/default/flush_test/ad661b6ee63cf119fcc4070bf3123543/info/f0b9af581abf42a3b7d8bcb36196d0f1
#5.4 => flush [l~o]
hbase> flush '71e7694a44576144422519aead2cad5d'
[root@rphf1hsn001 ~]# hadoop fs -ls /hbase/data/default/flush_test/*/info
/hbase/data/default/flush_test/71e7694a44576144422519aead2cad5d/info/32981c82f88f40b693c959e58bdd368b
/hbase/data/default/flush_test/ad661b6ee63cf119fcc4070bf3123543/info/f0b9af581abf42a3b7d8bcb36196d0f1