[HBase Flush]

[3] HBase Flush

    每个Hstore都有一个MemStore。
    每一次Put/Delete请求都写入到Memstore中。当MemStore满后,会flush成行的Hfile。Memsotre的最小Flush单位是HRegion而不是单个Memstore


    @1    =>    Memstore级别
        当Region中任意一个MemStore的大小达到上限(hbase.hregion.memstore.flush.size,默认128MB)会出发Memstore刷新

    @2    =>    Region级别
        当Region中所有的Memstore的大小总和达到上限(hbase.hregion.memstore.block.multiplier * hbase.hregion.memstore.flush.size) = 2*128 = 256M

    @3    =>    RegionServer级别
        当一个RegionServer的所有Memstore的大小总和达到上限(hbase.regionserver.global.memstore.upperLimit * hbase_heapsize) 40%
        Flush顺序是按照Memstore由大到小,直到 hbase.regionserver.global.memstore.lowerLimit * hbase_heapsize 38%

    @4     => HLog数量上限
        hbase.regionserver.max.logs

    @5    =>    Hbase定期刷新Memstore,周期为1hours


    Flush Steps:
        #1 prepare阶段
            遍历当前region中所有的Memstore,将Memstore中当前数据集kvset做一个snapshot,然后再新建一个kvset,后续的所有写操作会写入新的kvset中。
            在Flush阶段,读操作,会分别遍历kvset、snapshot,如果查不到则去找hfile。
        
        #2 flush阶段
            遍历所有的Memstore,将prepare阶段的snapshot时就会为临时文件,临时文件会统一放在.tmp目录下

        #3 commit阶段
            遍历所有的memstore,将flush阶段生成的临时文件移动到知道的CF目录下,针对hfile生成storefile和reader,将storefile提交到Hstore的storefile列表,最后再清空prepare阶段的snapshot

    Impact:
        maxHeap = 71
        hbase.regionserver.global.memstore.upperLimit = 0.35
        hbase.regionserver.global.memstore.lowerLimit = 0.30

        基于上述配置,可以得到RegisonServer级别的总MemStore内存为24.9G = 71 * 0.25

        假设每个Memstore的大小为128M,则上述配置下,如果每个Region有两个Memstore,整个RegionServer运行100个Region
            128 * 100 * 2 = 25.6 > 24.9

==================================================================>>>>>>
    #examples:
        #1 create pre-split table
        hbase> create 'flush_test','info',{SPLITS=>['e','l','o','u']}

        #2 check hfile
        [root@rphf1hsn001 ~]# hadoop fs -ls /hbase/data/default/flush_test/*/info

        #3 put some data.let it in 

        hbase> put 'flush_test','a','info:x','a'
        hbase> put 'flush_test','b','info:x','b'
        hbase> put 'flush_test','c','info:x','c'

        hbase> put 'flush_test','l','info:x','l'
        hbase> put 'flush_test','m','info:x','m'
        hbase> put 'flush_test','n','info:x','n'


        #4 check hfile again
        [root@rphf1hsn001 ~]# hadoop fs -ls /hbase/data/default/flush_test/*/info

        #5 flush region
        #
        #      hbase> flush 'TABLENAME'
        #      hbase> flush 'REGIONNAME'
        #      hbase> flush 'ENCODED_REGIONNAME'
        #


        #5.1 => find regionname/encoded_regionname
        hbase> scan 'hbase:meta',{STARTROW=>'flush_test',COLUMNS=>['info:regioninfo']}

            flush_test,,1503906853247.ad661b6ee63cf119fcc4070bf column=info:regioninfo, timestamp=1503906858620, value={ENCODED => ad661b6ee63cf119fcc4070bf3123543, NAME => 'flush_test,,1503906853247.ad661b6ee63cf113123543.9fcc4070bf3123543.', STARTKEY => '', ENDKEY => 'e'}
             flush_test,e,1503906853247.a23543cf1a44fb0bdd56e238 column=info:regioninfo, timestamp=1503906858619, value={ENCODED => a23543cf1a44fb0bdd56e238fb58c91a, NAME => 'flush_test,e,1503906853247.a23543cf1a44fbfb58c91a.0bdd56e238fb58c91a.', STARTKEY => 'e', ENDKEY => 'l'}
             flush_test,l,1503906853247.71e7694a44576144422519ae column=info:regioninfo, timestamp=1503906858620, value={ENCODED => 71e7694a44576144422519aead2cad5d, NAME => 'flush_test,l,1503906853247.71e7694a445761ad2cad5d.44422519aead2cad5d.', STARTKEY => 'l', ENDKEY => 'o'}
             flush_test,o,1503906853247.65ca864805bb64c2a20fd833 column=info:regioninfo, timestamp=1503906858494, value={ENCODED => 65ca864805bb64c2a20fd833759b3ff4, NAME => 'flush_test,o,1503906853247.65ca864805bb64759b3ff4.c2a20fd833759b3ff4.', STARTKEY => 'o', ENDKEY => 'u'}
             flush_test,u,1503906853247.d33efbefa0e7d2dd8663c755 column=info:regioninfo, timestamp=1503906858621, value={ENCODED => d33efbefa0e7d2dd8663c755a32db6fd, NAME => 'flush_test,u,1503906853247.d33efbefa0e7d2a32db6fd.


         #5.2 =>    flush [a~e]
         hbase> flush 'ad661b6ee63cf119fcc4070bf3123543'

         #5.3 =>
         [root@rphf1hsn001 ~]# hadoop fs -ls /hbase/data/default/flush_test/*/info
        /hbase/data/default/flush_test/ad661b6ee63cf119fcc4070bf3123543/info/f0b9af581abf42a3b7d8bcb36196d0f1
        
        #5.4 => flush [l~o]
        hbase> flush '71e7694a44576144422519aead2cad5d'

        [root@rphf1hsn001 ~]# hadoop fs -ls /hbase/data/default/flush_test/*/info
        /hbase/data/default/flush_test/71e7694a44576144422519aead2cad5d/info/32981c82f88f40b693c959e58bdd368b
        /hbase/data/default/flush_test/ad661b6ee63cf119fcc4070bf3123543/info/f0b9af581abf42a3b7d8bcb36196d0f1
        

猜你喜欢

转载自my.oschina.net/u/204498/blog/1526249