5。hbase高级部分:table design schema

study and summarie below

art 1:Table attributes

attr default usage/principle use case note
 Bloom filter disable cost some mem to impove lookup time TBD do huge range scan table this attr contains 'row','row-col',or none
Column families         a printable string since this will be used as the dir name under region-name
 Maximum file size  10G in 94.2      maxStoreSize in fact;i.e. property "hbase.hregion.max.filesize" set in hbase-site.xml
 Read-only  false    like a firmware to keep safe .i.e. a 'dead' table that never changed  
 Memstore flush size  128m in 94.2  same effect with property in xml 'hbase.hregion.memstore.flush.size'  

 1.this value determine the frequency of generating store file

2.as 1,this effects the replay time of hlog when a rs down.

Deferred log flush false if false,use 'hbase.regionserver.optionallogflushinterval' to check period to sumit edits  

if true may cause data loss as these cached data are in memory before sync to fs

         
         

Part 2:Column Family attributes

attr default usage/principle use case note
In-memory false cache some blocks of a small family in mem to speed up query analogous to secondarny index table ,for small table not guanrantee to when or how much blocks being cached
 Bloom filter        see Part 1
 Replication scope  0(disable)  sync local cluster data with remote ones TBD  for load balance by distribute req to clusters?  
 Maximum versions  3  control that how many versions(changes)are kept in storage  

 use 1 in general.if u want to check last verion only,given '2' is a good idea.

this will interact with 'Time-to-live'

 Compression  none compress this family if specified SNAPPY,LZO,GZ..    u must be clear completely what your requirements are then use corresponding one
Block size 64k a store file is splited into certain blocks,so smaller block cause faster reading randomly;else use bigger if for sequential readings TBD    
Block cache true when read some rows from hbase,this dertermine whehter to write back to cache to speed up last access use 'true' if clients used access to the much duplicted rows ;'false' if do a whole table scan or less readings than writes system  
Time-to-live max.int(sec in unit) how along a cell value will be kept in storage

if this is a 'recycled' system(ie. rolling),use a appropriate value to keep data size

this will interact with 'Maximum versions',that is both attributes contorl the data verions overlying by this
         

 Ref:

hbase definitive book

猜你喜欢

转载自leibnitz.iteye.com/blog/1958647