[go]map

map implementation details

// makehmap_small implements Go map creation for make(map[k]v) and
// make(map[k]v, hint) when hint is known to be at most bucketCnt
// at compile time and the map needs to be allocated on the heap.
func makemap_small() *hmap {
    h := new(hmap)
    h.hash0 = fastrand()
    return h
}

// makemap implements Go map creation for make(map[k]v, hint).
// If the compiler has determined that the map or the first bucket
// can be created on the stack, h and/or bucket may be non-nil.
// If h != nil, the map can be created directly in h.
// If h.buckets != nil, bucket pointed to can be used as the first bucket.
func makemap(t *maptype, hint int, h *hmap) *hmap {
    .....

    // initialize Hmap
    if h == nil {
        h = (*hmap)(newobject(t.hmap))
    }
    h.hash0 = fastrand()

    // find size parameter which will hold the requested # of elements
    B := uint8(0)
    for overLoadFactor(hint, B) {
        B++
    }
    h.B = B
    
    ......
    return h
}

//初始化: 最重要的肯定就是确定一开始要有多少个桶,初始的大小还是很重要的,还有一些别的初始化哈希种子等等,问题不大。
// A header for a Go map.
type hmap struct {
    // Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
    // Make sure this stays in sync with the compiler's definition.
    count     int // # live cells == size of map.  Must be first (used by len() builtin) (map的大小.  len()函数就取的这个值)
    flags     uint8  // (map状态标识)
    B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items) 
                     // 可以最多容纳 6.5 * 2 ^ B 个元素,6.5为装载因子即:map长度=6.5*2^B
                     // B可以理解为map扩容了多少次

    noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details 
                     // (溢出buckets的数量)
    hash0     uint32 // hash seed
                     // hash 种子

    buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
                              // 指向最大2^B个Buckets数组的指针. count==0时为nil.
    oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
                              //指向扩容之前的buckets数组,并且容量是现在一半.不增长就为nil
    nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)
                              // 搬迁进度,小于nevacuate的已经搬迁
    extra *mapextra           // optional fields
}

buckets this parameter, which is stored a pointer pointing to the array buckets, when the bucket (bucket is 0) is nil.
We can understand, hmap pointing to an empty bucket array,

And when the bucket array capacity is needed, it will open up to double the memory space will be gradual and the original copy of the array, that is, when used to copy the old array to the new array.

// A bucket for a Go map.
type bmap struct {
    // tophash generally contains the top byte of the hash value
    // for each key in this bucket. If tophash[0] < minTopHash,
    // tophash[0] is a bucket evacuation state instead.
    tophash [bucketCnt]uint8
    // Followed by bucketCnt keys and then bucketCnt elems.
    // NOTE: packing all the keys together and then all the elems together makes the
    // code a bit more complicated than alternating key/elem/key/elem/... but it allows
    // us to eliminate padding which would be needed for, e.g., map[int64]int8.
    // Followed by an overflow pointer.
}

// Go map 的 buckets结构
type bmap struct {
    // 每个元素hash值的高8位,如果tophash[0] < minTopHash,表示这个桶的搬迁状态
    tophash [bucketCnt]uint8
   // 第二个是8个key、8个value,但是我们不能直接看到;为了优化对齐,go采用了key放在一起,value放在一起的存储方式,
   // 第三个是溢出时,下一个溢出桶的地址
}

bucket (bucket), each bucket and place up to eight key value, the last field points to a next overflow BMAP,
attention key, value, overflow field definitions are not displayed, but calculates the offset acquired by maptype.

bucket mechanism

第一部分: tophash 存储的是哈希函数算出的哈希值的高八位。
    是用来加快索引的。
    因为把高八位存储起来,这样不用完整比较key就能过滤掉不符合的key,
    1. 加快查询速度当一个哈希值的高8位和存储的高8位相符合,
    2. 再去比较完整的key值,进而取出value。

第二部分,存储的是key 和value,就是我们传入的key和value,
    注意,它的底层排列方式是,
        key全部放在一起,value全部放在一起。
        
        当key大于128字节时,bucket的key字段存储的会是指针,指向key的实际内容;
        value也是一样。

        这样排列好处是在key和value的长度不同的时候,可以消除padding带来的空间浪费。
        
        并且每个bucket最多存放8个键值对。

第三部分,存储的是当bucket溢出时,指向的下一个bucket的指针

  • bucket design details
在golang map中出现冲突时,
    不是每一个key都申请一个结构通过链表串起来,而是以bmap为最小粒度挂载,
    一个bmap可以放8个key和value。
    
    这样减少对象数量,减轻管理内存的负担,利于gc。

如果插入时,bmap中key超过8,
    那么就会申请一个新的bmap(overflow bucket)挂在这个bmap的后面形成链表,
    优先用预分配的overflow bucket,
    如果预分配的用完了,那么就malloc一个挂上去。
    
    注意golang的map不会shrink,内存只会越用越多,overflow bucket中的key全删了也不会释放

reference

get

计算出key的hash
用最后的“B”位来确定在哪个桶(“B”就是前面说的那个,B为4,就有16个桶,0101用十进制表示为5,所以在5号桶)
根据key的前8位快速确定是在哪个格子(额外说明一下,在bmap中存放了每个key对应的tophash,是key的前8位)
最终还是需要比对key完整的hash是否匹配,如果匹配则获取对应value
如果都没有找到,就去下一个overflow找

2 from the map data structure to store data.
The first data structure is an array, the internal memory is engaged in eight-bit value is used to select the tub, this array is used to distinguish the presence of each key of which barrels
second data structure is an array of bytes used to store the key-value pairs. the first byte array stores a bucket all key, once again all of the values stored in the bucket. (save memory)

To summarize: B bits determine barrel after the passage of the front eight determine grid, loop through all attached to the barrel are all looking for last.
So why have this tophash it? Because tophash can quickly determine whether the correct key, you can understand it as a measure of the cache, if not before eight pairs, and the back is no need to compare.

Guess you like

Origin www.cnblogs.com/iiiiiher/p/12190108.html