HashMap implementation principle

foreword

The backbone of HashMap is an array. Suppose we have 3 key-value pairs dnf: 1, cf: 2, lol: 3. Each time we put them, we will determine where the key-value pair should be placed in the array according to the hash function. i.e. index = hash(key)

1 = hash(dnf), we put the key-value pair at the position of the array subscript 1
write picture description here
3 = hash(cf)
write picture description here
1 = hash(lol), then we found that the position of the array subscript 1 already has a value, we Put lol:3 in the first place of the linked list, and put the original dnf:1 in the form of a linked list below the lol key-value pair, because HashMap uses the head insertion method.
write picture description here
When obtaining the key-value pair whose key is dnf, 1=hash(dnf), get the position of the key-value pair in the array subscript 1, dnf and lol are not equal, compare with the next element, and return equal

source code

Based on jdk1.7.0_80

focus point in conclusion
Does HashMap allow null Both key and value are allowed to be empty
Does HashMap allow duplicate data? not allowed
Is the HashMap ordered? disorder
Is HashMap thread safe? Not thread safe

write picture description here
several important properties

//初始容量是16,且容量必须是2的倍数
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;

//最大容量是2的30次方
static final int MAXIMUM_CAPACITY = 1 << 30;

//负载因子
static final float DEFAULT_LOAD_FACTOR = 0.75f;

static final Entry<?,?>[] EMPTY_TABLE = {};

//HashMap的主干是一个Entry数组,在需要的时候进行扩容,长度必须是2的被数
transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

//放置的key-value对的个数
transient int size;

//进行扩容的阀值,值为 capacity * load factor,即容量 * 负载因子
int threshold;

//负载因子
final float loadFactor;

//和线程安全相关,这里不讨论
transient int modCount;

transient int hashSeed = 0;

Let's talk about threshold and loadFactor here, threshold = capacity * load factor, that is, the threshold for expansion = capacity * load factor, for example, the capacity of HashMap is 16, and the load factor is 0.75, then the threshold is 16*0.75=12, when HashMap is in When 12 elements are put in, it will expand

  1. The smaller the load factor, the easier it is to expand capacity and waste space, but the search efficiency is high
  2. The larger the load factor, the more difficult it is to expand the capacity, the more fully utilized the space, and the lower the search efficiency (the linked list is elongated)

Static inner class that stores data

static class Entry<K,V> implements Map.Entry<K,V> {
    final K key;
    V value;
    Entry<K,V> next;//存储指向下一个Entry的引用,单链表结构
    int hash;//对key的hashcode值进行hash运算后得到的值,存储在Entry,避免重复计算

    Entry(int h, K k, V v, Entry<K,V> n) {
        value = v;
        next = n;
        key = k;
        hash = h;
    }
}

Constructor (others are extensions based on this)

public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                loadFactor);

    this.loadFactor = loadFactor;
    threshold = initialCapacity;
    init();
}

put method

public V put(K key, V value) {
    //hashmap的数组为空
    if (table == EMPTY_TABLE) {
        inflateTable(threshold);
    }
    if (key == null)
        return putForNullKey(value);
    //获取hash值
    int hash = hash(key);
    //找到应该放到table的哪个位置
    int i = indexFor(hash, table.length);
    //遍历table[i]位置的链表,查找相同的key,若找到则使用新的value替换oldValue,并返回oldValue
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        //如果key已经存在,将value设置为新的,并返回旧的value值
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    //将元素放到table[i],新的元素总在table[i]位置的第一个元素,原来的元素后移
    addEntry(hash, key, value, i);
    return null;
}

When it is empty, HashMap has not yet created the array, it may use the default initial value of 16, or it may customize the length. In this case, the length of the array needs to be changed to the minimum multiple of 2, and the multiple of 2 is greater than equal to the initial capacity

private void inflateTable(int toSize) {
    //返回大于或等于最接近2的幂数
    int capacity = roundUpToPowerOf2(toSize);

    threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    table = new Entry[capacity];
    initHashSeedAsNeeded(capacity);
}

If the key is null, put the value on the chain of table[0]

private V putForNullKey(V value) {
    for (Entry<K,V> e = table[0]; e != null; e = e.next) {
        if (e.key == null) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
    modCount++;
    addEntry(0, null, value, 0);
    return null;
}

get hash value

final int hash(Object k) {
    int h = hashSeed;
    if (0 != h && k instanceof String) {
        return sun.misc.Hashing.stringHash32((String) k);
    }

    h ^= k.hashCode();

    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

find where it should be placed in the array

static int indexFor(int h, int length) {
    // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
    return h & (length-1);
}

add element

void addEntry(int hash, K key, V value, int bucketIndex) {
    //容量超过阈值,并且即将发生哈希冲突时进行扩容
    if ((size >= threshold) && (null != table[bucketIndex])) {
        //扩容为原来的2倍
        resize(2 * table.length);
        //重新计算hash值
        hash = (null != key) ? hash(key) : 0;
        bucketIndex = indexFor(hash, table.length);
    }

    createEntry(hash, key, value, bucketIndex);
}

Put the newly added element in the first place of the table, and follow the other elements after the first element

void createEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}

get method

public V get(Object key) {
    if (key == null)
        return getForNullKey();
    Entry<K,V> entry = getEntry(key);

    return null == entry ? null : entry.getValue();
}

Get the value whose key is null from the beginning of table[0]

private V getForNullKey() {
    if (size == 0) {
        return null;
    }
    for (Entry<K,V> e = table[0]; e != null; e = e.next) {
        if (e.key == null)
            return e.value;
    }
    return null;
}

when key is not null

final Entry<K,V> getEntry(Object key) {
    if (size == 0) {
        return null;
    }

    int hash = (key == null) ? 0 : hash(key);
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            return e;
    }
    return null;
}

resize

void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    //容量已经达到最大
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }

    Entry[] newTable = new Entry[newCapacity];
    transfer(newTable, initHashSeedAsNeeded(newCapacity));
    table = newTable;
    threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}

Recalculate the element's position in the new array and copy it

void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length;
    for (Entry<K,V> e : table) {
        while(null != e) {
            Entry<K,V> next = e.next;
            if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            int i = indexFor(e.hash, newCapacity);
            e.next = newTable[i];
            newTable[i] = e;
            e = next;
        }
    }
}

Knowledge point

Why is the size of HashMap 2 n

HashMap uses the following method to determine the subscript of the key-value pair in the array

static int indexFor(int h, int length) {
    return h & (length-1);
}

h & (length - 1) is equivalent to h % length, we assume the length of the array is 15 and 16, the hash code is 8 and 9

h & (length - 1) h length index
8 & (15 - 1) 0100 1110 0100
9 & (15 - 1) 0101 1110 0100
8 & (16 - 1) 0100 1111 0100
9 & (16 - 1) 0101 1111 0101

It can be seen that when the length of the array is 15, the elements with hash codes 8 and 9 are placed in the same position in the array to form a linked list, and the key reduces the query efficiency. When the hash code and 15-1 (1110) are &, The last bit is always 0, so 0001, 0011, 0101, 1001, 1011, 0111, 1101 will never be placed elements, which will cause 1. a large waste of space, 2. increase the probability of collision and slow down the query s efficiency. When the array length is 2 n hour, 2 n 1 All bits are 1, such as 8-1=7 or 111, then when the low-bit & operation is performed, the value is always the same as the original hash value, which reduces the probability of collision

Why do objects put into HashMap override equals and hashCode methods

Reference blog

HashMap实现原理
[1]https://www.cnblogs.com/chengxiao/p/6059914.html
[2]http://ms.csdn.net/geek/187726
[3]https://blog.csdn.net/changlei_shennan/article/details/78687719
[4]https://blog.csdn.net/world6/article/details/70053356

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325380416&siteId=291194637