[JavaSE source code analysis] personal understanding about HashMap

What the HashMap?

Map is common Java Key-Value structure for storing data, the key to
the HashMap Map structure underlying Hash algorithm used Access Key value

HashMap: implement the Map interface hash tables.

  1. This implementation provides all of the optional map operations, and allows the use null null values and keys.
    (In addition to allowing the use of null and non-synchronous addition, the HashMap Hashtable class is approximately the same.)
  2. This class does not guarantee the order of mapping, in particular, it does not guarantee that the order lasts forever.
  3. This implementation assuming the hash function element is suitably distributed between the tub, to provide a stable performance for the basic operations (get and put).
  4. Iteration time required to view the collection HashMap instance "capacity" (the number of buckets) and the size (key - value mappings). Therefore proportional, if the iterative performance is important not to set the initial capacity too high (or the load factor too low)

What is the underlying data structure is HashMap?

Entry Map of the ancillary data structure, the underlying structure is a Map Entry array
but conflicting use hash linked list processing method using linked list memory element conflict
when conflict accumulated data on the location to a certain extent, will be converted into black tree structure

// 必定是2的倍数 
transient Node<K,V>[] table;

HashMap underlying data structure is an array of elements for storing the list and
when the number of elements in such a position greater than or equal 8 ( and the table 64 is larger than the size of the list is converted into) when the red-black tree structure TREEIFY_THRESHOLD = 8
if (binCount >= TREEIFY_THRESHOLD - 1) treeifyBin(tab, hash)
the number of elements is smaller than the red-black tree 6 converted into a linked list structure UNTREEIFY_THRESHOLD = 6
if (lc <= UNTREEIFY_THRESHOLD) tab[index] = loHead.untreeify(map)

Array features are: addressing easily, insert and delete difficulties
list features are: addressing the difficulties, insert and delete easy
red-black tree: self-balancing binary search tree, search efficiency is very high, look for efficiency from the list the o (n) is reduced to o (logn)

  1. Red-black tree structure than the complex structure of the list, the list of nodes in a few times, it seems from the overall performance, the structure of the array + + list red-black tree may not necessarily be higher than the structural performance of an array of linked list +
  2. HashMap frequent expansion, will continue to cause splits and restructuring the bottom of the red-black tree, which is very time-consuming.
    Therefore, the length of the list is a long time into a red-black tree will significantly improve efficiency

Why table capacity must be a multiple of two?

In order to facilitate access to data, bit by modular arithmetic operations instead of
hash & (n-1) is in its stored position

// index 为元素在table数组中存放位置
// n = table.length
// hash 为key的hash
index = (n - 1) & hash;

// 有可能1都集中在前16位中 
// 而导致明明相差很大的数据 因为后16位相同而发生冲突
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
// 右位移16位, 正好是32bit的一半, 自己的高半区和低半区做异或, 
// 就是为了混合原始哈希码的高位和低位, 以此来加大低位的随机性. 
// 而且混合后的低位掺杂了高位的部分特征, 这样高位的信息也被变相保留下来. 

table capacity of how to do multiples of two?

Greater than equal to the input parameters to obtain the smallest multiple of 2 by a bit operation or the right way and

/**
* 返回大于等于给定参数的值(2的倍数)
* 首位为1 其余为0
* cap最大为: 1 << 30
*
* 先求全1 再加1 --> 1111 + 1 = 1 0000
*/
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
  • Why do you want to cap minus 1 operation?
    After this is to prevent, cap has a power of 2. If the cap has a power of two, and does not perform this operation minus 1, then finished behind the implementation of several unsigned right shift operation returned this capacity will be 2 times the cap.
  • Because the input parameter is not certainly has 1bit 1 0 highest bit is 1
    for the first time to the right: the highest level and the highest level will be high time to 1
    second right: The first two three four and the top two will 1
    third right: the top four and 5 to 8 before the top four will be for 1
    ...
    MSB the following are all 1
  • The final threshold value is stored only temporarily given a resize () to initialize
// initial capacity was placed in threshold
else if (oldThr > 0) 
    newCap = oldThr;

Entry What is the structure?

Entry single key-value pair, provide a simple key, value of operation

interface Entry<K,V> {
    // 返回该实例存储的Key
    K getKey();

    // 返回该实例存储的Value
    V getValue();

    // 替换该实例存储的value 
    // 返回原有value值
    V setValue(V value);

    // 判断两实例相等的方法
    // 一般指定两者的Key, Value均要相等
    boolean equals(Object o);

    // 获取该实例的hashCode
    // 一般为该实例的唯一标识
    int hashCode();

    // 返回一个比较Entry key值的比较器
    public static <K extends Comparable<? super K>, V> Comparator<Map.Entry<K,V>> comparingByKey() {
        return (Comparator<Map.Entry<K, V>> & Serializable)
            (c1, c2) -> c1.getKey().compareTo(c2.getKey());
    }

    // 返回一个比较Entry value值的比较器
    public static <K, V extends Comparable<? super V>> Comparator<Map.Entry<K,V>> comparingByValue() {
        return (Comparator<Map.Entry<K, V>> & Serializable)
            (c1, c2) -> c1.getValue().compareTo(c2.getValue());
    }

    // 通过给进比较key值的比较器 来获得一个比较Entry的比较器
    public static <K, V> Comparator<Map.Entry<K, V>> comparingByKey(Comparator<? super K> cmp) {
        Objects.requireNonNull(cmp);
        return (Comparator<Map.Entry<K, V>> & Serializable)
            (c1, c2) -> cmp.compare(c1.getKey(), c2.getKey());
    }

    // 通过给进比较value值的比较器 来获得一个比较Entry的比较器
    public static <K, V> Comparator<Map.Entry<K, V>> comparingByValue(Comparator<? super V> cmp) {
        Objects.requireNonNull(cmp);
        return (Comparator<Map.Entry<K, V>> & Serializable)
            (c1, c2) -> cmp.compare(c1.getValue(), c2.getValue());
    }
}

Node: Entry in the HashMap specific implementations

Node is a linked list node structure, when the main purpose processing hash conflicts, as a means to alleviate
(calculated hash memory location has been drawn on the element, then the element as a next node on the element)

static class Node<K,V> implements Map.Entry<K,V> {
    // hash, key一般赋值后不能被修改
    final int hash;
    final K key;
    V value;
    // 存放下一节点
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey()        { return key; }
    public final V getValue()      { return value; }
    public final String toString() { return key + "=" + value; }

    // 该Node实例的hashCode是key的hashCode和value的hashCode相异或
    public final int hashCode() {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    // key,value都相等
    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false;
    }
}

Approach to hash conflicts

  1. Open-addressable
    Once the conflict, went looking for the next empty hash address, as long as the hash table is large enough, empty hash address can always find and record stores
  2. Chain address method
    each cell header of the hash table as a linked list of nodes, all the hash addresses constituting a list of synonyms of index elements.
    I.e. conflict to put the key in the chain unit is a head node the tail of the list
  3. Hashing again
    when another hash function hash address calculation functions other address conflict until the conflict is not generated until the
  4. The establishment of a common overflow area
    will be divided into two parts hash table basic table and overflow table, the elements of conflict are placed in the overflow table

HashMap initialization or expansion resize ()

  1. HashMap until time of invoking it began initialization, after adding the data will begin to determine whether additional capacity is needed (by threshold)
  2. If no parameter is set as the default map capacity threshold is 16 9 (provided at this point)
  3. When the expansion, and the threshold values ​​are doubled capacity
/**
* 初始化或数组容量翻倍
*/
final Node<K, V>[] resize() {
    // 获得原有全局表 table
    Node<K, V>[] oldTab = table;
    // 获得原有表的容量 
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    // 获得原有表的阈值 容量*负载因子
    // 如果设置了初始容量 threshold等于设置的初始容量(大于等于输入参数的最小的二倍数)
    // 如果没有则为0
    int oldThr = threshold;
    // 设置新表的容量和阈值
    int newCap, newThr = 0;
    // 如果原有表不为空 
    if (oldCap > 0) {
        // 如果原有表的容量大于等于最大容量 不用扩展
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        // 如果旧表的容量 大于16 翻倍后小于最大容量
        } else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY)
            // 新表的阈值也是旧表阈值的两倍
            newThr = oldThr << 1; 
    // 如果旧表为空 并且事先设置了参数 threshold不为空
    } else if (oldThr > 0) 
        // 使用初始化的旧表阈值做新表的容量
        newCap = oldThr;
    // 旧表为空 没有设置参数 threshold为空
    else {
        // 新表容量 使用默认初始容量 16
        newCap = DEFAULT_INITIAL_CAPACITY;
        // 新表阈值为 16*0.75 = 12
        newThr = (int) (DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // 以上就处理完 新表的参数 容量newCap
    // 此时 旧表为空 并且设置了参数 threshold不为空
    if (newThr == 0) {
        // 使用新表的容量计算新表的阈值
        float ft = (float) newCap * loadFactor;
        // 新表的容量小于最大容量 计算的新表阈值也小于最大容量 则获得新表阈值
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float) MAXIMUM_CAPACITY ?
                (int) ft : Integer.MAX_VALUE);
    }
    // 以上就处理完 新表的参数 容量newCap和阈值newThr
    threshold = newThr;

    @SuppressWarnings({"rawtypes", "unchecked"})
    Node<K, V>[] newTab = (Node<K, V>[]) new Node[newCap];
    table = newTab;
    // 旧表不为空 开始扩容
    if (oldTab != null) {
        // 遍历旧表
        for (int j = 0; j < oldCap; ++j) {
            Node<K, V> e;
            // 获取旧表不为空元素
            if ((e = oldTab[j]) != null) {
                // 将该位置置为空
                oldTab[j] = null;
                // 如果该位置只有一个Node 没有下一Node
                if (e.next == null)
                    // 通过indexFor存入新表中
                    newTab[e.hash & (newCap - 1)] = e;
                
                // 判断该位置Node的链接Node是什么结构?
                // 树形结构 红黑树
                else if (e instanceof TreeNode)
                    ((TreeNode<K, V>) e).split(this, newTab, j, oldCap);
                // 链状结构 链表
                else {
                    // head 指向链表头部 tail 构建整个链表
                    Node<K, V> loHead = null, loTail = null;
                    Node<K, V> hiHead = null, hiTail = null;
                    Node<K, V> next;
                    do {
                        // 先取到该位置的下一节点
                        next = e.next;
                        // e.hash & oldCap = 0 则该Node不需要移位
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        } else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    // 低位链表不需要移动 在新表中也是原有位置
                    // 直接放置链表
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // 高位链表移动旧表的容量步数
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

calculated hash HashMap element

and their 16-bit high hashCode or dissimilar
so that the hash more discrete elements such that a more uniform distribution
in general we have relatively small capacity, 2 ^ 16
calculates the value of the position is the key element hash&(n-1)substantially only a few low so information is valid hash conflicts larger
conflict mitigation if the information is also involved in the operation can be high to some extent,

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

The designers of high-key hash value calculation is also made (with the 16-bit XOR operation high, so do & calculation at this time is actually low and low to high binding), which increases the random and reduce the likelihood of collision conflict

HashMap Add / Update elements

  1. First check whether the table is not initialized to resize () to initialize
  2. (N-1) If the position of the acquired storage position storage element is empty by hash &
  3. The key element is equal to the value of hash conflict occurs, find the location does not add elements to the tail / leaf node
  4. Find the key to equal the value of the element to update its value value, returning the old value
public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    Node<K,V>[] tab; 
    Node<K,V> p; 
    int n, i;
    // 如果当前表为空 还没创建 或者创建了 但表的大小为0
    if ((tab = table) == null || (n = tab.length) == 0)
        // 初始化表格
        n = (tab = resize()).length;
    // i = (n - 1) & hash 通过位运算得到需要存放的位置
    // 如果该位置为空 则直接存储
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    // 发生hash冲突
    else {
        // 获取存放位置的元素
        Node<K,V> e; 
        K k;
        // p 是已存放元素
        // 判断p是否与将要存放的元素key相等
        // 如果key相等 表示是更新元素
        if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // key不相等 并且p为树状结构节点
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        // key不相等 并且p为链状结构节点
        else {
            for (int binCount = 0; ; ++binCount) {
                // 找到最后一个节点
                if ((e = p.next) == null) {
                    // 添加节点
                    p.next = newNode(hash, key, value, null);
                    // 如果该链表长度大于等于7 则转化为红黑树结构
                    if (binCount >= TREEIFY_THRESHOLD - 1)
                        treeifyBin(tab, hash);
                    break;
                }
                // 继续判断查询节点是否与将要存放的元素key相等
                if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        // 如果存放的位置不为空
        if (e != null) {
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    // 如果表格元素个数大于阈值 则扩容
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

HashMap values

  1. Analyzing table is empty is empty or null
  2. Check the hash & (n-1) whether the position of null is returned empty to empty
  3. Determining whether the first element to meet the requirements of the return element
  4. The occurrence of hash conflict, see the location of other nodes returns the element is not found null is returned
public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; 
    Node<K,V> first, e; 
    int n; 
    K k;
    // 当table不为空 并且hash位置上存有元素
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        // 检查该位置的第一个元素
        if (first.hash == hash &&
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        // 如果第一个元素 不符合查询条件
        // 则表示有可能是hash冲突 
        if ((e = first.next) != null) {
            // 如果首位元素是树节点
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // 首位元素是链表节点 遍历
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

HashMap remove elements

  1. Analyzing table is empty is empty or null
  2. With getNode (hash, key) the same process to find qualified elements based on key values
  3. Locate the element and then delete it, and returns that element
public boolean remove(Object key, Object value) {
    return removeNode(hash(key), key, value, true, true) != null;
}

final Node<K,V> removeNode(int hash, Object key, Object value, 
                            boolean matchValue, boolean movable) {
    Node<K,V>[] tab; 
    Node<K,V> p; 
    int n, index;
    // 判断table是否为空 并且hash位置上存有元素
    if ((tab = table) != null && (n = tab.length) > 0 && 
        (p = tab[index = (n - 1) & hash]) != null) {
        Node<K,V> node = null, e; 
        K k; 
        V v;
        // 以下为找出符合要求的元素 根据key值 "getNode(hash, key)"
        // 首位元素
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            node = p;
        // 发生hash冲突 该位置其他节点
        else if ((e = p.next) != null) {
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key ||
                            (key != null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while ((e = e.next) != null);
            }
        }
        // 找打符合要求的元素node
        // matchValue 表示是否需要匹配value值 false表示不用 即使输入value不对 也可以删除该元素
        if (node != null && (!matchValue || (v = node.value) == value ||
                                (value != null && value.equals(v)))) {
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)
                tab[index] = node.next;
            else
                p.next = node.next;
            ++modCount;
            --size;
            afterNodeRemoval(node);
            return node;
        }
    }
    return null;
}

HashMap Why not thread-safe?

Because it's methods are not thread-safe

What problems may exist in the HashMap concurrency scenarios?

Data Loss Data repeat the cycle of death

  • Doug Lea writes:

Through the above Java7 why there is data loss, if there are two threads to simultaneously execute this statement table [i] = Entry create two threads metropolitan area when null, so there will be loss of data stored in the source code analysis.

If there are two threads find themselves key does not exist, and the two threads are actually the same key, when in the list written to the first thread provided for their own e Entry, and the second thread execution to e.next, this time got the last node, still will own holdings data is inserted into the list, which appeared 数据重复.

Source code can be found through put, data is first written to the map, and then to decide whether to do the number According to resize elements.
There will be a more tricky problem resize an infinite loop in the process.

The reason is mainly because hashMap resize process in a reverse order of the linked list processing
is assumed that two threads simultaneously resize, A-> B of the first thread in the process is relatively slow, the second thread has finished programming the reverse BA then appeared cycle, B-> A-> B. so there will be a surge in CPU usage.

PS: In this process can be found, was the result of an infinite loop, mainly because of the reverse process for the list in Java 8 has been not in use reverse list, infinite loop problem has been greatly improved.

Debug be further understood by HashMap

surroundings

  • IntelliJ IDEA 2018 Professional Edition
  • Deepin Linux 15.9 Desktop

process

Test code

import java.util.HashMap;
import java.util.Map;
import java.util.Objects;
import java.util.Random;

/**
 * 通过debug查看HashMap存储数据时的结构
 *
 * @author lingsh
 * @version 1.0
 * @date 19-9-25 下午4:29
 */

public class TestHashMap {
    public static void main(String[] args) {
        // size 存放数据量
        // cap HashMap初始设置容量 10 --> 16
        int size = 10000, cap = 10;
        Map<THMString, Integer> map = new HashMap<>(cap);
        // 存放数据
        for (int i = 0; i < size; i++) {
            map.put(THMString.getRandomString(), i);
        }
        // 用于定点调试(只是该main函数的终止位置, 可以用于暂停查看map结构)
        System.out.println(map.size());
    }
}

/**
 * TestHashMapString 
 * hashCode分布极其集中的自定义类
 * 底层是LEN大小的字符串
 */
class THMString {
    /**
     * 底层真实数据
     */
    private String str;
    /**
     * SIZE 用于加强实例碰撞的可能性
     */
    private final static int SIZE = 1024;
    /**
     * 底层数据大小
     */
    private final static int LEN = 5;
    /**
     * 使用随机数来确定底层数据内容
     */
    private static Random random = new Random();

    public THMString(String str) {
        this.str = str;
    }

    @Override
    public int hashCode() {
        // 集中hashCode 通过不断的取余来加强碰撞可能
        return str.hashCode() % SIZE % SIZE % SIZE % SIZE % SIZE;
    }

    @Override
    public boolean equals(Object obj) {
        THMString thms = (THMString) obj;
        return Objects.equals(this.str, thms.str);
    }

    @Override
    public String toString() {
        return "[String:" + str + "\thashCode:" + hashCode() + "]";
    }

    /**
     * 获取随机的实例作为HashMap的Key
     * 
     * @return 随机的实例
     */
    static THMString getRandomString() {
        char[] chars = new char[LEN];
        for (int i = 0; i < chars.length; i++) {
            int word = random.nextInt('z' - 'a');
            chars[i] = (char) (word + 'a');
        }
        return new THMString(new String(chars));
    }
}

Goal achievement

Underlying data structure
  • Node
  • TreeNode
Expansion process
  • Expansion at the beginning

  • A new table to the mobile node when the expansion

Debug Tips

Debug IDEA closed class structure optimization

When this option is turned on, IDEA hiding parameters, such as: next, newThr etc.

Found route table storage jar

Because in the source code debug, not only in our personal use HashMap, the system will run a program to use it, so we might catch HashMap access system call value at initialization
Therefore, the proposed

  • Give their own procedures set breakpoints, determined to start their own program, and then set a breakpoint that HashMap source

Reference article

  1. HashMap complete explication
  2. Detailed resize the HashMap
  3. Static source code annotation tool HashMap method of hash (), tableSizeFor () (iv)
  4. HashMap is so simple analysis [Source]
  5. HashMap source code analysis (jdk1.8, to ensure you can read)
  6. hashmap thread insecurity is reflected in where?
  7. Commissioning idea debug (HashMap, ArrayList etc.) On / Off view collections

Guess you like

Origin www.cnblogs.com/slowbirdoflsh/p/11585463.html