Java HashMap工作原理及实现

一、HashMap概述

　　HashMap是基于哈希表的Map接口的非同步实现。此实现提供所有可选的映射操作，并允许使用null值和null

键。此类不保证映射的顺序，特别是它不保证该顺序恒久不变。

二、HashMap的数据结构

　　HashMap的底层主要是基于数组、链表和红黑树来实现的，它之所以有相当快的查询速度主要是因为它是通过

计算散列码来快速找到存储的位置。HashMap中主要是通过key的hashCode来计算hash值的，只要hashCode相同，

计算出来的hash值就一样。如果存储的对象对多了，就有可能不同的对象所算出来的hash值是相同的，这就出现

了所谓的hash冲突。学过数据结构的同学都知道，解决hash冲突的方法有很多，HashMap底层是通过链表及红黑树

(JDK1.8引入)来解决hash冲突的。

当同一个桶(hash值相同，(桶的数目-1)与key的hash值进行位&运算)中的冲突次数较小时

（JDK1.8为小于或等于8），HashMap使用链表来解决hash冲突，在同一个桶中根据key进行查询的时间

复杂度为O(n)，结构大致如下图（图片来自Java HashMap工作原理 )

当同一个桶(hash值相同，(桶的数目-1)与key的hash值进行位&运算)中的冲突次数较大时

（JDK1.8为大于8），由于链表的查找时间复杂度为O(n)，当同一个桶中元素越多，查询就越慢。

因而，JDK1.8中引入了红黑树进行了优化，查找的时间复杂度为O(logn)。如下图，当第一个桶内

的元素数目大于8达到9时，链表结构会转换成红黑树(TreeNode为树的一个节点)，而尚未达到

的其它桶结构还是链表

三、HashMap源码分析

1、关键属性

transient Entry[] table;//存储元素的实体数组
 
transient int size;//存放元素的个数
 
int threshold; //临界值   当实际大小超过临界值时，会进行扩容threshold = 加载因子*容量

final float loadFactor; //加载因子
 
transient int modCount;//被修改的次数

其中loadFactor加载因子是表示Hash表中元素的填满的程度.

若:加载因子越大,填满的元素越多,好处是,空间利用率高了,但:冲突的机会加大了.链表长度/红黑树高度

会越来越大,查找效率降低。反之,加载因子越小,填满的元素越少,好处是:冲突的机会减小,但:空间浪费

多了.表中的数据将过于稀疏（很多空间还没用，就开始扩容了）冲突的机会越大,则查找的成本越高.

因此,必须在 "冲突的机会"与"空间利用率"之间寻找一种平衡与折衷. 这种平衡与折衷本质上是数据结构

中有名的"时-空"矛盾的平衡与折衷.如果机器内存足够，并且想要提高查询速度的话可以将加载因子

设置小一点；相反如果机器内存紧张，并且对查询速度没有什么要求的话可以将加载因子设置大一点。

不过一般我们都不用去设置它，让它取默认值0.75就好了。

2. 初始化方法

HashMap两种常用的构造方法：

第一种是无参的构造方法：

    /**
     * 构造一个空的HashMap，使用默认的初始capacity (16) 和默认的load factor (0.75).
     */
    public HashMap() {
         // table数组、容量、进行resize前的最大元素数等在resize()方法里进行设置
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

第二种是初始容量和负载因子参数的构造方法：

/**
     * 构造一个空的HashMap， 使用给定的初始容量和负载因子.
     *
     * @param  initialCapacity 初始容量
     * @param  loadFactor      负载因子
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {
       if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        // 返回大于或等于initialCapacity的最小的2的整数倍作为阈值       
       this.threshold = tableSizeFor(initialCapacity);
    }

HashMap容量调整代码如下：

  /**
 * 初始化 或者 加倍 table的大小. 
 *
 * @return the table
 */
final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
             //容量小于最大容量时，直接翻倍
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        // 初始阈值已经设置，容量还未设置，则将阈值的值设置为初始容量
       newCap = oldThr;
    else {               // 初始设置，使用默认设置
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;

    if (oldTab != null) {
        // 把每个bucket都移动到新的buckets中
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                // 改变每个桶内的所有节点的位置
                oldTab[j] = null;
                if (e.next == null)
                    // 如果桶内只有一个节点，newTab的该桶直接指向该元素
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order 
                    //存放不需要改变桶的节点列表，分别指向第一个和最后一个节点                
                    Node<K,V> loHead = null, loTail = null; 
                    //存放需要改变桶的节点列表，分别指向第一个和最后一个节点                
                    Node<K,V> hiHead = null, hiTail = null;
                    
                    Node<K,V> next;
                    // 循环改变当前桶内节点的位置
                    do {
                        next = e.next;
                        // 原索引（所在桶不变）                    
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                // 保存在原有位置的桶的第一个节点
                                loHead = e;
                            else 
                                // loTail当前指向的节点指向新的的节点                            
                                loTail.next = e;
                            // loTail指向新的节点
                            loTail = e;
                        }
                        // 原索引位置+oldCap（所在桶发生改变）
                        else {
                            if (hiTail == null)
                                // 保存要移到新索引位置的桶的第一个节点
                                hiHead = e;
                            else 
                                // hiTail当前指向的节点指向新的节点                            
                                hiTail.next = e;
                            // hiTail指向新的节点
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    
                    // 设置索引不需要改变的桶的第一个节点
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // 设置索引需要改变的桶的第一个节点                
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;    
}

当HashMap容量调整时，如果结构还是链表，节点的移动设计很巧妙。

1）当前节点Key的 hash值与调整前的容量进行&运算，如果结果是0，则该节点不用移到新的桶

2）当前节点Key的 hash值与调整前的容量进行&运算，如果结果不等于0，则该节点需要移到新的桶，且偏移量的值为oldCap

如下图：

当从16位扩展到32位时，

oldCap 16 00000000 00000000 00000000 00010000

aa 的hash（3104） 00000000 00000000 00001100 00100000 => 00000000 00000000 00000000 00000000 等于0，所在桶不发生变化

王三的hash（936912） 00000000 00001110 01001011 11010000 => 00000000 00000000 00000000 00010000 不等于0，桶索引增量为原来容量大小，从0 -> 16

这样设计使得每次扩充时，扩充后节点不需要再根据节点的hash与(table.length-1)进行&运算来获取节点所在桶，可以直接由原来所在桶的索引得出

3. HashMap的put操作

put函数大致的思路为：

对key的hashCode()做hash，然后再计算index;
如果没碰撞直接放到bucket里；
如果碰撞了，以链表的形式存在buckets后；
如果碰撞导致链表过长(大于TREEIFY_THRESHOLD)，就把链表转换成红黑树；
如果节点已经存在就替换old value(保证key的唯一性)
如果bucket满了(超过load factor*current capacity)，就要resize

public V put(K key, V value) {
  return putVal(hash(key), key, value, false, true);
}

/**
     * Implements Map.put and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;  //初次设置
        if ((p = tab[i = (n - 1) & hash]) == null)
           // 如果对应的桶是空的，则添加节点
          tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            // 如果要新增的Key对应当前桶的第一个元素，则不进行操作
           if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            // 如果当前桶第一个节点是红黑树的一个节点，则将新节点添加至红黑树
           else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            // 如果当前hash*(table.length-1)对应的桶中是链表结构，则将元素添加至链表末尾/进行value更新
           else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        // 将新的元素添加至链表末尾
                       p.next = newNode(hash, key, value, null);
                        // 判断链表的长度是否达到阈值，达到的话，将链表转换为红黑树
                       if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }

                    // 如果新增的元素之前已经出现，直接跳出循环
                   if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    // p指向它的下一个节点，为下一次循环做准备
                   p = e;
                }
            }
            if (e != null) { // e不为空，表示key已经存在，则进行更新操作
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        // 判断HashMap中元素数目是否超过阈值,超过的话进行容量扩充
       if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

综合1、2，HashMap容量相关情况如下：

1) 对于无参构造函数，HashMap容量初始值是16，初始阈值是12，负载因子是0.75。当数量大于阈值时扩充，

容量翻倍；当容量达到MAXIMUM_CAPACITY时，阈值直接设置为Integer.MAX_VALUE，不再进行扩容

2) 对于传入初始容量和负载因子参数的构造函数，阈值初始值为不小于初始容量的最小的2的整数倍，初始容量

与阈值相同(第一次put是设置)，负载因子为传入的值。当数量大于阈值时扩充，容量翻倍；当容量达到

MAXIMUM_CAPACITY时，阈值直接设置为Integer.MAX_VALUE，不再进行扩容

3) 当HashMap中的数量小于阈值时，也有可能进行容量扩充，如同一个桶中的元素达到超过TREEIFY_THRESHOLD，

这时会扩充容量

4. HashMap的get操作

get函数大致的思路为：

对key的hashCode()做hash，然后再计算index（注意：hash&(length-1)是为了确保算出的索引在数组的索引范围内）
如果对应的bucket中有节点，则判断第一个节点是否刚好对应要找的Key，如果是，直接返回该节点；
如果对应的bucket内第一个节点不是要找的，且该节点有后续节点，则进行查找；
如果第一个后续节点是红黑树的节点，则根据hash和Key在红黑树中进行查找
如果第一个后续节点是链表的节点，则遍历链表的节点进行查找

    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

    /**
     * Implements Map.get and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @return the node, or null if none
     */
    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        // 如果table不为空，hash对应的桶内含有元素，则进行查找
       if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            // 如果刚好为该桶的第一个节点，则直接返回
           if (first.hash == hash && 
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            // 如果桶内第一个节点有后续节点，则进行查找
           if ((e = first.next) != null) {
                // 如果当前节点为红黑树的节点，则在红黑树中根据hash和key进行查找
               if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                // 如果节点为链表中的节点，则不断遍历进行查找
               do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

其中红黑树中的查找思路大致如下：

1. 从根节点开始，比较当前节点的hash和传入的hash，如果当前节点的hash小于传入的hash,则在当前节点的

左子树中进行查找；如果当前节点的hash大于传入的hash,则在当前节点的右子树中进行查找

2. 如果当前节点的hash和传入的hash相等，如果当前节点的左子树为空，则当前节点指向它的右节点；如果

当前节点的左子树不为空且右子树为空，则当前节点指向它的左节点；如果当前左右的左右子树均不为空，

则分别进行查找，直到找到hash和Key均相同的节点或者树查找完成

/**
 * Finds the node starting at root p with the given hash and key.
 * The kc argument caches comparableClassFor(key) upon first use
 * comparing keys.
 */
final TreeNode<K,V> find(int h, Object k, Class<?> kc) {
	TreeNode<K,V> p = this;
	do {
		int ph, dir; K pk;
		TreeNode<K,V> pl = p.left, pr = p.right, q;
                // 如果hash的值小于当前节点的hash值，则在左子树中进行查找
               if ((ph = p.hash) > h)
			p = pl;
                // 如果hash的值大于当前节点的hash值，则在右子树中进行查找               
                else if (ph < h)
			p = pr;
                // 如果当前hash值相等 且 Key值相等，则找到，返回
               else if ((pk = p.key) == k || (k != null && k.equals(pk)))
			return p;
                // 如果当前hash值相等 且 左子数为空，则查询右子树               
                else if (pl == null)
			p = pr;
                // 如果当前hash值相等 且左子树不为空、右子树为空，则查询左子树               
                else if (pr == null)
			p = pl;
		else if ((kc != null ||
				  (kc = comparableClassFor(k)) != null) &&
				 (dir = compareComparables(kc, k, pk)) != 0)
			p = (dir < 0) ? pl : pr;
                // 右子树中如果查询到，则返回
               else if ((q = pr.find(h, k, kc)) != null)
			return q;
                // 右子树中没有查询到，则当前节点指向他的左节点，为下一次循环做准备               
                else
			p = pl;
	} while (p != null);
	return null;
}

/**
 * Calls find for root node.
 */
final TreeNode<K,V> getTreeNode(int h, Object k) {
	return ((parent != null) ? root() : this).find(h, k, null);
}

5. HashMap的remove操作

remove函数大致的思路为：

对key的hashCode()做hash，然后再计算index（注意：hash&(length-1)是为了确保算出的索引在数组的索引范围内）
如果对应的bucket中有节点，则判断第一个节点是否刚好对应要找的Key，如果是，直接返回该节点；
如果对应的bucket内第一个节点不是要找的，且该节点有后续节点，则进行查找；
如果第一个后续节点是红黑树的节点，则根据hash和Key在红黑树中进行查找
如果第一个后续节点是链表的节点，则遍历链表的节点进行查找

/**
 * Removes the mapping for the specified key from this map if present.
 */
public V remove(Object key) {
	Node<K,V> e;
	return (e = removeNode(hash(key), key, null, false, true)) == null ?
		null : e.value;
}

/**
 * Implements Map.remove and related methods
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to match if matchValue, else ignored
 * @param matchValue if true only remove if value is equal
 * @param movable if false do not move other nodes while removing
 * @return the node, or null if none
 */
final Node<K,V> removeNode(int hash, Object key, Object value,
						   boolean matchValue, boolean movable) {
	Node<K,V>[] tab; Node<K,V> p; int n, index;
        // 如果对应的桶中含有元素，则进行查找
       if ((tab = table) != null && (n = tab.length) > 0 &&
		(p = tab[index = (n - 1) & hash]) != null) {
		Node<K,V> node = null, e; K k; V v;
                // 如果刚好是key对应的桶中第一个元素，则记录下来
               if (p.hash == hash &&
			((k = p.key) == key || (key != null && key.equals(k))))
			node = p;
                // 如果不是对应的桶中第一个元素，且第一个元素的后续节点不为空，则从后续节点开始找
               else if ((e = p.next) != null) {
                        // 如果该节点是红黑树的节点，则在书中进行查找
                       if (p instanceof TreeNode)
				node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
                        // 如果在链表中，则遍历链表进行查找
                       else {
				do {
					if (e.hash == hash &&
						((k = e.key) == key ||
						 (key != null && key.equals(k)))) {
						node = e;
						break;
					}
					p = e;
				} while ((e = e.next) != null);
			}
		}
                // 移除并返回该节点
               if (node != null && (!matchValue || (v = node.value) == value ||
							 (value != null && value.equals(v)))) {
                        // 如果找到的节点是红黑树的节点，则在红黑树中移除该节点
                       if (node instanceof TreeNode)
				((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
                        // 如果找到的节点是当前桶中链表的第一个节点，则当前桶中存放第一个节点的后续节点
                       else if (node == p)
				tab[index] = node.next;
                        // 如果找到的节点不是桶中链表的第一个节点，则将找到节点的前一个节点指向其后续节点的后续节点
                       else
				p.next = node.next;
			++modCount;
			--size;
			afterNodeRemoval(node);
			return node;
		}
	}
	return null;
}

注意：在对map进行遍历时，不能直接使用map.remove(key)移除元素，否则会报

java.util.ConcurrentModificationException错，如下

Iterator<Entry<String, Integer>> iter = map.entrySet().iterator();
Map.Entry<String, Integer> entry;
while(iter.hasNext()) {
	entry = iter.next();
	System.out.println(entry.getKey() + " : " + entry.getValue());
	map.remove(entry.getKey());
}

通过查看源码，发现Iterator实例化时，会使用变量expectedModCount存放map的modCount值，当直接使用map

进行第一个元素的移除时，modCount减小，不再与expectedModCount的值相等，循环里第二次调用iter.next()时

发现modCount与expectedModCount不再相等，则说明其它地方对map进行了修改，抛出

ConcurrentModificationException异常

abstract class HashIterator {
	Node<K,V> next;        // next entry to return
	Node<K,V> current;     // current entry
	int expectedModCount;  // for fast-fail
	int index;             // current slot

	HashIterator() {
		expectedModCount = modCount;
		Node<K,V>[] t = table;
		current = next = null;
		index = 0;
		if (t != null && size > 0) { // advance to first entry
			do {} while (index < t.length && (next = t[index++]) == null);
		}
	}

	public final boolean hasNext() {
		return next != null;
	}

	final Node<K,V> nextNode() {
		Node<K,V>[] t;
		Node<K,V> e = next;
		if (modCount != expectedModCount)
			throw new ConcurrentModificationException();
		if (e == null)
			throw new NoSuchElementException();
		if ((next = (current = e).next) == null && (t = table) != null) {
			do {} while (index < t.length && (next = t[index++]) == null);
		}
		return e;
	}

	public final void remove() {
		Node<K,V> p = current;
		if (p == null)
			throw new IllegalStateException();
		if (modCount != expectedModCount)
			throw new ConcurrentModificationException();
		current = null;
		K key = p.key;
		removeNode(hash(key), key, null, false, false);
		expectedModCount = modCount;
	}
}

应该使用迭代器进行元素移除操作，如下：

Iterator<Entry<String, Integer>> iter = map.entrySet().iterator();
Map.Entry<String, Integer> entry;
while(iter.hasNext()) {
	entry = iter.next();
	System.out.println(entry.getKey() + " : " + entry.getValue());
	iter.remove();
}

6. HashMap的hash实现

在get和put的过程中，计算下标时，先对hashCode进行hash操作，然后再通过hash值进一步计算下标，

如下图所示（图片来自Java HashMap工作原理 ):

在对hashCode()计算hash时具体实现是这样的：

/**
 * Computes key.hashCode() and spreads (XORs) higher bits of hash
 * to lower.  Because the table uses power-of-two masking, sets of
 * hashes that vary only in bits above the current mask will
 * always collide. (Among known examples are sets of Float keys
 * holding consecutive whole numbers in small tables.)  So we
 * apply a transform that spreads the impact of higher bits
 * downward. There is a tradeoff between speed, utility, and
 * quality of bit-spreading. Because many common sets of hashes
 * are already reasonably distributed (so don't benefit from
 * spreading), and because we use trees to handle large sets of
 * collisions in bins, we just XOR some shifted bits in the
 * cheapest possible way to reduce systematic lossage, as well as
 * to incorporate impact of the highest bits that would otherwise
 * never be used in index calculations because of table bounds.
 */
static final int hash(Object key) {
	int h;
	return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

使用Key的hashcode的低16bit和高16bit做了一个异或。

参考链接:

Java HashMap工作原理及实现

Java集合---HashMap源码剖析

HashMap的工作原理