Map源码解析之ConcurrentHashMap(JDK1.7)

Map源码解析之HashMap
Map源码解析之HashMap红黑树
 Map源码解析之HashMap补充：集合、迭代器、compute、merge、replace
Map源码解析之LinkedHashMap
Map源码解析之TreeMap
Map源码解析之HashTable
Map源码解析之ConcurrentHashMap(JDK1.8)(一)
Map源码解析之ConcurrentHashMap(JDK1.8)(二)

ConcurrentHashMap在jdk1.7和jdk1.8中的源码发生了较大的变化，本文将针对jdk1.7的ConcurrentHashMap的源码进行解析。

一、简介

ConcurrentHashMap在jdk1.7中基于分段锁的方式保证线程安全，支持完全并发的读操作和可控级别的并发的写操作。
ConcurrentHashMap的弱一致性的，它的读操作只能反映最近的状态，并不一定是读操作开始时的状态。其在写操作的瞬间在多线程中可能会存在不一致状态，但这些不一致状态要么是历史的一致状态要么是将来的一致状态，不可能出现错误状态（譬如死循环、数据丢失等）。ConcurrentHashMap并不能返回类似于多个key值在某个绝对的同一时刻的value值，但它能保证写操作准确无误，能保证读操作不会读取到某个错误的状态下的值，这能满足大多数线程并发的场景。
在ConcurrentHashMap中，其维护着一个segment数组，而每个segment中维护着一个HashEntry数组，每个HashEntry又可以对应一条节点链表。其锁的粒度是segment级别的，因此称为分段锁。同一个segment内的节点不能同时执行写操作，但不同segment内的节点可以同时执行写操作。
ConcurrentHashMap通过Unsafe方式进行底层操作的读写。
ConcurrentHashMap的数据结构的简单示意图如下所示，由segments数组 + table数组 + 链表组成。segments数组和table数组的长度为2的次幂。对于节点的位置，首先根据key值的hash值的高位确定在segments数组的位置，根据hash值的地位确定在table数组的位置，然后用链表解决hash冲突。
ConcurrentHashMap

二、属性

1. 常量

/**
 * The default initial capacity for this table,
 * used when not otherwise specified in a constructor.
 */
//默认的table容量
static final int DEFAULT_INITIAL_CAPACITY = 16;

/**
 * The default load factor for this table, used when not
 * otherwise specified in a constructor.
 */
 //默认的table的负载因子
static final float DEFAULT_LOAD_FACTOR = 0.75f;

/**
 * The default concurrency level for this table, used when not
 * otherwise specified in a constructor.
 */
 //默认并发水平，即segments数组长度
static final int DEFAULT_CONCURRENCY_LEVEL = 16;

/**
 * The maximum capacity, used if a higher value is implicitly
 * specified by either of the constructors with arguments.  MUST
 * be a power of two <= 1<<30 to ensure that entries are indexable
 * using ints.
 */
 //最大容量
static final int MAXIMUM_CAPACITY = 1 << 30;

/**
 * The minimum capacity for per-segment tables.  Must be a power
 * of two, at least two to avoid immediate resizing on next use
 * after lazy construction.
 */
 //每个segment中table的最小长度
static final int MIN_SEGMENT_TABLE_CAPACITY = 2;

/**
 * The maximum number of segments to allow; used to bound
 * constructor arguments. Must be power of two less than 1 << 24.
 */
 //segments的最大数量
static final int MAX_SEGMENTS = 1 << 16; // slightly conservative

/**
 * Number of unsynchronized retries in size and containsValue
 * methods before resorting to locking. This is used to avoid
 * unbounded retries if tables undergo continuous modification
 * which would make it impossible to obtain an accurate result.
 */
 //size 和 containsValue加锁前的最大重试次数。这两个方法首先会尝试不加锁获取结果，失败重试，重试多次还不成功再加锁。
static final int RETRIES_BEFORE_LOCK = 2;

2. 成员变量

/**
 * Mask value for indexing into segments. The upper bits of a
 * key's hash code are used to choose the segment.
 */
 //segment掩码，segment数组长度-1
final int segmentMask;

/**
 * Shift value for indexing within segments.
 */
 // segment的偏移量，节点的key值的hash值的高（32 - segmentShift ）位和segmentMask进行与运算确定节点在数组的位置
final int segmentShift;

/**
 * The segments, each of which is a specialized hash table.
 */
final Segment<K,V>[] segments;

transient Set<K> keySet;
transient Set<Map.Entry<K,V>> entrySet;
transient Collection<V> values;

三、内部类

1. HashEntry

final int hash;
final K key;
volatile V value;
volatile HashEntry<K,V> next;

用于表示ConcurrentHashMap的节点，和其它map的节点类并没有什么区别。

2. Segment

/**
 * The maximum number of times to tryLock in a prescan before
 * possibly blocking on acquire in preparation for a locked
 * segment operation. On multiprocessors, using a bounded
 * number of retries maintains cache acquired while locating
 * nodes.
 */
 //预处理里的tryLock的的最大重试次数
static final int MAX_SCAN_RETRIES =
    Runtime.getRuntime().availableProcessors() > 1 ? 64 : 1;

/**
 * The per-segment table. Elements are accessed via
 * entryAt/setEntryAt providing volatile semantics.
 */
 //table数组
transient volatile HashEntry<K,V>[] table;

/**
 * The number of elements. Accessed only either within locks
 * or among other volatile reads that maintain visibility.
 */
 //数量
transient int count;

/**
 * The total number of mutative operations in this segment.
 * Even though this may overflows 32 bits, it provides
 * sufficient accuracy for stability checks in CHM isEmpty()
 * and size() methods.  Accessed only either within locks or
 * among other volatile reads that maintain visibility.
 */
 //修改次数
transient int modCount;

/**
 * The table is rehashed when its size exceeds this threshold.
 * (The value of this field is always <tt>(int)(capacity *
 * loadFactor)</tt>.)
 */
 //扩容阈值
transient int threshold;

/**
 * The load factor for the hash table.  Even though this value
 * is same for all segments, it is replicated to avoid needing
 * links to outer object.
 * @serial
 */
 //负载因子
final float loadFactor;

需要注意的是，Segment本身继承了ReentrantLock，每个segment都具有加锁的功能。

四、构造方法

ConcurrentHashMap共有5个构造方法，我们以ConcurrentHashMap#ConcurrentHashMap(int, float, int)为例进行解析，其它4个构造方法都以默认值作为参数调用了该构造方法。

public ConcurrentHashMap(int initialCapacity,
                         float loadFactor, int concurrencyLevel) {
    if (!(loadFactor > 0) || initialCapacity < 0 || concurrencyLevel <= 0)
        throw new IllegalArgumentException();
    if (concurrencyLevel > MAX_SEGMENTS)
        concurrencyLevel = MAX_SEGMENTS;
    // Find power-of-two sizes best matching arguments
    int sshift = 0;
    int ssize = 1;
    //确定segment数量和偏移量，大于concurrencyLevel的最小的2的次幂
    while (ssize < concurrencyLevel) {
        ++sshift;
        ssize <<= 1;
    }
    this.segmentShift = 32 - sshift;
    this.segmentMask = ssize - 1;
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    int c = initialCapacity / ssize;
    if (c * ssize < initialCapacity)
        ++c;
    int cap = MIN_SEGMENT_TABLE_CAPACITY;
    //确定segment中的table数组长度，大于initialCapacity / ssize的最小的2的次幂
    while (cap < c)
        cap <<= 1;
    // create segments and segments[0]
    //初始化segments and segments[0]
    Segment<K,V> s0 =
        new Segment<K,V>(loadFactor, (int)(cap * loadFactor),
                         (HashEntry<K,V>[])new HashEntry[cap]);
    Segment<K,V>[] ss = (Segment<K,V>[])new Segment[ssize];
    UNSAFE.putOrderedObject(ss, SBASE, s0); // ordered write of segments[0]
    this.segments = ss;
}

五、ConcurrentHashMap#get(Object)

public V get(Object key) {
   Segment<K,V> s; // manually integrate access methods to reduce overhead
   HashEntry<K,V>[] tab;
   int h = hash(key);
   long u = (((h >>> segmentShift) & segmentMask) << SSHIFT) + SBASE;
   if ((s = (Segment<K,V>)UNSAFE.getObjectVolatile(segments, u)) != null &&
       (tab = s.table) != null) {
       for (HashEntry<K,V> e = (HashEntry<K,V>) UNSAFE.getObjectVolatile
                (tab, ((long)(((tab.length - 1) & h)) << TSHIFT) + TBASE);
            e != null; e = e.next) {
           K k;
           if ((k = e.key) == key || (e.hash == h && key.equals(k)))
               return e.value;
       }
   }
   return null;
}

get方法逻辑简单，先定位到segments数组的下标，在定位到table数组的下边，而后遍历链表。

六、ConcurrentHashMap#put(K, V)

1. ConcurrentHashMap#put(K, V)

public V put(K key, V value) {
   Segment<K,V> s;
   if (value == null)
       throw new NullPointerException();
   int hash = hash(key);
   int j = (hash >>> segmentShift) & segmentMask;
   if ((s = (Segment<K,V>)UNSAFE.getObject          // nonvolatile; recheck
        (segments, (j << SSHIFT) + SBASE)) == null) //  in ensureSegment
       s = ensureSegment(j);
   return s.put(key, hash, value, false);
}

先定位到segment，如果segment不存在则新建，然后调用Segment的put方法。

2. ConcurrentHashMap#ensureSegment(int)

private Segment<K,V> ensureSegment(int k) {
   final Segment<K,V>[] ss = this.segments;
   long u = (k << SSHIFT) + SBASE; // raw offset
   Segment<K,V> seg;
   if ((seg = (Segment<K,V>)UNSAFE.getObjectVolatile(ss, u)) == null) {
       Segment<K,V> proto = ss[0]; // use segment 0 as prototype
       int cap = proto.table.length;
       float lf = proto.loadFactor;
       int threshold = (int)(cap * lf);
       HashEntry<K,V>[] tab = (HashEntry<K,V>[])new HashEntry[cap];
       if ((seg = (Segment<K,V>)UNSAFE.getObjectVolatile(ss, u))
           == null) { // recheck
           Segment<K,V> s = new Segment<K,V>(lf, threshold, tab);
           while ((seg = (Segment<K,V>)UNSAFE.getObjectVolatile(ss, u))
                  == null) {
               if (UNSAFE.compareAndSwapObject(ss, u, null, seg = s))
                   break;
           }
       }
   }
   return seg;
}

创建并返回给定下标的segment，新建的segment的table数组、赋值因子采用segments[0]的标准，这儿就体现出来了构造方法中在新建segment数组的同事新建segment[0]的作用，不需要考虑新建第一个segment的情况。
通过CAS将新建的segment加入到数组中，如果加入成功或者已经存在则返回。

3. Segment#put(K, int, V, boolean)

final V put(K key, int hash, V value, boolean onlyIfAbsent) {
    HashEntry<K,V> node = tryLock() ? null :
        scanAndLockForPut(key, hash, value);
    V oldValue;
    try {
        HashEntry<K,V>[] tab = table;
        int index = (tab.length - 1) & hash;
        HashEntry<K,V> first = entryAt(tab, index);
        for (HashEntry<K,V> e = first;;) {
            if (e != null) {
                K k;
                //存在key值对应的节点，根据onlyIfAbsent判断是否覆盖
                if ((k = e.key) == key ||
                    (e.hash == hash && key.equals(k))) {
                    oldValue = e.value;
                    if (!onlyIfAbsent) {
                        e.value = value;
                        ++modCount;
                    }
                    break;
                }
                e = e.next;
            }
            //不存在key值对应的节点
            else {
            	//node已经建立好，插入
                if (node != null)
                    node.setNext(first);
                 //node没有建立好，新建并插入
                else
                    node = new HashEntry<K,V>(hash, key, value, first);
                int c = count + 1;
                if (c > threshold && tab.length < MAXIMUM_CAPACITY)
                    rehash(node);
                else
                    setEntryAt(tab, index, node);
                ++modCount;
                count = c;
                oldValue = null;
                break;
            }
        }
    } finally {
        unlock();
    }
    return oldValue;
}

可以看到，在segment里操作是加锁进行的。加锁后的操作逻辑和HashMap也没有什么差别，定位到数组下标，插入节点，检查是否要扩容。

4. Segment#scanAndLockForPut(K, int, V)

private HashEntry<K,V> scanAndLockForPut(K key, int hash, V value) {
    HashEntry<K,V> first = entryForHash(this, hash);
    HashEntry<K,V> e = first;
    HashEntry<K,V> node = null;
    int retries = -1; // negative while locating node
    while (!tryLock()) {
        HashEntry<K,V> f; // to recheck first below
        if (retries < 0) {
            if (e == null) {
                if (node == null) // speculatively create node
                    node = new HashEntry<K,V>(hash, key, value, null);
                retries = 0;
            }
            else if (key.equals(e.key))
                retries = 0;
            else
                e = e.next;
        }
        else if (++retries > MAX_SCAN_RETRIES) {
            lock();
            break;
        }
        else if ((retries & 1) == 0 &&
                 (f = entryForHash(this, hash)) != first) {
            e = first = f; // re-traverse if entry changed
            retries = -1;
        }
    }
    return node;
}

该方法的作用时在等待锁的同时进行预处理，主要的预处理工作是预先创建节点，并检测链表是否变化。
（1）遍历链表，找到节点。没有找到时进行创建。此时将尝试次数置为0。
（2）尝试次数大于MAX_SCAN_RETRIES，阻塞方式获取锁lock，获取到锁后跳出循环
（3）尝试次数为偶数次时，监测链表是否变化。如果发生变化，重新进入（1）查找节点。
事实上，虽然说该方法在没有找到key值对应的节点时会返回新建的Node，但返回的node也有可能是脏数据，因为该方法并不是实时监控链表的，而且在监测到链表发生变化时也没有将node置为null。
所以在put方法中不能简单的根据node是否为null作为key值对应的节点是否存在的标准，还是需要遍历链表去进行判断。

5. ConcurrentHashMap#rehash(HashEntry)

private void rehash(HashEntry<K,V> node) {
    /*
     * Reclassify nodes in each list to new table.  Because we
     * are using power-of-two expansion, the elements from
     * each bin must either stay at same index, or move with a
     * power of two offset. We eliminate unnecessary node
     * creation by catching cases where old nodes can be
     * reused because their next fields won't change.
     * Statistically, at the default threshold, only about
     * one-sixth of them need cloning when a table
     * doubles. The nodes they replace will be garbage
     * collectable as soon as they are no longer referenced by
     * any reader thread that may be in the midst of
     * concurrently traversing table. Entry accesses use plain
     * array indexing because they are followed by volatile
     * table write.
     */
    HashEntry<K,V>[] oldTable = table;
    int oldCapacity = oldTable.length;
    int newCapacity = oldCapacity << 1;
    threshold = (int)(newCapacity * loadFactor);
    HashEntry<K,V>[] newTable =
        (HashEntry<K,V>[]) new HashEntry[newCapacity];
    int sizeMask = newCapacity - 1;
    for (int i = 0; i < oldCapacity ; i++) {
        HashEntry<K,V> e = oldTable[i];
        if (e != null) {
            HashEntry<K,V> next = e.next;
            int idx = e.hash & sizeMask;
            if (next == null)   //  Single node on list
                newTable[idx] = e;
            else { // Reuse consecutive sequence at same slot
                HashEntry<K,V> lastRun = e;
                int lastIdx = idx;
                for (HashEntry<K,V> last = next;
                     last != null;
                     last = last.next) {
                    int k = last.hash & sizeMask;
                    if (k != lastIdx) {
                        lastIdx = k;
                        lastRun = last;
                    }
                }
                newTable[lastIdx] = lastRun;
                // Clone remaining nodes
                for (HashEntry<K,V> p = e; p != lastRun; p = p.next) {
                    V v = p.value;
                    int h = p.hash;
                    int k = h & sizeMask;
                    HashEntry<K,V> n = newTable[k];
                    newTable[k] = new HashEntry<K,V>(h, p.key, v, n);
                }
            }
        }
    }
    int nodeIndex = node.hash & sizeMask; // add the new node
    node.setNext(newTable[nodeIndex]);
    newTable[nodeIndex] = node;
    table = newTable;
}

该方法的功能是扩容后插入节点。
扩容后新容量变为原来的容量的2倍。
在数组中的节点复制过程中，采用头插法，节点顺序相反，但是对于链表尾部slot一致的一部分节点，一次性进行赋值，而不是单个赋值，节点顺序不变。

七、ConcurrentHashMap#remove

有两个remove方法，以ConcurrentHashMap#remove(Object)

1. ConcurrentHashMap#remove(Object)

public V remove(Object key) {
   int hash = hash(key);
    Segment<K,V> s = segmentForHash(hash);
    return s == null ? null : s.remove(key, hash, null);
}

2. Segment#remove(Object, int, Object)

final V remove(Object key, int hash, Object value) {
    if (!tryLock())
        scanAndLock(key, hash);
    V oldValue = null;
    try {
        HashEntry<K,V>[] tab = table;
        int index = (tab.length - 1) & hash;
        HashEntry<K,V> e = entryAt(tab, index);
        HashEntry<K,V> pred = null;
        while (e != null) {
            K k;
            HashEntry<K,V> next = e.next;
            if ((k = e.key) == key ||
                (e.hash == hash && key.equals(k))) {
                V v = e.value;
                if (value == null || value == v || value.equals(v)) {
                    if (pred == null)
                        setEntryAt(tab, index, next);
                    else
                        pred.setNext(next);
                    ++modCount;
                    --count;
                    oldValue = v;
                }
                break;
            }
            pred = e;
            e = next;
        }
    } finally {
        unlock();
    }
    return oldValue;
}

3. Segment#scanAndLock(Object, int)

private void scanAndLock(Object key, int hash) {
    // similar to but simpler than scanAndLockForPut
    HashEntry<K,V> first = entryForHash(this, hash);
    HashEntry<K,V> e = first;
    int retries = -1;
    while (!tryLock()) {
        HashEntry<K,V> f;
        if (retries < 0) {
            if (e == null || key.equals(e.key))
                retries = 0;
            else
                e = e.next;
        }
        else if (++retries > MAX_SCAN_RETRIES) {
            lock();
            break;
        }
        else if ((retries & 1) == 0 &&
                 (f = entryForHash(this, hash)) != first) {
            e = first = f;
            retries = -1;
        }
    }
}

和插入节点时的scanAndLockForPut方法类似，只是不用新建节点返回。

八、ConcurrentHashMap#size()

public int size() {
  	// Try a few times to get accurate count. On failure due to
   // continuous async changes in table, resort to locking.
   final Segment<K,V>[] segments = this.segments;
   int size;
   boolean overflow; // true if size overflows 32 bits
   long sum;         // sum of modCounts
   long last = 0L;   // previous sum
   int retries = -1; // first iteration isn't retry
   try {
       for (;;) {
           if (retries++ == RETRIES_BEFORE_LOCK) {
               for (int j = 0; j < segments.length; ++j)
                   ensureSegment(j).lock(); // force creation
           }
           sum = 0L;
           size = 0;
           overflow = false;
           for (int j = 0; j < segments.length; ++j) {
               Segment<K,V> seg = segmentAt(segments, j);
               if (seg != null) {
                   sum += seg.modCount;
                   int c = seg.count;
                   if (c < 0 || (size += c) < 0)
                       overflow = true;
               }
           }
           if (sum == last)
               break;
           last = sum;
       }
   } finally {
       if (retries > RETRIES_BEFORE_LOCK) {
           for (int j = 0; j < segments.length; ++j)
               segmentAt(segments, j).unlock();
       }
   }
   return overflow ? Integer.MAX_VALUE : size;
}

首先尝试不加锁的方式，甲酸count和modCount，如果连续两次的modCount，则将count作为返回结果。
如果连续尝试RETRIES_BEFORE_LOCK的查询比较还是不一致，则通过对每一个segment加锁后计算count。
containsValue方法的处理方案类似。