HashMap principle (a) concepts and underlying architecture

HashMap<String, Integer> mapData = new HashMap<>();

We enter the source code from here, gradually expose HashMap

/**
 * Constructs an empty <tt>HashMap</tt> with the default initial capacity
 * (16) and the default load factor (0.75).
 */
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

We found two variables loadFactor and DEFAULT_LOAD_FACTOR, from the naming point of view: because not received loadFactor parameters to assign a default value to the loadFactor. These two variables in the end what it meant, there are no other variables?

In fact, many variables and static member variables defined in the HashMap, we look

//静态变量
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

static final int MAXIMUM_CAPACITY = 1 << 30;

static final float DEFAULT_LOAD_FACTOR = 0.75f;

static final int TREEIFY_THRESHOLD = 8;

static final int UNTREEIFY_THRESHOLD = 6;

static final int MIN_TREEIFY_CAPACITY = 64;
//成员变量
transient Node<K,V>[] table;

transient Set<Map.Entry<K,V>> entrySet;

transient int size;

transient int modCount;

int threshold;

final float loadFactor;

A total of six static variables are set to an initial value, and are final modified, called the constants more appropriate, in fact, their role can guess, is the default setting values ​​for member variables and methods related condition judgment, etc. .

A total of six member variables, member variables in addition to these, there is an important concept in capacity, we mainly talk about the table, entrySet, capacity, size, threshold, loadFactor, we we briefly explain what they do.

1. table variable

variables table HashMap underlying data structure for storing a HashMap added to the Key-value Yes, a Node array, Node is a static inner class, a composite structure of a combination of arrays and lists, we look at the Node class :

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey()        { return key; }
    public final V getValue()      { return value; }
    public final String toString() { return key + "=" + value; }

    public final int hashCode() {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false;
    }
}

In terms of the opportunity to do metaphor, then you buy a ticket ID number is ( Key ), generated by the hash algorithm ( hash ) value is equivalent to the value of the number of airline seats to get the machine; your own nature is ( value ), you the seat next to the next person is the Node ( next ); such a seat whole (including personnel and ID number, seat number of the seat) is a table , many of which build a table of the Node [] table , constitutes the task of this flight.

So why use a linked list data structure array and combination?

And we know that the array has its own list of advantages and disadvantages, the continuous storage arrays, addressing easy, relatively difficult insertion deletion; list stored discretely and, addressing relatively difficult, and delete operations easily inserted; the combination of two HashMap species data structure retains the respective advantages, and make up their own shortcomings, of course, the length of the list is too long, in JDK8 converted to red-black trees, red-black trees TreeMap explain the later section.

FIG HashMap configuration is as follows:

How to explain this structure?

Or to take advantage of an example to illustrate, if ticketing system more humane and canceled the check-in operations, tickets are distinguished by age, to facilitate communication journey, then 20 years of age were divided into a total of six people in a group 20A ~ 20F, 20 ~ 30 years were grouped in six individuals 21A ~ 21F, 30 ~ 40 years into a 6 component in personal 22A ~ 22F, 40 ~ 50 years a group of 6 individuals 23A ~ 23F.

Then if we're looking 20-something little sister, we know that it is easy to find a row of 21, looking down from the start 21A, you should be able to find quickly.

After the data from the point of view, grouped by age (hash value obtained by hash algorithm, different hash values ​​ages, the same hash value of the same age), the age of each person to sit on a seat into array table, the next person comes, will be the first person entered Norway, in their own array, and the next point to the first person.

2. entrySet variable

entrySet variable EntrySet entity, defined as a variable may be repeated several times to ensure not created, is a collection of Map.Entry, Map.Entry <K, V> is an interface, the Node class implements the interface, so the process requires EntrySet data manipulation is HashMap of Node entities.

public Set<Map.Entry<K,V>> entrySet() {
    Set<Map.Entry<K,V>> es;
    return (es = entrySet) == null ? (entrySet = new EntrySet()) : es;
}

3. capacity

capacity is not a member variable, but in many places HashMap will be used to this concept, meaning that capacity, well understood, the two constants mentioned in the previous article are related

/ ** 
 * Initial Capacity of The default - Power of the MUST BE A TWO (must be a power of 2). 
 * 16 default capacity, for example: the number of people on the plane corresponding to the normal seat, 
 * / 
static int Final DEFAULT_INITIAL_CAPACITY. 1 = . 4 <<; 16 // AKA 
/ ** 
 * Capacity of The maximum, Used IF A iS IN AREAS oF COMMUNICAITIONS implicitly specified value 
 . * The Constructors with by either of arguments 
 * Power of the mUST bE A TWO <= 30. 1 << (2 must be the power of, and can not be greater than the maximum capacity of 1,073,741,824). 
 * For example: in emergency situations, such as the number of people to evacuate quickly as possible when disaster relief personnel, at this time (to allow standing in the case to ensure the safety of), can transport 
 * / 
static Final int MAXIMUM_CAPACITY = 1 << 30;

Meanwhile HashMap also has expansion mechanism, the rules of the capacity of a power of 2, that capacity may be 16, 32 ..., how to achieve this capacity rules?

/**
 * Returns a power of two size for the given target capacity.
 */
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

Passed in to find the capacity of the process by the nearest power of 2, i.e.,

cap = 2, return 2;

cap = 3, return 4;

cap = 9, return 16;

...

You can pass values ​​into their own count, the first cap-1 operation, because that is when the transfer cap itself under a power of 2, 4 if not the result of subtracting a last get will be twice the cap is passed .

Line by line, we calculate: tableSizeFor (11), according to the results of the final rule should get 16

// Step: = n-10, was converted to binary 00001010 
int = n-CAP -. 1; 
// Step: 1 n-right, high bit 0 (decimal 10: 5, Binary: 00000101), and and n do or operation (1 to 1, with 0 to 0), and then assigned to n (decimal 10: 15, binary: 00001111) 
n | n >>> = 1; 
// third step: n the right 2, the high bit of 0 (decimal 10: 3, binary: 00000011), and make the n-oR operation (1 to 1, with 0 to 0), and then assigned to n (decimal 10: 15, binary: 00001111 ) 
n | n = 2 >>>; 
// step four: 4 n right, high bit 0 (decimal 10: 0, binary: 00000000), and make the n-oR operation (1 to 1, with 0 is 0), then n-assigned to the (decimal 10: 15, binary: 00001111) 
n-| = n->>>. 4; 
// step Five: n-8 right, high bit 0 (decimal 10: 0 binary: 00000000), and make the n-oR operation (1 to 1, with 0 to 0), and then assigned to n (decimal 10: 15, binary: 00001111) 
n | = n >>>. 8; 
// step Six: n 16-bit right, high bit 0 (decimal 10: 0, binary: 00000000), and make the n-oR operation (1 to 1, with 0 to 0 , And then assigned to n (10 hex: 15, Binary: 00001111) 
n-| n->>> = 16; 
// Step Seven: return. 1 + 15 = 16; 
return (n-<? 0). 1: (n-> ? = MAXIMUM_CAPACITY) MAXIMUM_CAPACITY: n + 1;

The end result, as expected, very fast hardware algorithm ah, ヽ (ー _ have) Techno, able to read, but the design does not come out.

4. size variable

size variable records the number of key-value pairs in the Map when calling putValue () method and removeNode () method, will cause it to change, and capacity to distinguish between what you can.

5. threshold variables and variables loadFactor

the critical threshold value, by definition, worked as a threshold you need to do something, and in the critical value HashMap "threshold = capacity * loadFactor", when it exceeds a critical value, HashMap in relation to the expansion.

loadFactor to the load factor is used to measure the full extent of HashMap, the default value DEFAULT_LOAD_FACTOR, namely 0.75f , can pass parameters adjusted by the constructor (0.75f very reasonable, and basically no one would adjust it), well understood ,for example:

100 examination questions, parents need only 75 minutes to test you, give you buy your favorite computer, load factor is 0.75,75 minutes is the critical value; if a few years later, scores questions into 200 points, this time you will need to test 150 points to get your favorite computer.

to sum up

This article explains some of the main concepts of the HashMap, while its underlying data structure is analyzed from the perspective of the source, table data and a composite structure is a linked list, size of the recorded number of key-value pairs, as HashMap capacity Capacity , the capacity of the rule is a power of 2, is loaded loadFactor Thus, the full measure of the degree of the threshold critical value, when the threshold is exceeded will the expansion.

Guess you like

Origin www.linuxidc.com/Linux/2019-08/160021.htm