Talking about the basic principles of HashMap

    I believe everyone is familiar with HashMap, it should be regarded as a collection type that is often used. Recently I have read related articles. Actually, there are a lot of related materials on the Internet. I just record what I understand and share it with you here. And learn

     First of all, let's talk about HashMap (based on jdk1.8). We must know what HashMap is (understand its data structure) and what is its use (features and advantages)?

   First, you need to understand that the bottom layer of hashMapd before jdk1.7 and before is (array + linked list) jdk1.8 and after is (array + linked list + red-black tree (to improve query efficiency))      

In Java, the most basic data structure is actually 2 (one is an array and the other is a linked list). I personally feel that most data structures are constructed by two structures. In fact, anyone who knows HashMap knows that the bottom layer of hashMap is actually realized by: data + linked list, and some are called: linked list hash; as shown below

 

The above Node is the element in the array, which is referenced by a next downward, which constitutes a (one-way) linked list.

   When we put an element, we actually get the index (subscript) of the element in the array according to the hash value of the key, and then put the element in the corresponding position, if there is a value at that position What to do? Why does this position have value? We will talk about this in the next section. If the position already has a value, then the newly added table head (chain head) is added in the form of a linked list, and the last added is the chain tail. When we are getting, we will first calculate the hsahcode of the key, find the element corresponding to the index in the array, and then find the required data value through the equal method of the key. If this place is in the form of a linked list, then When each linked list has only one data, the efficiency is still quite high, but if each linked list has a lot of values, then the efficiency is very embarrassing (this is also some optimizations done after jdk1.8, when the linked list When the data reaches a certain amount, the linked list will be converted to: red-black tree o(logn) efficiency).

 Then let's talk about the hash algorithm first (we can calculate the index (index), we need to get the index through the hash value of the key), how to calculate the hash algorithm at this position, first of all, since the index is passed If it is calculated manually, there will be conflicts. What we need to do is to distribute these data as evenly as possible and spread them as evenly as possible. This will reduce the amount of data in each linked list and speed up the hashmap. s efficiency.

Under normal circumstances, we can understand the storage of each location as a binary, such as: 15 -> 1111. Storage method:  When the location is 1, it means that the location has a value. Some people may ask why it is necessary to subtract 1. This is the point to ask. For example, the subscript range of a length of 16 is (0~15). I believe everyone should understand why it needs to be subtracted by 1. If it is not subtracted, it will be 0001 0000. Are there some subscript positions that will never have a value (such as: 0001,0010,0011,0101,1001,1011,1101), in this case, the waste of space will be large, and it will also cause the data in the linked list to be Many, resulting in lower query efficiency.

^Bitwise XOR operation, as long as the bit is different, the result is 1, otherwise the result is 0;
>>> Unsigned right shift: 0 on the right

The main purpose of XOR is to better retain the characteristics of each part. If the value calculated by the & operation is used, it will move closer to 1, and the value calculated by the | operation will move closer to 0.

Why the slot must be 2^n

Reasons: 1. In order to make the hashed result more uniform. 2. It can be calculated by bit operation e.hash & (newCap-1), a% (2^n) is equivalent to a & (2^n-1), the operation efficiency of bit operation is higher than arithmetic operation, the reason is Arithmetic operations will still be converted into bit operations)

For this reason, we continue to use the above example to illustrate

If the number of slots is not 16, but 17, the slot calculation formula becomes: (17-1) & hash

After calculating the slot (subscript), let’s talk about how to expand

Expansion (reach the critical value of expansion, improve the query efficiency of hashMap, hash collisions increase as the amount of data reaches a certain amount, because if the amount of data reaches a certain amount, each node node will generate a lot of values, and you need to go to variables. Need to expand capacity to ensure efficiency issues)

1. When to expand?

jdk1.7   (a. When storing a new value, the current value must be greater than or equal to the threshold (capacity n [default 16] * load factor [default 0.75]

             b. Of course, it is also possible to store more values ​​(over 16 values, up to 26 values). The capacity has not been expanded. The first 11 nodes have hash collisions, and there is a node, and there are 15 node nodes in total. Each node stores values, up to: 11+15 values, when the 27th value comes in, expansion will occur again)

jdk1.8  (a. If the current value is greater than the threshold value, expand the capacity. Note: If the current value is not a new addition, but replaces the existing value, then no expansion will occur)

 

2. How to expand?

 Steps: 1. Create a new Entry empty array whose length is twice the original array.

           2. ReHash: Traverse the original Entry array and re-Hash all the entries to the new array. Why re-Hash? Because the length is expanded, the Hash rules will also change 

When rehashing in JDK1.7, when the old linked list is migrated to the new linked list, if the array index position of the new table is the same, the linked list elements will be inverted, but as can be seen from the above figure, JDK1.8 will not be inverted, in the old array Elements on the same Entry chain in the same Entry chain may be placed in different positions in the new array after recalculating the index position

Recommend the blog of a great god: https://blog.csdn.net/pange1991/article/details/82347284 I   personally think that the example is written very well

 

Guess you like

Origin blog.csdn.net/u010200793/article/details/105161493