关于hashmap的散列程度分析

背景

先从查找算法说起,我们知道常用的查找算法有顺序查找(时间复杂度为O(n)),二分查找(时间复杂度为O(log2n)),二叉查找树查找(O(longN)),有基于Hash表的算法时间复杂度只有O(1)
但是基于hash表的算法要额外维护一个hash表,这也是用空间换时间的例子吧.

什么是HashMap

hashMap可以拆分为hash和map,hash是一个函数,或者说算法,用来计算hashCode;
可以定义为:hashCode = Hash(key)
map用来存放hashCode和key,以及value,根据key取出某个值,只需要调用函数,计算出hashCode即可拿到value,所以时间复杂度为O(1);

构造散列表常用的方法

1.数字分析法
2.平方取中法
3.折叠法
4.除留余数法

处理冲突的方法:

1.开放地址法
1.线性探测法
2.二次探测法
3.伪随机探测
2.链地址法(Java集合框架中hashmap处理冲突采用的方法)

Java集合框架中Hashmap的分析

在此不拿源码作为分析,从源码中截取比较有意思的几个片段作为分析
hashMap中为什么要求table的长度为2的n次幂?
在源码中有这样说:

    /**
     * Returns a power of two size for the given target capacity.
     */
    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

这是保证初始化的hashMap的长度为2的次幂;
而在hashmap中通过定位某一个元素,是根据它的key和key的hashcode找出value的

final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {//注意这句
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            if ((e = first.next) != null) {
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

hashMap中找table中的下标值就是用 (n - 1) & hash来确定,如果n不是2的幂-1,那么在插入的元素中,不能将其均匀的散列

测试代码:

package com.tangbaobao.baidu;

import org.junit.jupiter.api.Test;

import java.util.HashMap;
import java.util.Map;

/**
 * 哈希冲突计算
 *
 * @author 唐学俊
 * @create 2018/04/04
 **/

public class TestHashCode {
    private final int DEFAULT_CAPACITY = 64;

    /**
     * 用于统计重复
     */

    Map<Integer, Integer> map = new HashMap<>();

    @Test
    public void fun1() {
           String[] strings = {"hello", "world", "word", "nihao", "test", "main", "static","fdf","fdsfwrewr","reqq","hhh",
           "fsoopt","4424hfhg","53grtyr","5re53trt"
           };

        for (int i = 0; i < strings.length; i++) {
            this.getIndex(strings[i]);
        }
        map.forEach((x, y) -> System.out.println("hash_index:" + Integer.toBinaryString(x) + "   " + "count:" + y));
        System.out.println(map.size());
    }

    /**
     * 输出
     * @param string
     */
    public void getIndex(String string) {
        //1.获得hashCode
        int hashCode = string.hashCode();
        System.out.println("hashCode:" +Integer.toBinaryString(hashCode));

        //2.获得length
        int length_code = DEFAULT_CAPACITY - 1;
        System.out.println("length  :" + Integer.toBinaryString(length_code));
        //扰动函数
        int hash = hashCode^(hashCode>>>16);
        System.out.println("扰动函数"+Integer.toBinaryString(hash));
        int index =hash& length_code;
        //3.获得index
        System.out.println("index   :" + Integer.toBinaryString(index));
        int count = 1;
        if (map.containsKey(index)) {
            Integer integer = map.get(index);
            count = ++integer;
        }
        map.put(index, count);
        System.out.println("--------------------------------");
    }
}

测试用例:
将随机字符串插入散列表,统计其散列程度:
hashMap的大小:64,32,16
加载因子用的是0.75

hashMap大小:16,调用用扰动函数

hashCode:101111010010001100011010010
length :1111
扰动函数101111010010001110100111011

index :1011

hashCode:110110000010001101110010010
length :1111
扰动函数110110000010001110101010011

index :11

hashCode:1101111100011100001010
length :1111
扰动函数1101111100011100111101

index :1101

hashCode:110001111110110011011111011
length :1111
扰动函数110001111110110000011000100

index :100

hashCode:1101100100010010010010
length :1111
扰动函数1101100100010010100100

index :100

hashCode:1100110000010110111001
length :1111
扰动函数1100110000010110001010

index :1010

hashCode:11001010110011011100111001101110
length :1111
扰动函数11001010110011010000010010100011

index :11

hashCode:11000101101101000
length :1111
扰动函数11000101101101001

index :1001

hashCode:11111011101101111100001010100
length :1111
扰动函数11111011101101110011100100010

index :10

hashCode:1101010101101110010011
length :1111
扰动函数1101010101101110100110

index :110

hashCode:11001001101101000
length :1111
扰动函数11001001101101001

index :1001

hashCode:10110100100101110000001000110001
length :1111
扰动函数10110100100101111011011010100110

index :110

hashCode:110000111000010000111000111111
length :1111
扰动函数110000111000010011111011011110

index :1110

hashCode:1010000100100100111010111100100
length :1111
扰动函数1010000100100100010010101110110

index :110

hashCode:10111111100000011111001000110000
length :1111
扰动函数10111111100000010100110110110001

index :1

hash_index:1 count:1
hash_index:10 count:1
hash_index:11 count:2
hash_index:100 count:2
hash_index:110 count:3
hash_index:1001 count:2
hash_index:1010 count:1
hash_index:1011 count:1
hash_index:1101 count:1
hash_index:1110 count:1
10

hashMap大小:16,不调用用扰动函数

hashCode:101111010010001100011010010
length :1111
扰动函数101111010010001110100111011

index :10

hashCode:110110000010001101110010010
length :1111
扰动函数110110000010001110101010011

index :10

hashCode:1101111100011100001010
length :1111
扰动函数1101111100011100111101

index :1010

hashCode:110001111110110011011111011
length :1111
扰动函数110001111110110000011000100

index :1011

hashCode:1101100100010010010010
length :1111
扰动函数1101100100010010100100

index :10

hashCode:1100110000010110111001
length :1111
扰动函数1100110000010110001010

index :1001

hashCode:11001010110011011100111001101110
length :1111
扰动函数11001010110011010000010010100011

index :1110

hashCode:11000101101101000
length :1111
扰动函数11000101101101001

index :1000

hashCode:11111011101101111100001010100
length :1111
扰动函数11111011101101110011100100010

index :100

hashCode:1101010101101110010011
length :1111
扰动函数1101010101101110100110

index :11

hashCode:11001001101101000
length :1111
扰动函数11001001101101001

index :1000

hashCode:10110100100101110000001000110001
length :1111
扰动函数10110100100101111011011010100110

index :1

hashCode:110000111000010000111000111111
length :1111
扰动函数110000111000010011111011011110

index :1111

hashCode:1010000100100100111010111100100
length :1111
扰动函数1010000100100100010010101110110

index :100

hashCode:10111111100000011111001000110000
length :1111
扰动函数10111111100000010100110110110001

index :0

hash_index:0 count:1
hash_index:1 count:1
hash_index:10 count:3
hash_index:11 count:1
hash_index:100 count:2
hash_index:1000 count:2
hash_index:1001 count:1
hash_index:1010 count:1
hash_index:1011 count:1
hash_index:1110 count:1
hash_index:1111 count:1
11

hashMap大小:64,调用用扰动函数

hashCode:101111010010001100011010010
length :111111
扰动函数101111010010001110100111011

index :111011

hashCode:110110000010001101110010010
length :111111
扰动函数110110000010001110101010011

index :10011

hashCode:1101111100011100001010
length :111111
扰动函数1101111100011100111101

index :111101

hashCode:110001111110110011011111011
length :111111
扰动函数110001111110110000011000100

index :100

hashCode:1101100100010010010010
length :111111
扰动函数1101100100010010100100

index :100100

hashCode:1100110000010110111001
length :111111
扰动函数1100110000010110001010

index :1010

hashCode:11001010110011011100111001101110
length :111111
扰动函数11001010110011010000010010100011

index :100011

hashCode:11000101101101000
length :111111
扰动函数11000101101101001

index :101001

hashCode:11111011101101111100001010100
length :111111
扰动函数11111011101101110011100100010

index :100010

hashCode:1101010101101110010011
length :111111
扰动函数1101010101101110100110

index :100110

hashCode:11001001101101000
length :111111
扰动函数11001001101101001

index :101001

hashCode:10110100100101110000001000110001
length :111111
扰动函数10110100100101111011011010100110

index :100110

hashCode:110000111000010000111000111111
length :111111
扰动函数110000111000010011111011011110

index :11110

hashCode:1010000100100100111010111100100
length :111111
扰动函数1010000100100100010010101110110

index :110110

hashCode:10111111100000011111001000110000
length :111111
扰动函数10111111100000010100110110110001

index :110001

hash_index:100010 count:1
hash_index:100011 count:1
hash_index:100 count:1
hash_index:100100 count:1
hash_index:100110 count:2
hash_index:101001 count:2
hash_index:1010 count:1
hash_index:110001 count:1
hash_index:10011 count:1
hash_index:110110 count:1
hash_index:111011 count:1
hash_index:111101 count:1
hash_index:11110 count:1
13

hashMap大小:64,不调用用扰动函数

hashCode:101111010010001100011010010
length :111111
扰动函数101111010010001110100111011

index :10010

hashCode:110110000010001101110010010
length :111111
扰动函数110110000010001110101010011

index :10010

hashCode:1101111100011100001010
length :111111
扰动函数1101111100011100111101

index :1010

hashCode:110001111110110011011111011
length :111111
扰动函数110001111110110000011000100

index :111011

hashCode:1101100100010010010010
length :111111
扰动函数1101100100010010100100

index :10010

hashCode:1100110000010110111001
length :111111
扰动函数1100110000010110001010

index :111001

hashCode:11001010110011011100111001101110
length :111111
扰动函数11001010110011010000010010100011

index :101110

hashCode:11000101101101000
length :111111
扰动函数11000101101101001

index :101000

hashCode:11111011101101111100001010100
length :111111
扰动函数11111011101101110011100100010

index :10100

hashCode:1101010101101110010011
length :111111
扰动函数1101010101101110100110

index :10011

hashCode:11001001101101000
length :111111
扰动函数11001001101101001

index :101000

hashCode:10110100100101110000001000110001
length :111111
扰动函数10110100100101111011011010100110

index :110001

hashCode:110000111000010000111000111111
length :111111
扰动函数110000111000010011111011011110

index :111111

hashCode:1010000100100100111010111100100
length :111111
扰动函数1010000100100100010010101110110

index :100100

hashCode:10111111100000011111001000110000
length :111111
扰动函数10111111100000010100110110110001

index :110000

hash_index:110000 count:1
hash_index:110001 count:1
hash_index:10010 count:3
hash_index:10011 count:1
hash_index:10100 count:1
hash_index:100100 count:1
hash_index:101000 count:2
hash_index:111001 count:1
hash_index:1010 count:1
hash_index:111011 count:1
hash_index:101110 count:1
hash_index:111111 count:1
12

结论

加上扰动函数之后,散列更加均匀

猜你喜欢

转载自blog.csdn.net/tangyaya8/article/details/79817229
今日推荐