深入理解哈希

为什么使用哈希函数？

哈希函数能够将一个复杂的对象转换为相对简单的索引，能方便我们解决问题，这体现在，比如哈希表避免了线性搜索的问题，比如一堆数字如果是线性的排列，寻找起来还需要一一遍历，但是如果我们使用哈希分堆，我们能够很快的确定范围，从而方便我们数据的查找。

哈希函数最为关注的是两个问题：

1，哈希函数的设计（即将我们关系的问题，也就是键，转换为索引）

2，如何解决哈希冲突的问题，使键得到的索引能够均匀分布

哈希函数的设计

设计的目标：将我们的键转化为索引，并且得到的索引分布要均匀。

大整数的哈希函数的设计

同时，对1000000 取模并没有利用上所有的信息，这样也会增加哈希冲突的概率。解决的办法就是模一个素数。

具体的模一个什么样子的素数，可以看下面这个网站:

http://planetmath.org/goodhashtableprimes

浮点数的哈希函数的设计

字符串的哈希函数的设计

对于以上的hash函数的计算，我们可以简化：

复合类型

Java中的hashCode方法

Integer类型

package com.isea.hash;

public class Main {
    public static void main(String[] args) {
        int a = 42;
        System.out.println(((Integer)a).hashCode());//42
        int b = -42;
        System.out.println(((Integer) b).hashCode());//-42
    }
}

java中hashCode函数，来自于祖先类Object类，在Integer类中，重写了该方法：

@Override
public int hashCode() {
     return Integer.hashCode(value);
}

上面的这个方法调用的就是下面的hashCode方法，会直接返回Integer对象中存储的int类型的值，这个值有正有负

    public static int hashCode(int value) {
        return value;
    }

对于Integer类型的hashCode函数，并不能够实现整型和数组索引的对应，所以如果想要完成得到的hash值和数组的索引完成对应需要hash表的进一步处理，比如设置一个偏移。

Double类型

package com.isea.hash;

public class Main {
    public static void main(String[] args) {
        double c = 3.1314451;
        System.out.println(((Double) c).hashCode());//1453640796

        double d = -3.1314451;
        System.out.println(((Double) d).hashCode());//-693842852
    }
}

@Override
public int hashCode() {
   return Double.hashCode(value){
}

上面的hashCode函数调用下面的hashCode函数，返回一个int类型的数据，有正有负

    public static int hashCode(double value) {
        long bits = doubleToLongBits(value);
        return (int)(bits ^ (bits >>> 32));
    }

    public static long doubleToLongBits(double value) {
        long result = doubleToRawLongBits(value);
        // Check for NaN based on values of bit fields, maximum
        // exponent and nonzero significand.
        if ( ((result & DoubleConsts.EXP_BIT_MASK) ==
              DoubleConsts.EXP_BIT_MASK) &&
             (result & DoubleConsts.SIGNIF_BIT_MASK) != 0L)
            result = 0x7ff8000000000000L;
        return result;
    }

String类型

package com.isea.hash;

public class Main {
    public static void main(String[] args) {
        String s = "isea";
        System.out.println(s.hashCode());//3241798

        String ss = "isea you";
        System.out.println(ss.hashCode());//277398597
    }
}

String类型的hashCode函数

/** The value is used for character storage. */
private final char value[];

/** Cache the hash code for the string */
private int hash; // Default to 0

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;

        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

自定义数据类型

package com.isea.hash;

import java.util.Objects;

public class Student {
    private int grade;
    private int cls;
    private String firstName;
    private String lastName;

    public Student(int grade, int cls, String firstName, String lastName) {
        this.grade = grade;
        this.cls = cls;
        this.firstName = firstName;
        this.lastName = lastName;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        Student student = (Student) o;
        return grade == student.grade &&
                cls == student.cls &&
                Objects.equals(firstName, student.firstName) &&
                Objects.equals(lastName, student.lastName);
    }

/*    @Override
    public int hashCode() {
        return Objects.hash(grade, cls, firstName, lastName);
    }*/

    @Override
    public int hashCode() {
        int B = 31;
        int hash = 0;
        hash = hash * B + grade;
        hash = hash * B + cls;
        hash = hash * B + firstName.toLowerCase().hashCode();
        hash = hash * B + lastName.toLowerCase().hashCode();

        return hash;
    }
}

主函数：

package com.isea.hash;

public class Main {
    public static void main(String[] args) {
        Student isea = new Student(3, 2, "isea", "andy");
        System.out.println(isea.hashCode());//104509212
    }
}

测试二：

package com.isea.hash;

public class Main {
    public static void main(String[] args) {
        Student isea = new Student(3, 2, "isea", "andy");
        System.out.println(isea.hashCode());//104509212

        HashSet<Student> hashSet = new HashSet<>();
        hashSet.add(isea);
    }
}


我们不断追溯hashSet的add方法：
1，第一次会调用hashTable 的put方法

    private transient HashMap<E,Object> map;
    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }
    2，hashMap中的put方法是这样定义的：
    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }
    可以看出，在map的底层存储的是key的hash值，key，value，这个hash函数，也是hashTable中的方法
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
    
最后，这个key调用的是Object中的hashCode方法
    public native int hashCode();

使用HashSet 和 HashMap的需要注意的地方

重写hashCode方法，这是为了满足我们自己的业务需求；

重写equals方法，这是为了解决哈希冲突问题。

如果不重写hashCode方法的话，会使用Object的hashCode方法，这时候使用的是地址值，即便出现两个值一样的不同对象，产生的hash值是一样的，如果这时候是HashSet 的话，会出现重复的内容（实际上也是不同的）；如果我们没有重写equals方法的话，由于我们重写的哈希函数，有可能会产生哈希冲突，即两个不同的对象的hash值相同，这时候，如果不进一步验证，之前的key值会被新的key值覆盖掉（其实key1，和key2只是哈希值相同而已），造成数据的丢失。所以，需要进一步比较两个哈希值相等的对象是不是真的相等。

package com.isea.hash;

import java.util.HashMap;
import java.util.HashSet;

public class Main {
    public static void main(String[] args) {
        Student isea = new Student(3, 2, "isea", "andy");
        System.out.println(isea.hashCode());//104509212

        HashSet<Student> hashSet = new HashSet<>();
        hashSet.add(isea);

        HashMap<Student,Integer> hashMap = new HashMap<>();
        hashMap.put(isea,100);
    }
}