哈希算法初探

在java中，任何对象的HashCode（）方法在默认情况下把对象物理地址先转成整数，再把整数计算Hash码，当然对于一些复杂的抽象数据类型也可以把自定义key转换成Hash码。

这里计算Hash码的过程就是哈希函数的工作，哈希函数在理想条件下要具有3个性质：1.确定性 2.便于计算 3.将key均匀映射散布。当然现实情况肯定不可能，那就要解决hash码冲突的问题。因此，哈希表的建立需要包含哈希函数和冲突解决机制

符号表（symbol table）使用哈希算法基于假设key可以均匀映射到0-M-1的整数区间（这些整数也就是数组索引），符号表的哈希算法通常包括散链法（separated chaining ）和线性指针法(linear probe)，下面一一介绍。

散链法选择一个足够大的数组，数组每个元素指向一个链表，通过两步：先根据哈希码查询链表的数组索引，再在链表中寻找元素。具体是：使用一个大小为M（M常为质数，哈希码在0-M-1的整数区间）数组，每个数组指向一个由链表实现的符号表。当特定的key（可以是基本数据类型也可以是抽象数据类型，也可以是几个字段的的组合，只要保证key的唯一性）转化为相应的哈希码，在数组中查找到对应符号表，沿着符号表遍历，如果该key已经存在则修改key对应的value；若不存在则插入。代码如下：

public class SeparatedChainingHashST<Key,Value> {

	private int N;	//the number of all keys
	private int M;	//the number of separated chains
	private SequentialSearchST<Key, Value>[] st;//the array of separated chains
	
	
	public SeparatedChainingHashST()
	{
		this(997);
	}

	@SuppressWarnings("unchecked")
	public SeparatedChainingHashST(int M)
	{
		this.M=M;
		this.N=0;
		this.st=(SequentialSearchST<Key, Value>[])new SequentialSearchST[M]; 
		//Java不支持new泛型数组，但支持强制类型转换，禁止直接创建泛型数组实例，但并没有禁止声明一个泛型数组引用
		for(int i=0;i<this.M;i++)
			this.st[i]=new SequentialSearchST<Key, Value>();
	}

	public void resize(int M2)
	{
		int M1=this.M;
		int N1=this.N;
		SeparatedChainingHashST<Key,Value> new_ST=new SeparatedChainingHashST<Key,Value>(M2);
		for(int i=0;i<M1;i++)
		{
			for(Key key:this.st[i].keys())
			{
				new_ST.st[i].put(key,this.st[i].get(key));
			}
		}
		this.st=new_ST.st;
		this.N=N1;
		
	}
	
	public int hash(Key key)
	{
		return (key.hashCode() & 0x7fffffff)%M;
	}
	
	public int size()
	{
		return this.N;
	}
	
	
	public boolean isEmpty()
	{
		return this.size()==0;
	}
	
	public boolean contains(Key key)
	{
		if(get(key)!=null)
			return true;
		else
			return false;
	}
	
	private Value get(Key key)
	{
		return this.st[this.hash(key)].get(key);
	}
	
	public void put(Key key, Value val) 
	{
		int hash=this.hash(key);
		if(this.contains(key))
			this.st[hash].put(key, val);
		else
		{
			this.st[hash].put(key, val);;
			this.N++;
		}
	}
	
	public void delete(Key key) 
	{
		int hash=this.hash(key);
		if(this.contains(key))
		{
			this.st[hash].delete(key);
			this.N--;
		}
	}
	
	public Iterable<Key> keys() 
	{
		LinkedList<Key> list=new LinkedList<Key>();
		for(int i=0;i<this.M;i++)
		{
			for(Key key:this.st[i].keys())
			{
				list.add(key);
			}
		}
		return list;
	}

}

线性指针法使用key数组和value数组，同一索引对应的数组值构成key-value；数组的大小M要大于期望的key数目。当特定的key转化为相应的哈希码，找到key数组该位置，如果有相同key存在，则修改key对应的value；如果数组该位置为空，则插入相应的key-value；如果该位置有其它key，则沿着该位置向后搜索第一个空位置插入相应的key-value（到达数组末尾则转到数组开头进行搜索）。代码如下：

public class LinearProbingHashST<Key,Value> {

	private static final int INIT_CAPACITY = 8;
	
	
	private Key[] keys;
	private Value[] vals;
	private int N;	//the number of all keys:N<<M
	private int M;	//the size of keys&vals array
	
	public LinearProbingHashST()
	{
		this(INIT_CAPACITY);
	}
	
	@SuppressWarnings("unchecked")
	public LinearProbingHashST(int M)
	{
		this.keys=(Key[])new Object[M];
		this.vals=(Value[])new Object[M];
		this.M=M;
	}
	
	public void resize(int M2)
	{
		int m=this.M;
		LinearProbingHashST<Key,Value> ST_new=new LinearProbingHashST<Key,Value>(M2);
		for(int i=0;i<m;i++)
		{
			ST_new.keys[i]=this.keys[i];
			ST_new.vals[i]=this.vals[i];
		}
		this.keys=ST_new.keys;
		this.vals=ST_new.vals;
	}
	
	public int hash(Key key)
	{
		return (key.hashCode() & 0x7fffffff)%M;
	}
	
	public int size()
	{
		return this.N;
	}
	
	public boolean isEmpty()
	{
		return this.size()==0;
	}
	
	public boolean contains(Key key)
	{
		if(this.get(key)!=null)
			return true;
		else
			return false;
	}
	
	private Value get(Key key)
	{
		int hash=this.hash(key);
		if(keys[hash].equals(key))
			return this.vals[hash];
		return null;
	}
	
	public void put(Key key, Value val) 
	{
		if (key == null) throw new IllegalArgumentException("first argument to put() is null");
		
		if(val==null)
			this.delete(key);
		
		if(this.M<=2*this.N)//保证稀疏分布
			this.resize(this.M*2);
		
		int hash=this.hash(key);
		if(this.contains(key))
		{
			this.vals[hash]=val;
			return;
		}
			
		if(this.keys[hash]==null)
		{
			this.keys[hash]=key;
			this.vals[hash]=val;
		}
		else
		{
			int i=hash+1;
			while(this.keys[i]!=null)
				i=(i+1)%this.M;
			this.keys[hash]=key;
			this.vals[hash]=val;
		}
		this.N++;
	}
	
	public void delete(Key key)
	{
		int hash=this.hash(key);
		if(this.keys[hash].equals(key))
		{
			this.keys[hash]=null;
			this.vals[hash]=null;
			this.N--;
		}
	}
	
	public Iterable<Key> keys() 
	{
		LinkedList<Key> list=new LinkedList<Key>();
		for(int i=0;i<this.M && this.keys[i]!=null;i++)
			list.add(this.keys[i]);
		return list;
	}
}

以上参考Princeton《算法》一书，局部代码有所不同，但逻辑一致。

猜你喜欢