教你从零开始写一个哈希表--附录

附录：其他的冲突处理方案

常见的两种哈希冲突解决方案如下：

链表法
开放寻址法

链表法

分离链表法中，每一个桶包含一个链接表。当键值对的键冲突时，键值对会被加入到这个列表中。它支持的方法如下：

插入：
计算关键字的哈希值来查找桶的下表索引。如果访问的桶没有值，把键值对插入到这个桶中。如果已经保存了键值对了，将待插入的键值对追加到链接表后面。
搜索：
计算关键字的哈希值来查找桶的下表索引。遍历链接表，用待查找的关键字跟比较每一个键值对的关键字。如果找到了关键字，返回对应的值，否则返回空。
删除：
计算关键字的哈希值来查找桶的下表索引。遍历链接表，用待查找的关键字跟比较每一个键值对的关键字。如果找到了关键字，从链接表中删除对应的键值对。如果链接表中只有一个键值对了，在桶中放一个空值以标志链接表是空的。

这个方法的优势是易于实现，但是空间效率低下。每一个键值对都保存了所在链接表中的下一个节点的指针。如果没有下一个节点保存空指针。空间浪费在了记录指针上面，这本可以用来存储更多键值对的。

开放寻址法

开放寻址法是为了解决链表法的空间效率低的问题的。当发生冲突时，冲突的键值对被放在哈希表的其他桶中。放置键值对的桶，是依据预设好的规则来选择的。这样，查找键值对的时候可能出现重复。目前有三种常见的方法来为冲突的键值对选择选择可插入的桶。

线性探测

当发生冲突时，增加下标，然后把键值对放在数组的下一个可用的桶中。方法如下：

插入：
计算关键字的哈希值来查找桶的下标索引。如果桶是空的，把键值对插入的这里。如果桶是非空的，重复增加下表的动作，直到找到空的桶，然后把键值对插入到这个桶中。
搜索：
计算关键字的哈希值来查找桶的下表索引。重复增加下标，比较每一个键值对的关键字跟待查找的关键字，直到找到一个空的桶。如果匹配到了待查询的关键字，返回对应的值，否则返回空。
删除：
计算关键字的哈希值来查找桶的下表索引。重复增加下标，比较每一个键值对的关键字跟待删除的关键字，直到找到一个空的桶。如果匹配到了待查询的关键字，删除对应的键值对。删除一个键值对会使链表断开，我们只能把待删除的键值对后面的所有节点插入到链表的后面。

线性探查提供了很好的缓存性能，但是导致了扩展性的问题。把冲突的键值对放在下一个可用的桶中，这可能会导致以填充的桶的连续扩张。插入、搜索或删除时，这种做法都需要遍历。

二次探测

Quadratic probing.
Similar to linear probing, but instead of putting the collided item in the next available bucket, we try to put it in the buckets whose indexes follow the sequence: i, i + 1, i + 4, i + 9, i + 16, …, where i is the original hash of the key. Methods:

插入
Insert: hash the key to find the bucket index. Follow the probing sequence until an empty or deleted bucket is found, and insert the item there.

搜索
计算关键字的哈希值来查找桶的下表索引。
Search: hash the key to find the bucket index. Follow the probing sequence, comparing each item’s key to the search key until an empty bucket is found. If a matching key is found, return the value, else return NULL.
删除
Delete: we can’t tell if the item we’re deleting is part of a collision chain, so we can’t delete the item outright. Instead, we just mark it as deleted.
Quadratic probing reduces, but does not remove, clustering, and still offers decent cache performance.

扫描二维码关注公众号，回复： 4528785 查看本文章

再哈希法

Double hashing aims to solve the clustering problem. To do so, we use a second hash function to choose a new index for the item. Using a hash function gives us a new bucket, the index of which should be evenly distributed across all buckets. This removes clustering, but also removes any boosted cache performance from locality of reference. Double hashing is a common method of collision management in production hash tables, and is the method we implement in this tutorial.

上一篇:教你从零开始写一个哈希表–调整大小