HashMap哈希函数-二元运算符

Question

我正在浏览 HashMap 的源代码，但是二元运算符让我很困惑。

我确实理解下面的一般目的，公平分配并将 hashCode 控制在桶限制内。

有人可以解释这里的评论吗？现在这样做有什么好处？

/**
     * Computes key.hashCode() and spreads (XORs) higher bits of hash
     * to lower.  Because the table uses power-of-two masking, sets of
     * hashes that vary only in bits above the current mask will
     * always collide. (Among known examples are sets of Float keys
     * holding consecutive whole numbers in small tables.)  So we
     * apply a transform that spreads the impact of higher bits
     * downward. There is a tradeoff between speed, utility, and
     * quality of bit-spreading. Because many common sets of hashes
     * are already reasonably distributed (so don't benefit from
     * spreading), and because we use trees to handle large sets of
     * collisions in bins, we just XOR some shifted bits in the
     * cheapest possible way to reduce systematic lossage, as well as
     * to incorporate impact of the highest bits that would otherwise
     * never be used in index calculations because of table bounds.
     */
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

如果有人能帮助我理解它，那将是一个很大的帮助。

这不是重复问题，因为其他问题与 Java 8.

之前的哈希实现有关

提前致谢

Answer 1

hashCode() returns 一个 int，32 位宽。

在内部，HashMap 将对象保存在 pow(2, n) buckets 或 bins 中。 n 的值可能会有所不同——细节在这里并不重要；重要的是 n 通常比 32（散列中的位数）小得多。

每个对象都放入其中一个桶中。为了获得良好的性能，最好将对象均匀地分布在桶中。这就是对象哈希的用武之地：选择存储桶的最简单方法是采用对象哈希码的最低 n 位（使用简单的按位与）。但是，这只会使用最低的 n 位并忽略其余的哈希值。

在评论中，作者认为这是不可取的。他们引用了已知用例的示例，在这些用例中，对象哈希值会在除最低 n 以外的位上系统地不同。这将导致系统性冲突，而系统性冲突是坏消息。

为了部分解决这个问题，他们实施了当前的启发式算法：

保持哈希的前 16 位不变；
用前 16 位和后 16 位的 XOR 替换后 16 位。

HashMap哈希函数-二元运算符

HashMap hash function- Binary operator

java

hash

hashmap

hash-collision

java-8