java 中的 hash() 实现

hash() implementation in java

我正在 Java 中查看 HashMap class。我的理解是 hash table 的容量是桶数的 2 次方(容量 16 表示四个桶)。当调用 put(key,value) 时,key.hashCode() 输出一个整数,这个新添加的 (key,value) 对基于 key.hashCode()% 的桶数放置。但是下面是HashMap.class

中的实际实现
 static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

从上面的代码中,我无法弄清楚如何将 key.hashCode() 值拟合到桶中。

该代码不会 "fit" 将 hashCode 存储到桶中,它 "just" 传播 hashcode 使高位更重要。这是该方法的 javadoc。

Computes key.hashCode() and spreads (XORs) higher bits of hash to lower. Because the table uses power-of-two masking, sets of hashes that vary only in bits above the current mask will always collide. (Among known examples are sets of Float keys holding consecutive whole numbers in small tables.) So we apply a transform that spreads the impact of higher bits downward. There is a tradeoff between speed, utility, and quality of bit-spreading. Because many common sets of hashes are already reasonably distributed (so don't benefit from spreading), and because we use trees to handle large sets of collisions in bins, we just XOR some shifted bits in the cheapest possible way to reduce systematic lossage, as well as to incorporate impact of the highest bits that would otherwise never be used in index calculations because of table bounds.

桶的实际拟合是在 getNode(int, Object) 方法中完成的:

first = tab[(n - 1) & hash]

其中 hashhash(Object) 的结果,n 是哈希表的大小。

这可能有点帮助。如果我们要用从 1 到 10 的浮点键添加十个项目。

        Map<Float, String> map = new HashMap<>();
        int numOfBuckets = 64; // HashMap will have 64 bins after inserting 10 items

        String format = "|%1$-5s|%2$-35s|%3$-25s|%4$-35s|%5$-25s|%6$-25s|\n";
        System.out.format(format, "i", "raw binary", "right-shifted 16 bits", "rehashed", "bucket before rehashed",
                "bucket after rehashed");

        for (int i = 1; i <= 10; i++) {
            float key = i;
            int rawInt = Float.floatToRawIntBits(key);
            String binaryString = Long.toBinaryString(rawInt);
            String shifted16BitsString = Long.toBinaryString(rawInt >>> 16);

            int rehashed = rawInt ^ rawInt >>> 16;
            String rehashedString = Long.toBinaryString(rehashed);

            // HashMap calculates bin index with (n - 1) & hash
            String bucketBeforeRehashed = Long.toString((numOfBuckets - 1) & Objects.hashCode(key));
            String bucketAfterRehashed = Long.toString((numOfBuckets - 1) & rehashed);

            System.out.format(format, i, binaryString, shifted16BitsString, rehashedString,
                    bucketBeforeRehashed, bucketAfterRehashed);
            map.put(key, Integer.toString(i));
        }

产生:

|i    |raw binary                         |right-shifted 16 bits    |rehashed                           |bucket before rehashed   |bucket after rehashed    |
|1    |111111100000000000000000000000     |11111110000000           |111111100000000011111110000000     |0                        |0                        |
|2    |1000000000000000000000000000000    |100000000000000          |1000000000000000100000000000000    |0                        |0                        |
|3    |1000000010000000000000000000000    |100000001000000          |1000000010000000100000001000000    |0                        |0                        |
|4    |1000000100000000000000000000000    |100000010000000          |1000000100000000100000010000000    |0                        |0                        |
|5    |1000000101000000000000000000000    |100000010100000          |1000000101000000100000010100000    |0                        |32                       |
|6    |1000000110000000000000000000000    |100000011000000          |1000000110000000100000011000000    |0                        |0                        |
|7    |1000000111000000000000000000000    |100000011100000          |1000000111000000100000011100000    |0                        |32                       |
|8    |1000001000000000000000000000000    |100000100000000          |1000001000000000100000100000000    |0                        |0                        |
|9    |1000001000100000000000000000000    |100000100010000          |1000001000100000100000100010000    |0                        |16                       |
|10   |1000001001000000000000000000000    |100000100100000          |1000001001000000100000100100000    |0                        |32                       |

从输出中我们可以发现,键的低位全为0,这导致所有项目都被分配到同一个bin。但是执行右移和异或后分布变得更好。我认为HashMap的hash()方法中的源代码注释中描述的就是这种情况。