为什么 java 哈希码实现 31 * x + y 比 x + y 更好？

Question

我对 java 关于哪种哈希码实现更好的面试问题感到困惑。我们有一个 class Point {int x, y; }.为什么这个 class 的 hashcode 31 * x + y 的实现比 x + y 更好？正确答案是 "The multiplier creates a dependence of the hashcode value on the order of processing the fields, which ultimately gives rise to a better hash function"。但是我不明白为什么处理的顺序是这里的重点，因为我执行point1.equals(point2)时整个表达式31 * x + y都在计算;而且它发生的顺序无关紧要。我错了吗？

Answer 1

如果用x+y那么点(3,4)和(4,3)怎么区分呢？两者将具有相同的哈希码...

现在虽然31 * x + y不会很完美，但是在同样的情况下，会好很多。

注意：根据散列的定义，没有完美的散列。唯一的事情就是分析给定的哈希函数发生了什么样的冲突。在几何情况下，第一个为非常简单且通常的对称性引入碰撞属性。因此，在非常常见的情况下，可能会发生过多的碰撞。

Answer 2

假设您有两个字符串属性 prop1 和 prop2，以及两个对象：

A: {prop1="foo", prop2="bar"}
B: {prop1="bar", prop2="foo"}

这些值明显不同，设置哈希码以区分它们很有用。如果您只是将属性的哈希码加在一起，您将获得 A 和 B 的相同值。相反，通过乘法和加法，哈希码将根据属性序列而不同。

看来您可能稍微误解了建议：乘加的目的是创建对对象内属性语义顺序的依赖，而不是计算的执行顺序。

Answer 3

Jean-Baptiste Yunès 的回答是正确的，但我将添加以下示例来说明（请记住它在 javascript 中，只是因为我快速实现了这个示例）：

class Point {
    constructor(x, y) {
        this.x = x;
        this.y = y;
    }
}

function getHashCollisions(collection, hashFunction) {
    const collisionMap = new Map();
    let count = 1;
    let total = collection.length;
    for (const point of collection) {
        console.log(`calculating ${count++}/${total}`);
        const currentHash = hashFunction(point);
        const hashCount = collisionMap.has(currentHash) ? collisionMap.get(currentHash) +1 : 1;
        collisionMap.set(currentHash, hashCount);
    }
    return collisionMap;
}

function generateDataset(rangeX, rangeY) {
    const points = [];
    let count = 1;
    for (let x = 0; x < rangeX; x++) {
        for (let y = 0; y < rangeY; y++) {
            console.log(`generating ${count++} Point(${x};${y})`);
            points.push(new Point(x, y));
        }
    }
    return points;
}

function calculateAndGenerateReport(dataset, hashFunction, hashFunctionName) {
    const hashes = getHashCollisions(dataset, hashFunction);
    const totalCollisions = Array.from(hashes.values()).filter(currentCollisionCount => currentCollisionCount > 1).length;
    const highestCollisionCount = Array.from(hashes.values()).reduce((currentHighest, current) => current > currentHighest ? current : currentHighest) - 1;
    return `${hashFunctionName}: ${totalCollisions} collisions, highest collision count: ${highestCollisionCount}`;
}

const dataset = generateDataset(100, 100);

const literalHashesReport = calculateAndGenerateReport(dataset, point => point.x + point.y, "literal hash function:");
const onePrimeHashesReport = calculateAndGenerateReport(dataset, point => 31 * point.x + point.y, "one prime multiplication hash function:");
const twoPrimesHashesReport = calculateAndGenerateReport(dataset, point => 31 * point.x + 37 * point.y, "two primes multiplication hash function:");
const twoLargePrimesHashesReport = calculateAndGenerateReport(dataset, point => 8191 * point.x + 131071 * point.y, "two large primes multiplication hash function:");

console.log(literalHashesReport);
console.log(onePrimeHashesReport);
console.log(twoPrimesHashesReport);
console.log(twoLargePrimesHashesReport)

结果：

literal hash function: 197 collisions, highest collision count: 99
one prime multiplication hash function: 3107 collisions, highest collision count: 3
two primes multiplication hash function: 3359 collisions, highest collision count: 2
two large primes multiplication hash function: 0 collisions, highest collision count: 0

这表明我们选择 "calculate" 哈希的（质数）数大大降低了冲突的可能性。

为什么 java 哈希码实现 31 * x + y 比 x + y 更好？

Why java hashcode implementation 31 * x + y is better than x + y?

java

hashcode