为什么 java 哈希码实现 31 * x + y 比 x + y 更好?

Why java hashcode implementation 31 * x + y is better than x + y?

我对 java 关于哪种哈希码实现更好的面试问题感到困惑。我们有一个 class Point {int x, y; }.为什么这个 class 的 hashcode 31 * x + y 的实现比 x + y 更好?正确答案是 "The multiplier creates a dependence of the hashcode value on the order of processing the fields, which ultimately gives rise to a better hash function"。但是我不明白为什么处理的顺序是这里的重点,因为我执行point1.equals(point2)时整个表达式31 * x + y都在计算;而且它发生的顺序无关紧要。我错了吗?

如果用x+y那么点(3,4)和(4,3)怎么区分呢?两者将具有相同的哈希码...

现在虽然31 * x + y不会很完美,但是在同样的情况下,会好很多。

注意:根据散列的定义,没有完美的散列。唯一的事情就是分析给定的哈希函数发生了什么样的冲突。在几何情况下,第一个为非常简单且通常的对称性引入碰撞 属性。因此,在非常常见的情况下,可能会发生过多的碰撞。

假设您有两个字符串属性 prop1prop2,以及两个对象:

A: {prop1="foo", prop2="bar"}
B: {prop1="bar", prop2="foo"}

这些值明显不同,设置哈希码以区分它们很有用。如果您只是将属性的哈希码加在一起,您将获得 AB 的相同值。相反,通过乘法和加法,哈希码将根据 属性 序列而不同。

看来您可能稍微误解了建议:乘加的目的是创建对对象内属性语义顺序的依赖,而不是计算的执行顺序

Jean-Baptiste Yunès 的回答是正确的,但我将添加以下示例来说明(请记住它在 javascript 中,只是因为我快速实现了这个示例):

class Point {
    constructor(x, y) {
        this.x = x;
        this.y = y;
    }
}

function getHashCollisions(collection, hashFunction) {
    const collisionMap = new Map();
    let count = 1;
    let total = collection.length;
    for (const point of collection) {
        console.log(`calculating ${count++}/${total}`);
        const currentHash = hashFunction(point);
        const hashCount = collisionMap.has(currentHash) ? collisionMap.get(currentHash) +1 : 1;
        collisionMap.set(currentHash, hashCount);
    }
    return collisionMap;
}

function generateDataset(rangeX, rangeY) {
    const points = [];
    let count = 1;
    for (let x = 0; x < rangeX; x++) {
        for (let y = 0; y < rangeY; y++) {
            console.log(`generating ${count++} Point(${x};${y})`);
            points.push(new Point(x, y));
        }
    }
    return points;
}

function calculateAndGenerateReport(dataset, hashFunction, hashFunctionName) {
    const hashes = getHashCollisions(dataset, hashFunction);
    const totalCollisions = Array.from(hashes.values()).filter(currentCollisionCount => currentCollisionCount > 1).length;
    const highestCollisionCount = Array.from(hashes.values()).reduce((currentHighest, current) => current > currentHighest ? current : currentHighest) - 1;
    return `${hashFunctionName}: ${totalCollisions} collisions, highest collision count: ${highestCollisionCount}`;
}

const dataset = generateDataset(100, 100);

const literalHashesReport = calculateAndGenerateReport(dataset, point => point.x + point.y, "literal hash function:");
const onePrimeHashesReport = calculateAndGenerateReport(dataset, point => 31 * point.x + point.y, "one prime multiplication hash function:");
const twoPrimesHashesReport = calculateAndGenerateReport(dataset, point => 31 * point.x + 37 * point.y, "two primes multiplication hash function:");
const twoLargePrimesHashesReport = calculateAndGenerateReport(dataset, point => 8191 * point.x + 131071 * point.y, "two large primes multiplication hash function:");

console.log(literalHashesReport);
console.log(onePrimeHashesReport);
console.log(twoPrimesHashesReport);
console.log(twoLargePrimesHashesReport)

结果:

literal hash function: 197 collisions, highest collision count: 99
one prime multiplication hash function: 3107 collisions, highest collision count: 3
two primes multiplication hash function: 3359 collisions, highest collision count: 2
two large primes multiplication hash function: 0 collisions, highest collision count: 0

这表明我们选择 "calculate" 哈希的(质数)数大大降低了冲突的可能性。