Java 对于相同的对象,哈希码在一种情况下会发生冲突,而在另一种情况下不会发生冲突,为什么? (下面的代码)

Java hashcodes collide in one case and not the other for the same objects, why? (Code Below)

我尝试编写一个小程序来演示 java 中仅重写 equals 而不是 hashcode() 方法时的哈希冲突。这是为了证明两个不相等的对象可以具有相同哈希码的理论。这是针对询问行为的面试问题。

我创建了 200,000 个对象,将它们存储在一个数组中,然后比较它们以查看哪些是重复的。 (为此,我在对象创建阶段后使用嵌套的 for 循环迭代对象数组。)对于大约 200,000 个对象,我得到 9 次碰撞。第一个是索引 196 和 121949 处的对象。然后我继续打印这些哈希码以显示两个值相同。

但是我遇到了一些非常令人惊讶的行为。如果我迭代嵌套的 for 循环并打印哈希码的第一次冲突,我得到相同的哈希码值

1867750575
1867750575 

对于索引为 196 和 121949 的对象。

但是如果我注释掉用于检测所有冲突的嵌套 for 循环并直接打印索引 196 和 121949 处元素的哈希码,我会得到

1829164700
366712642

请注意,我没有评论这些元素的创建,只是我检查碰撞的部分。

为什么会这样,即使我不遍历它们,哈希码不应该是一致的吗?

附录1:据我所知,这背后是否有来源,按照生日原则,如果我创建200,000个对象,我必须发生碰撞,如何遍历每个hascode或不改变任何东西?

附录 2:我尝试添加另一个 200000 大小的数组只是为了查看冲突索引是否发生变化,但它们没有发生变化,因此显然在未提交循环的情况下对二进制文件进行更改不会进行任何更改。所以改变二进制改变哈希​​码的假设不成立。

这是我的代码

import java.util.HashMap;

public class EmployeeFactory {

    private static int counter = 0;
    public int id;
    public String empName;

    EmployeeFactory() {
        id = counter;
        empName = "employee_" + id;
        counter++;
    }

    @Override
    public boolean equals(Object o) {

        // If the object is compared with itself then return true
        if (o == this) {
            return true;
        }

        if (o == null || o.getClass() != this.getClass()) {
            return false;
        }

        EmployeeFactory emp = (EmployeeFactory) o;

        // Compare the data members and return accordingly
        return this.id == emp.id;
    }

    public static void main(String[] args) {

        int Obj_Count = 200000;

        EmployeeFactory objs[] = new EmployeeFactory[Obj_Count];
        for (int i = 0; i < Obj_Count; ++i) {
            EmployeeFactory temp = new EmployeeFactory();
            objs[i] = temp;
        }


//Please try code once un commenting the loop below and once while keeping it commented.
 /*   
        for (int i = 0; i < Obj_Count; ++i)
        {
            for (int j = i + 1; j < Obj_Count; ++j)
            {
                if (objs[i].hashCode() == objs[j].hashCode())
                {
                    System.out.println("Objects with IDs " + objs[i].id
                                     + " and " + objs[j].id + " collided.");
                    System.out.println("Object Is " + i + "and Obj ID is "+ objs[i].id + " Has Hashcode " + objs[i].hashCode());
                    System.out.println("Object Is " + j + "and Obj ID is "+ objs[j].id + " Has Hashcode " + objs[j].hashCode());
                    System.out.println("");
                }
            }
        }
        */

        HashMap<EmployeeFactory, EmployeeFactory> hm = new HashMap<EmployeeFactory, EmployeeFactory>();
        objs[121949].id = objs[196].id;
        hm.put(objs[196], objs[196]);
        hm.put(objs[121949], objs[121949]);
        System.out.println(hm.get(objs[121949]).empName);
        System.out.println(hm.get(objs[196]).empName);

        // checking the hashmap
        System.out.println(hm.get(objs[121949]).hashCode());
        System.out.println(hm.get(objs[196]).hashCode());

        // Checking the array
        System.out.println(objs[121949].hashCode());
        System.out.println(objs[196].hashCode());

    }

}

注释输出:

employee_121949
employee_196
1829164700
366712642
1829164700
366712642

未注释的循环输出

Objects with IDs 196 and 121949 collided.
Object Is 196and Obj ID is 196 Has Hashcode 1867750575
Object Is 121949and Obj ID is 121949 Has Hashcode 1867750575

Objects with IDs 62082 and 145472 collided.
Object Is 62082and Obj ID is 62082 Has Hashcode 2038112324
Object Is 145472and Obj ID is 145472 Has Hashcode 2038112324

Objects with IDs 62354 and 105841 collided.
Object Is 62354and Obj ID is 62354 Has Hashcode 2134400190
Object Is 105841and Obj ID is 105841 Has Hashcode 2134400190

Objects with IDs 68579 and 186938 collided.
Object Is 68579and Obj ID is 68579 Has Hashcode 1872358815
Object Is 186938and Obj ID is 186938 Has Hashcode 1872358815

Objects with IDs 105219 and 111288 collided.
Object Is 105219and Obj ID is 105219 Has Hashcode 651156501
Object Is 111288and Obj ID is 111288 Has Hashcode 651156501

Objects with IDs 107634 and 152385 collided.
Object Is 107634and Obj ID is 107634 Has Hashcode 273791087
Object Is 152385and Obj ID is 152385 Has Hashcode 273791087

Objects with IDs 108007 and 146405 collided.
Object Is 108007and Obj ID is 108007 Has Hashcode 1164664992
Object Is 146405and Obj ID is 146405 Has Hashcode 1164664992

Objects with IDs 135275 and 180997 collided.
Object Is 135275and Obj ID is 135275 Has Hashcode 996371445
Object Is 180997and Obj ID is 180997 Has Hashcode 996371445

Objects with IDs 153749 and 184310 collided.
Object Is 153749and Obj ID is 153749 Has Hashcode 254720071
Object Is 184310and Obj ID is 184310 Has Hashcode 254720071

employee_121949
employee_121949
1867750575
1867750575
1867750575
1867750575

当你不重写hashCode()时,你得到的是继承自class Object的身份哈希码函数。

身份哈希码取决于你看不到的东西,理论上每次你 运行 你的程序都会改变,比如对象在内存中的最终位置,在你之前创建的对象的数量等. 您只是不能指望在您的程序或方法的不同 运行 之间的身份哈希值有任何一致性。

但是,如果您 运行 完全 同一个程序两次,而且不是太大,那么您最终得到相同哈希值的可能性很大两次。但是,如果您更改程序,您会更改加载和编译 class 时将消耗的内存量,这很可能会通过更改对象在内存中的位置来更改标识哈希。

Matt Timmermans 的回答很好地涵盖了基本问题,尤其是 "You can't expect to have any consistency...between different runs." (+1)

默认的 Object.hashCode() 实现(也称为 身份哈希码 因为它与 System.identityHashCode(obj) 相同)目前在 Hotspot 中只是一个 pseudo-random 带有 thread-local 种子的数字。很长一段时间以来,对象的内存地址都没有任何依赖性。如果您的程序执行是完全确定的,则哈希很可能是可重复的。

另请注意,身份哈希码是在第一次调用 Object.hashCode()System.identityHashCode() 时延迟生成的,并且该值存储在对象中,因此对该对象的后续调用将 return 相同的值。如果你 运行 你的碰撞检测器在另一个线程中循环,你会得到完全不同的哈希值,因此会有不同的碰撞。