从 Java 中的给定字符串派生/计算唯一哈希的最佳方法

Best way to derive / compute a unique hash from a given String in Java

我正在寻找为 Java 中的给定字符串计算唯一哈希值的方法。看来我不能使用 MD5 或 SHA1,因为人们声称它们已损坏并且并不总是保证唯一性。

我应该为 equals() 方法相等的两个 String 对象获得相同的散列值(最好是 32 个字符的字符串,如 MD5 Sum)。并且没有其他 String 应该生成此哈希 - 这是棘手的部分。

有没有办法在 Java 中实现这一点?

如果需要保证唯一的哈希码,那么这是不可能的(理论上可能但实际上不可能)。哈希和哈希码是非唯一的。

A Java String of length N has 65536 ^ N possible states, and requires an integer with 16 * N bits to represent all possible values. If you write a hash function that produces integer with a smaller range (e.g. less than 16 * N bits), you will eventually find cases where more than one String hashes to the same integer; i.e. the hash codes cannot be unique. This is called the Pigeonhole Principle, and there is a straight forward mathematical proof. (You can't fight math and win!)

But if "probably unique" with a very small chance of non-uniqueness is acceptable, then crypto hashes are a good answer. The math will tell you how big (i.e. how many bits) the hash has to be to achieve a given (low enough) probability of non-uniqueness.

更新:检查这个另一个好的答案:What is a good 64bit hash function in Java for textual strings?