如果不能保证字符串或整数的 getHashCode() 是唯一的，为什么要使用它？

Question

正如我在标题中所写。

如果在您的应用程序中使用 getHashCode() 不安全，为什么要使用它？（对于字符串和整数）我想用它来交叉方法，除了 Linq 模型中的方法或者创建我自己的 IEqualityCompare class。感觉像是一个机会 - 如果它不是 100% 安全？

还是我漏掉了什么？

引用String.GetHashCode方法https://docs.microsoft.com/

Important

If two string objects are equal, the GetHashCode method returns identical values. However, there is not a unique hash code value for each unique string value. Different strings can return the same hash code.

The hash code itself is not guaranteed to be stable. Hash codes for identical strings can differ across .NET implementations, across .NET versions, and across .NET platforms (such as 32-bit and 64-bit) for a single version of .NET. In some cases, they can even differ by application domain. This implies that two subsequent runs of the same program may return different hash codes.

As a result, hash codes should never be used outside of the application domain in which they were created, they should never be used as key fields in a collection, and they should never be persisted.

Finally, don't use the hash code instead of a value returned by a cryptographic hashing function if you need a cryptographically strong hash. For cryptographic hashes, use a class derived from the System.Security.Cryptography.HashAlgorithm or System.Security.Cryptography.KeyedHashAlgorithm class.

For more information about hash codes, see Object.GetHashCode.

Answer 1

我认为让您感到困惑的是，您认为哈希码映射到一个值的地址，但事实并非如此。

把它想象成书架，哈希码映射到书架的地址。如果其中两个具有相同的 HashCode 将被放在同一个书架上，并且有一个书架的地址，里面有 3 本书，字典只检查书架上的三本书，而不是所有的书。所以唯一的哈希码越多，字典查找的速度就越快。

创建IEqualityComparer时，如果可以使GetHashCode()到return的值唯一，使用它的Dictionary或HashSet会比有很多重复项时执行得更快。

检查这个例子：

public int GetShashCode(string ojb)
{
     return obj.Length;
}

虽然它比遍历整个字符串快得多，但它不是很独特（尽管它是有效的）

这个例子也有效，但更糟糕的选择：

public int GetShashCode(string ojb)
{
     return (int)obj[0];
}

根据你能猜到的字符串内容，你可以做出更好的哈希码（例如你知道它是一个社会安全号码，格式如下："XXX-XX-XXXX" 其中每个 X 代表一个数字）将是一个不错的选择：

public int GetShashCode(string ojb)
{
     return int.Parse(obj.Replace("-",""));
}

Answer 2

If its not safe to use getHashCode() in your application, why use it?

GetHashCode has a different purpose. If you need an equality test for strings you should probably use String.Equals 或 == 运算符，这些保证可以正常工作。

哈希码并不是为每个可能的字符串生成唯一数字的方法，这是不可能的。这是 hash function 的定义：

A hash function is any function that can be used to map data of arbitrary size to fixed-size values.

它只是将一组几乎无限的字符串映射到（相对）非常有限的一组整数。如果您需要将大量字符串均匀分布到更小的 "buckets"，则可能需要使用哈希码。哈希码广泛用于基于哈希的集合，例如HashSet。

GetHashCode 的文档提到了此方法的不同问题：

该方法可以在不同的 domains/machines/versions .Net 上为相同的字符串生成不同的结果。这意味着将散列存储在外部作为某种唯一标识符供以后使用并不是一个好主意；
结果的加密强度不高，因此如果您需要牢不可破的密码盐，则不应使用它。

当然，它看起来很可怕，但是，GetHashCode 对于内存中的集合来说已经足够好了，例如 HashSet 或 Dictionary。

另外，看到这个问题：Why is it important to override GetHashCode when Equals method is overridden?

如果不能保证字符串或整数的 getHashCode() 是唯一的，为什么要使用它？

If getHashCode() for string or integer is not guaranteed to be unique why use it?

c#

linq

hashcode

iequalitycomparer