关于 std::hash 实现的公平假设

Fair assumptions about std::hash implementations

我们在研究数据库项目中使用多种形式的散列。例如，对于基数聚类，我们使用 n 最低有效位来确定聚类 ID。我们使用std::hash进行散列，这对我们来说已经足够了。

然而，虽然我们知道大多数实现使用身份来散列整数，但我们偶然发现浮点散列（是否有意义是另一个讨论）在不同平台上的实现方式不同。

我们可以对 std::hash 做出任何合理的假设吗？

MacOS: clang version 6.0.1 (tags/RELEASE_601/final) std::hash<float>{}(1.0f): 0000000000000000000000000000000000111111100000000000000000000000 std::hash<double>{}(1.0): 0011111111110000000000000000000000000000000000000000000000000000

Ubuntu: clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) std::hash<float>{}(1.0f): 0101001111100101011001010000100100010100111101010010111101001101 std::hash<double>{}(1.0): 0111010001100001101001000101000001001110110011100111101110011011

您唯一可以假设的是标准定义的（参见 cppreference）。

这意味着：

In particular, they define an operator() const that:

Accepts a single parameter of type Key.

Returns a value of type size_t that represents the hash value of the parameter.

Does not throw exceptions when called.

For two parameters k1 and k2 that are equal, std::hash()(k1) == std::hash()(k2).

For two different parameters k1 and k2 that are not equal, the probability that std::hash()(k1) == std::hash()(k2) should be very small, approaching 1.0/std::numeric_limits::max().

因此，您可以在不同平台上、在具有不同编译器版本的同一平台上，甚至从一个运行另一个平台上获得不同的值。在您的情况下，似乎在一种情况下您可能正在使用 libc++，而在另一种情况下您可能正在使用 libstdc++。

关于 std::hash 实现的公平假设

Fair assumptions about std::hash implementations

c++

hash

std

floating