7 位整数的哈希函数

Question

我是散列 table 和函数的新手，所以如果有任何错误，我提前道歉。

我正在尝试在 C++ 中创建一个散列 table，用于包含大约 10 万个条目的列表，这些条目由一个 7 位数字组成。问题是，我在尝试找出要使用的哈希函数时遇到了困难。

当使用 %100000 时，我得到了大约 65k 个唯一键，而有大约 90k 个唯一条目。这意味着大约 1/3 的数据会发生冲突。

这个哈希函数好用吗？或者在这种情况下是否有更好的功能可以使用以减少碰撞？我的 table 尺寸应该是多少？

再次感谢！

编辑- 条目是 1 到 2 百万之间的数字。是否可以使用数字本身作为密钥？或者散列 table 的键应该始终从 0 开始吗？

Answer 1

标准库在内部使用散列 table 提供了两种类型：std::unordered_map 和 std::unordered_set。

由于您的密钥类型是整数类型，您可以通过 std::unordered_map<YourIdType, YourDataType 非常方便地获得散列 table。您可以通过 theMap[someId] 轻松访问数据，但请注意，如果未找到密钥，则会创建一个新对象！如果这不是你想要的，你宁愿使用 theMap.find(someId)，其中 returns 一个迭代器。

不过，缺点是您会存储 id 两次（在内部作为 std::pair<YourIdType, YourDataType>）。您可以使用 std::unordered_set 来避免这种情况。但是，为了能够这样做，您需要针对您的类型专门化 std::hash 和 std::equal_to：

namespace std // you are not allowed to add to – with exception of specialisations
{
template<>
struct hash<YourDataType>
{
    size_t operator()(YourDataType const& object) const
    {
        return hash<YourIdType>()(object.id);
    }
};

// analogously equal_to with appropriate comparisons, possibly by
// comparing the object's addresses

或者，您可以提供自定义哈希类型（使用 C++20，甚至可以将 lambda 打包到 decltype 中）作为第二个模板参数设置，然后只为您的代码实现 operator==对象类型，或者如果您需要它与运算符进行不同的比较，则提供自定义相等比较器类型，例如：

// C++20 required:
using YourMapType = std::set
<
    YourDataType,
    decltype
    (
        [](YourDataType const& object)
        { return std::hash<YourIdType>()(object.id); }
    ),
    decltype
    (
        [](YourDataType const& o1, YourDataType const& o2)
        { return &o1 == &o2; } // TODO: comparisons as you need! 
    )
>;
// alternatively create custom types with appropriate operator() implementations

这里的缺点是——除了专业化的额外复杂性——你不能只通过 id 查找对象，而是你需要一个数据类型的完整对象。

那么哪个更appropriate/suitable？这取决于您的具体要求...

7 位整数的哈希函数

Hash Function for a 7 digits int

c++

hash-function

hashtable