C ++：获取范围内整数的最快方法

C++: Quickest way to get integer within a range

我需要为大约 N=1 亿个密钥生成散列密钥。根据我的研究，murmur3（MurmurHash3_x86_32，参见 murmur3 hash）似乎是最快的哈希函数，具有最佳延迟和足够小的冲突率。我面临的问题是函数 returns 键为 void *。更具体地说，模板是：

void MurmurHash3_x86_32 (const void *key, int len, uint32_t seed, void *out);

由于我的散列 table 大小小于它可以生成的最大散列，因此我需要将其放入 table 范围 [0, N-1]。最简单的解决方案似乎是使用 % 运算符。但是因为它被认为是一个缓慢的操作员，我想知道是否有更快的方法来解决这个问题。

我发现一个有趣的建议是 Is there an alternative to using % (modulus) in C/C++? 在 Whosebug 上给出的。它建议 'a power of two, the following works (assuming twos complement representation)':

return i & (n-1);

我的问题是，在较新的 CPU 上，有时（或者大多数时候？），由于多路高速缓存行，性能在大约 2^n，IIRC 时下降。（此 link 提供了有关插入的说明 Big Memory, Part 3.5: Google sparsehash!）。

目前，murmur3 的优势似乎被硬件相关问题和已知的 % 运算符的低效率所抵消。由于性能是一个限制因素，我要求低延迟和更快的解决方案来满足我的要求，即使它不是 MurmurHash3_x86_32。

The problem I am facing is that the function returns key as void *.

没有。它 returns 什么都没有 (void)。哈希结果记录在您通过最后一个参数指定的缓冲区（指向）中。对于 MurmurHash3_x86_32()，它是指向 uint32_t.

的指针最有意义

As my hash table size would be smaller than the largest hash it can generate, I need to get it into the table range [0, N-1]. The easiest solution seems to be to use % operator. But as it is known to be a slow operator, I wonder if there is a quicker way to solve the problem.

% 不仅是最简单的解决方案，而且是最常用的解决方案。 "Slow" 是相对的——% 比 + 慢，但是 比 MurmurHash3_x86_32().[=] 快得多 28=]

One interesting suggestion I found [...] suggests [using a power-of-two table size, and computing the modulus via the & operator]

请注意，与 SO 答案中的断言相反，实际上这完全不依赖于二进制补码表示。

My issue with this is that on newer CPUs, it is sometimes (or is it most of the times?), the performance degrades at around sizes 2^n, IIRC, due to multiway cache lines. (This link provides an illustration regarding insertions Big Memory, Part 3.5: Google sparsehash!).

您链接的报告中描述的性能下降归因于重新散列，这似乎很合理。这与您询问的操作无关。可以想象缓存（缺乏）关联性可能会影响大哈希 tables 的性能，但可能不会比大哈希 tables 通常影响的影响更大。使用散列 table 所固有的内存访问模式自然会产生较差的缓存局部性。这实际上就是点.

At the moment, the advantages of murmur3 seems to be nullified by hard-ware related issues and known inefficiency of % operator. As performance is a constraint, I request for low latency and faster solutions to my requirement even if it is not MurmurHash3_x86_32.

你想多了。无法有效使用 CPU 缓存只是您为使用大哈希 table 付出的代价。它与散列函数无关（只要散列函数能正常工作）。单个算术运算的成本，无论是 % 还是 &，与计算散列以在上进行运算的成本相比并不明显，因此你选择哪一个并不重要。如果您希望该操作获得微小优势，请使用大小为 table 和 & 的二次幂运算符。另一方面，这会丢弃一些您费了很大力气才能计算出的散列位。考虑改为选择 prime 散列 table 大小和 % 运算符——然后所有散列位都将有助于选择桶，这可能会改善您的传播。

C ++：获取范围内整数的最快方法

C++: Quickest way to get integer within a range

c

c++

hash

modulo

low-latency