为什么 std::unordered_map 很慢，我可以更有效地使用它来缓解这种情况吗？

Question

最近发现一件奇怪的事。似乎用 no caching at all is over 2 times faster than using std::unordered_map to cache all elements.

计算 Collatz 序列长度

请注意，我确实从问题 Is gcc std::unordered_map implementation slow? If so - why? 中得到了提示，并且我尝试使用这些知识来使 std::unordered_map 尽可能地执行（我使用了 g++ 4.6，它确实比最新版本执行得更好g++，并且我试图指定一个合理的初始桶数，我让它恰好等于地图必须容纳的最大元素数）。

相比之下，using std::vector to cache a few elements 比完全没有缓存快近 17 倍，比使用 std::unordered_map 快近 40 倍。

是我做错了什么，还是这个容器太慢了，为什么？可以让它执行得更快吗？或者 hashmap 本质上是无效的，应该在高性能代码中尽可能避免使用？

有问题的基准是：

#include <iostream>
#include <unordered_map>
#include <cstdint>
#include <ctime>

std::uint_fast16_t getCollatzLength(std::uint_fast64_t val) {
    static std::unordered_map <std::uint_fast64_t, std::uint_fast16_t> cache ({{1,1}}, 2168611);

    if(cache.count(val) == 0) {
        if(val%2 == 0)
            cache[val] = getCollatzLength(val/2) + 1;
        else
            cache[val] = getCollatzLength(3*val+1) + 1;
    }

    return cache[val];
}

int main()
{
    std::clock_t tStart = std::clock();

    std::uint_fast16_t largest = 0;
    for(int i = 1; i <= 999999; ++i) {
        auto cmax = getCollatzLength(i);
        if(cmax > largest)
            largest = cmax;
    }
    std::cout << largest << '\n';

    std::cout << "Time taken: " << (double)(std::clock() - tStart)/CLOCKS_PER_SEC << '\n';
}

它输出：Time taken: 0.761717

而根本没有缓存的基准测试：

#include <iostream>
#include <unordered_map>
#include <cstdint>
#include <ctime>

std::uint_fast16_t getCollatzLength(std::uint_fast64_t val) {
    std::uint_fast16_t length = 1;
    while(val != 1) {
        if(val%2 == 0)
            val /= 2;
        else
            val = 3*val + 1;
        ++length;
    }
    return length;
}

int main()
{
    std::clock_t tStart = std::clock();

    std::uint_fast16_t largest = 0;
    for(int i = 1; i <= 999999; ++i) {
        auto cmax = getCollatzLength(i);
        if(cmax > largest)
            largest = cmax;
    }
    std::cout << largest << '\n';

    std::cout << "Time taken: " << (double)(std::clock() - tStart)/CLOCKS_PER_SEC << '\n';
}

输出Time taken: 0.324586

Answer 1

标准库的映射确实本质上很慢（std::map 尤其是 std::unoredered_map）。 Google 的 Chandler Carruth 在他的 CppCon 2014 talk 中解释了这一点；简而言之：std::unordered_map 对缓存不友好，因为它使用链表作为桶。

This SO question 提到了一些高效的哈希映射实现 - 请改用其中之一。

为什么 std::unordered_map 很慢，我可以更有效地使用它来缓解这种情况吗？

Why is std::unordered_map slow, and can I use it more effectively to alleviate that?

c++

performance

caching

unordered-map

c++11