优化 C++ 代码（使用 UnorderedMap 和 Vector）

Question

我正在尝试优化花费很长时间的 C++ 代码的某些部分（代码的以下部分对于 X 数据量大约需要 19 秒，我试图在更短的时间内完成整个过程相同数量的数据少于 5 秒——基于我的一些基准测试）。我有一个函数"add"，我在这里编写并复制了代码。我将尝试尽可能多地解释我认为理解代码所需的内容。如果我错过了什么，请告诉我。

以下函数 add 被调用了 X 次，用于 X 量的数据条目。

void HashTable::add(PointObject vector)   // PointObject is a user-defined object
{
    int combinedHash = hash(vector);   // the function "hash" takes less than 1 second for X amount of data

   // hashTableMap is an unordered_map<int, std::vector<PointObject>>

   if (hashTableMap.count(combinedHash) == 0)
   {
        // if the hashmap does not contain the combinedHash key, then 
        //  add the key and a new vector
        std::vector<PointObject> pointVectorList;
        pointVectorList.push_back(vector);
        hashTableMap.insert(std::make_pair(combinedHash, pointVectorList));
   }
   else
   {
        // otherwise find the key and the corresponding vector of PointObjects and add the current PointObject to the existing vector
        auto it = hashTableMap.find(combinedHash);
        if (it != hashTableMap.end())
        {
            std::vector<PointObject> pointVectorList = it->second;
            pointVectorList.push_back(vector);
            it->second = pointVectorList;
        }
   }
}

Answer 1

与其调用 hashTableMap.count(combinedHash) 和 hashTableMap.find(combinedHash)，不如插入新元素并检查 insert() 返回的内容：

In versions (1) and (2), the function returns a pair object whose first element is an iterator pointing either to the newly inserted element in the container or to the element whose key is equivalent, and a bool value indicating whether the element was successfully inserted or not.

此外，不要按值传递对象，在您不必这样做的地方。最好通过指针或引用传递它。这个：

std::vector<PointObject> pointVectorList = it->second;

效率低下，因为它会创建一个不必要的向量副本。

Answer 2

没有 if，尝试在散列 table:

上插入一个空条目

auto ret = hashTableMap.insert(
   std::make_pair(combinedHash, std::vector<PointObject>());

要么添加一个新的空白条目，要么检索已经存在的条目。在你的情况下，你不需要检查是哪种情况，你只需要获取返回的迭代器并添加新元素：

auto &pointVectorList = *ret.first;
pointVectorList.push_back(vector);

Answer 3

您的最大问题是您在 else 部分复制整个向量（以及该向量中的每个元素）两次 :

std::vector<PointObject> pointVectorList = it->second;  // first copy
pointVectorList.push_back(vector);
it->second = pointVectorList;                           // second copy

这意味着每次向现有矢量添加元素时，您都会复制整个矢量。

如果您使用对该向量的引用，您会做得更好：

std::vector<PointObject> &pointVectorList = it->second;
pointVectorList.push_back(vector);
//it->second = pointVectorList; // don't need this anymore.

附带说明一下，在您的 unordered_map 中，您正在散列您的值作为您的密钥。您可以将 unordered_set 与哈希函数一起使用。

Answer 4

在这里使用 std::unordered_map 似乎不合适 - 您使用 hash 中的 int 作为键（大概）是 PointObject 的散列而不是PointObject 本身。本质上是双重哈希。而且，如果您需要 PointObject 来计算地图键，那么它根本就不是真正的键！也许 std::unordered_multiset 会是更好的选择？

首先定义散列函数形式PointObject

namespace std
{
    template<>
    struct hash<PointObject> {
        size_t operator()(const PointObject& p) const {
            return ::hash(p);
        }
    };
}

然后像

#include <unordered_set>

using HashTable = std::unordered_multiset<PointObject>;

int main()
{
    HashTable table {};

    PointObject a {};
    table.insert(a);

    table.emplace(/* whatever */);

    return 0;
}

Answer 5

你做了很多无用的操作...如果我理解正确的话，简化的形式可以是：

void HashTable::add(const PointObject& vector) {
   hashTableMap[hash(vector)].push_back(vector);    
}

之所以有效，是因为

使用 operator[] 访问的地图将创建默认初始化值（如果地图中尚未存在）
这个值（一个std::vector）通过引用返回所以你可以直接push_back传入指向它。 std::vector 要么是新插入的，要么是以前存在的，如果键已经在映射中的话。

另请注意，根据 PointObject 的大小和其他因素，按值传递 vector 可能比按 const PointObject& 更有效。这是一种微优化，但需要明智地执行分析。

Answer 6

假设 PointObject 很大并且复制它很昂贵，std::move 是你的朋友。您需要确保 PointObject 是移动感知的（要么不定义析构函数或复制运算符，要么自己提供移动构造函数和移动赋值运算符）。

void HashTable::add(PointObject vector)   // PointObject is a user-defined object
{
    int combinedHash = hash(vector);   // the function "hash" takes less than 1 second for X amount of data

   // hashTableMap is an unordered_map<int, std::vector<PointObject>>

   if (hashTableMap.count(combinedHash) == 0)
   {
        // if the hashmap does not contain the combinedHash key, then 
        //  add the key and a new vector
        std::vector<PointObject> pointVectorList;
        pointVectorList.push_back(std::move(vector));
        hashTableMap.insert(std::make_pair(combinedHash, std::move(pointVectorList)));
   }
   else
   {
        // otherwise find the key and the corresponding vector of PointObjects and add the current PointObject to the existing vector
        auto it = hashTableMap.find(combinedHash);
        if (it != hashTableMap.end())
        {
            std::vector<PointObject> pointVectorList = it->second;
            pointVectorList.push_back(std::move(vector));
            it->second = std::move(pointVectorList);
        }
   }
}

Answer 7

这个.count()完全没有必要，你可以将你的函数简化为：

void HashTable::add(PointObject vector)
{
    int combinedHash = hash(vector);
    auto it = hashTableMap.find(combinedHash);
    if (it != hashTableMap.end())
    {
        std::vector<PointObject> pointVectorList = it->second;
        pointVectorList.push_back(vector);
        it->second = pointVectorList;
    }
    else
    {
        std::vector<PointObject> pointVectorList;
        pointVectorList.push_back(vector);
        hashTableMap.insert(std::make_pair(combinedHash, pointVectorList));
    }
}

你也在到处进行复制操作。复制对象非常耗时，请避免这样做。尽可能使用引用和指针：

void HashTable::add(PointObject& vector)
{
    int combinedHash = hash(vector);
    auto it = hashTableMap.find(combinedHash);
    if (it != hashTableMap.end())
    {
        it->second.push_back(vector);
    }
    else
    {
        std::vector<PointObject> pointVectorList;
        pointVectorList.push_back(vector);
        hashTableMap.insert(std::make_pair(combinedHash, pointVectorList));
    }
}

此代码可能可以进一步优化，但需要了解 hash()，了解 hashTableMap 的工作方式（顺便说一下，为什么它不是 std::map？）和一些实验。

如果 hashTableMap 是 std::map<int, std::vector<pointVectorList>>，您可以将函数简化为：

void HashTable::add(PointObject& vector)
{
    hashTableMap[hash(vector)].push_back(vector);
}

如果它是 std::map<int, std::vector<pointVectorList*>>（指针），您甚至可以避免最后的复制操作。

优化 C++ 代码（使用 UnorderedMap 和 Vector）

Optimization of a C++ code (that uses UnorderedMap and Vector)

c++

optimization

unordered-map

vector