通过共享对象线程进行迭代是否安全？

Question

我有一个看起来有点像这样的并行 C++ 算法：

void recursiveFunction(int p, std::unordered_set<int> localSet) {
    vector<someStruct> toCall;
    if (p > SOMEINT) {
        return;
    }
    #pragma omp parallel for schedule(dynamic, 1) shared(localSet, someGlobalVector, someOtherGlobalVector)
    for (int i = 0; i < someGlobalVector[p].size(); ++i) {
        auto someLocalValue = someGlobalVector[p][i];
        auto someOtherLocal = someOtherGlobalVector[p][i];
        std::unordered_set<int> filtered;
        for (auto elem : localSet) {
            if (someLocalValue.count(elem) == 0) {
                filtered.insert(elem);
            }
        }
        if (checkSomeCondition(p+1, filtered)) {
            #pragma omp critical
            {
                tocall.push_back({someOtherLocal, filtered});
            }
        }
    }
    for (auto [elem1, elem2] : toCall) {
        recursiveFunction(p+1, filtered);
    }
}

通过共享 unordered_set 线程进行迭代是否安全？

Answer 1

Is iteration through a shared unordered_set thread safe?

不，unordered_set 根据定义不是线程安全的。但是，这并不一定意味着在使用此类结构时代码具有 竞争条件 ；例如，如果一个 only 读取数据结构，即使是并发的，那么一个是安全的。这就是为什么真正不可变的数据结构根据定义是线程安全的，因为不能修改这些结构，因此得名。

另一方面，如果 unordered_set 被多个线程同时修改，那么就会出现 竞争条件 。尽管如此，只要在执行写入时确保互斥，仍然可以以线程安全的方式写入unordered_set。

非线程安全的数据结构，例如 unordered_set 依赖于访问它们的代码来确保当前由多个线程修改这些结构时互斥。

为了避免竞争条件，可以使用 OpenMP 的 critical 构造函数在 insert 操作期间同步对共享结构的访问：

#pragma omp critical
{
  // the code to insure mutual exclusion.
}

但是，由于这种同步机制的开销，这会减慢并行化。或者，可以创建只能由每个线程单独访问的数据结构，插入到这些结构中，然后让 main 线程将这些结构合并为一个（在并行区域之外）。

让我们分解一下您的代码：

void recursiveFunction(int p, std::unordered_set<int> localSet) {
    #pragma omp parallel for schedule(dynamic, 1) shared(localSet, someGlobalVector, someOtherGlobalVector)
    for (int i = 0; i < someGlobalVector[p].size(); ++i) {
        auto someLocalValue = someGlobalVector[p][i];
        auto someOtherLocal = someOtherGlobalVector[p][i];
        std::unordered_set<int> filtered
        for (auto elem : localSet) {
            if (someLocalValue.count(elem) == 0) {
                filtered.insert(elem)
            }
        }
    }
}

localSet、someGlobalVector、someOtherGlobalVector已共享，但仅供阅读。因此，只要此方法之外没有其他线程（同时运行）更改这些结构，就可以了。 vector<someStruct> toCall 是共享和修改的，但修改发生在关键构造函数中，因此那里没有 竞争条件 。

总而言之，假设没有其他线程修改上述共享结构，则您的代码中没有 race-condition。尽管如此，您可以分析您的代码以验证该 critical 区域的开销。如果开销太大，可以尝试以下方法：

void recursiveFunction(int p, std::unordered_set<int> localSet, int total_threads) {
    vector<someStruct> thread_tocall[total_threads];
    vector<someStruct> toCall;
    if (p > SOMEINT) {
        return;
    }
    #pragma omp parallel shared(localSet, someGlobalVector, someOtherGlobalVector)
    {
        int thread_id = omp_get_thread_num();
        for (int i = 0; i < someGlobalVector[p].size(); ++i) {
             auto someLocalValue = someGlobalVector[p][i];
             auto someOtherLocal = someOtherGlobalVector[p][i];
             std::unordered_set<int> filtered;
             for (auto elem : localSet) {
                 if (someLocalValue.count(elem) == 0) {
                    filtered.insert(elem);
                 }
             }
           if (checkSomeCondition(p+1, filtered)) {
                thread_tocall[thread_id].push_back({someOtherLocal, filtered});
            }
         }
    }
    for(int i = 0; i < total_threads; i++)
       for (auto [elem1, elem2] : thread_tocall[i]) {
          recursiveFunction(p+1, filtered);
       }
}

Answer 2

如果全局对象是预先填充的并且只在那时读取 - 那么是的，它通常是线程安全的。但是，如果可以在有人读取共享对象时对其进行修改 - 那么读取也变得不是线程安全的。

通过共享对象线程进行迭代是否安全？

Is iteration through a shared object thread safe?

c++

parallel-processing

multithreading

openmp

thread-safety