OpenMP 一次只执行一个线程

Question

这是我的代码：

template <unsigned int DIM>
MyVector<DIM> MyVector<DIM>::operator+(MyVector& other) {
    MyVector ans = MyVector<DIM>();
    #pragma omp parallel for
    for (unsigned int i = 0; i < DIM; ++i)
    {
        std::cout << omp_get_thread_num() << std::endl;
        ans.values_[i] = values_[i] + other.values_[i];
    }
    return ans;
}

其中 values_ 是一个 std::vector 模板化的 double，DIM 类似于 1024。

我用'g++ -std=c++14 -fopenmp -g'

编译的

即使我有多个线程，但我在不使用 OpenMP 时获得的性能几乎没有差异。

的确，行：

std::cout << omp_get_thread_num() << std::endl;

表示一次执行一个线程...

输出很干净，类似于 11111...、22222...、00000...、33333... 并且 htop 始终只显示一个 100% 的核心，在整个执行。

我在几台机器上试过几个发行版，到处都是一样的。

Answer 1

您可能想像这样重写代码以防止 I/O 的巨大开销（这也或多或少地序列化了程序执行）：

template <unsigned int DIM>
MyVector<DIM> MyVector<DIM>::operator+(MyVector& other) {
    MyVector ans = MyVector<DIM>();
    #pragma omp parallel
    {
        #pragma omp critical(console_io)
        {
            // The following are actually two function calls and a critical
            // region is needed in order to ensure I/O atomicity
            std::cout << omp_get_thread_num() << std::endl;
        }
        #pragma omp for schedule(static)
        for (unsigned int i = 0; i < DIM; ++i)
        {
            ans.values_[i] = values_[i] + other.values_[i];
        }
    }
    return ans;
}

确保 DIM 足够大，以便与正在完成的工作相比，OpenMP 的开销很小，同时又足够小，使向量适合 CPU 的最后一级缓存。一旦不再是后者，您的循环就会受内存限制，添加新线程不会导致更快的计算。

OpenMP 一次只执行一个线程

OpenMP executes only one thread at a time

c++

multithreading

stl

openmp