OpenMp 和 #pragma omp for in c 它是如何工作的以及如何检查它是否在做他的 objective

OpenMp and #pragma omp for in c how it works and how to check if its doing his objective

我正在用 C 中的 OpenMP 编写一个幻方程序来尝试让它变得更快。但是,我的时间比顺序执行的要高。

我不知道我是否以一种好的方式执行 omp for,我不明白该循环是否正在扩展到线程中,它应该如何变得更快,或者我是否应该使用其他东西,有人可以帮助我吗?

我的示例代码:

#pragma omp parallel private(i,j)
  {
    //soma diagonal principal
    #pragma omp for
    for( i = 0; i < size; i++)
        sum += matrix[i][i];

    #pragma omp for
    for( i = 0; i < size; i++)
        sumAux += matrix[i][size-i-1];
    //printf("\nSoma diagonal principal %i e secundária %i\n", sum, sumAux);

    //------------------------------------------LINHAS E COLUNAS-----------------------------------------------------------
    #pragma omp for
    for(int i = 0; i < size; i++) {
        sumRow = 0;
        sumCol = 0;
    for(int j = 0; j < size; j++) {
        sumRow += matrix[i][j];
        sumCol += matrix[j][i];
        }
        //printf("soma  Linhas %i\n",sumRow );
        //printf("soma Colunas %i\n",sumCol );
    }
  }

    //------------------------------------------PRINTS-----------------------------------------------------------
    if (sum == sumCol && sum==sumRow && sum==sumAux  ) {
        printf("Quadrado magico com sum = %d \n", sum);
    } else {
        printf("Quadrado nao magico \n");
    }

    return 0;
}

代码有几个竞争条件,即变量 sumsumAuxsumRowsumCol 的更新。而且,这个:

for(int i = 0; i < size; i++) {
    sumRow = 0;
    sumCol = 0;
for(int j = 0; j < size; j++) {
    sumRow += matrix[i][j];
    sumCol += matrix[j][i];
    }
}

是错误的,因为:

A "magic square" is an arrangement of numbers (usually integers) in a square grid, where the numbers in each row, and in each column, and the numbers in the forward and backward main diagonals, all add up to the same number.

因此,您应该检查每行值的总和和每列值的总和产生与对角线总和(来自上一步)相同的结果。此外,如果不满足约束,您可以通过提前退出来优化顺序代码:

int sum = 0, sum2 = 0; 

for (int i = 0; i < size; i++){
    sum = sum + mat[i][i];
    sum2 = sum2 + mat[i][size-1-i]; 
}   

if(sum!=sum2) 
    return 0;

for (int i = 0; i < size; i++) {     
    int rowSum = 0;
    int colSum = 0;     
    for (int j = 0; j < size; j++){
        rowSum += mat[i][j];
        colSum += mat[j][i];
    }
      
    if (rowSum != sum || sum != colSum)
        return 0;
}
return 1;

要解决上述竞争条件,您应该使用 OpenMP reduction 子句:

int sum = 0, sum2 = 0; 

#pragma omp parallel for reduction(+:sum, sum2)
for (int i = 0; i < size; i++){
    sum = sum + mat[i][i];
    sum2 = sum2 + mat[i][N-1-i]; 
}   

if(sum!=sum2) 
    return 0;
  
for (int i = 0; i < size; i++) {     
    int rowSum = 0;
    int colSum = 0;     
    #pragma omp parallel for reduction(+:rowSum, colSum)
    for (int j = 0; j < size; j++){
        rowSum += mat[i][j];
        colSum += mat[j][i];
    }
      
    if (rowSum != sum || sum != colSum)
        return 0;
}
return 1;

But my times are higher then the sequencial implementation..

在代码中引入 OpenMP 或与此相关的并行性不会神奇地使您的代码更快。 TL;DR 并行完成的工作应该足够大以克服并行的开销(例如, 线程创建、同步等)。为此,您首先需要增加并行任务的大小,即通过将输入大小增加到证明上述并行性开销合理的值。