OpenMp 和 #pragma omp for in c 它是如何工作的以及如何检查它是否在做他的 objective
OpenMp and #pragma omp for in c how it works and how to check if its doing his objective
我正在用 C 中的 OpenMP 编写一个幻方程序来尝试让它变得更快。但是,我的时间比顺序执行的要高。
我不知道我是否以一种好的方式执行 omp for,我不明白该循环是否正在扩展到线程中,它应该如何变得更快,或者我是否应该使用其他东西,有人可以帮助我吗?
我的示例代码:
#pragma omp parallel private(i,j)
{
//soma diagonal principal
#pragma omp for
for( i = 0; i < size; i++)
sum += matrix[i][i];
#pragma omp for
for( i = 0; i < size; i++)
sumAux += matrix[i][size-i-1];
//printf("\nSoma diagonal principal %i e secundária %i\n", sum, sumAux);
//------------------------------------------LINHAS E COLUNAS-----------------------------------------------------------
#pragma omp for
for(int i = 0; i < size; i++) {
sumRow = 0;
sumCol = 0;
for(int j = 0; j < size; j++) {
sumRow += matrix[i][j];
sumCol += matrix[j][i];
}
//printf("soma Linhas %i\n",sumRow );
//printf("soma Colunas %i\n",sumCol );
}
}
//------------------------------------------PRINTS-----------------------------------------------------------
if (sum == sumCol && sum==sumRow && sum==sumAux ) {
printf("Quadrado magico com sum = %d \n", sum);
} else {
printf("Quadrado nao magico \n");
}
return 0;
}
代码有几个竞争条件,即变量 sum
、sumAux
、sumRow
和 sumCol
的更新。而且,这个:
for(int i = 0; i < size; i++) {
sumRow = 0;
sumCol = 0;
for(int j = 0; j < size; j++) {
sumRow += matrix[i][j];
sumCol += matrix[j][i];
}
}
是错误的,因为:
A "magic square" is an arrangement of numbers (usually integers) in a
square grid, where the numbers in each row, and in each column, and
the numbers in the forward and backward main diagonals, all add up to
the same number.
因此,您应该检查每行值的总和和每列值的总和产生与对角线总和(来自上一步)相同的结果。此外,如果不满足约束,您可以通过提前退出来优化顺序代码:
int sum = 0, sum2 = 0;
for (int i = 0; i < size; i++){
sum = sum + mat[i][i];
sum2 = sum2 + mat[i][size-1-i];
}
if(sum!=sum2)
return 0;
for (int i = 0; i < size; i++) {
int rowSum = 0;
int colSum = 0;
for (int j = 0; j < size; j++){
rowSum += mat[i][j];
colSum += mat[j][i];
}
if (rowSum != sum || sum != colSum)
return 0;
}
return 1;
要解决上述竞争条件,您应该使用 OpenMP reduction
子句:
int sum = 0, sum2 = 0;
#pragma omp parallel for reduction(+:sum, sum2)
for (int i = 0; i < size; i++){
sum = sum + mat[i][i];
sum2 = sum2 + mat[i][N-1-i];
}
if(sum!=sum2)
return 0;
for (int i = 0; i < size; i++) {
int rowSum = 0;
int colSum = 0;
#pragma omp parallel for reduction(+:rowSum, colSum)
for (int j = 0; j < size; j++){
rowSum += mat[i][j];
colSum += mat[j][i];
}
if (rowSum != sum || sum != colSum)
return 0;
}
return 1;
But my times are higher then the sequencial implementation..
在代码中引入 OpenMP 或与此相关的并行性不会神奇地使您的代码更快。 TL;DR 并行完成的工作应该足够大以克服并行的开销(例如, 线程创建、同步等)。为此,您首先需要增加并行任务的大小,即通过将输入大小增加到证明上述并行性开销合理的值。
我正在用 C 中的 OpenMP 编写一个幻方程序来尝试让它变得更快。但是,我的时间比顺序执行的要高。
我不知道我是否以一种好的方式执行 omp for,我不明白该循环是否正在扩展到线程中,它应该如何变得更快,或者我是否应该使用其他东西,有人可以帮助我吗?
我的示例代码:
#pragma omp parallel private(i,j)
{
//soma diagonal principal
#pragma omp for
for( i = 0; i < size; i++)
sum += matrix[i][i];
#pragma omp for
for( i = 0; i < size; i++)
sumAux += matrix[i][size-i-1];
//printf("\nSoma diagonal principal %i e secundária %i\n", sum, sumAux);
//------------------------------------------LINHAS E COLUNAS-----------------------------------------------------------
#pragma omp for
for(int i = 0; i < size; i++) {
sumRow = 0;
sumCol = 0;
for(int j = 0; j < size; j++) {
sumRow += matrix[i][j];
sumCol += matrix[j][i];
}
//printf("soma Linhas %i\n",sumRow );
//printf("soma Colunas %i\n",sumCol );
}
}
//------------------------------------------PRINTS-----------------------------------------------------------
if (sum == sumCol && sum==sumRow && sum==sumAux ) {
printf("Quadrado magico com sum = %d \n", sum);
} else {
printf("Quadrado nao magico \n");
}
return 0;
}
代码有几个竞争条件,即变量 sum
、sumAux
、sumRow
和 sumCol
的更新。而且,这个:
for(int i = 0; i < size; i++) {
sumRow = 0;
sumCol = 0;
for(int j = 0; j < size; j++) {
sumRow += matrix[i][j];
sumCol += matrix[j][i];
}
}
是错误的,因为:
A "magic square" is an arrangement of numbers (usually integers) in a square grid, where the numbers in each row, and in each column, and the numbers in the forward and backward main diagonals, all add up to the same number.
因此,您应该检查每行值的总和和每列值的总和产生与对角线总和(来自上一步)相同的结果。此外,如果不满足约束,您可以通过提前退出来优化顺序代码:
int sum = 0, sum2 = 0;
for (int i = 0; i < size; i++){
sum = sum + mat[i][i];
sum2 = sum2 + mat[i][size-1-i];
}
if(sum!=sum2)
return 0;
for (int i = 0; i < size; i++) {
int rowSum = 0;
int colSum = 0;
for (int j = 0; j < size; j++){
rowSum += mat[i][j];
colSum += mat[j][i];
}
if (rowSum != sum || sum != colSum)
return 0;
}
return 1;
要解决上述竞争条件,您应该使用 OpenMP reduction
子句:
int sum = 0, sum2 = 0;
#pragma omp parallel for reduction(+:sum, sum2)
for (int i = 0; i < size; i++){
sum = sum + mat[i][i];
sum2 = sum2 + mat[i][N-1-i];
}
if(sum!=sum2)
return 0;
for (int i = 0; i < size; i++) {
int rowSum = 0;
int colSum = 0;
#pragma omp parallel for reduction(+:rowSum, colSum)
for (int j = 0; j < size; j++){
rowSum += mat[i][j];
colSum += mat[j][i];
}
if (rowSum != sum || sum != colSum)
return 0;
}
return 1;
But my times are higher then the sequencial implementation..
在代码中引入 OpenMP 或与此相关的并行性不会神奇地使您的代码更快。 TL;DR 并行完成的工作应该足够大以克服并行的开销(例如, 线程创建、同步等)。为此,您首先需要增加并行任务的大小,即通过将输入大小增加到证明上述并行性开销合理的值。