了解#pragma acc 内核
Understanding #pragma acc kernels
我正在尝试优化 nbody 算法,当我在循环中添加 #pragma acc kernels 时,我不明白以下注释是什么
#pragma acc kernels
for (i = 0; i < n; i++)
{
real fx, fy, fz;
fx = fy = fz = 0;
real iPosx = in[i].x;
real iPosy = in[i].y;
real iPosz = in[i].z;
for (j = 0; j < n; j++)
{
real rx, ry, rz;
rx = in[j].x - iPosx;
ry = in[j].y - iPosy;
rz = in[j].z - iPosz;
real distSqr = rx*rx+ry*ry+rz*rz;
distSqr += SOFTENING_SQUARED;
real s = in[j].w / POW(distSqr,1.5);
real3 ff;
ff.x = rx * s;
ff.y = ry * s;
ff.z = rz * s;
fx += ff.x;
fy += ff.y;
fz += ff.z;
}
force[i].x = fx;
force[i].y = fy;
force[i].z = fz;
}
“生成隐式缩减(+:fx)”是什么意思
"生成隐式缩减(+:fy)
"generating implicit reduction(+:fz)"?
谢谢
为了并行化内部 "j" 循环,三个变量 fx、fy 和 fz 必须在总和缩减中。编译器已自动检测到这一点,因此隐式地为您添加了缩减。这与您明确声明它们一样,例如:
#pragma acc loop reduction(+:fx,fy,fz)
for (j = 0; j < n; j++)
{
real rx, ry, rz;
我正在尝试优化 nbody 算法,当我在循环中添加 #pragma acc kernels 时,我不明白以下注释是什么
#pragma acc kernels
for (i = 0; i < n; i++)
{
real fx, fy, fz;
fx = fy = fz = 0;
real iPosx = in[i].x;
real iPosy = in[i].y;
real iPosz = in[i].z;
for (j = 0; j < n; j++)
{
real rx, ry, rz;
rx = in[j].x - iPosx;
ry = in[j].y - iPosy;
rz = in[j].z - iPosz;
real distSqr = rx*rx+ry*ry+rz*rz;
distSqr += SOFTENING_SQUARED;
real s = in[j].w / POW(distSqr,1.5);
real3 ff;
ff.x = rx * s;
ff.y = ry * s;
ff.z = rz * s;
fx += ff.x;
fy += ff.y;
fz += ff.z;
}
force[i].x = fx;
force[i].y = fy;
force[i].z = fz;
}
“生成隐式缩减(+:fx)”是什么意思
"生成隐式缩减(+:fy)
"generating implicit reduction(+:fz)"?
谢谢
为了并行化内部 "j" 循环,三个变量 fx、fy 和 fz 必须在总和缩减中。编译器已自动检测到这一点,因此隐式地为您添加了缩减。这与您明确声明它们一样,例如:
#pragma acc loop reduction(+:fx,fy,fz)
for (j = 0; j < n; j++)
{
real rx, ry, rz;