OpenACC:如何将缩减子句应用于结构变量

OpenACC: How to apply the reduction clause to a struct variable

我只需要将 reduction(max: ) 子句应用到 Dts->t 但似乎没有任何效果,我尝试了 reduction(max:Dts.t), reduction(max:Dts->t), reduction(max:Dts) reduction(max:t).

  #pragma acc parallel loop collapse(3) reduction(max:t) present(Dts) 
        for (k = KBEG; k <= KEND; k++){
        for (j = JBEG; j <= JEND; j++){
        for (i = IBEG; i <= IEND; i++){
          Dts->t = MAX(Dts->t, C_dt[k][j][i]);
         
        }}}

我遇到了这些类型的错误:

PGC-S-0035-Syntax error: Recovery attempted by replacing '.' by ',' (update_stage.c: 450)
PGC-S-0035-Syntax error: Recovery attempted by replacing identifier present by accparallel (update_stage.c: 450)
PGC-S-0040-Illegal use of symbol, invDt_hyp (update_stage.c: 450)
PGC-S-0036-Syntax error: Recovery attempted by inserting <nl> before keyword for (update_stage.c: 451)
PGC-S-0978-The clause parallel is deprecated; use clause gang instead (update_stage.c: 451)
PGC-S-0374-Clause gang(value) not allowed in #pragma acc parallel loop (update_stage.c: 451)

Dts 是 Step 类型的变量。

typedef struct Step_{
  double *cmax; 
  double t;  
  .
  .
  .
} Step;

我试图加速的循环在主函数中调用的例程中。在主函数中,定义了 Dts,然后我写了

#pragma acc enter data create(Dts)

#pragma acc enter data copyin(Dts.t[:1])

根据 OpenACC 标准,缩减变量不能是复合变量的成员。解决此限制的最简单方法是使用局部标量变量,然后将结果分配回结构成员。

类似于:

  double tmax;
  ...
  tmax = Dts->t;
  #pragma acc parallel loop collapse(3) reduction(max:tmax) present(Dts) 
        for (k = KBEG; k <= KEND; k++){
        for (j = JBEG; j <= JEND; j++){
        for (i = IBEG; i <= IEND; i++){
          tmax = MAX(tmax, C_dt[k][j][i]);
         
        }}}
  Dts->t = tmax;

如果您需要设备上 Dts->t 的值,请在分配后将其添加到更新设备指令中,或者将“tmax”放入数据区域并将分配放入串行区域。

 // best if you need the value of Dts->t on both the host and device
  double tmax;
  ...
  tmax = Dts->t;
  #pragma acc parallel loop collapse(3) reduction(max:tmax) present(Dts) 
        for (k = KBEG; k <= KEND; k++){
        for (j = JBEG; j <= JEND; j++){
        for (i = IBEG; i <= IEND; i++){
          tmax = MAX(tmax, C_dt[k][j][i]);
         
        }}}
  Dts->t = tmax;
  #pragma acc update device(Dts->t)

  // best if you only need the value of Dts->t on the device
  double tmax;
  ...
  #pragma acc data create(tmax) 
  {
  #pragma acc serial present(Dts)
  {
  tmax = Dts->t;
  }
  #pragma acc parallel loop collapse(3) reduction(max:tmax) present(Dts) 
        for (k = KBEG; k <= KEND; k++){
        for (j = JBEG; j <= JEND; j++){
        for (i = IBEG; i <= IEND; i++){
          tmax = MAX(tmax, C_dt[k][j][i]);
         
        }}}
  #pragma acc serial present(Dts)
  {
  Dts->t = tmax;
  }
  }