Fortran-OpenACC包含的子程序如何从父子程序访问数据

How can Fortran-OpenACC contained subroutine access data from parent subroutine

我目前正在加速 Fortran 代码,其中包含的子例程 (subsub) 访问和修改在父子例程 (sub) 中声明的变量:

module mod
  implicit none
contains
  subroutine sub
    integer :: var(10)
    integer :: i

    !$acc kernels loop
    do i = 1, 10
      call subsub
    enddo
  contains
    subroutine subsub
      !$acc routine
      var(i) = i
    endsubroutine
  endsubroutine
endmodule

program test
  use mod
  call sub
endprogram

使用PGI编译器20.9-0版本编译时,提示subsub无法引用宿主变量var:

sub:
      8, Generating implicit copy(.S0000) [if not already present]
      9, Loop is parallelizable
         Generating Tesla code
          9, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
NVFORTRAN-S-0155-acc routine cannot be used for contained subprograms that refer to host subprogram data: var (test.f90)
  0 inform,   0 warnings,   1 severes, 0 fatal for subsub

这是有道理的。 我尝试使用 acc data create(var)acc declare create(var) 在设备上创建 var,但它不会改变结果。

这个模式可以加速吗?

不,这个模式行不通。对于包含的例程,编译器将隐藏参数传递给父级的堆栈指针。在这种情况下,堆栈指针将指向主机,这会在尝试从设备访问它时导致问题。

解决方法是将变量传递给子例程。例如:

% cat test2.f90
module mod
  implicit none
contains
  subroutine sub
    integer :: var(10)
    integer :: i

    !$acc kernels loop
    do i = 1, 10
      call subsub(var,i)
    enddo
    print *, var
  contains
    subroutine subsub(var,i)
      !$acc routine
    integer :: var(10)
    integer, value :: i
      var(i) = i
    endsubroutine
  endsubroutine
endmodule

program test
  use mod
  call sub
endprogram
% nvfortran test2.f90 -acc -Minfo=accel ; a.out
sub:
      8, Generating implicit copy(.S0000,var(:)) [if not already present]
      9, Loop is parallelizable
         Generating Tesla code
          9, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
subsub:
     14, Generating acc routine seq
         Generating Tesla code
            1            2            3            4            5            6
            7            8            9           10