Fortran-OpenACC包含的子程序如何从父子程序访问数据
How can Fortran-OpenACC contained subroutine access data from parent subroutine
我目前正在加速 Fortran 代码,其中包含的子例程 (subsub
) 访问和修改在父子例程 (sub
) 中声明的变量:
module mod
implicit none
contains
subroutine sub
integer :: var(10)
integer :: i
!$acc kernels loop
do i = 1, 10
call subsub
enddo
contains
subroutine subsub
!$acc routine
var(i) = i
endsubroutine
endsubroutine
endmodule
program test
use mod
call sub
endprogram
使用PGI编译器20.9-0版本编译时,提示subsub
无法引用宿主变量var
:
sub:
8, Generating implicit copy(.S0000) [if not already present]
9, Loop is parallelizable
Generating Tesla code
9, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
NVFORTRAN-S-0155-acc routine cannot be used for contained subprograms that refer to host subprogram data: var (test.f90)
0 inform, 0 warnings, 1 severes, 0 fatal for subsub
这是有道理的。
我尝试使用 acc data create(var)
或 acc declare create(var)
在设备上创建 var
,但它不会改变结果。
这个模式可以加速吗?
不,这个模式行不通。对于包含的例程,编译器将隐藏参数传递给父级的堆栈指针。在这种情况下,堆栈指针将指向主机,这会在尝试从设备访问它时导致问题。
解决方法是将变量传递给子例程。例如:
% cat test2.f90
module mod
implicit none
contains
subroutine sub
integer :: var(10)
integer :: i
!$acc kernels loop
do i = 1, 10
call subsub(var,i)
enddo
print *, var
contains
subroutine subsub(var,i)
!$acc routine
integer :: var(10)
integer, value :: i
var(i) = i
endsubroutine
endsubroutine
endmodule
program test
use mod
call sub
endprogram
% nvfortran test2.f90 -acc -Minfo=accel ; a.out
sub:
8, Generating implicit copy(.S0000,var(:)) [if not already present]
9, Loop is parallelizable
Generating Tesla code
9, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
subsub:
14, Generating acc routine seq
Generating Tesla code
1 2 3 4 5 6
7 8 9 10
我目前正在加速 Fortran 代码,其中包含的子例程 (subsub
) 访问和修改在父子例程 (sub
) 中声明的变量:
module mod
implicit none
contains
subroutine sub
integer :: var(10)
integer :: i
!$acc kernels loop
do i = 1, 10
call subsub
enddo
contains
subroutine subsub
!$acc routine
var(i) = i
endsubroutine
endsubroutine
endmodule
program test
use mod
call sub
endprogram
使用PGI编译器20.9-0版本编译时,提示subsub
无法引用宿主变量var
:
sub:
8, Generating implicit copy(.S0000) [if not already present]
9, Loop is parallelizable
Generating Tesla code
9, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
NVFORTRAN-S-0155-acc routine cannot be used for contained subprograms that refer to host subprogram data: var (test.f90)
0 inform, 0 warnings, 1 severes, 0 fatal for subsub
这是有道理的。
我尝试使用 acc data create(var)
或 acc declare create(var)
在设备上创建 var
,但它不会改变结果。
这个模式可以加速吗?
不,这个模式行不通。对于包含的例程,编译器将隐藏参数传递给父级的堆栈指针。在这种情况下,堆栈指针将指向主机,这会在尝试从设备访问它时导致问题。
解决方法是将变量传递给子例程。例如:
% cat test2.f90
module mod
implicit none
contains
subroutine sub
integer :: var(10)
integer :: i
!$acc kernels loop
do i = 1, 10
call subsub(var,i)
enddo
print *, var
contains
subroutine subsub(var,i)
!$acc routine
integer :: var(10)
integer, value :: i
var(i) = i
endsubroutine
endsubroutine
endmodule
program test
use mod
call sub
endprogram
% nvfortran test2.f90 -acc -Minfo=accel ; a.out
sub:
8, Generating implicit copy(.S0000,var(:)) [if not already present]
9, Loop is parallelizable
Generating Tesla code
9, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
subsub:
14, Generating acc routine seq
Generating Tesla code
1 2 3 4 5 6
7 8 9 10