OpenACC 声明构造
OpenACC Declare Construct
我使用 PGI 编译器检查了 OpenACC 2.6 支持的功能,并遇到了 CPU 和 GPU 之间的内存管理问题。
以下 Fortran 代码是 official document:
的修改版本
module data
integer, parameter :: maxl = 100000
real, dimension(maxl) :: xstat
real, dimension(:), allocatable :: yalloc
!$acc declare create(xstat,yalloc)
end module
module useit
use data
contains
subroutine compute(n)
integer :: n
integer :: i
!$acc parallel loop present(yalloc)
do i = 1, n
yalloc(i) = iprocess(i)
enddo
end subroutine
real function iprocess(i)
!$acc routine seq
integer :: i
iprocess = yalloc(i) + 2*xstat(i)
end function
end module
program main
use data
use useit
implicit none
integer :: nSize = 100
!---------------------------------------------------------------------------
call allocit(nSize)
call initialize
call compute(nSize)
!$acc update self(yalloc)
write(*,*) "yalloc(10)=",yalloc(10) ! should be 3
call finalize
contains
subroutine allocit(n)
integer :: n
allocate(yalloc(n))
end subroutine allocit
subroutine initialize
xstat = 1.0
yalloc = 1.0
!$acc enter data copyin(xstat,yalloc)
end subroutine initialize
subroutine finalize
deallocate(yalloc)
end subroutine finalize
end program main
这段代码可以用nvfortran
编译:
nvfortran -Minfo test.f90
它显示了 CPU 上的期望值:
yalloc(10)= 3.000000
但是,当使用 OpenACC 编译时:
nvfortran -add -Minfo test.f90
代码没有显示正确的输出:
upload CUDA data device=0 threadid=1 variable=descriptor bytes=128
upload CUDA data device=0 threadid=1 variable=.attach. bytes=8
upload CUDA data file=/home/yang/GPU-Collection/openacc/basics/globalArray.f90 function=initialize line=55 device=0 threadid=1 variable=.attach. bytes=8
launch CUDA kernel file=/home/yang/GPU-Collection/openacc/basics/globalArray.f90 function=compute line=14 device=0 threadid=1 num_gangs=1 num_workers=1 vector_length=128 grid=1 block=128
download CUDA data file=/home/yang/GPU-Collection/openacc/basics/globalArray.f90 function=main line=41 device=0 threadid=1 variable=yalloc bytes=400
yalloc(10)= 0.000000
我曾尝试在几个地方添加一些显式内存移动,但无济于事。这真的让我很困惑。
问题出在您的初始化例程中:
subroutine initialize
xstat = 1.0
yalloc = 1.0
!acc enter data copyin(xstat,yalloc)
!$acc update device(xstat,yalloc)
end subroutine initialize
由于 xstat 和 yalloc 已经在一个数据区域(declare 指令)中,第二个数据区域(“输入数据复制”)基本上被忽略(尽管引用计数器已更新)。相反,您需要使用更新指令来同步数据。
通过此更改,代码得到正确答案:
% nvfortran test.f90 -acc -Minfo=accel; a.out
compute:
14, Generating Tesla code
15, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
iprocess:
19, Generating acc routine seq
Generating Tesla code
main:
41, Generating update self(yalloc(:))
initialize:
56, Generating update device(yalloc(:),xstat(:))
yalloc(10)= 3.000000
我使用 PGI 编译器检查了 OpenACC 2.6 支持的功能,并遇到了 CPU 和 GPU 之间的内存管理问题。
以下 Fortran 代码是 official document:
的修改版本module data
integer, parameter :: maxl = 100000
real, dimension(maxl) :: xstat
real, dimension(:), allocatable :: yalloc
!$acc declare create(xstat,yalloc)
end module
module useit
use data
contains
subroutine compute(n)
integer :: n
integer :: i
!$acc parallel loop present(yalloc)
do i = 1, n
yalloc(i) = iprocess(i)
enddo
end subroutine
real function iprocess(i)
!$acc routine seq
integer :: i
iprocess = yalloc(i) + 2*xstat(i)
end function
end module
program main
use data
use useit
implicit none
integer :: nSize = 100
!---------------------------------------------------------------------------
call allocit(nSize)
call initialize
call compute(nSize)
!$acc update self(yalloc)
write(*,*) "yalloc(10)=",yalloc(10) ! should be 3
call finalize
contains
subroutine allocit(n)
integer :: n
allocate(yalloc(n))
end subroutine allocit
subroutine initialize
xstat = 1.0
yalloc = 1.0
!$acc enter data copyin(xstat,yalloc)
end subroutine initialize
subroutine finalize
deallocate(yalloc)
end subroutine finalize
end program main
这段代码可以用nvfortran
编译:
nvfortran -Minfo test.f90
它显示了 CPU 上的期望值:
yalloc(10)= 3.000000
但是,当使用 OpenACC 编译时:
nvfortran -add -Minfo test.f90
代码没有显示正确的输出:
upload CUDA data device=0 threadid=1 variable=descriptor bytes=128
upload CUDA data device=0 threadid=1 variable=.attach. bytes=8
upload CUDA data file=/home/yang/GPU-Collection/openacc/basics/globalArray.f90 function=initialize line=55 device=0 threadid=1 variable=.attach. bytes=8
launch CUDA kernel file=/home/yang/GPU-Collection/openacc/basics/globalArray.f90 function=compute line=14 device=0 threadid=1 num_gangs=1 num_workers=1 vector_length=128 grid=1 block=128
download CUDA data file=/home/yang/GPU-Collection/openacc/basics/globalArray.f90 function=main line=41 device=0 threadid=1 variable=yalloc bytes=400
yalloc(10)= 0.000000
我曾尝试在几个地方添加一些显式内存移动,但无济于事。这真的让我很困惑。
问题出在您的初始化例程中:
subroutine initialize
xstat = 1.0
yalloc = 1.0
!acc enter data copyin(xstat,yalloc)
!$acc update device(xstat,yalloc)
end subroutine initialize
由于 xstat 和 yalloc 已经在一个数据区域(declare 指令)中,第二个数据区域(“输入数据复制”)基本上被忽略(尽管引用计数器已更新)。相反,您需要使用更新指令来同步数据。
通过此更改,代码得到正确答案:
% nvfortran test.f90 -acc -Minfo=accel; a.out
compute:
14, Generating Tesla code
15, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
iprocess:
19, Generating acc routine seq
Generating Tesla code
main:
41, Generating update self(yalloc(:))
initialize:
56, Generating update device(yalloc(:),xstat(:))
yalloc(10)= 3.000000