主机发送的值不是 return 设备使用 CUDA Fortran 正确发送的值

Question

我以 CUDA Fortran 的主机和设备之间的数据传输为例，发现了这个：

主机代码：

program incTest  
    use cudafor
    use simpleOps_m
    implicit none
    integer, parameter :: n = 256
    integer :: a(n), b, i
    integer, device :: a_d(n)
    a = 1
    b = 3
    a_d = a
    call inc<<<1,n>>>(a_d, b)
    a = a_d
    if (all(a == 4)) then
        write(*,*) 'Success'
    endif
end program incTest

设备代码：

module simpleOps_m
contains
    attributes(global) subroutine inc(a, b)
        implicit none
        integer :: a(:)
        integer, value :: b
        integer :: i
        i = threadIdx%x
        a(i) = a(i)+b
    end subroutine inc
end module simpleOps_m

预期的结果是控制台显示 "Success"，但这并没有发生。屏幕上什么也没有出现，没有错误或消息。发生这种情况是因为不输入 if，因为 a_d 与调用 inc 子例程之前的值相同。

我正在使用：

OS: Linux - Ubuntu 16

库达 8

要编译的 PGI

要编译的命令：

pgf90 -Mcuda -c Device.cuf
pgf90 -Mcuda -c Host.cuf
pgf90 -Mcuda -o HostDevice Device.o Host.o
./HostDevice

我尝试了其他示例，但它们也不起作用。

我尝试使用简单的 Fortran (.f90) 代码和相同的命令进行编译，它成功了！

我该如何解决这个问题？

Answer 1

您使用的是什么类型的设备？（如果您不知道，post 来自 "pgaccelinfo" 实用程序的输出）。

我最好的猜测是您有一个基于 Pascal 的设备，在这种情况下您需要使用“-Mcuda=cc60”进行编译。

例如，如果我将错误检查添加到示例代码中，我们会看到当运行在没有 "cc60" 的 Pascal 上时，我们得到一个无效的设备内核错误作为编译的一部分。

% cat test.cuf 
 module simpleOps_m 
      contains 
          attributes(global) subroutine inc(a, b) 
              implicit none 
              integer :: a(:) 
              integer, value :: b 
              integer :: i 
              i = threadIdx%x 
              a(i) = a(i)+b 
          end subroutine inc 
  end module simpleOps_m 

 program incTest 
          use cudafor 
          use simpleOps_m 
          implicit none 
          integer, parameter :: n = 256 
          integer :: a(n), b, i, istat 
          integer, device :: a_d(n) 
          a = 1 
          b = 3 
          a_d = a 
          call inc<<<1,n>>>(a_d, b) 
          istat=cudaDeviceSynchronize() 
          istat=cudaGetLastError() 
          a = a_d 
          if (all(a == 4)) then 
              write(*,*) 'Success' 
          else 
              write(*,*) 'Error code:', cudaGetErrorString(istat) 
          endif 
  end program incTest 
 % pgf90 test.cuf -Mcuda 
 % a.out 
  Error code: 
  invalid device function                                                        
 % pgf90 test.cuf -Mcuda=cc60 
 % a.out 
  Success

主机发送的值不是 return 设备使用 CUDA Fortran 正确发送的值

A value sended by host not return correctly by device using CUDA Fortran

parallel-processing

fortran

cuda

gpgpu

pgi