OpenMP PARALLEL DO 中的子例程 - 程序崩溃

Subroutine inside OpenMP PARALLEL DO - Program Crash

这两本我都看过了 Calling an internal subroutine inside OpenMP region and Global Variables in Fortran OpenMP. My understanding (from here) 是:

以下是我的代码的简化版本:

!$OMP PARALLEL DO DEFAULT(SHARED) PRIVATE(j,dummy1,dummy2,dummy3,dummy4)
DO j=1,ntotal
  dummy1 = 0.0d0
  dummy2 = foo(j)
  CALL kernel(dummy1,dummy1,dummy2,dummy3,dummy4)  
  Variable(j) = dummy3 + dummy4
END DO 
!$OMP END PARALLEL DO 

子程序内核然后取入dummy1和dummy2并输出OUT dummy3和dummy4。我编译:

 -fopenmp -fno-automatic -fcheck=all

我得到:

Fortran runtime error: Recursive call to nonrecursive procedure 'kernel'

根据 here 我的理解是预期的。当我在没有 -fcheck 的情况下进行编译时,有时代码会顺利通过子例程调用,但大多数时候它会崩溃而不会出现错误。我猜这是因为我的子例程不是线程安全的。传递给子例程的所有参数对于每个线程都应该是私有的和独立的。修剪后的子程序如下:

SUBROUTINE kernel(r,dx,hsml,w,dwdx)   

  USE Initial_Parameters
  IMPLICIT NONE 

  ! DATA DICTIONARY: DECLARE CALLING PARAMETER TYPES AND DEFINITIONS
  REAL(KIND=dp), INTENT(IN)                 ::  r           
  REAL(KIND=dp), DIMENSION(dim), INTENT(IN) ::  dx          
  REAL(KIND=dp), INTENT(IN)                 ::  hsml        
  REAL(KIND=dp), INTENT(OUT)                ::  w           
  REAL(KIND=dp), DIMENSION(dim), INTENT(OUT)::  dwdx        
  ! DATA DICTIONARY: DECLARE LOCAL VARIABLE TYPES AND DEFINITIONS
  INTEGER                                   ::  i, d  
  REAL(KIND=dp)                             ::  q, dw
  REAL(KIND=dp)                             ::  factor      


  ! Kernel functions are funcitons of q, the distance between particles
  ! divided by the smoothing length  
  q = r/hsml 
  ! Preset the kernel to zero
  w = 0.e0
  ! Preset the derivative of the kernel to zero
  DO d=1,dim         
    dwdx(d) = 0.e0
  END DO   

  IF (skf == 1) THEN     

    ! If the problem is one dimensional then,
    IF (dim == 1) THEN
      ! The coefficient, alpha = factor is given by:
      factor = 1.e0/hsml
    ! If the problem is two dimensional then,
    ELSE IF (dim == 2) THEN
      ! The coefficient, alpha = factor is given by:
      factor = 15.e0/(7.e0*pi*hsml*hsml)
    ! If the problem is two dimensional then,
    ELSE IF (dim == 3) THEN
      ! The coefficient, alpha = factor is given by:
      factor = 3.e0/(2.e0*pi*hsml*hsml*hsml)
    ! If the dimension value is not 1, 2 or 3 then there is a problem.
    ELSE
       WRITE(*,*)' >>> Error <<< : Wrong dimension: Dim =',dim
       STOP
    END IF

    ! Smoothing function for 1st range of q.                                         
    IF (q >= 0 .AND. q <= 1.e0) THEN
      ! The smoothing function is given by:
      w = factor * (2./3. - q*q + q*q*q / 2.)
      ! For each dimension work out the gradient of the smoothing function
      DO d = 1, dim
        dwdx(d) = factor * (-2.+3./2.*q)/hsml**2 * dx(d)       
      END DO   

    ! Smoothing function for 2nd range of q.  
    ELSE IF (q > 1.e0 .AND. q <= 2) THEN  
      ! Smoothing function is equal to:        
      w = factor * 1.e0/6.e0 * (2.-q)**3 
      ! Gadient of the smoothing function in each dimension.
      DO d = 1, dim
        dwdx(d) =-factor * 1.e0/6.e0 * 3.*(2.-q)**2/hsml * (dx(d)/r)        
      END DO   

    ! Smoothing function and gradient for all other values of q is zero.
    ELSE
      ! Smoothing function is equal to: 
      w=0.
      ! Gadient of the smoothing function in each dimension.
      DO d= 1, dim
        dwdx(d) = 0.
      END DO             
    END IF     

END SUBROUTINE kernel

局部变量应该是私有的,所有传递的参数都是私有的。模块参数是共享的,但这很好。你能解释一下为什么会崩溃吗?

对于 -fno-automatickernel 中的局部变量将隐式为 SAVEd。描述here 注释

Local variables with the SAVE attribute declared in procedures called from a parallel region are implicitly SHARED.

因此,kernel 确实不是线程安全的(据我所知)。

另请注意,在您的示例中,您将 dummy1 作为第一个和第二个参数传递给 kernel,但是您对该例程的定义指定了标量中的第一个参数 (r)而第二个 (dx) 是一个长度为 dim 的数组。我不确定这是否只是您的最小示例或您的真实代码的产物,但这可能会导致问题。您是在模块内声明 kernel 然后使用该模块吗?这将生成应该有助于捕捉此类事物的接口。