分段错误 - 无效的内存引用(条件跳转或移动取决于未初始化的值)

Segmentation fault - invalid memory reference (Conditional jump or move depends on uninitialised value(s))

这是我要执行的代码:

  SUBROUTINE GRAD(tasklist_GRAD,ww,pas,cpt ,nb_element,cpt1,dt,dx,p_element,u_prime,u_prime_moins,u_prime_plus,&
        &taux,grad_x_u,grad_t_u,grad_x_f,grad_t_f,ax_plus,ax_moins,ux_plus,ux_moins,sm,flux,tab0,tab)
    INTEGER ::i,j,k,ff,pas
    INTEGER,intent(inout)::cpt,cpt1,nb_element,ww
    real*8  :: dt,dx
    integer ,allocatable, dimension(:),intent(inout) ::p_element
    REAL*8 ,allocatable, dimension(:),intent(inout) :: u_prime,u_prime_moins, u_prime_plus,taux,grad_x_u,&
         &grad_t_u,grad_t_f,grad_x_f,flux,sm
    real*8,allocatable,dimension(:),intent(inout) :: ax_plus,ax_moins,ux_moins,ux_plus
    REAL*8 ,allocatable, dimension(:,:),intent(inout) ::tab0,tab
    integer::num_thread,nthreads
    integer, external :: OMP_GET_THREAD_NUM, OMP_GET_NUM_THREADS
    type(tcb),dimension(20)::tasklist_GRAD,tasks_ready_master
    integer,allocatable,dimension(:)::threads_list
    integer,dimension(30)::threads_list_all
    integer,dimension(3)::threads_list_part1
    integer::threads_list_part2    
    integer,dimension(16)::threads_list_part3
    type(tcb)::self

    !-----------Calcul des gradients de x

    
    Choisircese: select case (ww)

    case(0)  !  Old CESE
       tasklist_GRAD(1)%f_ptr => u_prime_1 !1
       tasklist_GRAD(2)%f_ptr => u_prime_droite_1 !2
       tasklist_GRAD(3)%f_ptr => u_prime_gauche_1 !3

       !$OMP PARALLEL PRIVATE(num_thread,nthreads) &
       !$OMP SHARED(tasklist_GRAD,threads_list,threads_list_all,tasks_ready_master) &
       !$OMP SHARED(threads_list_part1,threads_list_part2,threads_list_part3)

       num_thread=OMP_GET_THREAD_NUM()
       nthreads=OMP_GET_NUM_THREADS()
       
       !Thread Application Master
       !$OMP SINGLE
       if (num_thread==1) then 
          do ff=1,3
             if (associated(tasklist_GRAD(ff)%f_ptr) .eqv. .true. ) then
                tasks_ready_master(ff) = tasklist_GRAD(ff) 
             end if
          end do

          do ff=1,3
             if (associated(tasks_ready_master(ff)%f_ptr) .eqv. .true.) then
                tasks_ready_master(ff)%state=STATE_READY
             end if
          end do          
       end if
       !$OMP END SINGLE

       !Thread Master
       !$OMP SINGLE
       if (num_thread==0) then          
          allocate(threads_list(nthreads-2))
          do ff=1,nthreads-2
             threads_list(ff)=ff+1
          end do

          do ff=1,3,nthreads-2
             if (tasks_ready_master(ff)%state==STATE_READY) then
                threads_list_all(ff:ff+nthreads-3)=threads_list(:)
             end if
          end do
          threads_list_part1=threads_list_all(1:3) 
       end if
       !$OMP END SINGLE 
       !Threads workers
       do ff=1,3
          if (num_thread==threads_list_part1(ff)) then
             !$OMP TASK
             call tasks_ready_master(ff)%f_ptr(self,ww,pas,cpt ,nb_element,cpt1,dt,dx,p_element,u_prime,u_prime_moins,&
                  &u_prime_plus,taux,grad_x_u,grad_t_u,grad_x_f,grad_t_f,ax_plus,ax_moins,ux_plus,ux_moins,sm,flux,tab0,tab)
             !$OMP END TASK
          end if
       end do

       !$OMP END PARALLEL


       if(pas.eq.2)then
          u_prime(2) = tab0(2,2)+dt/2.0d0*grad_x_u(2)!d_t_u(2)
          u_prime(cpt-1) = tab0(cpt-1,2)+dt/2.0d0*grad_t_u(cpt-1)
          u_prime(1) = tab0(1,2)+dt/2.0d0*grad_t_u(1)
          u_prime(cpt) = tab0(cpt,2)+dt/2.0d0*grad_t_u(cpt)

          u_prime_plus(1)= (u_prime(2)-tab(1,2))/(dx/2.0d0)
          u_prime_moins(1)=-(u_prime(1)-tab(1,2))/(dx/2.0d0)
          u_prime_plus(cpt)= (u_prime(cpt)-tab(cpt,2))/(dx/2.0d0)
          u_prime_moins(cpt)= -(u_prime(cpt-1)-tab(cpt,2))/(dx/2.0d0)
       end if
end select choisircese
  END SUBROUTINE GRAD

代码很长,所以我只贴了案例0(足以理解整个子程序的部分)。

为了编译,我这样做:

gfortran -fopenmp -O3 -g HECESE_openmp.f90

为了执行,我是这样的(我之前固定了线程数):

./a.out

我得到的错误是:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7f0eae2d6700 in ???
#0  0x7f0eae2d6700 in ???
#0  0x7f0eae2d6700 in ???
#1  0x7f0eae2d58a5 in ???
#1  0x7f0eae2d58a5 in ???
#2  0x7f0eadf7520f in ???
#3  0x0 in ???
Erreur de segmentation (core dumped)

我决定使用 valgrind,并且我做到了:

valgrind --track-origins=yes ./a.out

我得到的错误是:

==10923== Thread 6:
==10923== Conditional jump or move depends on uninitialised value(s)
==10923==    at 0x1153DE: __procedures_MOD_grad._omp_fn.4 (HECESE_openmp.f90:702)
==10923==    by 0x4C81A85: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10923==    by 0x4F13608: start_thread (pthread_create.c:477)
==10923==    by 0x4DCD292: clone (clone.S:95)
==10923==  Uninitialised value was created by a stack allocation
==10923==    at 0x10B22A: __procedures_MOD_grad (HECESE_openmp.f90:482)
==10923== 
==10923== Thread 1:
==10923== Conditional jump or move depends on uninitialised value(s)
==10923==    at 0x1153DE: __procedures_MOD_grad._omp_fn.4 (HECESE_openmp.f90:702)
==10923==    by 0x4C78E75: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10923==    by 0x10BF35: __procedures_MOD_grad (HECESE_openmp.f90:659)
==10923==    by 0x1123CA: MAIN__ (HECESE_openmp.f90:1065)
==10923==    by 0x113E45: main (HECESE_openmp.f90:723)
==10923==  Uninitialised value was created by a stack allocation
==10923==    at 0x10B22A: __procedures_MOD_grad (HECESE_openmp.f90:482)
==10923== 
==10923== Thread 8:
==10923== Jump to the invalid address stated on the next line
==10923==    at 0x0: ???
==10923==    by 0x4C7BD7A: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10923==    by 0x4C846A7: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10923==    by 0x1153A4: __procedures_MOD_grad._omp_fn.4 (HECESE_openmp.f90:674)
==10923==    by 0x4C81A85: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10923==    by 0x4F13608: start_thread (pthread_create.c:477)
==10923==    by 0x4DCD292: clone (clone.S:95)
==10923==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==10923== 

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
==10923== Thread 5:
==10923== Jump to the invalid address stated on the next line
==10923==    at 0x0: ???
==10923==    by 0x4C7BD7A: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10923==    by 0x4C846A7: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10923==    by 0x4C81A91: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10923==    by 0x4F13608: start_thread (pthread_create.c:477)
==10923==    by 0x4DCD292: clone (clone.S:95)
==10923==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==10923== 
==10923== Thread 1:
==10923== Jump to the invalid address stated on the next line
==10923==    at 0x0: ???
==10923==    by 0x4C7BD7A: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10923==    by 0x4C846A7: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10923==    by 0x4C8304C: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==10923==    by 0x10BF35: __procedures_MOD_grad (HECESE_openmp.f90:659)
==10923==    by 0x1123CA: MAIN__ (HECESE_openmp.f90:1065)
==10923==    by 0x113E45: main (HECESE_openmp.f90:723)
==10923==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==10923== 
#0  0x487e700 in ???
#1  0x487d8a5 in ???
#2  0x4cf120f in ???
#3  0x0 in ???
==10923== 
==10923== Process terminating with default action of signal 11 (SIGSEGV)
==10923==    at 0x4CF1169: raise (raise.c:46)
==10923==    by 0x4CF120F: ??? (in /usr/lib/x86_64-linux-gnu/libc-2.31.so)
==10923== 
==10923== HEAP SUMMARY:
==10923==     in use at exit: 263,919 bytes in 122 blocks
==10923==   total heap usage: 195 allocs, 73 frees, 330,511 bytes allocated
==10923== 
==10923== LEAK SUMMARY:
==10923==    definitely lost: 0 bytes in 0 blocks
==10923==    indirectly lost: 0 bytes in 0 blocks
==10923==      possibly lost: 3,344 bytes in 11 blocks
==10923==    still reachable: 260,575 bytes in 111 blocks
==10923==         suppressed: 0 bytes in 0 blocks
==10923== Rerun with --leak-check=full to see details of leaked memory
==10923== 
==10923== For lists of detected and suppressed errors, rerun with: -s
==10923== ERROR SUMMARY: 165 errors from 5 contexts (suppressed: 0 from 0)
Erreur de segmentation (core dumped)

你能帮我找出所有这些错误是从哪里来的吗?没关系,直到我添加 $!OMP SINGLE 并停用 $!OMP BARRIER

考虑

       !$OMP SINGLE
       if (num_thread==0) then          
...
          threads_list_part1=threads_list_all(1:3) 
       end if
       !$OMP END SINGLE 
       !Threads workers
       do ff=1,3
          if (num_thread==threads_list_part1(ff)) then

到达此代码的第一个线程将进入 single 块。然后所有其他线程将跳过它并在末尾的隐式屏障处等待,直到进入该块的线程完成其工作。如果且仅当时,进入块的线程是线程号0,数组threads_list_part1将被初始化。如果任何其他线程进入该块,它将不会被初始化。您无法保证哪个线程进入该块,因此您看到的是编号为 not 零的线程是第一个到达单个块的线程。可能的解决方案:只需摆脱 if (num_thread==0) then 并且类似地摆脱它之前的另一个块。

也就是说,在看到您正在做的事情之后,一种更加 OpenMP 的方法可能是使用并行部分,这是我第一次看到这可能是明智的做法。