omp 部分使用私有 (num_threads) 子句与默认(无子句)
omp sections using private (num_threads) clause vs default (without clauses)
我使用两种情况执行以下代码:
"$omp sections" 和 "!$omp sections private(thread_num)"
在这两种情况下,每个部分都是由不同的线程完成的吗?
program main
use omp_lib
implicit none
integer, parameter:: ma=100, n=10000, mb= 100
real, dimension (ma,n) :: a
real, dimension (n,mb) :: b
real, dimension (ma,mb) :: c = 0.
integer:: i,j,k, threads=2, ppt, thread_num
integer:: toc, tic, rate
real:: time_parallel, time
call random_number (a)
call random_number (b)
!/////////////////////// PARALLEL PRIVATE ///////////////////////
c=0
CALL system_clock(count_rate=rate)
call system_clock(tic)
ppt = ma/threads
!$ call omp_set_num_threads(threads)
!$omp parallel
!$omp sections private(thread_num) !(HERE IS THE QUESTION TOPIC)
! EXAMPLE PROCESS 1 (it is only an example to test 'omp sections')
!$omp section
!$ thread_num = omp_get_thread_num()
!$ print*, "Section 1 started by thread number:", thread_num
do i= 1,50
do j= 1,mb
do k= 1,n
c(i,j) = c(i,j) + a(i,k)*b(k,j)
end do
end do
end do
!$ print*, "Section 1 finished by thread number:", thread_num
! EXAMPLE PROCESS 2
!$omp section
!$ thread_num = omp_get_thread_num()
!$ print*, "Section 2 started by thread number:", thread_num
do i= 51,100
do j= 1,mb
do k= 1,n
c(i,j) = c(i,j) + a(i,k)*b(k,j)
end do
end do
end do
!$ print*, "Section 2 finished by thread number:", thread_num
!$omp end sections
!$omp end parallel
print*, '//////////////////////////////////////////////////////////////'
print*, 'Result in Parallel'
!$ print*, c(85:90,40)
call system_clock(toc)
time_parallel = real(toc-tic)/real(rate)
!/////////////////////// normal execution ///////////////////////
c = 0
CALL system_clock(count_rate=rate)
call system_clock(tic)
call system_clock(tic)
do i= 1,ma
do j= 1,mb
do k= 1,n
c(i,j) = c(i,j) + a(i,k)*b(k,j)
end do
end do
end do
call system_clock(toc)
time = real(toc-tic)/real(rate)
print*, 'Result in serial mode'
print*, c(85:90,40)
print*, '------------------------------------------------'
print*, 'Threads: ', threads, '| Time Parallel ', time_parallel, 's '
print*, ' Time Normal ', time, 's'
!----------------------------------------------------------------
end program main
这分别是“!$omp sections”和“!$omp sections private(thread_num)”的结果:
第 1 节由线程号开始:1
第 2 节从线程号开始:1
第 1 节已完成线程数:1
第 2 部分已完成,线程数:1
////////////////////////////////////////// /////////////////
并行结果
2507.23853 2494.16162 2496.83960 2503.58960 2509.34448
2518.64160
串行模式下的结果
2507.23853 2494.16162 2496.83960 2503.58960 2509.34448
2518.64160
线程:2 |时间平行 0.428116574
时间正常 0.605000019 秒
第 1 节由线程号开始:0
第 2 节从线程号开始:1
第 1 节已完成线程数:0
第 2 部分已完成,线程数:1
////////////////////////////////////////// /////////////////
并行结果
2523.38281 2501.28369 2517.81860 2502.66235 2503.13940
2532.35791
串行模式下的结果
2523.38281 2501.28369 2517.81860 2502.66235 2503.13940
2532.35791
线程:2 |时间平行 0.432999998
时间正常 0.610204018 秒
编译使用:
gfortran -Wall -fopenmp -O2 -Wall -o prog.exe prueba.f90
./prog.exe
CPU 我的笔记本型号:
AMD A6-6310(4 核,每核一个线程)
P.S:主要目标是测试并行子句不加速矩阵计算
thread_num
绝对应该是一个私有变量。否则,两个线程都使用相同的变量,因此您从两个线程都获得了值 1
。从两个线程写入同一个变量是一种竞争条件。
您可以将其设为对整个并行区域私有,并且只在区域开始时调用 omp_get_thread_num()
一次。
!$omp parallel private(thread_num)
!$ thread_num = omp_get_thread_num()
!$omp sections
!$omp section
!$ print*, "Section 1 started by thread number:", thread_num
...
我使用两种情况执行以下代码:
"$omp sections" 和 "!$omp sections private(thread_num)"
在这两种情况下,每个部分都是由不同的线程完成的吗?
program main
use omp_lib
implicit none
integer, parameter:: ma=100, n=10000, mb= 100
real, dimension (ma,n) :: a
real, dimension (n,mb) :: b
real, dimension (ma,mb) :: c = 0.
integer:: i,j,k, threads=2, ppt, thread_num
integer:: toc, tic, rate
real:: time_parallel, time
call random_number (a)
call random_number (b)
!/////////////////////// PARALLEL PRIVATE ///////////////////////
c=0
CALL system_clock(count_rate=rate)
call system_clock(tic)
ppt = ma/threads
!$ call omp_set_num_threads(threads)
!$omp parallel
!$omp sections private(thread_num) !(HERE IS THE QUESTION TOPIC)
! EXAMPLE PROCESS 1 (it is only an example to test 'omp sections')
!$omp section
!$ thread_num = omp_get_thread_num()
!$ print*, "Section 1 started by thread number:", thread_num
do i= 1,50
do j= 1,mb
do k= 1,n
c(i,j) = c(i,j) + a(i,k)*b(k,j)
end do
end do
end do
!$ print*, "Section 1 finished by thread number:", thread_num
! EXAMPLE PROCESS 2
!$omp section
!$ thread_num = omp_get_thread_num()
!$ print*, "Section 2 started by thread number:", thread_num
do i= 51,100
do j= 1,mb
do k= 1,n
c(i,j) = c(i,j) + a(i,k)*b(k,j)
end do
end do
end do
!$ print*, "Section 2 finished by thread number:", thread_num
!$omp end sections
!$omp end parallel
print*, '//////////////////////////////////////////////////////////////'
print*, 'Result in Parallel'
!$ print*, c(85:90,40)
call system_clock(toc)
time_parallel = real(toc-tic)/real(rate)
!/////////////////////// normal execution ///////////////////////
c = 0
CALL system_clock(count_rate=rate)
call system_clock(tic)
call system_clock(tic)
do i= 1,ma
do j= 1,mb
do k= 1,n
c(i,j) = c(i,j) + a(i,k)*b(k,j)
end do
end do
end do
call system_clock(toc)
time = real(toc-tic)/real(rate)
print*, 'Result in serial mode'
print*, c(85:90,40)
print*, '------------------------------------------------'
print*, 'Threads: ', threads, '| Time Parallel ', time_parallel, 's '
print*, ' Time Normal ', time, 's'
!----------------------------------------------------------------
end program main
这分别是“!$omp sections”和“!$omp sections private(thread_num)”的结果:
第 1 节由线程号开始:1
第 2 节从线程号开始:1
第 1 节已完成线程数:1
第 2 部分已完成,线程数:1
////////////////////////////////////////// /////////////////
并行结果
2507.23853 2494.16162 2496.83960 2503.58960 2509.34448
2518.64160
串行模式下的结果
2507.23853 2494.16162 2496.83960 2503.58960 2509.34448
2518.64160
线程:2 |时间平行 0.428116574
时间正常 0.605000019 秒
第 1 节由线程号开始:0
第 2 节从线程号开始:1
第 1 节已完成线程数:0
第 2 部分已完成,线程数:1
////////////////////////////////////////// /////////////////
并行结果
2523.38281 2501.28369 2517.81860 2502.66235 2503.13940
2532.35791
串行模式下的结果
2523.38281 2501.28369 2517.81860 2502.66235 2503.13940
2532.35791
线程:2 |时间平行 0.432999998
时间正常 0.610204018 秒
编译使用:
gfortran -Wall -fopenmp -O2 -Wall -o prog.exe prueba.f90 ./prog.exe
CPU 我的笔记本型号:
AMD A6-6310(4 核,每核一个线程)
P.S:主要目标是测试并行子句不加速矩阵计算
thread_num
绝对应该是一个私有变量。否则,两个线程都使用相同的变量,因此您从两个线程都获得了值 1
。从两个线程写入同一个变量是一种竞争条件。
您可以将其设为对整个并行区域私有,并且只在区域开始时调用 omp_get_thread_num()
一次。
!$omp parallel private(thread_num)
!$ thread_num = omp_get_thread_num()
!$omp sections
!$omp section
!$ print*, "Section 1 started by thread number:", thread_num
...