在 Linux 中使用 Matlab MEX 时,OpenMP 没有加速
No speedup with OpenMP when using Matlab MEX in Linux
我正在使用 OpenMP 加速 Matlab MEX 文件中的 Fortran 代码。但是,我发现 OpenMP 似乎不适用于 Linux,但实际上适用于 Windows。我附上代码如下:
1) Matlab Mex 文件:
clc; clear all; close all; tic
FLAG_SYS = 0; % 0 for Windows; 1 for Linux
%--------------------------------------------------------------------------
% Mex Fortran code
%--------------------------------------------------------------------------
if FLAG_SYS == 0
mex COMPFLAGS="-Qopenmp $COMPFLAGS"...
LINKFLAGS="/Qopenmp $LINKFLAGS"...
OPTIMFLAGS="/Qopenmp $OPTIMFLAGS"...
'-IC:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.5.267\windows\mkl\include'...
'-LC:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.5.267\windows\mkl\lib\intel64'...
-lmkl_intel_ilp64.lib -lmkl_intel_thread.lib -lmkl_core.lib libiomp5md.lib...
Test_OpenMP_Mex.f90...
-output Test_OpenMP_Mex
elseif FLAG_SYS == 1
mex COMPFLAGS="-fopenmp $COMPFLAGS"...
LINKFLAGS="-fopenmp $LINKFLAGS"...
FFLAGS='$FFLAGS -fdec-math -cpp' ...
'-I${MKLROOT}/include'...
'-L${MKLROOT}/lib'...
-lmkl_avx2 -lmkl_gf_ilp64 -lmkl_core -lmkl_intel_thread -liomp5 -lpthread -lm -ldl...
Test_OpenMP_Mex.f90...
-output Test_OpenMP_Mex
end
Test_OpenMP_Mex;
2) Fortran 代码
#include "fintrf.h"
!GATEWAY ROUTINE
SUBROUTINE MEXFUNCTION(NLHS, PLHS, NRHS, PRHS)
!DECLARATIONS
IMPLICIT NONE
!MEXFUNCTION ARGUMENTS:
MWPOINTER PLHS(*), PRHS(*)
INTEGER NLHS, NRHS
!FUNCTION DECLARATIONS:
MWPOINTER MXCREATEDOUBLEMATRIX
MWPOINTER MXGETM, MXGETN
INTEGER MXISNUMERIC
!POINTERS TO INPUT MXARRAYS:
MWPOINTER MIV1, MIV2
!POINTERS TO OUTPUT MXARRAYS:
MWPOINTER MOV1, MOV2
!CALL FORTRAN CODE
CALL TEST_OPENMP
RETURN
END
!-----------------------------------------------------------------------
SUBROUTINE TEST_OPENMP
USE OMP_LIB
IMPLICIT NONE
INTEGER I, J, K, STEP
REAL*8 STARTTIME, ENDTIME,Y
OPEN(1,FILE='1.TXT')
!COUNT ELAPSED TIME START
STARTTIME = OMP_GET_WTIME()
DO I = 1,1000000
DO J = 1,50000
DO K = 1,1000
Y=(I+10)*J-SQRT(789.1)+SQRT(789.1)-(I+10)*J
END DO
END DO
END DO
ENDTIME = OMP_GET_WTIME()
WRITE(1,*) ENDTIME-STARTTIME
!COUNT ELAPSED TIME START
STARTTIME = OMP_GET_WTIME()
!$OMP PARALLEL
!$OMP DO PRIVATE(I,J)
DO I = 1,1000000
DO J = 1,50000
DO K = 1,1000
Y=(I+10)*J-SQRT(789.1)+SQRT(789.1)-(I+10)*J
END DO
END DO
END DO
!$OMP END DO
!$OMP END PARALLEL
ENDTIME = OMP_GET_WTIME()
WRITE(1,*) ENDTIME-STARTTIME
!$OMP PARALLEL
! GET THE NUMBER OF THREADS
WRITE(1,*) OMP_GET_THREAD_NUM(), OMP_GET_NUM_THREADS()
!$OMP END PARALLEL
CLOSE(1)
RETURN
END SUBROUTINE TEST_OPENMP
Windows 上的输出是:
1.09620520001044
4.50355500000296
0 6
1 6
3 6
5 6
2 6
4 6
Linux 上的输出是:
0.0000
0.0000
0 1
很明显,OpenMP 在 Windows 上工作,因为计算时间从 4.5 秒减少到 1.0 秒。我可以发现有 6 个线程用于计算。但是Linux上好像没有执行任何计算,只有2个线程(Linux上的线程数是36,但是只用了2个)
欢迎提出任何建议!
您可以直接从这里下载代码link:
https://www.dropbox.com/sh/crkuwhu22407sjs/AAAQrtzAvTmFOmAxv_jpTCBaa?dl=0
在 Linux(和 MacOS)下编译 MEX 文件时,忽略 COMPFLAGS
变量。它是一个 Windows 特定的环境变量。您需要对 C 使用 CFLAGS
,对 C++ 使用 CXXFLAGS
,对 Fortran 使用 FFLAGS
,对链接器使用 LDFLAGS
。这些是控制编译的标准 Unix 环境变量。
您的编译命令将如下所示:
mex LDFLAGS='-fopenmp $LDFLAGS'...
FFLAGS='-fopenmp -fdec-math -cpp $FFLAGS' ...
'-I${MKLROOT}/include'...
'-L${MKLROOT}/lib'...
-lmkl_avx2 -lmkl_gf_ilp64 -lmkl_core -lmkl_intel_thread -liomp5 -lpthread -lm -ldl...
Test_OpenMP_Mex.f90...
-output Test_OpenMP_Mex
参考:
在针对 intel mkl ilp64 版本的库进行衬里时,有一个注意事项不容错过:
您需要添加 -I4 编译器选项,否则,您可能会看到某种意外的段错误...请参阅 mkl 链接器顾问以查看更多详细信息:https://software.intel.com/content/www/us/en/develop/articles/intel-mkl-link-line-advisor.html
我正在使用 OpenMP 加速 Matlab MEX 文件中的 Fortran 代码。但是,我发现 OpenMP 似乎不适用于 Linux,但实际上适用于 Windows。我附上代码如下:
1) Matlab Mex 文件:
clc; clear all; close all; tic
FLAG_SYS = 0; % 0 for Windows; 1 for Linux
%--------------------------------------------------------------------------
% Mex Fortran code
%--------------------------------------------------------------------------
if FLAG_SYS == 0
mex COMPFLAGS="-Qopenmp $COMPFLAGS"...
LINKFLAGS="/Qopenmp $LINKFLAGS"...
OPTIMFLAGS="/Qopenmp $OPTIMFLAGS"...
'-IC:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.5.267\windows\mkl\include'...
'-LC:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.5.267\windows\mkl\lib\intel64'...
-lmkl_intel_ilp64.lib -lmkl_intel_thread.lib -lmkl_core.lib libiomp5md.lib...
Test_OpenMP_Mex.f90...
-output Test_OpenMP_Mex
elseif FLAG_SYS == 1
mex COMPFLAGS="-fopenmp $COMPFLAGS"...
LINKFLAGS="-fopenmp $LINKFLAGS"...
FFLAGS='$FFLAGS -fdec-math -cpp' ...
'-I${MKLROOT}/include'...
'-L${MKLROOT}/lib'...
-lmkl_avx2 -lmkl_gf_ilp64 -lmkl_core -lmkl_intel_thread -liomp5 -lpthread -lm -ldl...
Test_OpenMP_Mex.f90...
-output Test_OpenMP_Mex
end
Test_OpenMP_Mex;
2) Fortran 代码
#include "fintrf.h"
!GATEWAY ROUTINE
SUBROUTINE MEXFUNCTION(NLHS, PLHS, NRHS, PRHS)
!DECLARATIONS
IMPLICIT NONE
!MEXFUNCTION ARGUMENTS:
MWPOINTER PLHS(*), PRHS(*)
INTEGER NLHS, NRHS
!FUNCTION DECLARATIONS:
MWPOINTER MXCREATEDOUBLEMATRIX
MWPOINTER MXGETM, MXGETN
INTEGER MXISNUMERIC
!POINTERS TO INPUT MXARRAYS:
MWPOINTER MIV1, MIV2
!POINTERS TO OUTPUT MXARRAYS:
MWPOINTER MOV1, MOV2
!CALL FORTRAN CODE
CALL TEST_OPENMP
RETURN
END
!-----------------------------------------------------------------------
SUBROUTINE TEST_OPENMP
USE OMP_LIB
IMPLICIT NONE
INTEGER I, J, K, STEP
REAL*8 STARTTIME, ENDTIME,Y
OPEN(1,FILE='1.TXT')
!COUNT ELAPSED TIME START
STARTTIME = OMP_GET_WTIME()
DO I = 1,1000000
DO J = 1,50000
DO K = 1,1000
Y=(I+10)*J-SQRT(789.1)+SQRT(789.1)-(I+10)*J
END DO
END DO
END DO
ENDTIME = OMP_GET_WTIME()
WRITE(1,*) ENDTIME-STARTTIME
!COUNT ELAPSED TIME START
STARTTIME = OMP_GET_WTIME()
!$OMP PARALLEL
!$OMP DO PRIVATE(I,J)
DO I = 1,1000000
DO J = 1,50000
DO K = 1,1000
Y=(I+10)*J-SQRT(789.1)+SQRT(789.1)-(I+10)*J
END DO
END DO
END DO
!$OMP END DO
!$OMP END PARALLEL
ENDTIME = OMP_GET_WTIME()
WRITE(1,*) ENDTIME-STARTTIME
!$OMP PARALLEL
! GET THE NUMBER OF THREADS
WRITE(1,*) OMP_GET_THREAD_NUM(), OMP_GET_NUM_THREADS()
!$OMP END PARALLEL
CLOSE(1)
RETURN
END SUBROUTINE TEST_OPENMP
Windows 上的输出是:
1.09620520001044
4.50355500000296
0 6
1 6
3 6
5 6
2 6
4 6
Linux 上的输出是:
0.0000
0.0000
0 1
很明显,OpenMP 在 Windows 上工作,因为计算时间从 4.5 秒减少到 1.0 秒。我可以发现有 6 个线程用于计算。但是Linux上好像没有执行任何计算,只有2个线程(Linux上的线程数是36,但是只用了2个)
欢迎提出任何建议!
您可以直接从这里下载代码link: https://www.dropbox.com/sh/crkuwhu22407sjs/AAAQrtzAvTmFOmAxv_jpTCBaa?dl=0
在 Linux(和 MacOS)下编译 MEX 文件时,忽略 COMPFLAGS
变量。它是一个 Windows 特定的环境变量。您需要对 C 使用 CFLAGS
,对 C++ 使用 CXXFLAGS
,对 Fortran 使用 FFLAGS
,对链接器使用 LDFLAGS
。这些是控制编译的标准 Unix 环境变量。
您的编译命令将如下所示:
mex LDFLAGS='-fopenmp $LDFLAGS'...
FFLAGS='-fopenmp -fdec-math -cpp $FFLAGS' ...
'-I${MKLROOT}/include'...
'-L${MKLROOT}/lib'...
-lmkl_avx2 -lmkl_gf_ilp64 -lmkl_core -lmkl_intel_thread -liomp5 -lpthread -lm -ldl...
Test_OpenMP_Mex.f90...
-output Test_OpenMP_Mex
参考:
在针对 intel mkl ilp64 版本的库进行衬里时,有一个注意事项不容错过: 您需要添加 -I4 编译器选项,否则,您可能会看到某种意外的段错误...请参阅 mkl 链接器顾问以查看更多详细信息:https://software.intel.com/content/www/us/en/develop/articles/intel-mkl-link-line-advisor.html