用于单核使用的英特尔编译器标志

Question

我注意到堡运行代码主要包含 matrix/matrix 和 matrix/vector 乘法，这在我看来是一个令人惊讶的行为。

最初，代码是用 gfort运行编译的，乘法是在矩阵的行和列上用双 "DO" 循环执行的。我编译代码使用：

gfortran -c -g -O3 ...

代码的执行使用了 8 核 i7 处理器的单核。

然后我使用英特尔编译器编译了我的代码：

ifort -c -g -O3 ...

代码运行使用单核速度明显更快。然后我决定使用众所周知的 dgemm 和 dgemv 函数分别针对 matrix/matrix 和 matrix/vector 乘法优化代码。

然后我编译使用：

ifort -c -g -O3 ...

生成的代码工作正常，但使用了我的 i7 处理器的 8 个内核，没有任何显着的性能改进。有没有办法通过编译命令控制我的代码使用的内核数？

Answer 1

编译器本身不生成任何并行代码。但是英特尔数学核心函数库 (MKL)（DGEMM 和朋友居住的地方）进行自动并行化和 CPU 分派。

MKL 文档是这样说的：

Use the following techniques to specify the number of OpenMP threads to use in Intel MKL:

Set one of the OpenMP or Intel MKL environment variables: OMP_NUM_THREADS MKL_NUM_THREADS MKL_DOMAIN_NUM_THREADS

Call one of the OpenMP or Intel MKL functions: omp_set_num_threads() mkl_set_num_threads() mkl_domain_set_num_threads() mkl_set_num_threads_local()

用于单核使用的英特尔编译器标志

Intel compiler flags for single core usage

fortran

intel

compiler-optimization

intel-fortran