`mkl_set_num_threads` 上限是否为 CPU 线程数？

Question

在 OpenBLAS 中，如果您调用 openblas_set_num_threads 要求线程数高于您拥有的 CPU 线程数，那么它将设置为使用的实际线程数是您的 CPU 线程数。

中看到

我想知道 MKL 是否有相同的行为？ docs 没有明确提到它。但他们确实说：

The number specified is a hint, and Intel® MKL may actually use a smaller number.

Answer 1

它似乎以内核数（而不是线程数）为上限。下面的代码运行在 6 核 Intel Core i7 上：

julia> using MKL_jll

julia> get_max_threads() = ccall((:mkl_get_max_threads, libmkl_rt), Int32, ());

julia> set_max_threads(n) = ccall((:mkl_set_num_threads, libmkl_rt), Cvoid, (Ptr{Int32},), Ref(Int32(n)));

julia> get_max_threads()
6

julia> set_max_threads(4)

julia> get_max_threads()
4

julia> set_max_threads(8)

julia> get_max_threads() # maxed out at 6
6

julia> set_max_threads(24)

julia> get_max_threads() # maxed out at 6
6

julia> set_max_threads(1)

julia> get_max_threads()
1

Answer 2

MKL 行为不同，事实上，您可以拥有比内核更多的线程。

@Kristoffer 在他的回答中没有看到这一点的原因是因为动态调整是 enabled per default:

By default, Intel® MKL can adjust the specified number of threads dynamically. [...] If dynamic adjustment of the number of threads is disabled, Intel® MKL attempts to use the specified number of threads in internal parallel regions (for more information, see theIntel® MKL Developer Guide). Use the mkl_set_dynamic function to control dynamic adjustment of the number of threads.

因此，如果我们使用 mkl_set_dynamic(0) 关闭动态调整，我们将看到以下内容：

>>> set_max_threads(44)
>>> get_max_threads()  
6
>>> mkl_set_dynamic(0)
>>> get_max_threads()
44

所以我们看到，如果不进行动态调整，MKL 可以使用 44 个线程。这是否真的是另一个问题，mkl_get_dynamic 的帮助解释说（即使这些信息对我来说似乎有点过时，因为 get_max_threads 已经在 get_max_threads 中考虑到了):

Suppose that the mkl_get_max_threads function returns the number of threads equal to N. [...] If dynamic adjustment is disabled, Intel ® MKL requests exactly N threads for internal parallel regions ([...]). However, the OpenMP* run-time library may be configured to supply fewer threads than Intel ® MKL requests, depending on the OpenMP* setting of dynamic adjustment.

OpenMP 的方法在算法 2.1 OpenMP-5.0 specification 中给出（我不假装理解）。

在我的机器上，重要的值是 omp_get_thread_limit()=2147483647 和 omp_get_dynamic()=0，因此禁用 MKL_DYNAMIC 并将最大线程数设置得更高我真的可以看到由于更多的开销导致性能下降.

`mkl_set_num_threads` 上限是否为 CPU 线程数？

Will `mkl_set_num_threads` upper-bound to the number of CPU Threads?

multithreading

blas

intel-mkl