Numpy 点操作未使用所有 cpu 个核心

Numpy dot operation is not using all cpu cores

当 b 的行大小从 2 到 15(即从 (2, 10000) 到 (15, 10000))时会发生这种情况。

示例:

import numpy as np

a = np.random.rand(10**4, 10**4)

def dot(a, b_row_size):
    b = np.random.rand(b_row_size, 10**4)

    for i in range(10):
        # dot operation
        x = np.dot(a, b.T)

# Using all CPU cores
dot(a, 1)

# Using only one CPU core
dot(a, 2)

# Using only one CPU core
dot(a, 5)

# Using only one CPU core
dot(a, 15)

# Using all CPU cores
dot(a, 16)

# Using all CPU cores
dot(a, 50)

np.show_config()

openblas_lapack_info:
    define_macros = [('HAVE_CBLAS', None)]
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
lapack_opt_info:
    define_macros = [('HAVE_CBLAS', None)]
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
blas_mkl_info:
  NOT AVAILABLE
lapack_mkl_info:
  NOT AVAILABLE
blas_opt_info:
    define_macros = [('HAVE_CBLAS', None)]
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
blis_info:
  NOT AVAILABLE
openblas_info:
    define_macros = [('HAVE_CBLAS', None)]
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c

Numpy dot operation is not using all cpu cores

numpy.show_config() 清楚地表明它在下划线级别使用 OpenBLAS。

所以 OpenBLAS 才是真正负责并行计算的。

但在 sgemm 中,OpenBLAS 不会将计算并行化到某个阈值(在您的情况下,b 的行大小是 2 到 15)。

作为解决方法,您可以在 sgemm file and compile OpenBLAS 中使用 numpy

更改阈值 (GEMM_MULTITHREAD_THRESHOLD)

GEMM_MULTITHREAD_THRESHOLD 值从 4 更改为 0 以并行化所有sgemm 计算。