Numpy 点操作未使用所有 cpu 个核心

Question

我在两个矩阵上做 numpy 点积（假设 a 和 b 是两个矩阵）。
当 a 的形状为 (10000, 10000) 且 b 的形状为 (1, 10000) 时 numpy.dot(a, b.T) 正在使用所有CPU 个核心。
但是当 a 的形状是 (10000, 10000) 并且 b 的形状是 (2, 10000) 那么 numpy.dot(a, b.T) 不是使用所有 CPU 个核心（仅使用一个）。

当 b 的行大小从 2 到 15（即从 (2, 10000) 到 (15, 10000)）时会发生这种情况。

示例：

import numpy as np

a = np.random.rand(10**4, 10**4)

def dot(a, b_row_size):
    b = np.random.rand(b_row_size, 10**4)

    for i in range(10):
        # dot operation
        x = np.dot(a, b.T)

# Using all CPU cores
dot(a, 1)

# Using only one CPU core
dot(a, 2)

# Using only one CPU core
dot(a, 5)

# Using only one CPU core
dot(a, 15)

# Using all CPU cores
dot(a, 16)

# Using all CPU cores
dot(a, 50)

np.show_config()

openblas_lapack_info:
    define_macros = [('HAVE_CBLAS', None)]
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
lapack_opt_info:
    define_macros = [('HAVE_CBLAS', None)]
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
blas_mkl_info:
  NOT AVAILABLE
lapack_mkl_info:
  NOT AVAILABLE
blas_opt_info:
    define_macros = [('HAVE_CBLAS', None)]
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
blis_info:
  NOT AVAILABLE
openblas_info:
    define_macros = [('HAVE_CBLAS', None)]
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c

Answer 1

Numpy dot operation is not using all cpu cores

numpy.show_config() 清楚地表明它在下划线级别使用 OpenBLAS。

所以 OpenBLAS 才是真正负责并行计算的。

但在 sgemm 中，OpenBLAS 不会将计算并行化到某个阈值（在您的情况下，b 的行大小是 2 到 15）。

作为解决方法，您可以在 sgemm file and compile OpenBLAS 中使用 numpy

更改阈值 (GEMM_MULTITHREAD_THRESHOLD)

将 GEMM_MULTITHREAD_THRESHOLD 值从 4 更改为 0 以并行化所有sgemm 计算。

Numpy 点操作未使用所有 cpu 个核心

Numpy dot operation is not using all cpu cores

python

numpy

openblas

python-3.5