优化 numpy 数组乘法:* 比 numpy.dot 快?
Optimizing numpy array multiplication: * faster than numpy.dot?
问题:
1) 当使用 BLAS 时,为什么 numpy.dot()
比下面示例代码中的 *
慢?
2) 在这种情况下,有没有一种方法可以实现 numpy.dot()
而不是 *
以实现更快的数组乘法?我认为我遗漏了一条可以回答问题 1 的关键信息,这意味着 numpy.dot()
至少与 *
一样快,如果不是更快的话。
详情如下。在此先感谢您的回答和帮助。
详情:
我正在编写一个程序,使用 python 2.7(64 位)、numpy 1.11.2、Anaconda2 在 Windows 7 上求解耦合偏微分方程。为了提高程序输出的准确性,我需要使用大数组(形状 (2, 2^14) 和更大)和小的积分步骤,导致每次模拟需要进行大量数组乘法运算,我需要优化速度。
有 looked around,似乎 numpy.dot()
应该用于相对于 *
更快的数组乘法,只要安装了 BLAS 并使用 numpy。这是经常推荐的。但是,当我使用下面的计时器脚本时,*
比 numpy.dot()
快至少 7 倍。在某些情况下,这会增加到 >1000 倍:
from __future__ import division
import numpy as np
import timeit
def dotter(a, b):
return np.dot(a, b)
def timeser(a, b):
return a*b
def wrapper(func, a, b):
def wrapped():
return func(a, b)
return wrapped
size = 100
num = int(3e5)
a = np.random.random_sample((size, size))
b = np.random.random_sample((size, size))
wrapped = wrapper(dotter, a, b)
dotTime = timeit.timeit(wrapped, number=num)/num
print "\nTime for np.dot: ", dotTime
wrapped = wrapper(timeser, a, b)
starTime = timeit.timeit(wrapped, number=num)/num
print "\nTime for *: ", starTime
print "dotTime / starTime: ", dotTime/starTime
这输出:
Time for np.dot: 8.58201189949e-05
Time for *: 1.07564737429e-05
dotTime / starTime: 7.97846218436
numpy.dot()
和 *
都分布在多个内核上,我认为这表明 BLAS 至少在某种程度上起作用:
查看 numpy.__config__.show()
似乎我正在使用 BLAS 和 lapack(虽然不是 openblas_lapack?):
lapack_opt_info:
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
blas_opt_info:
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
openblas_lapack_info:
NOT AVAILABLE
lapack_mkl_info:
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
blas_mkl_info:
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
np.dot
调用矩阵-矩阵乘法,而 *
是逐元素乘法。矩阵乘法的符号是 @
for Python 3.5+.
问题:
1) 当使用 BLAS 时,为什么 numpy.dot()
比下面示例代码中的 *
慢?
2) 在这种情况下,有没有一种方法可以实现 numpy.dot()
而不是 *
以实现更快的数组乘法?我认为我遗漏了一条可以回答问题 1 的关键信息,这意味着 numpy.dot()
至少与 *
一样快,如果不是更快的话。
详情如下。在此先感谢您的回答和帮助。
详情:
我正在编写一个程序,使用 python 2.7(64 位)、numpy 1.11.2、Anaconda2 在 Windows 7 上求解耦合偏微分方程。为了提高程序输出的准确性,我需要使用大数组(形状 (2, 2^14) 和更大)和小的积分步骤,导致每次模拟需要进行大量数组乘法运算,我需要优化速度。
有 looked around,似乎 numpy.dot()
应该用于相对于 *
更快的数组乘法,只要安装了 BLAS 并使用 numpy。这是经常推荐的。但是,当我使用下面的计时器脚本时,*
比 numpy.dot()
快至少 7 倍。在某些情况下,这会增加到 >1000 倍:
from __future__ import division
import numpy as np
import timeit
def dotter(a, b):
return np.dot(a, b)
def timeser(a, b):
return a*b
def wrapper(func, a, b):
def wrapped():
return func(a, b)
return wrapped
size = 100
num = int(3e5)
a = np.random.random_sample((size, size))
b = np.random.random_sample((size, size))
wrapped = wrapper(dotter, a, b)
dotTime = timeit.timeit(wrapped, number=num)/num
print "\nTime for np.dot: ", dotTime
wrapped = wrapper(timeser, a, b)
starTime = timeit.timeit(wrapped, number=num)/num
print "\nTime for *: ", starTime
print "dotTime / starTime: ", dotTime/starTime
这输出:
Time for np.dot: 8.58201189949e-05
Time for *: 1.07564737429e-05
dotTime / starTime: 7.97846218436
numpy.dot()
和 *
都分布在多个内核上,我认为这表明 BLAS 至少在某种程度上起作用:
查看 numpy.__config__.show()
似乎我正在使用 BLAS 和 lapack(虽然不是 openblas_lapack?):
lapack_opt_info:
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
blas_opt_info:
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
openblas_lapack_info:
NOT AVAILABLE
lapack_mkl_info:
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
blas_mkl_info:
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
np.dot
调用矩阵-矩阵乘法,而 *
是逐元素乘法。矩阵乘法的符号是 @
for Python 3.5+.