在这个简单的例子中,为什么 Matlab 看起来比 Python 慢得多
Why does Matlab seem so much slower than Python in this simple case
我最近问a question on the optimization of a mask function in Matlab。我得到了两个对我有很大帮助的答案,但根据我的时间安排,似乎所有 Matlab 解决方案都比其中一个 Numpy 解决方案慢得多。不同功能的代码可以在我之前的问题中找到,但为了让您了解我在做什么,我给出了 Numpy "loop" 解决方案,这当然不是最快的,但可能是最简单的阅读:
def dealiasing2d(where, data):
nk, n0, n1 = data.shape
for i0 in xrange(n0):
for i1 in xrange(n1):
if where[i0, i1]:
data[:, i0, i1] = 0.
我获得(使用 Matlab R2014b 和 "basic" Numpy 1.9.1 与 Blas 和 Lapack 链接)(n0 = n1 = N
):
N = 500 ; nk = 500:
Method | time (s) | normalized
----------------|----------|------------
Numpy | 0.05 | 1.0
Numpy loop | 0.05 | 1.0
Matlab find | 0.74 | 14.8
Matlab bsxfun2 | 0.76 | 15.2
Matlab bsxfun | 0.78 | 15.6
Matlab loop | 0.78 | 15.6
Matlab repmat | 0.89 | 17.8
N = 500 ; nk = 100:
Method | time (s) | normalized
----------------|----------|------------
Numpy | 0.01 | 1.0
Numpy loop | 0.03 | 3.0
Matlab find | 0.15 | 13.6
Matlab bsxfun2 | 0.15 | 13.6
Matlab bsxfun | 0.16 | 14.5
Matlab loop | 0.16 | 14.5
Matlab repmat | 0.18 | 16.4
N = 2000 ; nk = 10:
Method | time (s) | normalized
----------------|----------|------------
Numpy | 0.02 | 1.0
Matlab find | 0.23 | 13.8
Matlab bsxfun2 | 0.23 | 13.8
Matlab bsxfun | 0.26 | 15.6
Matlab repmat | 0.28 | 16.8
Matlab loop | 0.34 | 20.4
Numpy loop | 0.42 | 25.1
在我看来,这些结果很奇怪。对我来说,Numpy 和 Matlab 在科学计算方面非常相似,所以性能应该相似,而这里有超过 10 倍!所以我的第一个猜测是我比较这两种语言的方式有问题。另一种可能是我的 Matlab 设置有问题,但我不明白为什么。或者 Matlab 和 Numpy 之间真正的深层区别?
任何人都可以对这些函数计时来验证这些结果吗?你知道为什么在这个简单的例子中 Matlab 看起来比 Python 慢得多吗?
为了给 Matlab 函数计时,我使用了一个文件:
N = 500;
n0 = N;
n1 = N;
nk = 500;
disp(['N = ', num2str(N), ' ; nk = ', num2str(nk)])
where = false([n1, n0]);
where(1:100, 1:100) = 1;
data = (5.+1i)*ones([n1, n0, nk]);
disp('time dealiasing2d_loops:')
time = timeit(@() dealiasing2d_loops(where, data));
disp([' ', num2str(time), ' s'])
disp('time dealiasing2d_find:')
time = timeit(@() dealiasing2d_find(where, data));
disp([' ', num2str(time), ' s'])
disp('time dealiasing2d_bsxfun:')
time = timeit(@() dealiasing2d_bsxfun(where, data));
disp([' ', num2str(time), ' s'])
disp('time dealiasing2d_bsxfun2:')
time = timeit(@() dealiasing2d_bsxfun2(where, data));
disp([' ', num2str(time), ' s'])
disp('time dealiasing2d_repmat:')
time = timeit(@() dealiasing2d_repmat(where, data));
disp([' ', num2str(time), ' s'])
我用
测量 Python 函数的性能
from __future__ import print_function
import numpy as np
from timeit import timeit, repeat
import_lines = {
'numpy_bad': 'from dealiasing_numpy_bad import dealiasing2d as dealiasing',
'numpy': 'from dealiasing_numpy import dealiasing'}
tools = import_lines.keys()
time_approx_one_bench = 5.
setup = """
import numpy as np
N = 500
n0, n1 = N, N
nk = 500
where = np.zeros((n0, n1), dtype=np.uint8)
where[0:100, 0:100] = 1
data = (5.+1j)*np.ones((nk, n0, n1), dtype=np.complex128)
"""
exec(setup)
print('n0 = n1 = {}, nk = {}'.format(N, nk))
print(13*' ' + 'min mean')
durations = np.empty([len(tools)])
for it, tool in enumerate(tools):
setup_tool = import_lines[tool] + setup
duration = timeit(setup_tool + 'dealiasing(where, data)', number=1)
nb_repeat = int(round((time_approx_one_bench - duration)/duration))
nb_repeat = max(1, nb_repeat)
ds = np.array(repeat('dealiasing(where, data)',
setup=setup_tool, number=1, repeat=nb_repeat))
duration = ds.min()
print('{:11s} {:8.2e} s ; {:8.2e} s'.format(
tool.capitalize() + ':', duration, ds.mean()))
durations[it] = duration
fastest = tools[durations.argmin()].capitalize()
durations = durations / durations.min()
print('Durations normalized by the fastest method (' + fastest + '):')
for it, tool in enumerate(tools):
print('{:11s} {:8.3f}'.format(tool.capitalize() + ':', durations[it]))
我认为这主要与复制 data
变量有关。如果您运行整理东西以便 MATLAB 的写时复制行为对您有利,那么您可以获得相当不错的时间。我用线性索引写了一个简单的版本
function data = dealiasing2d2(where_dealiased, data)
[n1, n2, nk] = size(data);
where_li = find(where_dealiased);
for idx = 1:nk
offset = n1 * n2 * (idx-1);
data(where_li + offset) = 0;
end
我 运行 喜欢这个(请注意,重要的是 timed
是一个 函数 而不是允许 data
执行的脚本被重复使用)
function timed
N = 2000;
nk = 10;
where = false([N, N]);
where(1:100, 1:100) = 1;
data = (5.+1j)*ones([N, N, nk]);
tic, data = dealiasing2d2(where,data); toc
这在我的 GLNXA64 机器上运行 0.00437
秒 运行 R2014b。
我最近问a question on the optimization of a mask function in Matlab。我得到了两个对我有很大帮助的答案,但根据我的时间安排,似乎所有 Matlab 解决方案都比其中一个 Numpy 解决方案慢得多。不同功能的代码可以在我之前的问题中找到,但为了让您了解我在做什么,我给出了 Numpy "loop" 解决方案,这当然不是最快的,但可能是最简单的阅读:
def dealiasing2d(where, data):
nk, n0, n1 = data.shape
for i0 in xrange(n0):
for i1 in xrange(n1):
if where[i0, i1]:
data[:, i0, i1] = 0.
我获得(使用 Matlab R2014b 和 "basic" Numpy 1.9.1 与 Blas 和 Lapack 链接)(n0 = n1 = N
):
N = 500 ; nk = 500:
Method | time (s) | normalized
----------------|----------|------------
Numpy | 0.05 | 1.0
Numpy loop | 0.05 | 1.0
Matlab find | 0.74 | 14.8
Matlab bsxfun2 | 0.76 | 15.2
Matlab bsxfun | 0.78 | 15.6
Matlab loop | 0.78 | 15.6
Matlab repmat | 0.89 | 17.8
N = 500 ; nk = 100:
Method | time (s) | normalized
----------------|----------|------------
Numpy | 0.01 | 1.0
Numpy loop | 0.03 | 3.0
Matlab find | 0.15 | 13.6
Matlab bsxfun2 | 0.15 | 13.6
Matlab bsxfun | 0.16 | 14.5
Matlab loop | 0.16 | 14.5
Matlab repmat | 0.18 | 16.4
N = 2000 ; nk = 10:
Method | time (s) | normalized
----------------|----------|------------
Numpy | 0.02 | 1.0
Matlab find | 0.23 | 13.8
Matlab bsxfun2 | 0.23 | 13.8
Matlab bsxfun | 0.26 | 15.6
Matlab repmat | 0.28 | 16.8
Matlab loop | 0.34 | 20.4
Numpy loop | 0.42 | 25.1
在我看来,这些结果很奇怪。对我来说,Numpy 和 Matlab 在科学计算方面非常相似,所以性能应该相似,而这里有超过 10 倍!所以我的第一个猜测是我比较这两种语言的方式有问题。另一种可能是我的 Matlab 设置有问题,但我不明白为什么。或者 Matlab 和 Numpy 之间真正的深层区别?
任何人都可以对这些函数计时来验证这些结果吗?你知道为什么在这个简单的例子中 Matlab 看起来比 Python 慢得多吗?
为了给 Matlab 函数计时,我使用了一个文件:
N = 500;
n0 = N;
n1 = N;
nk = 500;
disp(['N = ', num2str(N), ' ; nk = ', num2str(nk)])
where = false([n1, n0]);
where(1:100, 1:100) = 1;
data = (5.+1i)*ones([n1, n0, nk]);
disp('time dealiasing2d_loops:')
time = timeit(@() dealiasing2d_loops(where, data));
disp([' ', num2str(time), ' s'])
disp('time dealiasing2d_find:')
time = timeit(@() dealiasing2d_find(where, data));
disp([' ', num2str(time), ' s'])
disp('time dealiasing2d_bsxfun:')
time = timeit(@() dealiasing2d_bsxfun(where, data));
disp([' ', num2str(time), ' s'])
disp('time dealiasing2d_bsxfun2:')
time = timeit(@() dealiasing2d_bsxfun2(where, data));
disp([' ', num2str(time), ' s'])
disp('time dealiasing2d_repmat:')
time = timeit(@() dealiasing2d_repmat(where, data));
disp([' ', num2str(time), ' s'])
我用
测量 Python 函数的性能from __future__ import print_function
import numpy as np
from timeit import timeit, repeat
import_lines = {
'numpy_bad': 'from dealiasing_numpy_bad import dealiasing2d as dealiasing',
'numpy': 'from dealiasing_numpy import dealiasing'}
tools = import_lines.keys()
time_approx_one_bench = 5.
setup = """
import numpy as np
N = 500
n0, n1 = N, N
nk = 500
where = np.zeros((n0, n1), dtype=np.uint8)
where[0:100, 0:100] = 1
data = (5.+1j)*np.ones((nk, n0, n1), dtype=np.complex128)
"""
exec(setup)
print('n0 = n1 = {}, nk = {}'.format(N, nk))
print(13*' ' + 'min mean')
durations = np.empty([len(tools)])
for it, tool in enumerate(tools):
setup_tool = import_lines[tool] + setup
duration = timeit(setup_tool + 'dealiasing(where, data)', number=1)
nb_repeat = int(round((time_approx_one_bench - duration)/duration))
nb_repeat = max(1, nb_repeat)
ds = np.array(repeat('dealiasing(where, data)',
setup=setup_tool, number=1, repeat=nb_repeat))
duration = ds.min()
print('{:11s} {:8.2e} s ; {:8.2e} s'.format(
tool.capitalize() + ':', duration, ds.mean()))
durations[it] = duration
fastest = tools[durations.argmin()].capitalize()
durations = durations / durations.min()
print('Durations normalized by the fastest method (' + fastest + '):')
for it, tool in enumerate(tools):
print('{:11s} {:8.3f}'.format(tool.capitalize() + ':', durations[it]))
我认为这主要与复制 data
变量有关。如果您运行整理东西以便 MATLAB 的写时复制行为对您有利,那么您可以获得相当不错的时间。我用线性索引写了一个简单的版本
function data = dealiasing2d2(where_dealiased, data)
[n1, n2, nk] = size(data);
where_li = find(where_dealiased);
for idx = 1:nk
offset = n1 * n2 * (idx-1);
data(where_li + offset) = 0;
end
我 运行 喜欢这个(请注意,重要的是 timed
是一个 函数 而不是允许 data
执行的脚本被重复使用)
function timed
N = 2000;
nk = 10;
where = false([N, N]);
where(1:100, 1:100) = 1;
data = (5.+1j)*ones([N, N, nk]);
tic, data = dealiasing2d2(where,data); toc
这在我的 GLNXA64 机器上运行 0.00437
秒 运行 R2014b。