二维numpy.take快吗?
Is 2-dimensional numpy.take fast?
numpy.take 可以应用于 2 维 和
np.take(np.take(T,ix,axis=0), iy,axis=1 )
我测试了离散二维拉普拉斯算子的模板
ΔT = T[ix-1,iy] + T[ix+1, iy] + T[ix,iy-1] + T[ix,iy+1] - 4 * T[ix,iy]
有 2 个 take-schemes 和通常的 numpy.array scheme。引入函数 p 和 q 是为了简化代码编写,并以不同的顺序处理轴 0 和 1。这是代码:
nx = 300; ny= 300
T = np.arange(nx*ny).reshape(nx, ny)
ix = np.linspace(1,nx-2,nx-2,dtype=int)
iy = np.linspace(1,ny-2,ny-2,dtype=int)
#------------------------------------------------------------
def p(Φ,kx,ky):
return np.take(np.take(Φ,ky,axis=1), kx,axis=0 )
#------------------------------------------------------------
def q(Φ,kx,ky):
return np.take(np.take(Φ,kx,axis=0), ky,axis=1 )
#------------------------------------------------------------
%timeit ΔT_n = T[0:nx-2,1:ny-1] + T[2:nx,1:ny-1] + T[1:nx-1,0:ny-2] + T[1:nx-1,2:ny] - 4.0 * T[1:nx-1,1:ny-1]
%timeit ΔT_t = p(T,ix-1,iy) + p(T,ix+1,iy) + p(T,ix,iy-1) + p(T,ix,iy+1) - 4.0 * p(T,ix,iy)
%timeit ΔT_t = q(T,ix-1,iy) + q(T,ix+1,iy) + q(T,ix,iy-1) + q(T,ix,iy+1) - 4.0 * q(T,ix,iy)
.
1000 loops, best of 3: 944 µs per loop
100 loops, best of 3: 3.11 ms per loop
100 loops, best of 3: 2.02 ms per loop
结果似乎很明显:
- 通常的 numpy 索引 arithmeitk 是最快的
- take-scheme q 需要 100% 的时间(= C 排序?)
- take-scheme p 需要 200% 的时间(= Fortran 顺序?)
甚至一维
example of the scipy manual表示numpy.take快:
a = np.array([4, 3, 5, 7, 6, 8])
indices = [0, 1, 4]
%timeit np.take(a, indices)
%timeit a[indices]
.
The slowest run took 6.58 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.32 µs per loop
The slowest run took 7.34 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.87 µs per loop
有没有人有过如何使 numpy.take 变快的经验?对于编码速度快且
的精益代码编写,这将是一种灵活且有吸引力的方式
is told to be fast in execution 还有。感谢您提供一些改进方法的提示!
索引版本可能会像这样使用切片对象进行清理:
T[0:nx-2,1:ny-1] + T[2:nx,1:ny-1] + T[1:nx-1,0:ny-2] + T[1:nx-1,2:ny] - 4.0 * T[1:nx-1,1:ny-1]
sy1 = slice(1,ny-1)
sx1 = slice(1,nx-1)
sy2 = slice(2,ny)
sy_2 = slice(0,ny-2)
T[0:nx-2,sy1] + T[2:nx,sy1] + T[sx1,xy_2] + T[sx1,sy2] - 4.0 * T[sx1,sy1]
感谢@Divakar 和@hpaulj!是的,使用 slice
也是可行的。比较所有 4 种方法得出:
- 最快的 ex aequo:t(
usual np
) 和 t(slice
)
- t(
take
) = 2 * t(slice
)
- t(
ix_
) = 3 * t(slice
)
这里是代码和结果:
import numpy as np
from numpy import ix_ as r
nx = 500; ny = 500
T = np.arange(nx*ny).reshape(nx, ny)
ix = np.arange(1,nx-1);
iy = np.arange(1,ny-1);
jx = slice(1,nx-1); jxm = slice(0,nx-2); jxp = slice(2,nx)
jy = slice(1,ny-1); jym = slice(0,ny-2); jyp = slice(2,ny)
#------------------------------------------------------------
def p(U,kx,ky):
return np.take(np.take(U,kx, axis=0), ky,axis=1)
#------------------------------------------------------------
%timeit ΔT_slice= -T[jxm,jy] + T[jxp,jy] - T[jx,jym] + T[jx,jyp] - 0.0 * T[jx,jy]
%timeit ΔT_npy = -T[0:nx-2,1:ny-1] + T[2:nx,1:ny-1] - T[1:nx-1,0:ny-2] + T[1:nx-1,2:ny] - 0.0 * T[1:nx-1,1:ny-1]
%timeit ΔT_take = -p(T,ix-1,iy) + p(T,ix+1,iy) - p(T,ix,iy-1) + p(T,ix,iy+1) - 0.0 * p(T,ix,iy)
%timeit ΔT_ix_ = -T[r(ix-1,iy)] + T[r(ix+1,iy)] - T[r(ix,iy-1)] + T[r(ix,iy+1)] - 0.0 * T[r(ix,iy)]
.
100 loops, best of 3: 3.14 ms per loop
100 loops, best of 3: 3.13 ms per loop
100 loops, best of 3: 7.03 ms per loop
100 loops, best of 3: 9.58 ms per loop
关于查看和复制的讨论,以下内容可能具有指导意义:
print("if False --> a view ; if True --> a copy" )
print("_slice_ :", T[jx,jy].base is None)
print("_npy_ :", T[1:nx-1,1:ny-1].base is None)
print("_take_ :", p(T,ix,iy).base is None)
print("_ix_ :", T[r(ix,iy)].base is None)
.
if False --> a view ; if True --> a copy
_slice_ : False
_npy_ : False
_take_ : True
_ix_ : True
numpy.take 可以应用于 2 维 和
np.take(np.take(T,ix,axis=0), iy,axis=1 )
我测试了离散二维拉普拉斯算子的模板
ΔT = T[ix-1,iy] + T[ix+1, iy] + T[ix,iy-1] + T[ix,iy+1] - 4 * T[ix,iy]
有 2 个 take-schemes 和通常的 numpy.array scheme。引入函数 p 和 q 是为了简化代码编写,并以不同的顺序处理轴 0 和 1。这是代码:
nx = 300; ny= 300
T = np.arange(nx*ny).reshape(nx, ny)
ix = np.linspace(1,nx-2,nx-2,dtype=int)
iy = np.linspace(1,ny-2,ny-2,dtype=int)
#------------------------------------------------------------
def p(Φ,kx,ky):
return np.take(np.take(Φ,ky,axis=1), kx,axis=0 )
#------------------------------------------------------------
def q(Φ,kx,ky):
return np.take(np.take(Φ,kx,axis=0), ky,axis=1 )
#------------------------------------------------------------
%timeit ΔT_n = T[0:nx-2,1:ny-1] + T[2:nx,1:ny-1] + T[1:nx-1,0:ny-2] + T[1:nx-1,2:ny] - 4.0 * T[1:nx-1,1:ny-1]
%timeit ΔT_t = p(T,ix-1,iy) + p(T,ix+1,iy) + p(T,ix,iy-1) + p(T,ix,iy+1) - 4.0 * p(T,ix,iy)
%timeit ΔT_t = q(T,ix-1,iy) + q(T,ix+1,iy) + q(T,ix,iy-1) + q(T,ix,iy+1) - 4.0 * q(T,ix,iy)
.
1000 loops, best of 3: 944 µs per loop
100 loops, best of 3: 3.11 ms per loop
100 loops, best of 3: 2.02 ms per loop
结果似乎很明显:
- 通常的 numpy 索引 arithmeitk 是最快的
- take-scheme q 需要 100% 的时间(= C 排序?)
- take-scheme p 需要 200% 的时间(= Fortran 顺序?)
甚至一维 example of the scipy manual表示numpy.take快:
a = np.array([4, 3, 5, 7, 6, 8])
indices = [0, 1, 4]
%timeit np.take(a, indices)
%timeit a[indices]
.
The slowest run took 6.58 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.32 µs per loop
The slowest run took 7.34 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.87 µs per loop
有没有人有过如何使 numpy.take 变快的经验?对于编码速度快且
的精益代码编写,这将是一种灵活且有吸引力的方式
is told to be fast in execution 还有。感谢您提供一些改进方法的提示!
索引版本可能会像这样使用切片对象进行清理:
T[0:nx-2,1:ny-1] + T[2:nx,1:ny-1] + T[1:nx-1,0:ny-2] + T[1:nx-1,2:ny] - 4.0 * T[1:nx-1,1:ny-1]
sy1 = slice(1,ny-1)
sx1 = slice(1,nx-1)
sy2 = slice(2,ny)
sy_2 = slice(0,ny-2)
T[0:nx-2,sy1] + T[2:nx,sy1] + T[sx1,xy_2] + T[sx1,sy2] - 4.0 * T[sx1,sy1]
感谢@Divakar 和@hpaulj!是的,使用 slice
也是可行的。比较所有 4 种方法得出:
- 最快的 ex aequo:t(
usual np
) 和 t(slice
) - t(
take
) = 2 * t(slice
) - t(
ix_
) = 3 * t(slice
)
这里是代码和结果:
import numpy as np
from numpy import ix_ as r
nx = 500; ny = 500
T = np.arange(nx*ny).reshape(nx, ny)
ix = np.arange(1,nx-1);
iy = np.arange(1,ny-1);
jx = slice(1,nx-1); jxm = slice(0,nx-2); jxp = slice(2,nx)
jy = slice(1,ny-1); jym = slice(0,ny-2); jyp = slice(2,ny)
#------------------------------------------------------------
def p(U,kx,ky):
return np.take(np.take(U,kx, axis=0), ky,axis=1)
#------------------------------------------------------------
%timeit ΔT_slice= -T[jxm,jy] + T[jxp,jy] - T[jx,jym] + T[jx,jyp] - 0.0 * T[jx,jy]
%timeit ΔT_npy = -T[0:nx-2,1:ny-1] + T[2:nx,1:ny-1] - T[1:nx-1,0:ny-2] + T[1:nx-1,2:ny] - 0.0 * T[1:nx-1,1:ny-1]
%timeit ΔT_take = -p(T,ix-1,iy) + p(T,ix+1,iy) - p(T,ix,iy-1) + p(T,ix,iy+1) - 0.0 * p(T,ix,iy)
%timeit ΔT_ix_ = -T[r(ix-1,iy)] + T[r(ix+1,iy)] - T[r(ix,iy-1)] + T[r(ix,iy+1)] - 0.0 * T[r(ix,iy)]
.
100 loops, best of 3: 3.14 ms per loop
100 loops, best of 3: 3.13 ms per loop
100 loops, best of 3: 7.03 ms per loop
100 loops, best of 3: 9.58 ms per loop
关于查看和复制的讨论,以下内容可能具有指导意义:
print("if False --> a view ; if True --> a copy" )
print("_slice_ :", T[jx,jy].base is None)
print("_npy_ :", T[1:nx-1,1:ny-1].base is None)
print("_take_ :", p(T,ix,iy).base is None)
print("_ix_ :", T[r(ix,iy)].base is None)
.
if False --> a view ; if True --> a copy
_slice_ : False
_npy_ : False
_take_ : True
_ix_ : True