如何在 numpy 中优化此函数的计算?
How can I optimize the calculation over this function in numpy?
我想在 numpy 中实现以下问题,这是我的代码。
我已经用一个 for 循环尝试了以下 numpy 代码来解决这个问题。我想知道是否有更有效的方法来进行此计算?非常感谢!
k, d = X.shape
m = Y.shape[0]
c1 = 2.0*sigma**2
c2 = 0.5*np.log(np.pi*c1)
c3 = np.log(1.0/k)
L_B = np.zeros((m,))
for i in xrange(m):
if i % 100 == 0:
print i
L_B[i] = np.log(np.sum(np.exp(np.sum(-np.divide(
np.power(X-Y[i,:],2), c1)-c2,1)+c3)))
print np.mean(L_B)
我想到了np.expand_dims(X, 2).repeat(Y.shape[0], 2)-Y
通过创建一个3D张量,所以下面的计算可以通过广播来完成,但是当m
很大时会浪费很多内存。
我也认为 np.einsum()
只使用 for 循环,所以可能效率不高,如果我错了请纠正我。
有什么想法吗?
优化阶段 #1
我的第一级优化是在引入新轴后直接将循环代码转换为基于 broadcasting
的代码,因此内存效率不高,如下所列 -
p1 = (-((X[:,None] - Y)**2)/c1)-c2
p11 = p1.sum(2)
p2 = np.exp(p11+c3)
out = np.log(p2.sum(0)).mean()
优化阶段#2
考虑到我们打算分离出对常量的操作,进行了一些优化,我最终得到了以下 -
c10 = -c1
c20 = X.shape[1]*c2
subs = (X[:,None] - Y)**2
p00 = subs.sum(2)
p10 = p00/c10
p11 = p10-c20
p2 = np.exp(p11+c3)
out = np.log(p2.sum(0)).mean()
优化阶段#3
更进一步,看到可以优化操作的地方,我最终使用 Scipy's cdist
来代替平方和 sum-reduction
的重量级工作。这应该是非常有效的内存,并为我们提供了最终的实现,如下所示 -
from scipy.spatial.distance import cdist
# Setup constants
c10 = -c1
c20 = X.shape[1]*c2
c30 = c20-c3
c40 = np.exp(c30)
c50 = np.log(c40)
# Get stagewise operations corresponding to loopy ones
p1 = cdist(X, Y, 'sqeuclidean')
p2 = np.exp(p1/c10).sum(0)
out = np.log(p2).mean() - c50
运行时测试
接近 -
def loopy_app(X, Y, sigma):
k, d = X.shape
m = Y.shape[0]
c1 = 2.0*sigma**2
c2 = 0.5*np.log(np.pi*c1)
c3 = np.log(1.0/k)
L_B = np.zeros((m,))
for i in xrange(m):
L_B[i] = np.log(np.sum(np.exp(np.sum(-np.divide(
np.power(X-Y[i,:],2), c1)-c2,1)+c3)))
return np.mean(L_B)
def vectorized_app(X, Y, sigma):
# Setup constants
k, d = D_A.shape
c1 = 2.0*sigma**2
c2 = 0.5*np.log(np.pi*c1)
c3 = np.log(1.0/k)
c10 = -c1
c20 = X.shape[1]*c2
c30 = c20-c3
c40 = np.exp(c30)
c50 = np.log(c40)
# Get stagewise operations corresponding to loopy ones
p1 = cdist(X, Y, 'sqeuclidean')
p2 = np.exp(p1/c10).sum(0)
out = np.log(p2).mean() - c50
return out
时间和验证 -
In [294]: # Setup inputs with m(=D_B.shape[0]) being a large number
...: X = np.random.randint(0,9,(100,10))
...: Y = np.random.randint(0,9,(10000,10))
...: sigma = 2.34
...:
In [295]: np.allclose(loopy_app(X, Y, sigma),vectorized_app(X, Y, sigma))
Out[295]: True
In [296]: %timeit loopy_app(X, Y, sigma)
1 loops, best of 3: 225 ms per loop
In [297]: %timeit vectorized_app(X, Y, sigma)
10 loops, best of 3: 23.6 ms per loop
In [298]: # Setup inputs with m(=Y.shape[0]) being a much large number
...: X = np.random.randint(0,9,(100,10))
...: Y = np.random.randint(0,9,(100000,10))
...: sigma = 2.34
...:
In [299]: np.allclose(loopy_app(X, Y, sigma),vectorized_app(X, Y, sigma))
Out[299]: True
In [300]: %timeit loopy_app(X, Y, sigma)
1 loops, best of 3: 2.27 s per loop
In [301]: %timeit vectorized_app(X, Y, sigma)
1 loops, best of 3: 243 ms per loop
在10x
附近加速了!
我想在 numpy 中实现以下问题,这是我的代码。
我已经用一个 for 循环尝试了以下 numpy 代码来解决这个问题。我想知道是否有更有效的方法来进行此计算?非常感谢!
k, d = X.shape
m = Y.shape[0]
c1 = 2.0*sigma**2
c2 = 0.5*np.log(np.pi*c1)
c3 = np.log(1.0/k)
L_B = np.zeros((m,))
for i in xrange(m):
if i % 100 == 0:
print i
L_B[i] = np.log(np.sum(np.exp(np.sum(-np.divide(
np.power(X-Y[i,:],2), c1)-c2,1)+c3)))
print np.mean(L_B)
我想到了np.expand_dims(X, 2).repeat(Y.shape[0], 2)-Y
通过创建一个3D张量,所以下面的计算可以通过广播来完成,但是当m
很大时会浪费很多内存。
我也认为 np.einsum()
只使用 for 循环,所以可能效率不高,如果我错了请纠正我。
有什么想法吗?
优化阶段 #1
我的第一级优化是在引入新轴后直接将循环代码转换为基于 broadcasting
的代码,因此内存效率不高,如下所列 -
p1 = (-((X[:,None] - Y)**2)/c1)-c2
p11 = p1.sum(2)
p2 = np.exp(p11+c3)
out = np.log(p2.sum(0)).mean()
优化阶段#2
考虑到我们打算分离出对常量的操作,进行了一些优化,我最终得到了以下 -
c10 = -c1
c20 = X.shape[1]*c2
subs = (X[:,None] - Y)**2
p00 = subs.sum(2)
p10 = p00/c10
p11 = p10-c20
p2 = np.exp(p11+c3)
out = np.log(p2.sum(0)).mean()
优化阶段#3
更进一步,看到可以优化操作的地方,我最终使用 Scipy's cdist
来代替平方和 sum-reduction
的重量级工作。这应该是非常有效的内存,并为我们提供了最终的实现,如下所示 -
from scipy.spatial.distance import cdist
# Setup constants
c10 = -c1
c20 = X.shape[1]*c2
c30 = c20-c3
c40 = np.exp(c30)
c50 = np.log(c40)
# Get stagewise operations corresponding to loopy ones
p1 = cdist(X, Y, 'sqeuclidean')
p2 = np.exp(p1/c10).sum(0)
out = np.log(p2).mean() - c50
运行时测试
接近 -
def loopy_app(X, Y, sigma):
k, d = X.shape
m = Y.shape[0]
c1 = 2.0*sigma**2
c2 = 0.5*np.log(np.pi*c1)
c3 = np.log(1.0/k)
L_B = np.zeros((m,))
for i in xrange(m):
L_B[i] = np.log(np.sum(np.exp(np.sum(-np.divide(
np.power(X-Y[i,:],2), c1)-c2,1)+c3)))
return np.mean(L_B)
def vectorized_app(X, Y, sigma):
# Setup constants
k, d = D_A.shape
c1 = 2.0*sigma**2
c2 = 0.5*np.log(np.pi*c1)
c3 = np.log(1.0/k)
c10 = -c1
c20 = X.shape[1]*c2
c30 = c20-c3
c40 = np.exp(c30)
c50 = np.log(c40)
# Get stagewise operations corresponding to loopy ones
p1 = cdist(X, Y, 'sqeuclidean')
p2 = np.exp(p1/c10).sum(0)
out = np.log(p2).mean() - c50
return out
时间和验证 -
In [294]: # Setup inputs with m(=D_B.shape[0]) being a large number
...: X = np.random.randint(0,9,(100,10))
...: Y = np.random.randint(0,9,(10000,10))
...: sigma = 2.34
...:
In [295]: np.allclose(loopy_app(X, Y, sigma),vectorized_app(X, Y, sigma))
Out[295]: True
In [296]: %timeit loopy_app(X, Y, sigma)
1 loops, best of 3: 225 ms per loop
In [297]: %timeit vectorized_app(X, Y, sigma)
10 loops, best of 3: 23.6 ms per loop
In [298]: # Setup inputs with m(=Y.shape[0]) being a much large number
...: X = np.random.randint(0,9,(100,10))
...: Y = np.random.randint(0,9,(100000,10))
...: sigma = 2.34
...:
In [299]: np.allclose(loopy_app(X, Y, sigma),vectorized_app(X, Y, sigma))
Out[299]: True
In [300]: %timeit loopy_app(X, Y, sigma)
1 loops, best of 3: 2.27 s per loop
In [301]: %timeit vectorized_app(X, Y, sigma)
1 loops, best of 3: 243 ms per loop
在10x
附近加速了!