优化均值外积
Optimize mean outer products
我目前正在编写一个简短的程序来对随机矩阵特征值分布进行一些分析,但我的分析所需的参数选择结果使整个过程变得非常缓慢。基本上我应该循环下面的函数,理想情况下循环大约 5000 次,并最终在最后收集完整的特征值列表。
C = np.zeros((N,N))
time_series = np.random.normal(mu,sigma, (N + B*(M-1)) )
for k in range(int(M)):
C += np.outer(time_series[k*B : (N) + k*B], time_series[k*B : (N) + k*B])
C = C/M
eg_v = np.linalg.eigvalsh(C)
我需要 N = 1000,B 大约 10,M = 100。
但是,使用这种参数选择,在我性能相当好的笔记本电脑上,程序需要大约 4-5 个小时才能 运行。
撇开硬件限制不谈,我可以对代码做些什么来加快整个过程吗?
提前致谢!
您可以使用 np.tensordot
将循环替换为矢量化解决方案
因此,以下-
C = np.zeros((N,N))
for k in range(int(M)):
C += np.outer(time_series[k*B : (N) + k*B], time_series[k*B : (N) + k*B])
可以替换为 -
# Get the starting indices for each iteration
idx = (np.arange(M)*B)[:,None] + np.arange(N)
# Get the range of indices across all iterations as a 2D array and index
# time_series with it to give us "time_series[k*B : (N) + k*B]" equivalent
time_idx = time_series[idx]
# Use broadcasting to perform summation accumulation
C = np.tensordot(time_idx,time_idx,axes=([0],[0]))
tensordot
可以用简单的点积代替:
C = time_idx.T.dot(time_idx)
运行时测试
函数:
def original_app(time_series,B,N,M):
C = np.zeros((N,N))
for k in range(int(M)):
C += np.outer(time_series[k*B : (N) + k*B], time_series[k*B : (N) + k*B])
return C
def vectorized_app(time_series,B,N,M):
idx = (np.arange(M)*B)[:,None] + np.arange(N)
time_idx = time_series[idx]
return np.tensordot(time_idx,time_idx,axes=([0],[0]))
输入:
In [115]: # Inputs
...: mu = 1.2
...: sigma = 0.5
...: N = 1000
...: M = 100
...: B = 10
...: time_series = np.random.normal(mu,sigma, (N + B*(M-1)) )
...:
时间安排:
In [116]: out1 = original_app(time_series,B,N,M)
In [117]: out2 = vectorized_app(time_series,B,N,M)
In [118]: np.allclose(out1,out2)
Out[118]: True
In [119]: %timeit original_app(time_series,B,N,M)
1 loops, best of 3: 1.56 s per loop
In [120]: %timeit vectorized_app(time_series,B,N,M)
10 loops, best of 3: 26.2 ms per loop
因此,对于问题中列出的输入,我们看到 60x
加速!
我目前正在编写一个简短的程序来对随机矩阵特征值分布进行一些分析,但我的分析所需的参数选择结果使整个过程变得非常缓慢。基本上我应该循环下面的函数,理想情况下循环大约 5000 次,并最终在最后收集完整的特征值列表。
C = np.zeros((N,N))
time_series = np.random.normal(mu,sigma, (N + B*(M-1)) )
for k in range(int(M)):
C += np.outer(time_series[k*B : (N) + k*B], time_series[k*B : (N) + k*B])
C = C/M
eg_v = np.linalg.eigvalsh(C)
我需要 N = 1000,B 大约 10,M = 100。 但是,使用这种参数选择,在我性能相当好的笔记本电脑上,程序需要大约 4-5 个小时才能 运行。
撇开硬件限制不谈,我可以对代码做些什么来加快整个过程吗?
提前致谢!
您可以使用 np.tensordot
因此,以下-
C = np.zeros((N,N))
for k in range(int(M)):
C += np.outer(time_series[k*B : (N) + k*B], time_series[k*B : (N) + k*B])
可以替换为 -
# Get the starting indices for each iteration
idx = (np.arange(M)*B)[:,None] + np.arange(N)
# Get the range of indices across all iterations as a 2D array and index
# time_series with it to give us "time_series[k*B : (N) + k*B]" equivalent
time_idx = time_series[idx]
# Use broadcasting to perform summation accumulation
C = np.tensordot(time_idx,time_idx,axes=([0],[0]))
tensordot
可以用简单的点积代替:
C = time_idx.T.dot(time_idx)
运行时测试
函数:
def original_app(time_series,B,N,M):
C = np.zeros((N,N))
for k in range(int(M)):
C += np.outer(time_series[k*B : (N) + k*B], time_series[k*B : (N) + k*B])
return C
def vectorized_app(time_series,B,N,M):
idx = (np.arange(M)*B)[:,None] + np.arange(N)
time_idx = time_series[idx]
return np.tensordot(time_idx,time_idx,axes=([0],[0]))
输入:
In [115]: # Inputs
...: mu = 1.2
...: sigma = 0.5
...: N = 1000
...: M = 100
...: B = 10
...: time_series = np.random.normal(mu,sigma, (N + B*(M-1)) )
...:
时间安排:
In [116]: out1 = original_app(time_series,B,N,M)
In [117]: out2 = vectorized_app(time_series,B,N,M)
In [118]: np.allclose(out1,out2)
Out[118]: True
In [119]: %timeit original_app(time_series,B,N,M)
1 loops, best of 3: 1.56 s per loop
In [120]: %timeit vectorized_app(time_series,B,N,M)
10 loops, best of 3: 26.2 ms per loop
因此,对于问题中列出的输入,我们看到 60x
加速!