如何加速 up/vectorize 计算滚动协方差矩阵的多级迭代？

Question

由于for循环在python中的性能不佳，我需要加快以下代码的速度。

我试过的东西：

1.申请。 -- 还没想好怎么应用到multilevel df上。

2。麻麻。 -- 似乎 Numba 或 Bodo 不支持 pandas 滚动。

代码如下：

df = pd.DataFrame(np.random.randn(9,3),columns=['A','B','C'])
df_result = pd.DataFrame()
shape = np.full(df.shape[1],1)

def func_cov(df):
    df_cov = df.rolling(3,min_periods=3).cov()
    for i in df.index:
        df_result.loc[i,'result'] = np.dot(shape.T,np.dot(df_cov.loc[i], shape))
    return df_result

func_cov(df)


df:
    A   B   C
0   0.191484    0.765756    -1.288696
1   -0.111369   1.276903    1.567775
2   -0.209460   2.920247    0.142898
3   0.169375    1.096265    -0.646460
4   3.847551    0.936200    -1.221572
5   -1.783127   0.426784    1.311940
6   -0.417902   0.253048    0.097059
7   -1.176098   -0.975650   1.481306
8   -1.429595   0.257955    -0.832083


desired df_result:
    result
0   NaN
1   NaN
2   3.258732
3   1.579507
4   2.359369
5   3.684835
6   4.364114
7   0.125943
8   0.981440

Answer 1

您可以将数据帧转换为 Numpy 数组，然后使用 Numba 和基本循环完成所有工作：

import numba as nb

df = pd.DataFrame(np.random.randn(9,3),columns=['A','B','C'])
df_result = pd.DataFrame()
shape = np.full(df.shape[1],1)

@nb.njit('(float64[:,::1], float64[:])')
def fast_func_cov(values, shape):
    result = np.empty(len(values))
    result[0] = result[1] = np.nan
    for i in range(2, len(values)):
        cov_mat = np.cov(values[i-2:i+1,:].T)
        result[i] = np.dot(shape.T,np.dot(cov_mat, shape))
    return result
fast_func_cov(np.ascontiguousarray(df.values), shape.astype(np.float64))

values = np.ascontiguousarray(df.values)
df_result['result'] = fast_func_cov(values, shape.astype(np.float64))

在我的机器上，计算需要 0.016 毫秒，而初始计算函数需要 7 毫秒。这大约快 440 倍。话虽这么说，Pandas 赋值 0.032 毫秒导致 150 倍的整体代码速度。

如何加速 up/vectorize 计算滚动协方差矩阵的多级迭代？

How to speed up/vectorize a multilevel iteration calculating rolling covariance matrix?

python

performance

for-loop

vectorization

pandas