了解 pandas 中的滚动相关性

Question

我想了解 pandas.rolling_corr 实际上是如何计算滚动相关性的。到目前为止，我一直在用 numpy 做这件事。由于速度和易用性，我更喜欢使用 pandas，但我无法像以前那样获得滚动相关性。

我从两个 numy 数组开始：

c = np.array([1,2,3,4,5,6,7,8,9,8,7,6,5,4,3,2,1])
d = np.array([8,9,8])

现在我想计算数组 c 的 length-3-window 的互相关。我定义了一个滚动 window 函数：

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

并计算我生成的每个 windows 与第二个原始数据集之间的相关性。这种方法效果很好：

for win in rolling_window(c, len(d)):
    print(np.correlate(win, d))

输出：

[50]
[75]
[100]
[125]
[150]
[175]
[200]
[209]
[200]
[175]
[150]
[125]
[100]
[75]
[50]

如果我尝试用 pandas 解决它：

a = pd.DataFrame([1,2,3,4,5,6,7,8,9,8,7,6,5,4,3,2,1])
b = pd.DataFrame([8,9,8])

无论我是否使用 DataFrame rolling_corr:

a.rolling(window=3, center=True).corr(b)

或Pandas rolling_corr:

pd.rolling_corr(a, b, window=1, center=True)

我刚得到一堆 NaN：

      0
0   NaN
1   0.0
2   NaN
3   NaN
4   NaN
5   NaN
6   NaN
7   NaN
8   NaN
9   NaN
10  NaN
11  NaN
12  NaN
13  NaN
14  NaN
15  NaN
16  NaN

有人可以帮帮我吗？我能够通过展平从转换 pandas DataFrame

获得的 numpy 数组来解决 numpy 的问题

a.values.ravel()

但是，我想用pandas完全解决计算。我已经搜索了文档，但没有找到我正在寻找的答案。我错过了什么或不明白什么？

非常感谢您。

D.

Answer 1

您尝试进行的计算可以被认为是在以下数据帧上运行：

pd.concat([a, b], axis=1)

    0   0
0   1   8
1   2   9
2   3   8
3   4 NaN
4   5 NaN
5   6 NaN
6   7 NaN
7   8 NaN
8   9 NaN
9   8 NaN
10  7 NaN
11  6 NaN
12  5 NaN
13  4 NaN
14  3 NaN
15  2 NaN
16  1 NaN

如果您使用 window=3，它会将 b 中的前三个值与 a 中的前 3 个值相关联，将其余值与 [=16= 相关联]，并将值放在 window 的中心（center=True）。

你可以试试：

pd.rolling_apply(a, window=3, func=lambda x: np.correlate(x, b[0]))

输出：

如果您愿意，也可以在此处添加 center=True。

（我正在使用 pandas 0.17.0）

了解 pandas 中的滚动相关性

understanding rolling correlation in pandas

python

correlation

pandas

rolling-computation