使用回顾的两个数据框列之间的相关性

Correlation between two dataframe colunms using a lookback

index Date Col_A Col_B Detection
0   1 Jan      0    1   0
1   2 Jan      0    0   0
2   3 Jan      1    0   0
3   4 Jan      0    1   1
4   5 Jan      0    0   0
5   6 Jan      1    0   0
6   7 Jan      0    0   0
7   8 Jan      0    0   1
8   9 Jan      0    0   0
9   10 Jan     0    0   0
10  11 Jan     0    0   1

我有上面的数据名。我想找到 "Detection"Col_ACol_B 列之间的相关性,如下所示:

循环遍历Detection列where ever df1["Detection"]==1, 然后将其与Col_A的索引进行比较,如果df1["Col_A"]==1,则报告有一个correlation (yes) else, lookback at two earlier position (say shift(-2)) if there existence an item with value ==1, 然后我们报告 yes else 报告 No

下面是我的试用代码

df1["Corr_with_A"] = np.where((df1['Col_A'] == 1 or df1['Col_A'].shif(-1) == 1 or df1['Col_A'].shif(-2) == 1) & (df1['Detection'] ==1), "Yes", "no")
df1["Corr_with_B"] = np.where((df1['Col_B'] == 1 or df1['Col_B'].shif(-1) == 1 or df1['Col_B'].shif(-2) == 1) & (df1['Detection'] ==1), "Yes", "no")

我的预期输出(我想要我的输出)

index Date  Col_A   Col_B   Detection   Corr_with_A Corr_with_B
0   1 Jan      0    1      0           no          no
1   2 Jan      0    0      0           no          no
2   3 Jan      1    0      0           no          no
3   4 Jan      0    1      1          Yes          Yes
4   5 Jan      0    0      0           no          no
5   6 Jan      1    0      0           no          no
6   7 Jan      0    0      0           no          no
7   8 Jan      0    0      1           Yes         no
8   9 Jan      0    0      0           no          no
9   10 Jan     0    0      0           no          no
10  11 Jan     0    0      1           no          no

有人可以想出更好的方法来实现这个目标吗?我的代码给我错误。谢谢。

这是 rolling.max 的一个很好的用例:

N = 3 # number of rows to consider
m0 = df['Detection'].eq(1)
m1 = df['Col_A'].rolling(window=N, min_periods=1).max().eq(1)
m2 = df['Col_B'].rolling(window=N, min_periods=1).max().eq(1)

df['Corr_with_A'] = np.where(m0&m1, 'yes', 'no')
df['Corr_with_B'] = np.where(m0&m2, 'yes', 'no')

输出:

    index Date  Col_A  Col_B  Detection Corr_with_A Corr_with_B
0       1  Jan      0      1          0          no          no
1       2  Jan      0      0          0          no          no
2       3  Jan      1      0          0          no          no
3       4  Jan      0      1          1         yes         yes
4       5  Jan      0      0          0          no          no
5       6  Jan      1      0          0          no          no
6       7  Jan      0      0          0          no          no
7       8  Jan      0      0          1         yes          no
8       9  Jan      0      0          0          no          no
9      10  Jan      0      0          0          no          no
10     11  Jan      0      0          1          no          no

作为一个循环:

N = 3 # number of rows to consider
m0 = df['Detection'].eq(1)

for col in ['A', 'B']:
    m_rol = df[f'Col_{col}'].rolling(window=N, min_periods=1).max().eq(1)
    df[f'Corr_with_{col}'] = np.where(m0&m_rol, 'yes', 'no')