使用回顾的两个数据框列之间的相关性
Correlation between two dataframe colunms using a lookback
index Date Col_A Col_B Detection
0 1 Jan 0 1 0
1 2 Jan 0 0 0
2 3 Jan 1 0 0
3 4 Jan 0 1 1
4 5 Jan 0 0 0
5 6 Jan 1 0 0
6 7 Jan 0 0 0
7 8 Jan 0 0 1
8 9 Jan 0 0 0
9 10 Jan 0 0 0
10 11 Jan 0 0 1
我有上面的数据名。我想找到 "Detection"
和 Col_A
或 Col_B
列之间的相关性,如下所示:
循环遍历Detection
列where ever df1["Detection"]==1, 然后将其与Col_A的索引进行比较,如果df1["Col_A"]==1
,则报告有一个correlation (yes
) else, lookback at two earlier position (say shift(-2)
) if there existence an item with value ==1
, 然后我们报告 yes
else 报告 No
下面是我的试用代码
df1["Corr_with_A"] = np.where((df1['Col_A'] == 1 or df1['Col_A'].shif(-1) == 1 or df1['Col_A'].shif(-2) == 1) & (df1['Detection'] ==1), "Yes", "no")
df1["Corr_with_B"] = np.where((df1['Col_B'] == 1 or df1['Col_B'].shif(-1) == 1 or df1['Col_B'].shif(-2) == 1) & (df1['Detection'] ==1), "Yes", "no")
我的预期输出(我想要我的输出)
index Date Col_A Col_B Detection Corr_with_A Corr_with_B
0 1 Jan 0 1 0 no no
1 2 Jan 0 0 0 no no
2 3 Jan 1 0 0 no no
3 4 Jan 0 1 1 Yes Yes
4 5 Jan 0 0 0 no no
5 6 Jan 1 0 0 no no
6 7 Jan 0 0 0 no no
7 8 Jan 0 0 1 Yes no
8 9 Jan 0 0 0 no no
9 10 Jan 0 0 0 no no
10 11 Jan 0 0 1 no no
有人可以想出更好的方法来实现这个目标吗?我的代码给我错误。谢谢。
这是 rolling.max
的一个很好的用例:
N = 3 # number of rows to consider
m0 = df['Detection'].eq(1)
m1 = df['Col_A'].rolling(window=N, min_periods=1).max().eq(1)
m2 = df['Col_B'].rolling(window=N, min_periods=1).max().eq(1)
df['Corr_with_A'] = np.where(m0&m1, 'yes', 'no')
df['Corr_with_B'] = np.where(m0&m2, 'yes', 'no')
输出:
index Date Col_A Col_B Detection Corr_with_A Corr_with_B
0 1 Jan 0 1 0 no no
1 2 Jan 0 0 0 no no
2 3 Jan 1 0 0 no no
3 4 Jan 0 1 1 yes yes
4 5 Jan 0 0 0 no no
5 6 Jan 1 0 0 no no
6 7 Jan 0 0 0 no no
7 8 Jan 0 0 1 yes no
8 9 Jan 0 0 0 no no
9 10 Jan 0 0 0 no no
10 11 Jan 0 0 1 no no
作为一个循环:
N = 3 # number of rows to consider
m0 = df['Detection'].eq(1)
for col in ['A', 'B']:
m_rol = df[f'Col_{col}'].rolling(window=N, min_periods=1).max().eq(1)
df[f'Corr_with_{col}'] = np.where(m0&m_rol, 'yes', 'no')
index Date Col_A Col_B Detection
0 1 Jan 0 1 0
1 2 Jan 0 0 0
2 3 Jan 1 0 0
3 4 Jan 0 1 1
4 5 Jan 0 0 0
5 6 Jan 1 0 0
6 7 Jan 0 0 0
7 8 Jan 0 0 1
8 9 Jan 0 0 0
9 10 Jan 0 0 0
10 11 Jan 0 0 1
我有上面的数据名。我想找到 "Detection"
和 Col_A
或 Col_B
列之间的相关性,如下所示:
循环遍历Detection
列where ever df1["Detection"]==1, 然后将其与Col_A的索引进行比较,如果df1["Col_A"]==1
,则报告有一个correlation (yes
) else, lookback at two earlier position (say shift(-2)
) if there existence an item with value ==1
, 然后我们报告 yes
else 报告 No
下面是我的试用代码
df1["Corr_with_A"] = np.where((df1['Col_A'] == 1 or df1['Col_A'].shif(-1) == 1 or df1['Col_A'].shif(-2) == 1) & (df1['Detection'] ==1), "Yes", "no")
df1["Corr_with_B"] = np.where((df1['Col_B'] == 1 or df1['Col_B'].shif(-1) == 1 or df1['Col_B'].shif(-2) == 1) & (df1['Detection'] ==1), "Yes", "no")
我的预期输出(我想要我的输出)
index Date Col_A Col_B Detection Corr_with_A Corr_with_B
0 1 Jan 0 1 0 no no
1 2 Jan 0 0 0 no no
2 3 Jan 1 0 0 no no
3 4 Jan 0 1 1 Yes Yes
4 5 Jan 0 0 0 no no
5 6 Jan 1 0 0 no no
6 7 Jan 0 0 0 no no
7 8 Jan 0 0 1 Yes no
8 9 Jan 0 0 0 no no
9 10 Jan 0 0 0 no no
10 11 Jan 0 0 1 no no
有人可以想出更好的方法来实现这个目标吗?我的代码给我错误。谢谢。
这是 rolling.max
的一个很好的用例:
N = 3 # number of rows to consider
m0 = df['Detection'].eq(1)
m1 = df['Col_A'].rolling(window=N, min_periods=1).max().eq(1)
m2 = df['Col_B'].rolling(window=N, min_periods=1).max().eq(1)
df['Corr_with_A'] = np.where(m0&m1, 'yes', 'no')
df['Corr_with_B'] = np.where(m0&m2, 'yes', 'no')
输出:
index Date Col_A Col_B Detection Corr_with_A Corr_with_B
0 1 Jan 0 1 0 no no
1 2 Jan 0 0 0 no no
2 3 Jan 1 0 0 no no
3 4 Jan 0 1 1 yes yes
4 5 Jan 0 0 0 no no
5 6 Jan 1 0 0 no no
6 7 Jan 0 0 0 no no
7 8 Jan 0 0 1 yes no
8 9 Jan 0 0 0 no no
9 10 Jan 0 0 0 no no
10 11 Jan 0 0 1 no no
作为一个循环:
N = 3 # number of rows to consider
m0 = df['Detection'].eq(1)
for col in ['A', 'B']:
m_rol = df[f'Col_{col}'].rolling(window=N, min_periods=1).max().eq(1)
df[f'Corr_with_{col}'] = np.where(m0&m_rol, 'yes', 'no')