外连接 Pandas 数据框
Outer Join Pandas Dataframe
我正在尝试外部连接(在 df1 上)两个 pandas 数据框。以下是示例数据框:
df1:
Index Team 1 Team 2 Team1_Score Team2_Score
0 A B 25 56
1 B C 30 55
2 D E 35 75
df2:
Index Team 1 Team 2 Team1_Avg Team2_Avg
0 A B 5 15
1 G F 10 25
2 C B 15 35
dfcombined
Index Team 1 Team 2 Team1_Score Team2_Score Team2_Avg Team1_Avg
0 A B 25 56 5 15
1 B C 30 55 35 15
2 D E 35 75
我正在尝试使用 pandasql 模块,但是我不确定如何处理在 df1 中加入索引 1 并在 df2 中加入索引 2 的情况,因为团队的顺序是相反的。通过 pandasql 模块,如果团队顺序相反,我不确定如何在组合数据框中切换团队平均值。
对于这方面的任何帮助,我将不胜感激。
设置-
df1
Team 1 Team 2 Team1_Score Team2_Score
Index
0 A B 25 56
1 B C 30 55
2 D E 35 75
df2
Team 1 Team 2 Team1_Avg Team2_Avg
Index
0 A B 5 15
1 F G 25 10
2 B C 35 15
首先,我们需要对 Team *
列进行排序,并相应地以相同的方式对 Team*_Score
列进行排序。我们将使用 argsort
来做到这一点。
i = np.arange(len(df1))[:, None]
j = np.argsort(df1[['Team 1', 'Team 2']], axis=1).values
df1[['Team 1', 'Team 2']] = df1[['Team 1', 'Team 2']].values[i, j]
df1[['Team1_Score', 'Team2_Score']] = df1[['Team1_Score', 'Team2_Score']].values[i, j]
现在,对 df2
、Team *
和 Team*_Avg
重复相同的过程。
j = np.argsort(df2[['Team 1', 'Team 2']], axis=1).values
df2[['Team 1', 'Team 2']] = df2[['Team 1', 'Team 2']].values[i, j]
df2[['Team1_Avg', 'Team2_Avg']] = df2[['Team1_Avg', 'Team2_Avg']].values[i, j]
现在,执行左外 merge
-
df1.merge(df2, on=['Team 1', 'Team 2'], how='left')
Team 1 Team 2 Team1_Score Team2_Score Team1_Avg Team2_Avg
0 A B 25 56 5 15
1 B C 30 55 35 15
2 D E 35 75
你可以做的是通过翻转列名来复制 df2 和 pd.concat()
。您可以通过 rename
设置它们来做到这一点
df3 = df2.rename(columns={'Team 1':'Team 2','Team 2':'Team 1',
'Team1_Avg':'Team2_Avg','Team2_Avg':'Team1_Avg'})
现在我们可以对 df2 和新创建的 df3
进行左 merge
和 concat
df1.merge(pd.concat([df2,df3]),how='left',on=['Team 1','Team 2'])
这会为您提供所需的 DataFrame
Team 1 Team 2 Team1_Score Team2_score Team1_Avg Team2_Avg
0 A B 25 56 5.0 15.0
1 B C 30 55 35.0 15.0
2 D E 25 75 NaN NaN
我正在尝试外部连接(在 df1 上)两个 pandas 数据框。以下是示例数据框:
df1:
Index Team 1 Team 2 Team1_Score Team2_Score
0 A B 25 56
1 B C 30 55
2 D E 35 75
df2:
Index Team 1 Team 2 Team1_Avg Team2_Avg
0 A B 5 15
1 G F 10 25
2 C B 15 35
dfcombined
Index Team 1 Team 2 Team1_Score Team2_Score Team2_Avg Team1_Avg
0 A B 25 56 5 15
1 B C 30 55 35 15
2 D E 35 75
我正在尝试使用 pandasql 模块,但是我不确定如何处理在 df1 中加入索引 1 并在 df2 中加入索引 2 的情况,因为团队的顺序是相反的。通过 pandasql 模块,如果团队顺序相反,我不确定如何在组合数据框中切换团队平均值。
对于这方面的任何帮助,我将不胜感激。
设置-
df1
Team 1 Team 2 Team1_Score Team2_Score
Index
0 A B 25 56
1 B C 30 55
2 D E 35 75
df2
Team 1 Team 2 Team1_Avg Team2_Avg
Index
0 A B 5 15
1 F G 25 10
2 B C 35 15
首先,我们需要对 Team *
列进行排序,并相应地以相同的方式对 Team*_Score
列进行排序。我们将使用 argsort
来做到这一点。
i = np.arange(len(df1))[:, None]
j = np.argsort(df1[['Team 1', 'Team 2']], axis=1).values
df1[['Team 1', 'Team 2']] = df1[['Team 1', 'Team 2']].values[i, j]
df1[['Team1_Score', 'Team2_Score']] = df1[['Team1_Score', 'Team2_Score']].values[i, j]
现在,对 df2
、Team *
和 Team*_Avg
重复相同的过程。
j = np.argsort(df2[['Team 1', 'Team 2']], axis=1).values
df2[['Team 1', 'Team 2']] = df2[['Team 1', 'Team 2']].values[i, j]
df2[['Team1_Avg', 'Team2_Avg']] = df2[['Team1_Avg', 'Team2_Avg']].values[i, j]
现在,执行左外 merge
-
df1.merge(df2, on=['Team 1', 'Team 2'], how='left')
Team 1 Team 2 Team1_Score Team2_Score Team1_Avg Team2_Avg
0 A B 25 56 5 15
1 B C 30 55 35 15
2 D E 35 75
你可以做的是通过翻转列名来复制 df2 和 pd.concat()
。您可以通过 rename
df3 = df2.rename(columns={'Team 1':'Team 2','Team 2':'Team 1',
'Team1_Avg':'Team2_Avg','Team2_Avg':'Team1_Avg'})
现在我们可以对 df2 和新创建的 df3
merge
和 concat
df1.merge(pd.concat([df2,df3]),how='left',on=['Team 1','Team 2'])
这会为您提供所需的 DataFrame
Team 1 Team 2 Team1_Score Team2_score Team1_Avg Team2_Avg
0 A B 25 56 5.0 15.0
1 B C 30 55 35.0 15.0
2 D E 25 75 NaN NaN