外连接 Pandas 数据框

Outer Join Pandas Dataframe

我正在尝试外部连接(在 df1 上)两个 pandas 数据框。以下是示例数据框:

df1:
Index   Team 1   Team 2   Team1_Score    Team2_Score
 0       A        B        25              56
 1       B        C        30              55
 2       D        E        35              75

df2:
Index   Team 1   Team 2   Team1_Avg     Team2_Avg
 0       A        B        5              15
 1       G        F        10             25
 2       C        B        15             35

dfcombined
Index   Team 1   Team 2   Team1_Score    Team2_Score    Team2_Avg     Team1_Avg
 0       A        B        25              56           5             15
 1       B        C        30              55           35            15
 2       D        E        35              75        

我正在尝试使用 pandasql 模块,但是我不确定如何处理在 df1 中加入索引 1 并在 df2 中加入索引 2 的情况,因为团队的顺序是相反的。通过 pandasql 模块,如果团队顺序相反,我不确定如何在组合数据框中切换团队平均值。

对于这方面的任何帮助,我将不胜感激。

设置-

df1

      Team 1 Team 2  Team1_Score  Team2_Score
Index                                        
0          A      B           25           56
1          B      C           30           55
2          D      E           35           75

df2

      Team 1 Team 2  Team1_Avg  Team2_Avg
Index                                    
0          A      B          5         15
1          F      G         25         10
2          B      C         35         15

首先,我们需要对 Team * 列进行排序,并相应地以相同的方式对 Team*_Score 列进行排序。我们将使用 argsort 来做到这一点。

i = np.arange(len(df1))[:, None]
j = np.argsort(df1[['Team 1', 'Team 2']], axis=1).values

df1[['Team 1', 'Team 2']] = df1[['Team 1', 'Team 2']].values[i, j]
df1[['Team1_Score', 'Team2_Score']] = df1[['Team1_Score', 'Team2_Score']].values[i, j]

现在,对 df2Team *Team*_Avg 重复相同的过程。

j = np.argsort(df2[['Team 1', 'Team 2']], axis=1).values

df2[['Team 1', 'Team 2']] = df2[['Team 1', 'Team 2']].values[i, j]
df2[['Team1_Avg', 'Team2_Avg']] = df2[['Team1_Avg', 'Team2_Avg']].values[i, j]

现在,执行左外 merge -

df1.merge(df2, on=['Team 1', 'Team 2'], how='left')

  Team 1 Team 2  Team1_Score  Team2_Score Team1_Avg Team2_Avg
0      A      B           25           56         5        15
1      B      C           30           55        35        15
2      D      E           35           75                 

你可以做的是通过翻转列名来复制 df2 和 pd.concat()。您可以通过 rename

设置它们来做到这一点
df3 = df2.rename(columns={'Team 1':'Team 2','Team 2':'Team 1', 
        'Team1_Avg':'Team2_Avg','Team2_Avg':'Team1_Avg'})

现在我们可以对 df2 和新创建的 df3

进行左 mergeconcat
df1.merge(pd.concat([df2,df3]),how='left',on=['Team 1','Team 2'])

这会为您提供所需的 DataFrame

  Team 1 Team 2  Team1_Score  Team2_score  Team1_Avg  Team2_Avg
0      A      B           25           56        5.0       15.0
1      B      C           30           55       35.0       15.0
2      D      E           25           75        NaN        NaN