连接两个数据框并排除重叠的行

Question

我正在尝试连接两个数据帧 df1 和 df2:

输入

        name   age   hobby   married
index
0       jack   20    hockey  yes
1       ben    19    chess   no
2       lisa   30    golf    no

        name   age    hobby      job
index
0       jack   20     hockey     student
1       anna   34     football   finance
2       dan    26     golf       retail

我想在多列上匹配，所以假设 ['name', 'age']，得到 df:

输出

        name   age   hobby     married   job
index
0       jack   20    hockey    yes       student
1       ben    19    chess     no        /
2       lisa   30    golf      no        /
3       anna   34    football  /         finance
4       dan    26    golf      /         retail

是否可以使用 concat 来做到这一点？因为我找不到如何匹配键列表以避免重叠行...

Answer 1

你可以这样做：

In [1077]: res = df1.merge(df2, on=['name', 'age'], how='outer')
In [1079]: res['hobby'] = res.hobby_x.combine_first(res.hobby_y)

In [1081]: res.drop(['hobby_x', 'hobby_y'], axis=1, inplace=True)

In [1082]: res
Out[1082]: 
   name  age married      job     hobby
0  jack   20     yes  student    hockey
1   ben   19      no      NaN     chess
2  lisa   30      no      NaN      golf
3  anna   34     NaN  finance  football
4   dan   26     NaN   retail      golf

Answer 2

这是另一种方式：

df1.set_index(['name', 'age'])\
   .combine_first(df2.set_index(['name', 'age']))\
   .reset_index()\
   .fillna('/')

输出：

   name  age     hobby      job married
0  anna   34  football  finance       /
1   ben   19     chess        /      no
2   dan   26      golf   retail       /
3  jack   20    hockey  student     yes
4  lisa   30      golf        /      no

让我们在 pandas 中使用内部数据对齐，方法是将索引设置为您要“加入”的列，然后使用 combine_first 数据帧。

连接两个数据框并排除重叠的行

concat two dataframes and exclude overlapping rows

python

merge

concat

dataframe

pandas