连接两个数据框并排除重叠的行
concat two dataframes and exclude overlapping rows
我正在尝试连接两个数据帧 df1
和 df2
:
输入
name age hobby married
index
0 jack 20 hockey yes
1 ben 19 chess no
2 lisa 30 golf no
name age hobby job
index
0 jack 20 hockey student
1 anna 34 football finance
2 dan 26 golf retail
我想在多列上匹配,所以假设 ['name', 'age']
,得到 df
:
输出
name age hobby married job
index
0 jack 20 hockey yes student
1 ben 19 chess no /
2 lisa 30 golf no /
3 anna 34 football / finance
4 dan 26 golf / retail
是否可以使用 concat 来做到这一点?因为我找不到如何匹配键列表以避免重叠行...
你可以这样做:
In [1077]: res = df1.merge(df2, on=['name', 'age'], how='outer')
In [1079]: res['hobby'] = res.hobby_x.combine_first(res.hobby_y)
In [1081]: res.drop(['hobby_x', 'hobby_y'], axis=1, inplace=True)
In [1082]: res
Out[1082]:
name age married job hobby
0 jack 20 yes student hockey
1 ben 19 no NaN chess
2 lisa 30 no NaN golf
3 anna 34 NaN finance football
4 dan 26 NaN retail golf
这是另一种方式:
df1.set_index(['name', 'age'])\
.combine_first(df2.set_index(['name', 'age']))\
.reset_index()\
.fillna('/')
输出:
name age hobby job married
0 anna 34 football finance /
1 ben 19 chess / no
2 dan 26 golf retail /
3 jack 20 hockey student yes
4 lisa 30 golf / no
让我们在 pandas 中使用内部数据对齐,方法是将索引设置为您要“加入”的列,然后使用 combine_first
数据帧。
我正在尝试连接两个数据帧 df1
和 df2
:
输入
name age hobby married
index
0 jack 20 hockey yes
1 ben 19 chess no
2 lisa 30 golf no
name age hobby job
index
0 jack 20 hockey student
1 anna 34 football finance
2 dan 26 golf retail
我想在多列上匹配,所以假设 ['name', 'age']
,得到 df
:
输出
name age hobby married job
index
0 jack 20 hockey yes student
1 ben 19 chess no /
2 lisa 30 golf no /
3 anna 34 football / finance
4 dan 26 golf / retail
是否可以使用 concat 来做到这一点?因为我找不到如何匹配键列表以避免重叠行...
你可以这样做:
In [1077]: res = df1.merge(df2, on=['name', 'age'], how='outer')
In [1079]: res['hobby'] = res.hobby_x.combine_first(res.hobby_y)
In [1081]: res.drop(['hobby_x', 'hobby_y'], axis=1, inplace=True)
In [1082]: res
Out[1082]:
name age married job hobby
0 jack 20 yes student hockey
1 ben 19 no NaN chess
2 lisa 30 no NaN golf
3 anna 34 NaN finance football
4 dan 26 NaN retail golf
这是另一种方式:
df1.set_index(['name', 'age'])\
.combine_first(df2.set_index(['name', 'age']))\
.reset_index()\
.fillna('/')
输出:
name age hobby job married
0 anna 34 football finance /
1 ben 19 chess / no
2 dan 26 golf retail /
3 jack 20 hockey student yes
4 lisa 30 golf / no
让我们在 pandas 中使用内部数据对齐,方法是将索引设置为您要“加入”的列,然后使用 combine_first
数据帧。