比较 2 个 DataFrame 并删除不包含相应 ID 变量的行
Compare 2 DataFrames and drop rows that do not contain corresponding ID variables
我需要比较 2 个 DataFrame 并删除不包含相应 ID 的行。例如考虑 df1
和 df2
.
df1 = pd.DataFrame({'ID':[1,2,3,4],
'Food':['Ham','Cheese','Egg','Bacon',],
'Amount':[5,2,10,4,],
})
df2 = pd.DataFrame({'ID':[1,2,4,5],
'Food':['Ham','Cheese','Bacon','Chocolate Salty Balls'],
'Amount':[6,7,15,5000],
})
伪代码:
if df1['ID'] notin df2['ID']:
df2['ID'].drop()
...反之亦然。该示例中的结果将是:
df1
:
ID Food Amount
0 1 Ham 5
1 2 Cheese 2
2 4 Bacon 4
df2
:
ID Food Amount
0 1 Ham 6
1 2 Cheese 7
2 4 Bacon 15
您可以使用布尔掩码:
msk1 = df1['ID'].isin(df2['ID'])
msk2 = df2['ID'].isin(df1['ID'])
df1 = df1[msk1]
df2 = df2[msk2]
或使用 set.intersection
和布尔索引:
common = set(df1['ID']).intersection(df2['ID'])
df1 = df1[df1['ID'].isin(common)]
df2 = df2[df2['ID'].isin(common)]
输出:
df1
:
ID Food Amount
0 1 Ham 5
1 2 Cheese 2
3 4 Bacon 4
df2
:
ID Food Amount
0 1 Ham 6
1 2 Cheese 7
2 4 Bacon 15
inner_join()
来自 datar
怎么样:
>>> from datar.all import f, inner_join, select
>>> import pandas as pd
>>> df1 = pd.DataFrame({'ID':[1,2,3,4],
... 'Food':['Ham','Cheese','Egg','Bacon',],
... 'Amount':[5,2,10,4,],
... })
>>>
>>> df2 = pd.DataFrame({'ID':[1,2,4,5],
... 'Food':['Ham','Cheese','Bacon','Chocolate Salty Balls'],
... 'Amount':[6,7,15,5000],
... })
>>> inner_join(df1, df2 >> select(f.ID))
ID Food Amount
<int64> <object> <int64>
0 1 Ham 5
1 2 Cheese 2
2 4 Bacon 4
>>> inner_join(df2, df1 >> select(f.ID))
ID Food Amount
<int64> <object> <int64>
0 1 Ham 6
1 2 Cheese 7
2 4 Bacon 15
我需要比较 2 个 DataFrame 并删除不包含相应 ID 的行。例如考虑 df1
和 df2
.
df1 = pd.DataFrame({'ID':[1,2,3,4],
'Food':['Ham','Cheese','Egg','Bacon',],
'Amount':[5,2,10,4,],
})
df2 = pd.DataFrame({'ID':[1,2,4,5],
'Food':['Ham','Cheese','Bacon','Chocolate Salty Balls'],
'Amount':[6,7,15,5000],
})
伪代码:
if df1['ID'] notin df2['ID']:
df2['ID'].drop()
...反之亦然。该示例中的结果将是:
df1
:
ID Food Amount
0 1 Ham 5
1 2 Cheese 2
2 4 Bacon 4
df2
:
ID Food Amount
0 1 Ham 6
1 2 Cheese 7
2 4 Bacon 15
您可以使用布尔掩码:
msk1 = df1['ID'].isin(df2['ID'])
msk2 = df2['ID'].isin(df1['ID'])
df1 = df1[msk1]
df2 = df2[msk2]
或使用 set.intersection
和布尔索引:
common = set(df1['ID']).intersection(df2['ID'])
df1 = df1[df1['ID'].isin(common)]
df2 = df2[df2['ID'].isin(common)]
输出:
df1
:
ID Food Amount
0 1 Ham 5
1 2 Cheese 2
3 4 Bacon 4
df2
:
ID Food Amount
0 1 Ham 6
1 2 Cheese 7
2 4 Bacon 15
inner_join()
来自 datar
怎么样:
>>> from datar.all import f, inner_join, select
>>> import pandas as pd
>>> df1 = pd.DataFrame({'ID':[1,2,3,4],
... 'Food':['Ham','Cheese','Egg','Bacon',],
... 'Amount':[5,2,10,4,],
... })
>>>
>>> df2 = pd.DataFrame({'ID':[1,2,4,5],
... 'Food':['Ham','Cheese','Bacon','Chocolate Salty Balls'],
... 'Amount':[6,7,15,5000],
... })
>>> inner_join(df1, df2 >> select(f.ID))
ID Food Amount
<int64> <object> <int64>
0 1 Ham 5
1 2 Cheese 2
2 4 Bacon 4
>>> inner_join(df2, df1 >> select(f.ID))
ID Food Amount
<int64> <object> <int64>
0 1 Ham 6
1 2 Cheese 7
2 4 Bacon 15