根据另一个数据框中行中的匹配值排除数据框中的行
Exclude rows in a dataframe based on matching values in rows from another dataframe
我有两个数据帧(A 和 B)。我想删除 B 中的所有行,其中 Month、Year、Type、Name 列的值完全匹配。
数据框 A
Name Type Month Year country Amount Expiration Paid
0 EXTRON GOLD March 2019 CA 20000 2019-09-07 yes
0 LEAF SILVER March 2019 PL 4893 2019-02-02 yes
0 JMC GOLD March 2019 IN 7000 2020-01-16 no
数据框 B
Name Type Month Year country Amount Expiration Paid
0 JONS GOLD March 2018 PL 500 2019-10-17 yes
0 ABBY BRONZE March 2019 AU 60000 2019-02-02 yes
0 BUYT GOLD March 2018 BR 50 2018-03-22 no
0 EXTRON GOLD March 2019 CA 90000 2019-09-07 yes
0 JAYB PURPLE March 2019 PL 9.90 2018-04-20 yes
0 JMC GOLD March 2019 IN 6000 2020-01-16 no
0 JMC GOLD April 2019 IN 1000 2020-01-16 no
期望的输出:
数据框 B
Name Type Month Year country Amount Expiration Paid
0 JONS GOLD March 2018 PL 500 2019-10-17 yes
0 ABBY BRONZE March 2019 AU 60000 2019-02-02 yes
0 BUYT GOLD March 2018 BR 50 2018-03-22 no
0 JAYB PURPLE March 2019 PL 9.90 2018-04-20 yes
0 JMC GOLD April 2019 IN 1000 2020-01-16 no
我们可以在这里使用 merge
l=['Month', 'Year','Type', 'Name']
B=B.merge(A[l],on=l,indicator=True,how='outer').loc[lambda x : x['_merge']=='left_only'].copy()
# you can add drop here like B=B.drop('_merge',1)
Name Type Month Year country Amount Expiration Paid _merge
0 JONS GOLD March 2018 PL 500.0 2019-10-17 yes left_only
1 ABBY BRONZE March 2019 AU 60000.0 2019-02-02 yes left_only
2 BUYT GOLD March 2018 BR 50.0 2018-03-22 no left_only
4 JAYB PURPLE March 2019 PL 9.9 2018-04-20 yes left_only
6 JMC GOLD April 2019 IN 1000.0 2020-01-16 no left_only
我尝试使用 MultiIndex 来实现同样的效果。
cols =['Month', 'Year','Type', 'Name']
index1 = pd.MultiIndex.from_arrays([df1[col] for col in cols])
index2 = pd.MultiIndex.from_arrays([df2[col] for col in cols])
df2 = df2.loc[~index2.isin(index1)]
我有两个数据帧(A 和 B)。我想删除 B 中的所有行,其中 Month、Year、Type、Name 列的值完全匹配。
数据框 A
Name Type Month Year country Amount Expiration Paid
0 EXTRON GOLD March 2019 CA 20000 2019-09-07 yes
0 LEAF SILVER March 2019 PL 4893 2019-02-02 yes
0 JMC GOLD March 2019 IN 7000 2020-01-16 no
数据框 B
Name Type Month Year country Amount Expiration Paid
0 JONS GOLD March 2018 PL 500 2019-10-17 yes
0 ABBY BRONZE March 2019 AU 60000 2019-02-02 yes
0 BUYT GOLD March 2018 BR 50 2018-03-22 no
0 EXTRON GOLD March 2019 CA 90000 2019-09-07 yes
0 JAYB PURPLE March 2019 PL 9.90 2018-04-20 yes
0 JMC GOLD March 2019 IN 6000 2020-01-16 no
0 JMC GOLD April 2019 IN 1000 2020-01-16 no
期望的输出:
数据框 B
Name Type Month Year country Amount Expiration Paid
0 JONS GOLD March 2018 PL 500 2019-10-17 yes
0 ABBY BRONZE March 2019 AU 60000 2019-02-02 yes
0 BUYT GOLD March 2018 BR 50 2018-03-22 no
0 JAYB PURPLE March 2019 PL 9.90 2018-04-20 yes
0 JMC GOLD April 2019 IN 1000 2020-01-16 no
我们可以在这里使用 merge
l=['Month', 'Year','Type', 'Name']
B=B.merge(A[l],on=l,indicator=True,how='outer').loc[lambda x : x['_merge']=='left_only'].copy()
# you can add drop here like B=B.drop('_merge',1)
Name Type Month Year country Amount Expiration Paid _merge
0 JONS GOLD March 2018 PL 500.0 2019-10-17 yes left_only
1 ABBY BRONZE March 2019 AU 60000.0 2019-02-02 yes left_only
2 BUYT GOLD March 2018 BR 50.0 2018-03-22 no left_only
4 JAYB PURPLE March 2019 PL 9.9 2018-04-20 yes left_only
6 JMC GOLD April 2019 IN 1000.0 2020-01-16 no left_only
我尝试使用 MultiIndex 来实现同样的效果。
cols =['Month', 'Year','Type', 'Name']
index1 = pd.MultiIndex.from_arrays([df1[col] for col in cols])
index2 = pd.MultiIndex.from_arrays([df2[col] for col in cols])
df2 = df2.loc[~index2.isin(index1)]