避免重复 Pandas 在 Dataframe 及其副本之间合并

Avoid Duplicates Pandas Merge Between a Dataframe and its Copy

我需要在同一数据框的列中找到匹配项,我正在做的是复制数据框并在 de 数据框与其副本之间进行合并,但是有一种方法可以避免 2 列重复相等并且之前显示相同的结果。例如:

df1 = pd.DataFrame()
df1['Id'] = ['001','002','003','004','005','006']
df1['Tel'] = ['123','456','789','123','852','123']

df2 = df1

df3 = pd.merge(df1,df2,on='Tel',how='inner')

结果如下:

    Id_x Tel Id_y
0   001 123 001
1   001 123 004
2   001 123 006
3   004 123 001
4   004 123 004
5   004 123 006
6   006 123 001
7   006 123 004
8   006 123 006
9   002 456 002
10  003 789 003
11  005 852 005

但我想要以下结果:

    Id_x Tel Id_y
0   001 123 004
1   001 123 006
2   004 123 006

如您所见,我需要忽略 Id_x == Id_y 时的结果,但我也需要忽略先前以不同顺序显示的相同结果,例如在第一个结果索引 1 是索引 3 的相同结果,索引 2 是索引 6 的相同结果,索引 5 是索引 7 的相同结果。所以在最终结果中我只想要索引 1,索引 2 和索引 5。

有办法吗?

非常感谢!

有点迂回的解决方案,但这应该能为您提供所需的答案

df1 = pd.DataFrame()
df1['Id'] = ['001','002','003','004','005','006']
df1['Tel'] = ['123','456','789','123','852','123']

df1 = df1.drop_duplicates()
df2 = df1
df3 = pd.merge(df1,df2,on='Tel',how='inner')
df3 = df3[df3['Id_x'] != df3['Id_y']]

Id_xId_y 创建元组,然后对它们进行排序并删除重复项:

>>> df3[df3[['Id_x', 'Id_y']].apply(lambda x: sorted(tuple(x)), axis=1) 
                             .duplicated(keep='last')]

  Id_x  Tel Id_y
1  001  123  004
2  001  123  006
5  004  123  006

更新

Can you help me in the cases where a Id is not in both columns? For example suppose that df2 has an aditional row with Id: 007 and Tel: 852, this Id merge with Id: 005, but when I apply the tuple in df3 the Id: 007 is deleted

df3 = pd.merge(df1,df2,on='Tel',how='inner').query('Id_x != Id_y')
df3 = df3[~df3[['Id_x', 'Id_y']].apply(lambda x: sorted(tuple(x)), axis=1)
                                .duplicated(keep='first')]
print(df3)

# Output:
   Id_x  Tel Id_y
1   001  123  004
2   001  123  006
5   004  123  006
12  005  852  007

设置:

df1 = pd.DataFrame()
df1['Id'] = ['001','002','003','004','005','006']
df1['Tel'] = ['123','456','789','123','852','123']

df2 = df1.copy()
df2 = df2.append(pd.DataFrame({'Id': ['007'], 'Tel': ['852']}))