Pandas:根据子串去重

Pandas: remove duplicates based on substring

我有以下 2 列,来自 Pandas DataFrame:

antecedents        consequents
  apple               orange
  orange              apple

  apple               water
  apple               pineapple

  water               lemon
  lemon               water

我想删除作为机器人前因后果出现的重复项,只保留第一个出现的,从而获得:

antecedents        consequents
  apple               orange

  apple               water
  apple               pineapple

  water               lemon

如何使用 Pandas 实现该目标?

两列都使用 frozenset 并通过 Series.duplicated 测试副本:

df2 = df[~df[['antecedents','consequents']].apply(frozenset,axis=1).duplicated()]

或对 numpy.sort 中每行的值进行排序:

df1 = pd.DataFrame(np.sort(df[['antecedents','consequents']], axis=1), index=df.index)
df2 = df[~df1.duplicated()]

print (df2)
  antecedents consequents
0       apple      orange
2       apple       water
3       apple   pineapple
4       water       lemon