从元组列表中删除重复项 keys/values

Removing duplicate keys/values from tuple-list

我有一个元组列表,(key,value) 对。我需要删除重复键或值的元素,列表的顺序可以更改,但键或值的第一次出现必须保留在元组列表中:

示例:

input: [("r","w"),("n","j"),("a","j"),("d","i"),("s","g"),("r","a")]
output: [("r","w"),("n","j"),("d","i"),("s","g")]

我做了什么:

removeDuplicates  _   []  = []
removeDuplicates seen (x:xs) 
                        | elem (head $ fst x) (fst seen) = [] ++ removeDuplicates seen xs
                        | elem (head $ snd x) (snd seen) = [] ++ removeDuplicates seen xs
                        | otherwise  = x:removeDuplicates ((fst seen)++(fst x),(snd seen)++(snd x)) xs

但这需要被称为removeDuplicates ("","") something,这很丑。

您可以使用 Data.List 包中的 nubBy 函数和适当的比较器:

removeDuplicates xs = nubBy cmpKeyAndVal xs 
  where
    cmpKeyAndVal (x, y) (x', y') = x == x' || y == y'

用作:

> removeDuplicates [("r","w"),("n","j"),("a","j"),("d","i"),("s","g"),("r","a")]
[("r","w"),("n","j"),("d","i"),("s","g")]

另请注意,当键或值是 "" 时,使用 ("", "") 调用您的实现会产生不正确的结果。选择正确的第一个参数的唯一方法是输入一些没有出现在输入中的东西,这样做有点烦人。


请注意,上述实现需要 O(n^2) 时间,这对于 Eq 个实例是最佳的。如果可以允许 Ord 约束,则可以使用实现 stable 排序算法的 sortBy 函数,然后使用 groupBy 删除连续的重复:

import Data.List(sortBy, groupBy)
import Data.Ord(comparing)
import Data.Function(on)

removeDuplicates xs = sortAndGroupBy snd (sortAndGroupBy fst xs)
  where
    sortAndGroupBy f = map head . groupBy ((==) `on` f). sortBy (comparing f)

这需要 O(nlog n) 时间,但显然需要 Ord 约束。

所以首先,养成在编写函数时添加类型签名的习惯。它让你保持理智和诚实,它抓住了你想做的事情,最好在你实现你的功能之前写下来。

removeDuplicates :: (Eq a, Eq a1) => ([a], [a1]) -> [([a], [a1])] -> [([a], [a1])]

如果你想在没有附加参数的情况下调用它,我建议这样:

remove :: (Eq a, Eq a1) => [([a], [a1])] -> [([a], [a1])]
remove = removeDuplicates ("","")

另一个更通用的版本是这样的:

removeX :: (Eq t, Eq s) => [(t, s)] -> [(t, s)]
removeX [] = []
removeX (xx@(x,y):xs) = let xs' = filter (\(a,b) -> not (a == x || b ==y) ) xs
                        in xx:removeX xs'

如果您想坚持使用标准函数 - @Bakuriu 有适合您的答案

将累加器放在辅助函数中。

removeDuplicates lst = rd lst []
                       where rd _ [] = []
                             rd seen (x:xs) = ...