Python：2 个列表的交集，保留两个列表中的重复项

Question

我想有效地找到两个列表的交集，从 both 中保留重复项，例如A=[1,1,2,3], B=[1,1,2,4] 应该 return [1,1,1,1,2]

我知道之前有人问过类似的问题 (Python intersection of two lists keeping duplicates) 但这对我没有帮助，因为只保留了一个列表中的重复项。

以下作品

def intersect(A,B):
    C=[]
    for a in A:
        for b in B:
            if a==b:
                C.append(a)
    return C

但是对于我正在做的事情来说它不够有效！为了加快速度，我尝试对列表进行排序

def intersect(A,B):
    A.sort()
    B.sort()
    C=[]
    i=0
    j=0
    while i<len(A) and j<len(B):
        if A[i]<=B[j]:
            if A[i]==B[j]: 
                C.append(A[i])
            i+=1
        else:
            j=j+1
    return C

但是这只会保留列表 B 中的重复项。有什么建议吗？

Answer 1

这个怎么样：

a_set = set(A)
b_set = set(B)
intersect = [i for i in A if i in b_set] + [j for j in B if j in a_set]

两个列表解析级联。一些额外的时间和内存用于创建 A 和 B 的集合，但这将被检查集合与列表中项目成员资格的效率所抵消。

你也可以修饰一下：

set_intersect = set(A) & set(B)
list_intersect = [ele for ele in A+B if ele in set_intersect]

将两个列表强制转换为集合，获取它们的交集，然后使用列表推导将列表 A 和 B 中出现在集合交集中的所有元素相加。

Answer 2

这是对您所问问题的回答：

import collections
for A,B,expected_output in (
    ([1,1,2,3], [1,1,2,4], [1,1,1,1,2]),
    ([1,1,2,3], [1,2,4], [1,1,2])):
    cntA = collections.Counter(A)
    cntB = collections.Counter(B)
    output = [
        x for x in sorted(set(A) & set(B)) for i in range(cntA[x]*cntB[x])]
    assert output == expected_output

这里是我和另外两个人最初解释的问题答案：

import collections
A=[1,1,2,3]
B=[1,1,2,4]
expected_output = [1,1,1,1,2,2]
cntA = collections.Counter(A)
cntB = collections.Counter(B)
cnt_sum = collections.Counter(A) + collections.Counter(B)
output = [x for x in sorted(set(A) & set(B)) for i in range(cnt_sum[x])]
assert output == expected_output

您可以找到 collections.Counter() 文档 here。 collections 是一个很棒的模块，我强烈建议阅读整个模块的文档。

我意识到您实际上不需要找到集合的交集，因为根据文档 "count of a missing element is zero"：

import collections
for A,B,expected_output in (
    ([1,1,2,3], [1,1,2,4], [1,1,1,1,2]),
    ([1,1,2,3], [1,2,4], [1,1,2])):
    cntA = collections.Counter(A)
    cntB = collections.Counter(B)
    output = [
        x for x in sorted(set(A)) for i in range(cntA[x]*cntB[x])]
    assert output == expected_output

Answer 3

我很难加快你的代码速度，因为我不知道你在做什么运行。无论您运行它是在小列表还是大列表上，以及有多少不同的元素，都会有很大的不同。不管怎样，这里有一些建议：

1.

def intersect(a, b):
    count_a = Counter(a)
    count_b = Counter(b)
    count_mul = []
    for i in count_a:
        count_mul.extend([i] * (count_a[i] * count_b[i]))
    return count_mul

2.

这个returns一个迭代器，你可以用list(iterator)把它变成一个列表

def intersect(a, b):
    count_a = Counter(a)
    count_b = Counter(b)
    count_mul = Counter()
    for i in count_a:
        count_mul[i] += count_a[i] * count_b[i]
    return count_mul.elements()

3.

与您的方式非常相似，但没有改变列表的大小，这需要时间。

def intersect(A, B):
    return [a for a in A for b in B if a == b]

我不确定这是否对您原来的方式有任何改进，这实际上取决于输入，但您的方式是 O(n*m)，我的方式是 O(n+m)。

您可以使用模块 timeit 检查它运行对您的输入有多快：

from timeit import timeit
timeit('test.intersect(A, B)', 'import test; A = [1,1,2,3]; B = [1,1,2,4]')

Python：2 个列表的交集，保留两个列表中的重复项

Python: intersection of 2 lists keeping duplicates from both lists

python

intersection

list

duplicates