一个简单列表和一个集合列表的后续匹配?

Subsequence match for a simple list and a list of sets?

这是一个涉及subsequences的问题。 普通子序列问题有一个列表l1,问另一个列表l2是否是一个子序列。例如:

l1 = [3, 1, 4, 1, 5, 9]

l2 = [3, 1, 5]            => True (remove 4, 1, 9)

l2 = [4, 3, 5]            => False

my 问题中,l2sets 个值。例如:

l1 = [3, 1, 4, 1, 5, 9]
l2 = [{2, 3}, {1, 5, 2}, {9}]
=> True, because:
     {2, 3} matches 3
     {1, 5, 2} matches 1 (or 5)
     {9} matches 9

在这里,l2可以被认为通过从每个集合中取一个元素在概念上扩展到不同的可能性:

l2exp = [[2, 1, 9], [2, 5, 9], [2, 2, 9], [3, 1, 9], [3, 5, 9], [3, 2, 9]]

这意味着只要 l2 表示的六个可能列表之一匹配 l1,我们就成功匹配。由于 [3, 1, 9] 匹配 l1 ,因此整个 l2 匹配。

所以一个解决方案可能是首先像上面那样将 l2 扁平化为新的 l2exp,然后对于 l2exp 中的每个 sublist_l2ordinary subsequence check all(e in iter_of_l1 for e in sublist_l2)可以用

如何在不显式将 l2 扩展为列表列表的情况下进行匹配?

解决方案

不是检查 l1l2 元素的 equality,而是检查 membership l1 个元素在 l2 个元素中。

一些可能的实现(Try it online!):

def subseq(l1, l2):
    it1 = iter(l1)
    return all(any(x in pool for x in it1)
               for pool in l2)

def subseq(l1, l2):
    it1 = iter(l1)
    return all(not pool.isdisjoint(it1)
               for pool in l2)

def subseq(l1, l2):
    it1 = iter(l1)
    return not any(pool.isdisjoint(it1)
                   for pool in l2)

def subseq(l1, l2):
    return not any(map(set.isdisjoint, l2, repeat(iter(l1))))

def subseq(l1, l2):
    it1 = iter(l1)
    for pool in l2:
        for x in it1:
            if x in pool:
                break
        else:
            return
    return True

基准

测试

l1 = [0, 1, 2, 3, ..., 39]
l2 = [{0, 1}, {2, 3}, ..., {40, 41}]

次数:

5254.183 ms  subseq_product
   0.020 ms  subseq_all_any
   0.006 ms  subseq_all_not_disjoint
   0.005 ms  subseq_not_any_disjoint
   0.005 ms  subseq_map_disjoint
   0.003 ms  subseq_loops
1279.553 ms  subseq_Alain
   0.018 ms  subseq_Alain_faster
  • subseq_product 从问题中得出想法,遍历 l2 的所有 221 种可能性,对每种可能性进行 ordinary subsequence check
  • subseq_Alain 尝试 220 (?) 递归路径。 _faster 版本贪婪地获取每个 l2 元素的第一个匹配项并且不再尝试。

尝试使用较长输入的快速解决方案:

l1 = [0, 1, 2, 3, ..., 999]
l2 = [{0, 1}, {2, 3}, ..., {1000, 1001}]

次数:

   0.285 ms  subseq_all_any
   0.058 ms  subseq_all_not_disjoint
   0.057 ms  subseq_not_any_disjoint
   0.031 ms  subseq_map_disjoint
   0.044 ms  subseq_loops
   1.488 ms  subseq_Alain_faster

完整基准代码(Try it online!):

from timeit import timeit
from itertools import product, repeat

def subseq_product(l1, l2):

    def is_subseq(x, y):
        # Ordinary subsequence test,
        # see 
        it = iter(y)
        return all(c in it for c in x)

    return any(is_subseq(p2, l1)
               for p2 in product(*l2))

def subseq_all_any(l1, l2):
    it1 = iter(l1)
    return all(any(x in pool for x in it1)
               for pool in l2)

def subseq_all_not_disjoint(l1, l2):
    it1 = iter(l1)
    return all(not pool.isdisjoint(it1)
               for pool in l2)

def subseq_not_any_disjoint(l1, l2):
    it1 = iter(l1)
    return not any(pool.isdisjoint(it1)
                   for pool in l2)

def subseq_map_disjoint(l1, l2):
    return not any(map(set.isdisjoint, l2, repeat(iter(l1))))

def subseq_loops(l1, l2):
    it1 = iter(l1)
    for pool in l2:
        for x in it1:
            if x in pool:
                break
        else:
            return
    return True

def subseq_Alain(A, S):
    if not S: return True          # all sets matched
    for i,n in enumerate(A):       # check for membership in 1st set    
        if n in S[0] and subseq_Alain(A[i+1:],S[1:]): # and matching rest
            return True            # found full match
    return False

def subseq_Alain_faster(A, S):
    if not S: return True          # all sets matched
    for i,n in enumerate(A):       # check for membership in 1st set    
        if n in S[0]:
            return subseq_Alain_faster(A[i+1:],S[1:]) # and matching rest
    return False

def benchmark(funcs, args, number):
    for _ in range(3):
        for func in funcs:
            t = timeit(lambda: func(*args), number=number) / number
            print('%8.3f ms ' % (t * 1e3), func.__name__)
        print()

l1 = list(range(40))
l2 = [{i, i+1} for i in range(0, 42, 2)]
funcs = [
    subseq_product,
    subseq_all_any,
    subseq_all_not_disjoint,
    subseq_not_any_disjoint,
    subseq_map_disjoint,
    subseq_loops,
    subseq_Alain,
    subseq_Alain_faster,
]
benchmark(funcs, (l1, l2), 1)

l1 = list(range(1000))
l2 = [{i, i+1} for i in range(0, 1002, 2)]
funcs = [
    subseq_all_any,
    subseq_all_not_disjoint,
    subseq_not_any_disjoint,
    subseq_map_disjoint,
    subseq_loops,
    subseq_Alain_faster,
]
benchmark(funcs, (l1, l2), 500)

如果你的集合列表可以有两个以上的项目,也许递归函数是最好的:

def subMatch(A,S):
    if not S: return True          # all sets matched
    for i,n in enumerate(A):       # check for membership in 1st set    
        if n in S[0] and subMatch(A[i+1:],S[1:]): # and matching rest
            return True            # found full match
    return False

输出:

l1 = [1, 2, 3, 4]

l2 = [{2, 1}, {1, 3}]

print(subMatch(l1,l2)) # True

l1 = [1, 2, 3, 4]

l2 = [{2, 1}, {1, 3}, {3, 4}]

print(subMatch(l1,l2)) # True

l1 = [1, 2, 4, 3]

l2 = [{2, 1}, {1, 3}, {3, 4}]

print(subMatch(l1,l2)) # False

见证版本

而不是只知道是否 匹配,人们可能还想知道什么 匹配是什么。也许是为了真正使用它,或者也许只是为了能够验证正确性。这里有几个版本,返回实际匹配元素的列表,如果没有匹配则返回 None。前两个可能是最好的。

也许是最简单的,假设来自 l2 的每个池在 l1 中都有一个匹配项,并捕获异常:

def subseq1(l1, l2):
    it1 = iter(l1)
    try:
        return [next(x for x in it1 if x in pool)
                for pool in l2]
    except StopIteration:
        pass

基本循环:

def subseq2(l1, l2):
    witness = []
    it1 = iter(l1)
    for pool in l2:
        for x in it1:
            if x in pool:
                witness.append(x)
                break
        else:
            return
    return witness

稍微修改我原来的 if(all(any(... 解决方案以附加每个见证元素:

def subseq3(l1, l2):
    witness = []
    it1 = iter(l1)
    if all(any(x in pool and not witness.append(x)
               for x in it1)
           for pool in l2):
        return witness

为下一个见证元素预先附加一个位置:

def subseq4(l1, l2):
    witness = []
    it1 = iter(l1)
    if all(any(witness[-1] in pool
               for witness[-1] in it1)
           for pool in l2
           if not witness.append(None)):
        return witness

所有个见证元素预分配位置:

def subseq5(l1, l2):
    witness = [None] * len(l2)
    it1 = iter(l1)
    if all(any(witness[i] in pool
               for witness[i] in it1)
           for i, pool in enumerate(l2)):
        return witness

最后检查假证人:

def subseq6(l1, l2):
    it1 = iter(l1)
    false = object()
    witness = [next((x for x in it1 if x in pool), false)
               for pool in l2]
    if false not in witness:
        return witness

测试代码(案例取自Alain的回答):

funcs = [
    subseq1,
    subseq2,
    subseq3,
    subseq4,
    subseq5,
    subseq6,
]

for func in funcs:
    l1 = [1, 2, 3, 4]
    l2 = [{2, 1}, {1, 3}]
    print(func(l1,l2)) # True

    l1 = [1, 2, 3, 4]
    l2 = [{2, 1}, {1, 3}, {3, 4}]
    print(func(l1,l2)) # True

    l1 = [1, 2, 4, 3]
    l2 = [{2, 1}, {1, 3}, {3, 4}]
    print(func(l1,l2)) # False

    print()