一个简单列表和一个集合列表的后续匹配?
Subsequence match for a simple list and a list of sets?
这是一个涉及subsequences的问题。 普通子序列问题有一个列表l1
,问另一个列表l2
是否是一个子序列。例如:
l1 = [3, 1, 4, 1, 5, 9]
l2 = [3, 1, 5] => True (remove 4, 1, 9)
l2 = [4, 3, 5] => False
在 my 问题中,l2
有 sets 个值。例如:
l1 = [3, 1, 4, 1, 5, 9]
l2 = [{2, 3}, {1, 5, 2}, {9}]
=> True, because:
{2, 3} matches 3
{1, 5, 2} matches 1 (or 5)
{9} matches 9
在这里,l2
可以被认为通过从每个集合中取一个元素在概念上扩展到不同的可能性:
l2exp = [[2, 1, 9], [2, 5, 9], [2, 2, 9], [3, 1, 9], [3, 5, 9], [3, 2, 9]]
这意味着只要 l2
表示的六个可能列表之一匹配 l1
,我们就成功匹配。由于 [3, 1, 9]
匹配 l1
,因此整个 l2
匹配。
所以一个解决方案可能是首先像上面那样将 l2
扁平化为新的 l2exp
,然后对于 l2exp
中的每个 sublist_l2
,ordinary subsequence check all(e in iter_of_l1 for e in sublist_l2)
可以用
如何在不显式将 l2
扩展为列表列表的情况下进行匹配?
解决方案
不是检查 l1
和 l2
元素的 equality,而是检查 membership l1
个元素在 l2
个元素中。
一些可能的实现(Try it online!):
def subseq(l1, l2):
it1 = iter(l1)
return all(any(x in pool for x in it1)
for pool in l2)
def subseq(l1, l2):
it1 = iter(l1)
return all(not pool.isdisjoint(it1)
for pool in l2)
def subseq(l1, l2):
it1 = iter(l1)
return not any(pool.isdisjoint(it1)
for pool in l2)
def subseq(l1, l2):
return not any(map(set.isdisjoint, l2, repeat(iter(l1))))
def subseq(l1, l2):
it1 = iter(l1)
for pool in l2:
for x in it1:
if x in pool:
break
else:
return
return True
基准
测试
l1 = [0, 1, 2, 3, ..., 39]
l2 = [{0, 1}, {2, 3}, ..., {40, 41}]
次数:
5254.183 ms subseq_product
0.020 ms subseq_all_any
0.006 ms subseq_all_not_disjoint
0.005 ms subseq_not_any_disjoint
0.005 ms subseq_map_disjoint
0.003 ms subseq_loops
1279.553 ms subseq_Alain
0.018 ms subseq_Alain_faster
subseq_product
从问题中得出想法,遍历 l2
的所有 221 种可能性,对每种可能性进行 ordinary subsequence check。
subseq_Alain
尝试 220 (?) 递归路径。 _faster
版本贪婪地获取每个 l2
元素的第一个匹配项并且不再尝试。
尝试使用较长输入的快速解决方案:
l1 = [0, 1, 2, 3, ..., 999]
l2 = [{0, 1}, {2, 3}, ..., {1000, 1001}]
次数:
0.285 ms subseq_all_any
0.058 ms subseq_all_not_disjoint
0.057 ms subseq_not_any_disjoint
0.031 ms subseq_map_disjoint
0.044 ms subseq_loops
1.488 ms subseq_Alain_faster
完整基准代码(Try it online!):
from timeit import timeit
from itertools import product, repeat
def subseq_product(l1, l2):
def is_subseq(x, y):
# Ordinary subsequence test,
# see
it = iter(y)
return all(c in it for c in x)
return any(is_subseq(p2, l1)
for p2 in product(*l2))
def subseq_all_any(l1, l2):
it1 = iter(l1)
return all(any(x in pool for x in it1)
for pool in l2)
def subseq_all_not_disjoint(l1, l2):
it1 = iter(l1)
return all(not pool.isdisjoint(it1)
for pool in l2)
def subseq_not_any_disjoint(l1, l2):
it1 = iter(l1)
return not any(pool.isdisjoint(it1)
for pool in l2)
def subseq_map_disjoint(l1, l2):
return not any(map(set.isdisjoint, l2, repeat(iter(l1))))
def subseq_loops(l1, l2):
it1 = iter(l1)
for pool in l2:
for x in it1:
if x in pool:
break
else:
return
return True
def subseq_Alain(A, S):
if not S: return True # all sets matched
for i,n in enumerate(A): # check for membership in 1st set
if n in S[0] and subseq_Alain(A[i+1:],S[1:]): # and matching rest
return True # found full match
return False
def subseq_Alain_faster(A, S):
if not S: return True # all sets matched
for i,n in enumerate(A): # check for membership in 1st set
if n in S[0]:
return subseq_Alain_faster(A[i+1:],S[1:]) # and matching rest
return False
def benchmark(funcs, args, number):
for _ in range(3):
for func in funcs:
t = timeit(lambda: func(*args), number=number) / number
print('%8.3f ms ' % (t * 1e3), func.__name__)
print()
l1 = list(range(40))
l2 = [{i, i+1} for i in range(0, 42, 2)]
funcs = [
subseq_product,
subseq_all_any,
subseq_all_not_disjoint,
subseq_not_any_disjoint,
subseq_map_disjoint,
subseq_loops,
subseq_Alain,
subseq_Alain_faster,
]
benchmark(funcs, (l1, l2), 1)
l1 = list(range(1000))
l2 = [{i, i+1} for i in range(0, 1002, 2)]
funcs = [
subseq_all_any,
subseq_all_not_disjoint,
subseq_not_any_disjoint,
subseq_map_disjoint,
subseq_loops,
subseq_Alain_faster,
]
benchmark(funcs, (l1, l2), 500)
如果你的集合列表可以有两个以上的项目,也许递归函数是最好的:
def subMatch(A,S):
if not S: return True # all sets matched
for i,n in enumerate(A): # check for membership in 1st set
if n in S[0] and subMatch(A[i+1:],S[1:]): # and matching rest
return True # found full match
return False
输出:
l1 = [1, 2, 3, 4]
l2 = [{2, 1}, {1, 3}]
print(subMatch(l1,l2)) # True
l1 = [1, 2, 3, 4]
l2 = [{2, 1}, {1, 3}, {3, 4}]
print(subMatch(l1,l2)) # True
l1 = [1, 2, 4, 3]
l2 = [{2, 1}, {1, 3}, {3, 4}]
print(subMatch(l1,l2)) # False
见证版本
而不是只知道是否 匹配,人们可能还想知道什么 匹配是什么。也许是为了真正使用它,或者也许只是为了能够验证正确性。这里有几个版本,返回实际匹配元素的列表,如果没有匹配则返回 None
。前两个可能是最好的。
也许是最简单的,假设来自 l2
的每个池在 l1
中都有一个匹配项,并捕获异常:
def subseq1(l1, l2):
it1 = iter(l1)
try:
return [next(x for x in it1 if x in pool)
for pool in l2]
except StopIteration:
pass
基本循环:
def subseq2(l1, l2):
witness = []
it1 = iter(l1)
for pool in l2:
for x in it1:
if x in pool:
witness.append(x)
break
else:
return
return witness
稍微修改我原来的 if(all(any(...
解决方案以附加每个见证元素:
def subseq3(l1, l2):
witness = []
it1 = iter(l1)
if all(any(x in pool and not witness.append(x)
for x in it1)
for pool in l2):
return witness
为下一个见证元素预先附加一个位置:
def subseq4(l1, l2):
witness = []
it1 = iter(l1)
if all(any(witness[-1] in pool
for witness[-1] in it1)
for pool in l2
if not witness.append(None)):
return witness
为所有个见证元素预分配位置:
def subseq5(l1, l2):
witness = [None] * len(l2)
it1 = iter(l1)
if all(any(witness[i] in pool
for witness[i] in it1)
for i, pool in enumerate(l2)):
return witness
最后检查假证人:
def subseq6(l1, l2):
it1 = iter(l1)
false = object()
witness = [next((x for x in it1 if x in pool), false)
for pool in l2]
if false not in witness:
return witness
测试代码(案例取自Alain的回答):
funcs = [
subseq1,
subseq2,
subseq3,
subseq4,
subseq5,
subseq6,
]
for func in funcs:
l1 = [1, 2, 3, 4]
l2 = [{2, 1}, {1, 3}]
print(func(l1,l2)) # True
l1 = [1, 2, 3, 4]
l2 = [{2, 1}, {1, 3}, {3, 4}]
print(func(l1,l2)) # True
l1 = [1, 2, 4, 3]
l2 = [{2, 1}, {1, 3}, {3, 4}]
print(func(l1,l2)) # False
print()
这是一个涉及subsequences的问题。 普通子序列问题有一个列表l1
,问另一个列表l2
是否是一个子序列。例如:
l1 = [3, 1, 4, 1, 5, 9]
l2 = [3, 1, 5] => True (remove 4, 1, 9)
l2 = [4, 3, 5] => False
在 my 问题中,l2
有 sets 个值。例如:
l1 = [3, 1, 4, 1, 5, 9]
l2 = [{2, 3}, {1, 5, 2}, {9}]
=> True, because:
{2, 3} matches 3
{1, 5, 2} matches 1 (or 5)
{9} matches 9
在这里,l2
可以被认为通过从每个集合中取一个元素在概念上扩展到不同的可能性:
l2exp = [[2, 1, 9], [2, 5, 9], [2, 2, 9], [3, 1, 9], [3, 5, 9], [3, 2, 9]]
这意味着只要 l2
表示的六个可能列表之一匹配 l1
,我们就成功匹配。由于 [3, 1, 9]
匹配 l1
,因此整个 l2
匹配。
所以一个解决方案可能是首先像上面那样将 l2
扁平化为新的 l2exp
,然后对于 l2exp
中的每个 sublist_l2
,ordinary subsequence check all(e in iter_of_l1 for e in sublist_l2)
可以用
如何在不显式将 l2
扩展为列表列表的情况下进行匹配?
解决方案
不是检查 l1
和 l2
元素的 equality,而是检查 membership l1
个元素在 l2
个元素中。
一些可能的实现(Try it online!):
def subseq(l1, l2):
it1 = iter(l1)
return all(any(x in pool for x in it1)
for pool in l2)
def subseq(l1, l2):
it1 = iter(l1)
return all(not pool.isdisjoint(it1)
for pool in l2)
def subseq(l1, l2):
it1 = iter(l1)
return not any(pool.isdisjoint(it1)
for pool in l2)
def subseq(l1, l2):
return not any(map(set.isdisjoint, l2, repeat(iter(l1))))
def subseq(l1, l2):
it1 = iter(l1)
for pool in l2:
for x in it1:
if x in pool:
break
else:
return
return True
基准
测试
l1 = [0, 1, 2, 3, ..., 39]
l2 = [{0, 1}, {2, 3}, ..., {40, 41}]
次数:
5254.183 ms subseq_product
0.020 ms subseq_all_any
0.006 ms subseq_all_not_disjoint
0.005 ms subseq_not_any_disjoint
0.005 ms subseq_map_disjoint
0.003 ms subseq_loops
1279.553 ms subseq_Alain
0.018 ms subseq_Alain_faster
subseq_product
从问题中得出想法,遍历l2
的所有 221 种可能性,对每种可能性进行 ordinary subsequence check。subseq_Alain
尝试 220 (?) 递归路径。_faster
版本贪婪地获取每个l2
元素的第一个匹配项并且不再尝试。
尝试使用较长输入的快速解决方案:
l1 = [0, 1, 2, 3, ..., 999]
l2 = [{0, 1}, {2, 3}, ..., {1000, 1001}]
次数:
0.285 ms subseq_all_any
0.058 ms subseq_all_not_disjoint
0.057 ms subseq_not_any_disjoint
0.031 ms subseq_map_disjoint
0.044 ms subseq_loops
1.488 ms subseq_Alain_faster
完整基准代码(Try it online!):
from timeit import timeit
from itertools import product, repeat
def subseq_product(l1, l2):
def is_subseq(x, y):
# Ordinary subsequence test,
# see
it = iter(y)
return all(c in it for c in x)
return any(is_subseq(p2, l1)
for p2 in product(*l2))
def subseq_all_any(l1, l2):
it1 = iter(l1)
return all(any(x in pool for x in it1)
for pool in l2)
def subseq_all_not_disjoint(l1, l2):
it1 = iter(l1)
return all(not pool.isdisjoint(it1)
for pool in l2)
def subseq_not_any_disjoint(l1, l2):
it1 = iter(l1)
return not any(pool.isdisjoint(it1)
for pool in l2)
def subseq_map_disjoint(l1, l2):
return not any(map(set.isdisjoint, l2, repeat(iter(l1))))
def subseq_loops(l1, l2):
it1 = iter(l1)
for pool in l2:
for x in it1:
if x in pool:
break
else:
return
return True
def subseq_Alain(A, S):
if not S: return True # all sets matched
for i,n in enumerate(A): # check for membership in 1st set
if n in S[0] and subseq_Alain(A[i+1:],S[1:]): # and matching rest
return True # found full match
return False
def subseq_Alain_faster(A, S):
if not S: return True # all sets matched
for i,n in enumerate(A): # check for membership in 1st set
if n in S[0]:
return subseq_Alain_faster(A[i+1:],S[1:]) # and matching rest
return False
def benchmark(funcs, args, number):
for _ in range(3):
for func in funcs:
t = timeit(lambda: func(*args), number=number) / number
print('%8.3f ms ' % (t * 1e3), func.__name__)
print()
l1 = list(range(40))
l2 = [{i, i+1} for i in range(0, 42, 2)]
funcs = [
subseq_product,
subseq_all_any,
subseq_all_not_disjoint,
subseq_not_any_disjoint,
subseq_map_disjoint,
subseq_loops,
subseq_Alain,
subseq_Alain_faster,
]
benchmark(funcs, (l1, l2), 1)
l1 = list(range(1000))
l2 = [{i, i+1} for i in range(0, 1002, 2)]
funcs = [
subseq_all_any,
subseq_all_not_disjoint,
subseq_not_any_disjoint,
subseq_map_disjoint,
subseq_loops,
subseq_Alain_faster,
]
benchmark(funcs, (l1, l2), 500)
如果你的集合列表可以有两个以上的项目,也许递归函数是最好的:
def subMatch(A,S):
if not S: return True # all sets matched
for i,n in enumerate(A): # check for membership in 1st set
if n in S[0] and subMatch(A[i+1:],S[1:]): # and matching rest
return True # found full match
return False
输出:
l1 = [1, 2, 3, 4]
l2 = [{2, 1}, {1, 3}]
print(subMatch(l1,l2)) # True
l1 = [1, 2, 3, 4]
l2 = [{2, 1}, {1, 3}, {3, 4}]
print(subMatch(l1,l2)) # True
l1 = [1, 2, 4, 3]
l2 = [{2, 1}, {1, 3}, {3, 4}]
print(subMatch(l1,l2)) # False
见证版本
而不是只知道是否 匹配,人们可能还想知道什么 匹配是什么。也许是为了真正使用它,或者也许只是为了能够验证正确性。这里有几个版本,返回实际匹配元素的列表,如果没有匹配则返回 None
。前两个可能是最好的。
也许是最简单的,假设来自 l2
的每个池在 l1
中都有一个匹配项,并捕获异常:
def subseq1(l1, l2):
it1 = iter(l1)
try:
return [next(x for x in it1 if x in pool)
for pool in l2]
except StopIteration:
pass
基本循环:
def subseq2(l1, l2):
witness = []
it1 = iter(l1)
for pool in l2:
for x in it1:
if x in pool:
witness.append(x)
break
else:
return
return witness
稍微修改我原来的 if(all(any(...
解决方案以附加每个见证元素:
def subseq3(l1, l2):
witness = []
it1 = iter(l1)
if all(any(x in pool and not witness.append(x)
for x in it1)
for pool in l2):
return witness
为下一个见证元素预先附加一个位置:
def subseq4(l1, l2):
witness = []
it1 = iter(l1)
if all(any(witness[-1] in pool
for witness[-1] in it1)
for pool in l2
if not witness.append(None)):
return witness
为所有个见证元素预分配位置:
def subseq5(l1, l2):
witness = [None] * len(l2)
it1 = iter(l1)
if all(any(witness[i] in pool
for witness[i] in it1)
for i, pool in enumerate(l2)):
return witness
最后检查假证人:
def subseq6(l1, l2):
it1 = iter(l1)
false = object()
witness = [next((x for x in it1 if x in pool), false)
for pool in l2]
if false not in witness:
return witness
测试代码(案例取自Alain的回答):
funcs = [
subseq1,
subseq2,
subseq3,
subseq4,
subseq5,
subseq6,
]
for func in funcs:
l1 = [1, 2, 3, 4]
l2 = [{2, 1}, {1, 3}]
print(func(l1,l2)) # True
l1 = [1, 2, 3, 4]
l2 = [{2, 1}, {1, 3}, {3, 4}]
print(func(l1,l2)) # True
l1 = [1, 2, 4, 3]
l2 = [{2, 1}, {1, 3}, {3, 4}]
print(func(l1,l2)) # False
print()