Python 中有序子集的高效枚举
Efficient enumeration of ordered subsets in Python
我不确定我尝试编写的代码的适当数学术语。我想生成唯一整数的组合,其中每个组合的 "ordered subsets" 用于排除某些后面的组合。
希望一个例子能说明这一点:
from itertools import chain, combinations
mylist = range(4)
max_depth = 3
rev = chain.from_iterable(combinations(mylist, i) for i in xrange(max_depth, 0, -1))
for el in list(rev):
print el
该代码生成的输出包含我想要的所有子集,但也包含一些我不需要的额外子集。我已手动插入注释以指明哪些元素是我不想要的。
(0, 1, 2)
(0, 1, 3)
(0, 2, 3)
(1, 2, 3)
(0, 1) # Exclude: (0, 1, _) occurs as part of (0, 1, 2) above
(0, 2) # Exclude: (0, 2, _) occurs above
(0, 3) # Keep
(1, 2) # Exclude: (1, 2, _) occurs above
(1, 3) # Keep: (_, 1, 3) occurs above, but (1, 3, _) does not
(2, 3) # Keep
(0,) # Exclude: (0, _, _) occurs above
(1,) # Exclude: (1, _, _) occurs above
(2,) # Exclude: (2, _) occurs above
(3,) # Keep
因此,我的生成器或迭代器的期望输出为:
(0, 1, 2)
(0, 1, 3)
(0, 2, 3)
(1, 2, 3)
(0, 3)
(1, 3)
(2, 3)
(3,)
我知道我可以列出所有(想要的和不需要的)组合,然后过滤掉我不想要的组合,但我想知道是否有更高效的基于生成器或迭代器的方法。
我注意到您想要的输出中有一个有趣的模式,我有一个生成器可以生成该模式。这适用于您的所有情况吗?
from itertools import combinations
def orderedSetCombination(iterable, r):
# Get the last element of the iterable
last = (iterable[-1], )
# yield all the combinations of the iterable without the
# last element
for iter in combinations(iterable[:-1], r):
yield iter
# while r > 1 reduce r by 1 and yield all the combinations
while r>1:
r -= 1
for iter in combinations(iterable[:-1], r):
yield iter+last
# yield the last item
yield last
iter = [0,1,2,3]
for el in (list(orderedSetCombination(iter, 3))):
print(el)
这是我对逻辑的解释:
# All combinations that does not include the last element of the iterable
# taking r = max_depth items at a time
(0,1,2)
# from here on, its the combinations of all the elements except
# the last element and the last element is added to it.
# so here taking r = r -1 items at a time and adding the last element
# combinations([0,1,2], r=2)
(0,1,3)
(0,2,3)
(1,2,3)
# the only possible value right now at index r = 2 is the last element (3)
# since all possible values of (0,1,_) (0,2,_) (1,2,_) are already listed
# So reduce r by 1 again and continue: combinations([0,1,2], r=1)
(0, 3)
(1, 3)
(2, 3)
# continue until r == 0 and then yield the last element
(3,)
您正试图排除作为先前返回的组合的前缀 的任何组合。这样做很简单。
- 如果元组
t
的长度为 max_depth
,则它不能是先前返回的元组的前缀,因为它作为前缀的任何元组都必须更长。
- 如果元组
t
以 mylist[-1]
结尾,那么它不能是先前返回的元组的前缀,因为没有元素可以合法地添加到元组的末尾t
扩展它。
- 如果元组
t
的长度小于 max_depth
并且不以 mylist[-1]
结尾,则 t
是先前返回的元组 t + (mylist[-1],)
和 t
不应返回。
因此,您应该生成的组合恰好是长度为 max_depth
的组合和以 mylist[-1]
结尾的较短的组合。以下代码以与原始代码完全相同的顺序执行此操作,并正确处理 maxdepth > len(mylist)
:
等情况
def nonprefix_combinations(iterable, maxlen):
iterable = list(iterable)
if not (iterable and maxlen):
return
for comb in combinations(iterable, maxlen):
yield comb
for length in xrange(maxlen-2, -1, -1):
for comb in combinations(iterable[:-1], length):
yield comb + (iterable[-1],)
(我在这里假设在 maxdepth == 0
的情况下,您仍然不想在输出中包含空元组,即使对于 maxdepth == 0
,它也不是先前返回的元组的前缀。如果在这种情况下你确实想要空元组,你可以将 if not (iterable and maxlen)
更改为 if not iterable
。)
我不确定我尝试编写的代码的适当数学术语。我想生成唯一整数的组合,其中每个组合的 "ordered subsets" 用于排除某些后面的组合。
希望一个例子能说明这一点:
from itertools import chain, combinations
mylist = range(4)
max_depth = 3
rev = chain.from_iterable(combinations(mylist, i) for i in xrange(max_depth, 0, -1))
for el in list(rev):
print el
该代码生成的输出包含我想要的所有子集,但也包含一些我不需要的额外子集。我已手动插入注释以指明哪些元素是我不想要的。
(0, 1, 2)
(0, 1, 3)
(0, 2, 3)
(1, 2, 3)
(0, 1) # Exclude: (0, 1, _) occurs as part of (0, 1, 2) above
(0, 2) # Exclude: (0, 2, _) occurs above
(0, 3) # Keep
(1, 2) # Exclude: (1, 2, _) occurs above
(1, 3) # Keep: (_, 1, 3) occurs above, but (1, 3, _) does not
(2, 3) # Keep
(0,) # Exclude: (0, _, _) occurs above
(1,) # Exclude: (1, _, _) occurs above
(2,) # Exclude: (2, _) occurs above
(3,) # Keep
因此,我的生成器或迭代器的期望输出为:
(0, 1, 2)
(0, 1, 3)
(0, 2, 3)
(1, 2, 3)
(0, 3)
(1, 3)
(2, 3)
(3,)
我知道我可以列出所有(想要的和不需要的)组合,然后过滤掉我不想要的组合,但我想知道是否有更高效的基于生成器或迭代器的方法。
我注意到您想要的输出中有一个有趣的模式,我有一个生成器可以生成该模式。这适用于您的所有情况吗?
from itertools import combinations
def orderedSetCombination(iterable, r):
# Get the last element of the iterable
last = (iterable[-1], )
# yield all the combinations of the iterable without the
# last element
for iter in combinations(iterable[:-1], r):
yield iter
# while r > 1 reduce r by 1 and yield all the combinations
while r>1:
r -= 1
for iter in combinations(iterable[:-1], r):
yield iter+last
# yield the last item
yield last
iter = [0,1,2,3]
for el in (list(orderedSetCombination(iter, 3))):
print(el)
这是我对逻辑的解释:
# All combinations that does not include the last element of the iterable
# taking r = max_depth items at a time
(0,1,2)
# from here on, its the combinations of all the elements except
# the last element and the last element is added to it.
# so here taking r = r -1 items at a time and adding the last element
# combinations([0,1,2], r=2)
(0,1,3)
(0,2,3)
(1,2,3)
# the only possible value right now at index r = 2 is the last element (3)
# since all possible values of (0,1,_) (0,2,_) (1,2,_) are already listed
# So reduce r by 1 again and continue: combinations([0,1,2], r=1)
(0, 3)
(1, 3)
(2, 3)
# continue until r == 0 and then yield the last element
(3,)
您正试图排除作为先前返回的组合的前缀 的任何组合。这样做很简单。
- 如果元组
t
的长度为max_depth
,则它不能是先前返回的元组的前缀,因为它作为前缀的任何元组都必须更长。 - 如果元组
t
以mylist[-1]
结尾,那么它不能是先前返回的元组的前缀,因为没有元素可以合法地添加到元组的末尾t
扩展它。 - 如果元组
t
的长度小于max_depth
并且不以mylist[-1]
结尾,则t
是先前返回的元组t + (mylist[-1],)
和t
不应返回。
因此,您应该生成的组合恰好是长度为 max_depth
的组合和以 mylist[-1]
结尾的较短的组合。以下代码以与原始代码完全相同的顺序执行此操作,并正确处理 maxdepth > len(mylist)
:
def nonprefix_combinations(iterable, maxlen):
iterable = list(iterable)
if not (iterable and maxlen):
return
for comb in combinations(iterable, maxlen):
yield comb
for length in xrange(maxlen-2, -1, -1):
for comb in combinations(iterable[:-1], length):
yield comb + (iterable[-1],)
(我在这里假设在 maxdepth == 0
的情况下,您仍然不想在输出中包含空元组,即使对于 maxdepth == 0
,它也不是先前返回的元组的前缀。如果在这种情况下你确实想要空元组,你可以将 if not (iterable and maxlen)
更改为 if not iterable
。)