Python 中有序子集的高效枚举

Efficient enumeration of ordered subsets in Python

我不确定我尝试编写的代码的适当数学术语。我想生成唯一整数的组合,其中每个组合的 "ordered subsets" 用于排除某些后面的组合。

希望一个例子能说明这一点:

from itertools import chain, combinations
​
mylist = range(4)
max_depth = 3

rev = chain.from_iterable(combinations(mylist, i) for i in xrange(max_depth, 0, -1))
for el in list(rev):
    print el

该代码生成的输出包含我想要的所有子集,但也包含一些我不需要的额外子集。我已手动插入注释以指明哪些元素是我不想要的。

(0, 1, 2)
(0, 1, 3)
(0, 2, 3)
(1, 2, 3)
(0, 1)  # Exclude: (0, 1, _) occurs as part of (0, 1, 2) above
(0, 2)  # Exclude: (0, 2, _) occurs above
(0, 3)  # Keep
(1, 2)  # Exclude: (1, 2, _) occurs above
(1, 3)  # Keep: (_, 1, 3) occurs above, but (1, 3, _) does not
(2, 3)  # Keep
(0,)    # Exclude: (0, _, _) occurs above
(1,)    # Exclude: (1, _, _) occurs above
(2,)    # Exclude: (2, _) occurs above
(3,)    # Keep

因此,我的生成器或迭代器的期望输出为:

(0, 1, 2)
(0, 1, 3)
(0, 2, 3)
(1, 2, 3)
(0, 3)
(1, 3)
(2, 3)
(3,)  

我知道我可以列出所有(想要的和不需要的)组合,然后过滤掉我不想要的组合,但我想知道是否有更高效的基于生成器或迭代器的方法。

我注意到您想要的输出中有一个有趣的模式,我有一个生成器可以生成该模式。这适用于您的所有情况吗?

from itertools import combinations

def orderedSetCombination(iterable, r):
    # Get the last element of the iterable
    last = (iterable[-1], )
    # yield all the combinations of the iterable without the
    # last element
    for iter in combinations(iterable[:-1], r):
        yield iter
    # while r > 1 reduce r by 1 and yield all the combinations
    while r>1:
        r -= 1
        for iter in combinations(iterable[:-1], r):
            yield iter+last
    # yield the last item
    yield last

iter = [0,1,2,3]

for el in (list(orderedSetCombination(iter, 3))):
    print(el)

这是我对逻辑的解释:

# All combinations that does not include the last element of the iterable
# taking r = max_depth items at a time

(0,1,2) 

# from here on, its the combinations of all the elements except 
# the last element and the last element is added to it.
# so here taking r = r -1 items at a time and adding the last element
# combinations([0,1,2], r=2)

(0,1,3)
(0,2,3)
(1,2,3)

# the only possible value right now at index r = 2 is the last element (3)
# since all possible values of (0,1,_) (0,2,_) (1,2,_) are already listed
# So reduce r by 1 again and continue: combinations([0,1,2], r=1)

(0, 3)
(1, 3)
(2, 3)

# continue until r == 0 and then yield the last element

(3,)

您正试图排除作为先前返回的组合的前缀 的任何组合。这样做很简单。

  • 如果元组 t 的长度为 max_depth,则它不能是先前返回的元组的前缀,因为它作为前缀的任何元组都必须更长。
  • 如果元组 tmylist[-1] 结尾,那么它不能是先前返回的元组的前缀,因为没有元素可以合法地添加到元组的末尾t 扩展它。
  • 如果元组 t 的长度小于 max_depth 并且不以 mylist[-1] 结尾,则 t 是先前返回的元组 t + (mylist[-1],)t 不应返回。

因此,您应该生成的组合恰好是长度为 max_depth 的组合和以 mylist[-1] 结尾的较短的组合。以下代码以与原始代码完全相同的顺序执行此操作,并正确处理 maxdepth > len(mylist):

等情况
def nonprefix_combinations(iterable, maxlen):
    iterable = list(iterable)
    if not (iterable and maxlen):
        return
    for comb in combinations(iterable, maxlen):
        yield comb
    for length in xrange(maxlen-2, -1, -1):
        for comb in combinations(iterable[:-1], length):
            yield comb + (iterable[-1],)

(我在这里假设在 maxdepth == 0 的情况下,您仍然不想在输出中包含空元组,即使对于 maxdepth == 0,它也不是先前返回的元组的前缀。如果在这种情况下你确实想要空元组,你可以将 if not (iterable and maxlen) 更改为 if not iterable。)