Itertools 链在 Cython 中的行为不同

Question

我有两套物品，A和B，预计A较大。我想要来自 A U B 的所有给定大小的无序元组，其中至少包含一个来自 B 的元素。

我的方法是取 B 的每个元素，取它的 product 和 A 的所有 (k-1)-元组 combinations，然后将元素添加到 A，这样它包含在与 B 的其余成员的组合中。然后我 chain 将这些产品放在一起。

我在 Python 中使用它，但是当我将它放入 Cython 时，行为发生了变化。（在这个例子中，我只是做对，但我想概括为一个 5 元组。我的示例集有 4 个和 2 个项目，但我希望有数百个 - 这就是为什么我使用生成器而不是仅仅扩展前面的元组。）

Python 版本（期望的行为）：

from itertools import combinations, chain, product

def get_colder_python():
    inventory = {"hat","shoes","shirt","socks"}
    add_items = {"pants","jacket"}
    combo_chain = []
    for a in add_items:
        next_iterator = product([a],combinations(inventory,1))
        combo_chain.append((x,*y) for x,y in next_iterator)
        inventory.add(a)    
    combos = chain.from_iterable(combo_chain)
    return list(combos)

print(get_colder_python())

结果：

[('jacket', 'shoes'), ('jacket', 'shirt'), ('jacket', 'hat'), ('jacket', 'socks'), ('pants', 'shirt'), ('pants', 'jacket'), ('pants', 'hat'), ('pants', 'shoes'), ('pants', 'socks')]

Cython 版本：

%%cython

from itertools import chain,product,combinations

cdef get_colder_cython():
    inventory = {"hat","shoes","shirt","socks"}
    add_items = {"pants","jacket"}
    combo_chain = []
    for a in add_items:
        next_iterator = product([a],combinations(inventory,1))
        combo_chain.append((x,*y) for x,y in next_iterator)
        inventory.add(a)
    combos = chain.from_iterable(combo_chain)
    return list(combos)

print(get_colder_cython())

结果

[('pants', 'shirt'), ('pants', 'jacket'), ('pants', 'hat'), ('pants', 'shoes'), ('pants', 'socks')]

它只是从链中获取第二个迭代器。

我现在的解决方法是“不要为此使用 Cython”，我知道 itertools 已经过优化，因此 Cython 不会带来很大的速度提升，但我想了解为什么会这样表现不同。

Answer 1

更详细一点：生成器变量范围是 a long-standing bug on Cython.

行为不同的行是

((x,*y) for x,y in next_iterator)

在这两种情况下，它都是延迟执行的。在 Python 中，它查找 next_iterator，存储对它的引用，并使用该引用初始化生成器表达式。

在 Cython 中，它在创建生成器表达式时几乎什么都不做 - 相反 next_iterator 仅在执行表达式时查找。此时它已经被重新分配给多次。

我的建议是使用列表理解，因为这些在创建时立即执行。但这显然失去了懒惰的好处。嵌套的生成器函数也可能有效：

def gen(next_iterator):
    yield from ((x,*y) for x,y in next_iterator)
combo_chain.append(gen)

虽然创建函数并不便宜，但您可能会发现这对性能不利。

Itertools 链在 Cython 中的行为不同

Itertools chain behaves differently in Cython

python

cython

itertools