在列表理解中使用迭代器时缺少元素

Elements missing when iterator used in a list comprehension

以下示例显示了不同的行为,具体取决于列表推导中最右边的生成器是列表还是迭代器。具体来说,使用迭代器时生成的结果更少 - 我发现这种行为非常令人惊讶。

我是 Python 的新手,所以我想我遗漏了一些明显的东西,但如果您能给我解释,我将不胜感激。

>>> import itertools
>>> xs = range(0, 5)
>>> ys = range(0, 3)
>>> zs_l = [x for x in xs if not x in ys]
>>> zs_l
[3, 4]

# Validate the contents of the iterator, and create it again
>>> zs_i = itertools.ifilterfalse(lambda x: x in ys, xs)
>>> list(zs_i)
[3, 4]
>>> list(zs_i)
[]
>>> zs_i = itertools.ifilterfalse(lambda x: x in ys, xs)

>>> [(i,z) for i in [1,2] for z in zs_l]
[(1, 3), (1, 4), (2, 3), (2, 4)]
>>> [(i,z) for i in [1,2] for z in zs_i]
[(1, 3), (1, 4)]

itertools.ifilterfalse 是一个生成器。如果你通过调用 list 来消耗它 yield 的所有东西,之后它不会产生任何东西。

[(i,z) for i in [1,2] for z in zs_i]

zs_id 已用完 i = 1。当i = 2zs_i不会产生任何东西。

引用 itertools.ifilterfalse 文档,

Make an iterator that filters elements ...

引用 python 术语 iterator

的文档

An object representing a stream of data. Repeated calls to the iterator’s next() method return successive items in the stream. When no more data are available a StopIteration exception is raised instead. At this point, the iterator object is exhausted and any further calls to its next() method just raise StopIteration again. Iterators are required to have an __iter__() method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted. One notable exception is code which attempts multiple iteration passes. A container object (such as a list) produces a fresh new iterator each time you pass it to the iter() function or use it in a for loop. Attempting this with an iterator will just return the same exhausted iterator object used in the previous iteration pass, making it appear like an empty container.

上面的粗体文字回答了您的问题。

当你这样做时

>>> [(i,z) for i in [1,2] for z in zs_i]
[(1, 3), (1, 4)]

迭代器 zs_i 在 for 循环的第一次迭代中耗尽。因此,当它再次在 for 循环中使用时,如上面的文档所示,第二次出现 StopIteration 。因此,for 循环中断并且不再处理它。

但同样适用于 range 返回的列表,因为根据上述文档,

A container object (such as a list) produces a fresh new iterator each time you pass it to the iter() function or use it in a for loop.

因此,当您在每次迭代中将列表传递给 for 循环时,它会创建一个新的迭代器,这就是它按预期工作的原因。

此答案是对更详细地解释潜在机制的其他答案的补充。如果你想让它工作,生成器必须在理解中多次重新创建。

一种方法是为嵌套 for 循环的每次传递初始化一个新生成器:

>>> [(i,z) for i in [1,2] for z in itertools.ifilterfalse(lambda x: x in ys, xs)]
[(1, 3), (1, 4), (2, 3), (2, 4)]