'zip' 在同一个生成器上使用不同的迭代器（使用 itertools.tee）如何不再次运行生成器？

Question

我不明白 zip 是如何工作的。运行怎么不“计算”两次呢？它如何“知道”它的同一次迭代？

In [1]: import itertools

In [2]: def calc():
   ...:     for i in range(5):
   ...:         print(i)
   ...:         yield i
   ...:

In [3]: i1, i2 = itertools.tee(calc())

In [4]: z = zip(i1, i2)

In [5]: for i in z:
   ...:     print(i)
   ...:
0
(0, 0)
1
(1, 1)
2
(2, 2)
3
(3, 3)
4
(4, 4)

原来是我说的不够清楚。我知道 yiled 做什么，这不是问题所在。在每次迭代中，我们应该遍历 calc 两次，但是如果您查看“打印”，您会发现它只发生了一次。当我在 'zip' 之前使用 next(i1) 时，也会发生同样的情况，这使得它更加奇怪。

Answer 1

"""
calc() is a generator. Generators by design maintain the state between calls. `
"""

import itertools


def calc():
    for i in range(5):
        print(i)
        yield i


i1, i2 = itertools.tee(calc())
print(f'{type(i1)=}')
print(f'{type(i2)=}')
# itertools.tee Return n independent iterators from a single iterable. i.e.calc
next(i1)  # calc called yields the 1st result.

z = zip(i1, i2)  # calc called again here as a single iterable
# Calc has now yielded the last 4 values.

for i in z:
    print(i) 
# The yield function maintains state so:
next(i1)  # Returns a Traceback

输出

type(i1)=<class 'itertools._tee'>
type(i2)=<class 'itertools._tee'>
0
1
(1, 0)
2
(2, 1)
3
(3, 2)
4
(4, 3)
Traceback (most recent call last):
  File "C:\Users\ctynd\OneDrive\CodeBase\WhosebugActivity\OldScratches\scratch_2.py", line 26, in <module>
    next(i1)  # Returns a Traceback
StopIteration

Answer 2

在你的例子中，因为你运行下一次，你在第一个迭代器 (i1) 中消耗了一次迭代，它是零，所以 zip 从 i1 的 1 和 i2 的 0 开始

请参阅下面带有和不带有运行的类似测试用例，以说明为什么在您的情况下您将 [0,1,2,3] 压缩为 [1,2,3,4]:

#
# With running next
#
>>> import itertools
>>> a = [1,2,3,4]
>>> b, c = itertools.tee(a)
>>> b
<itertools._tee object at 0x000001D5A60D7C80>
>>> next(b)
1
>>> next(b)
2
>>> zip(b,c)
<zip object at 0x000001D5A60D7D00>
>>> for i in zip(b,c):
...  print(i)
...
(3, 1)
(4, 2)


#
# Without running next
#
>>> a = [1,2,3,4]
>>> b, c = itertools.tee(a)
>>> for i in zip(b,c):
...  print(i)
... 
(1, 1)
(2, 2)
(3, 3)
(4, 4)
>>>

Answer 3

看起来 zip 能够有效区分来自同一 tee 调用的不同“itertools.tee”对象，因此决定无需调用 calc() 两次。另一方面，如果您在下面的代码中两次调用 tee 本身，您将看到 zip 将不再知道输入不相同，因此它将调用 calc() 两次。

>>> i1, i2 = itertools.tee(calc())
>>> i3,i4=itertools.tee(calc())
>>> z = zip(i1,i3)
>>> for i in z:
...  print(i)
... 
0
0
(0, 0)
1
1
(1, 1)
2
2
(2, 2)
3
3
(3, 3)
4
4
(4, 4)
>>>

Answer 4

'magic' 发生在 itertools.tee。 tee 不 return “常规”迭代器：

>>> type(i1)
<class 'itertools._tee'>

迭代器i1和i2仍然相关，并且不会在每次迭代时再次调用该函数。迭代后（如果它是引导迭代）来自 'calc' 的值 return 被保存并被 return 编辑到来自其他迭代器 return 的即将到来的匹配迭代tee（可以超过 2 个迭代器。参见 itertools.tee）。

ex: 在方法 print 的帮助下，我们可以看到 calc 只被调用了一次，尽管我用每个迭代器迭代了 1 次。注意第一次迭代和第二次迭代的输出之间的区别——“我计算”只打印一次，这意味着生成器只被调用一次。

>>> import itertools
>>> def calc():
...     for i in range(5):
...         print("I calculated: " + str(i))
...         yield i
...
>>> i1, i2 = itertools.tee(calc())
>>> next(i1)
I calculated: 0
0
>>> next(i2)
0

如果您想查看实现，here 是 python 中的一个实现并带有解释。 here 是 itertools (with tee) 的真正实现。这个模块是用c写的。尝试查看 tee_fromiterable 和 itertools__tee_impl.

'zip' 在同一个生成器上使用不同的迭代器（使用 itertools.tee）如何不再次运行生成器？

How does 'zip' with different iterators on the same generator (Using itertools.tee) does not run the generator again?

python

iteration

zip

'zip' 在同一个生成器上使用不同的迭代器（使用 itertools.tee）如何不再次 运行 生成器？

How does 'zip' with different iterators on the same generator (Using itertools.tee) does not run the generator again?

python

iteration

zip

'zip' 在同一个生成器上使用不同的迭代器（使用 itertools.tee）如何不再次运行生成器？