为什么生成器函数不使用空闲时间来准备下一次收益?

Why does a generator function not use the idle time to prepare the next yield?

在当今的多核、多线程 CPU 编程世界中(我笔记本中的 CPU 有两个内核,每个内核有两个线程),编写能够利用提供的硬件功能的代码越来越有意义。像 go(lang) 这样的语言的诞生是为了让程序员更容易地通过生成多个 'independent' 进程来加速应用程序,以便稍后再次同步它们。

在接触Python中的生成器函数的上下文中,我预计此类函数将使用在后续项目请求之间传递的空闲时间来准备下一个产量以立即交付,但似乎没有就是这样 - 至少我对从 运行 下面提供的代码中得到的结果的解释是这样的。

更让我困惑的是,即使生成器已经交付了所有项,生成器函数的调用者也必须等到函数完成才能处理所有剩余的指令。

Are there any clear reasons I can't currently see, why a generator function doesn't in the idle time between yield requests run the code past the requested yield until it meets next yield instruction and even lets the caller wait in case all the items are already delivered?

这里是我用过的代码:

import time
startTime = time.time()
time.sleep(1)
def generatorFunctionF():
    print("# here: generatorFunctionF() lineNo #1", time.time()-startTime)
    for i in range(1,4):
        print("# now: time.sleep(1)", time.time()-startTime)
        time.sleep(1)
        print("# before yield", i, time.time()-startTime)
        yield i # yield i
        print("# after  yield", i, time.time()-startTime)
    print("# now: time.sleep(5)", time.time()-startTime)
    time.sleep(5)
    print("# end followed by 'return'", time.time()-startTime)
    return
#:def

def standardFunctionF():
    print("*** before: 'gFF = generatorFunctionF()'", time.time()-startTime) 
    gFF = generatorFunctionF()
    print("*** after:  'gFF = generatorFunctionF()'", time.time()-startTime) 
    print("*** before print(next(gFF)", time.time()-startTime)
    print(next(gFF))
    print("*** after  print(next(gFF)", time.time()-startTime)
    print("*** before time.sleep(3)", time.time()-startTime)
    time.sleep(3)
    print("*** after  time.sleep(3)", time.time()-startTime)
    print("*** before print(next(gFF)", time.time()-startTime)
    print(next(gFF))
    print("*** after  print(next(gFF)", time.time()-startTime)
    print("*** before list(gFF)", time.time()-startTime)
    print("*** list(gFF): ", list(gFF), time.time()-startTime)
    print("*** after:  list(gFF)", time.time()-startTime)
    print("*** before time.sleep(3)", time.time()-startTime)
    time.sleep(3)
    print("*** after  time.sleep(3)", time.time()-startTime)
    return "*** endOf standardFunctionF"

print()
print(standardFunctionF)
print(standardFunctionF())

给出:

>python3.6 -u "aboutIteratorsAndGenerators.py"

<function standardFunctionF at 0x7f97800361e0>
*** before: 'gFF = generatorFunctionF()' 1.001169204711914
*** after:  'gFF = generatorFunctionF()' 1.0011975765228271
*** before print(next(gFF) 1.0012099742889404
# here: generatorFunctionF() lineNo #1 1.0012233257293701
# now: time.sleep(1) 1.0012412071228027
# before yield 1 2.0023491382598877
1
*** after  print(next(gFF) 2.002397298812866
*** before time.sleep(3) 2.0024073123931885
*** after  time.sleep(3) 5.005511283874512
*** before print(next(gFF) 5.005547761917114
# after  yield 1 5.005556106567383
# now: time.sleep(1) 5.005565881729126
# before yield 2 6.006666898727417
2
*** after  print(next(gFF) 6.006711006164551
*** before list(gFF) 6.0067174434661865
# after  yield 2 6.006726026535034
# now: time.sleep(1) 6.006732702255249
# before yield 3 7.0077736377716064
# after  yield 3 7.0078125
# now: time.sleep(5) 7.007838010787964
# end followed by 'return' 12.011908054351807
*** list(gFF):  [3] 12.011950254440308
*** after:  list(gFF) 12.011966466903687
*** before time.sleep(3) 12.011971473693848
*** after  time.sleep(3) 15.015069007873535
*** endOf standardFunctionF
>Exit code: 0

因为yield之间的代码可能会有副作用。您不仅在 "want the next value" 时推进生成器,而且在您想要通过继续 运行 代码来推进生成器时推进生成器。

关于 Python 中生成函数的预期特征的问题应该从

更广泛的主题的角度来看待

implicit parallelism

这里是excerpt from Wikipedia"In computer science, implicit parallelism is a characteristic of a programming language that allows a compiler or interpreter to automatically exploit the parallelism inherent to the computations expressed by some of the language's constructs."

问题的本质有什么重要的原因吗,为什么生成器函数不在 yield 之间的空闲时间内预取下一个项目? 实际上是要问

"Does Python as programming language support implicit parallelism?"

尽管事实上(问题作者的引用表达了意见):“没有任何有意义的理由说明生成器函数不应该提供这种类型的'intelligent' 行为。",在 Python 作为编程语言的上下文中,问题的实际正确答案(已经在评论中给出但没有如此清楚地暴露问题的核心)是:

Python 生成器函数不应该在后台智能预取下一项以便稍后立即交付的重要原因是 Python 作为编程语言 不支持隐式并行。


这就是说,如果可以在 Python 中以明确的方式提供预期的功能,那么在这种情况下进行探索肯定很有趣?是的,这是可能的。让我们在这种情况下演示一个生成器函数,该函数能够通过将此功能显式编程到此类函数中来在后台隐式预取下一个项目:

from multiprocessing import Process
import time

def generatorFetchingItemsOnDemand():
    for i in range(1, 4):
        time.sleep(2)
        print("# ...ItemsOnDemand spends 2 seconds for delivery of item")
        yield i

def generatorPrefetchingItemsForImmediateDelivery():
    with open('tmpFile','w') as tmpFile:
        tmpFile.write('')
        tmpFile.flush()

    def itemPrefetcher():
        for i in range(1, 4):
            time.sleep(2)
            print("### itemPrefetcher spends 2 seconds for prefetching an item")
            with open('tmpFile','a') as tmpFile:
                tmpFile.write(str(i)+'\n')
                tmpFile.flush()

    p = Process(target=itemPrefetcher)
    p.start()

    for i in range(1, 4):
        with open('tmpFile','r') as tmpFile:
            lstFileLines = tmpFile.readlines()
        if len(lstFileLines) < i: 
            while len(lstFileLines) < i:
                time.sleep(0.1)
                with open('tmpFile','r') as tmpFile:
                    lstFileLines = tmpFile.readlines()

        yield int(lstFileLines[i-1])
#:def

def workOnAllItems(intValue):
    startTime = time.time()
    time.sleep(2)
    print("workOn(", intValue, "): took", (time.time()-startTime), "seconds")
    return intValue

print("===============================")        
genPrefetch = generatorPrefetchingItemsForImmediateDelivery()
startTime = time.time()
for item in genPrefetch:
    workOnAllItems(item)
print("using genPrefetch workOnAllItems took", (time.time()-startTime), "seconds")
print("-------------------------------")        
print()
print("===============================")        
genOnDemand = generatorFetchingItemsOnDemand()
startTime = time.time()
for item in genOnDemand:
    workOnAllItems(item)
print("using genOnDemand workOnAllItems took", (time.time()-startTime), "seconds")
print("-------------------------------")        

所提供的代码使用文件系统进行进程间通信,所以如果您想在自己的编程中重新使用此概念以将其替换为现有的其他更快的进程间通信机制,请随意。以此处演示的方式实现生成器函数,做了问题的作者期望生成器函数应该做的事情,并有助于加快应用程序的速度(这里从 12 秒减少到 8 秒):

>python3.6 -u "generatorPrefetchingItemsForImmediateDelivery.py"
===============================
### itemPrefetcher spends 2 seconds for prefetching an item
### itemPrefetcher spends 2 seconds for prefetching an item
workOn( 1 ): took 2.0009119510650635 seconds
### itemPrefetcher spends 2 seconds for prefetching an item
workOn( 2 ): took 2.0010197162628174 seconds
workOn( 3 ): took 2.00161075592041 seconds
using genPrefetch workOnAllItems took 8.013896942138672 seconds
-------------------------------

===============================
# ...ItemsOnDemand spends 2 seconds for delivery of item
workOn( 1 ): took 2.0011563301086426 seconds
# ...ItemsOnDemand spends 2 seconds for delivery of item
workOn( 2 ): took 2.001920461654663 seconds
# ...ItemsOnDemand spends 2 seconds for delivery of item
workOn( 3 ): took 2.0002224445343018 seconds
using genOnDemand workOnAllItems took 12.007976293563843 seconds
-------------------------------
>Exit code: 0

生成器被设计为用于编写迭代器的更简单、更短、更易于理解的语法。那是他们的用例。想要使迭代器更短且更易于理解的人 而不是 想要将线程同步的令人头疼的问题引入到他们编写的每个迭代器中。那将与设计目标背道而驰。

因此,生成器基于 coroutines 和协作多任务处理的概念,而不是线程。设计权衡不同;生成器牺牲并行执行以换取更容易推理的语义。

此外,为每个生成器使用单独的线程会非常低效,并且弄清楚何时并行化是一个难题。大多数生成器实际上 worth 在另一个线程中执行。哎呀,即使在 Python 的无 GIL 实现中,它们也不值得在另一个线程中执行,例如 Jython 或 Grumpy。

如果您想要并行运行的东西,已经通过启动线程或进程并通过队列与其通信来处理了。