多处理池将每个工作人员 运行 代码映射到 __main__ 块之外
multiprocessing pool map each worker running code outside __main__ block
import multiprocessing
import threading
counter = 1
print("Code outside __main__",counter)
lock = threading.Lock()
counter += 1
def foo(i):
#print("Inside foo ",i)
pass
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=10)
pool.map(foo, range(100))
如果你 运行 来自终端的这段代码 python run.py
它会打印出来
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
如果您取消注释 foo() 上的打印,您有时会看到 Code outside __main__ 1
在 foo() 调用之间。
为什么要这样做?
import multiprocessing
import threading
counter = 1
print("Code outside __main__",counter)
counter += 1
def foo(i):
global lock
with lock:
print("Inside foo ",i)
if __name__ == '__main__':
lock = threading.Lock()
pool = multiprocessing.Pool(processes=10)
pool.map(foo, range(100))
如果我在 __main__
块内声明锁,即使我使用 global lock
,它在 foo() 内也是未定义的
这是输出
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\David\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Users\David\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 48, in mapstar
return list(map(*args))
File "C:\Users\David\Documents\test.py", line 10, in foo
with lock:
NameError: name 'lock' is not defined
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "test.py", line 16, in <module>
pool.map(foo, range(100))
File "C:\Users\David\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\David\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 771, in get
raise self._value
NameError: name 'lock' is not defined
Code outside __main__ 1
Code outside __main__ 1
这段代码只是一个简化,我正在尝试抓取网站并写入文件,但我想了解这里发生了什么。
第 1 部分
sometimes the Code outside __main__ 1
is in between the foo() calls.
您使用 multiprocessing.Pool
创建的 10 个 child 进程中的每个 import *
来自“主”文件,这基本上意味着执行该文件。您将获得主要过程的打印,然后 children 获得 10。特别是随着 children 越来越多,一些早起的鸟儿可能会在其他人完成初始化之前处理来自 pool.map
调用的输入,所以这就是它们可以交错的原因。同样在此导入过程中,每个进程都会获得自己版本的 counter
变量,因此它将始终为 1
.
第 2 部分
If I declare the lock inside the __main__
block, it's undefined inside foo() even if I use global lock
foo
正在完全独立的内存中执行 space。 global
无法自动将 objects 发送到另一个进程的内存中,并且 lock
不会存在于它们的内存中,因为它们不会在 if
块内执行任何操作(他们不应该)。 children 需要接收锁作为参数并将其分配给自己的内存space。当使用 Pool
时,某些东西如锁、队列等只能作为参数传递给初始化函数(正常的 Process
es 没有那么多限制)。然后,您可以使用初始化器接收锁,并将其保存到 child 内存的全局 space。
import multiprocessing as mp
from time import sleep
#mp.Lock is the same as threading.Lock, so save an import here
print("Code outside __main__")
def foo(i):
global my_lock
with my_lock:
sleep(1)#counting 1 at a time means lock is working to limit access to a resource
print("code inside foo ", i)
def init_worker(l):
global my_lock
my_lock = l
if __name__ == '__main__':
l = mp.Lock()
with mp.Pool(processes=10, initializer=init_worker, initargs=(l,)) as pool:
pool.map(foo, range(10))
import multiprocessing
import threading
counter = 1
print("Code outside __main__",counter)
lock = threading.Lock()
counter += 1
def foo(i):
#print("Inside foo ",i)
pass
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=10)
pool.map(foo, range(100))
如果你 运行 来自终端的这段代码 python run.py
它会打印出来
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
如果您取消注释 foo() 上的打印,您有时会看到 Code outside __main__ 1
在 foo() 调用之间。
为什么要这样做?
import multiprocessing
import threading
counter = 1
print("Code outside __main__",counter)
counter += 1
def foo(i):
global lock
with lock:
print("Inside foo ",i)
if __name__ == '__main__':
lock = threading.Lock()
pool = multiprocessing.Pool(processes=10)
pool.map(foo, range(100))
如果我在 __main__
块内声明锁,即使我使用 global lock
这是输出
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\David\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Users\David\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 48, in mapstar
return list(map(*args))
File "C:\Users\David\Documents\test.py", line 10, in foo
with lock:
NameError: name 'lock' is not defined
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "test.py", line 16, in <module>
pool.map(foo, range(100))
File "C:\Users\David\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\David\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 771, in get
raise self._value
NameError: name 'lock' is not defined
Code outside __main__ 1
Code outside __main__ 1
这段代码只是一个简化,我正在尝试抓取网站并写入文件,但我想了解这里发生了什么。
第 1 部分
sometimes the Code
outside __main__ 1
is in between the foo() calls.
您使用 multiprocessing.Pool
创建的 10 个 child 进程中的每个 import *
来自“主”文件,这基本上意味着执行该文件。您将获得主要过程的打印,然后 children 获得 10。特别是随着 children 越来越多,一些早起的鸟儿可能会在其他人完成初始化之前处理来自 pool.map
调用的输入,所以这就是它们可以交错的原因。同样在此导入过程中,每个进程都会获得自己版本的 counter
变量,因此它将始终为 1
.
第 2 部分
If I declare the lock inside the
__main__
block, it's undefined inside foo() even if I useglobal lock
foo
正在完全独立的内存中执行 space。 global
无法自动将 objects 发送到另一个进程的内存中,并且 lock
不会存在于它们的内存中,因为它们不会在 if
块内执行任何操作(他们不应该)。 children 需要接收锁作为参数并将其分配给自己的内存space。当使用 Pool
时,某些东西如锁、队列等只能作为参数传递给初始化函数(正常的 Process
es 没有那么多限制)。然后,您可以使用初始化器接收锁,并将其保存到 child 内存的全局 space。
import multiprocessing as mp
from time import sleep
#mp.Lock is the same as threading.Lock, so save an import here
print("Code outside __main__")
def foo(i):
global my_lock
with my_lock:
sleep(1)#counting 1 at a time means lock is working to limit access to a resource
print("code inside foo ", i)
def init_worker(l):
global my_lock
my_lock = l
if __name__ == '__main__':
l = mp.Lock()
with mp.Pool(processes=10, initializer=init_worker, initargs=(l,)) as pool:
pool.map(foo, range(10))