在新进程中执行 python 代码比在主进程中慢得多
Executing python code in new process is much slower than on main process
我在 python
中开始学习 multiprocessing
,我注意到相同的代码在主进程上的执行速度比在使用 multiprocessing
模块创建的进程中快得多。
这是我的代码的简化示例,其中我首先在 main process
上执行代码并打印前 10 次计算的时间和总计算的时间。然后在 new process
上执行相同的代码(这是一个很长的 运行 过程,我可以随时发送 new_pattern
)。
import multiprocessing
import random
import time
old_patterns = [[random.uniform(-1, 1) for _ in range(0, 10)] for _ in range(0, 2000)]
new_patterns = [[random.uniform(-1, 1) for _ in range(0, 10)] for _ in range(0, 100)]
new_pattern_for_processing = multiprocessing.Array('d', 10)
there_is_new_pattern = multiprocessing.Value('i', 0)
queue = multiprocessing.Queue()
def iterate_and_add(old_patterns, new_pattern):
for each_pattern in old_patterns:
sum = 0
for count in range(0, 10):
sum += each_pattern[count] + new_pattern[count]
print_count_main_process = 0
def patt_recognition_main_process(new_pattern):
global print_count_main_process
# START of same code on main process
start_main_process_one_patt = time.time()
iterate_and_add(old_patterns, new_pattern)
if print_count_main_process < 10:
print_count_main_process += 1
print("Time on main process one pattern:", time.time() - start_main_process_one_patt)
# END of same code on main process
def patt_recognition_new_process(old_patterns, new_pattern_on_new_proc, there_is_new_pattern, queue):
print_count = 0
while True:
if there_is_new_pattern.value:
#START of same code on new process
start_new_process_one_patt = time.time()
iterate_and_add(old_patterns, new_pattern_on_new_proc)
if print_count < 10:
print_count += 1
print("Time on new process one pattern:", time.time() - start_new_process_one_patt)
#END of same code on new process
queue.put("DONE")
there_is_new_pattern.value = 0
if __name__ == "__main__":
start_main_process = time.time()
for new_pattern in new_patterns:
patt_recognition_main_process(new_pattern)
print(".\n.\n.")
print("Total Time on main process:", time.time() - start_main_process)
print("\n###########################################################################\n")
start_new_process = time.time()
p1 = multiprocessing.Process(target=patt_recognition_new_process, args=(old_patterns, new_pattern_for_processing, there_is_new_pattern, queue))
p1.start()
for new_pattern in new_patterns:
for idx, n in enumerate(new_pattern):
new_pattern_for_processing[idx] = n
there_is_new_pattern.value = 1
while True:
msg = queue.get()
if msg == "DONE":
break
print(".\n.\n.")
print("Total Time on new process:", time.time()-start_new_process)
这是我的结果:
Time on main process one pattern: 0.0025289058685302734
Time on main process one pattern: 0.0020127296447753906
Time on main process one pattern: 0.002008199691772461
Time on main process one pattern: 0.002511262893676758
Time on main process one pattern: 0.0020067691802978516
Time on main process one pattern: 0.0020036697387695312
Time on main process one pattern: 0.0020072460174560547
Time on main process one pattern: 0.0019974708557128906
Time on main process one pattern: 0.001997232437133789
Time on main process one pattern: 0.0030074119567871094
.
.
.
Total Time on main process: 0.22810864448547363
###########################################################################
Time on new process one pattern: 0.03462791442871094
Time on new process one pattern: 0.03308463096618652
Time on new process one pattern: 0.034590721130371094
Time on new process one pattern: 0.033623456954956055
Time on new process one pattern: 0.03407788276672363
Time on new process one pattern: 0.03308820724487305
Time on new process one pattern: 0.03408670425415039
Time on new process one pattern: 0.0345921516418457
Time on new process one pattern: 0.03710794448852539
Time on new process one pattern: 0.03358912467956543
.
.
.
Total Time on new process: 4.0528037548065186
为什么执行时间相差这么大?
在这种特殊情况下,您似乎在另一个进程中进行顺序执行,而不是并行化您的算法。这会产生一些开销。
流程创建本身就需要时间。但这并不是全部。您还在队列中传输数据并使用 Manager 代理。这些实际上都是队列,或者实际上是两个队列和另一个进程。与使用内存中的数据副本相比,队列非常非常慢。
如果你拿你的代码,在另一个进程中执行它并使用队列来传输数据进出,它总是比较慢。从性能的角度来看,这毫无意义。尽管如此,可能还有其他原因需要这样做,例如,如果您的主程序需要做其他事情,例如等待 IO。
如果您想要提高性能,您应该创建多个进程并拆分您的算法,以便在不同的进程中处理部分范围,从而并行工作。如果您想让一组工作进程准备好等待更多工作,您也可以考虑 Multiprocessing.Pool
。这将减少流程创建开销,因为您只需执行一次。在Python3中,也可以使用ProcessPoolExecutor
。
并行处理很有用,但很少有蛇油可以轻松解决所有问题。为了最大限度地利用它,您需要重新设计您的程序以最大化并行处理并最小化队列中的数据传输。
有点微妙,但问题在于
new_pattern_for_processing = multiprocessing.Array('d', 10)
它不包含 python float
个对象,它包含原始字节,在这种情况下足以容纳 10 个 8 字节机器级别 double
。当您读取或写入此数组时,python 必须将 float
转换为 double
或相反。如果您只读或写一次,这没什么大不了的,但是您的代码会在循环中执行多次,并且这些转换占主导地位。
为了确认,我将机器级数组复制到 python 浮点数列表一次,并让流程对其进行处理。现在它的速度和父级一样。我的更改仅在一个功能中
def patt_recognition_new_process(old_patterns, new_pattern_on_new_proc, there_is_new_pattern, queue):
print_count = 0
while True:
if there_is_new_pattern.value:
local_pattern = new_pattern_on_new_proc[:]
#START of same code on new process
start_new_process_one_patt = time.time()
#iterate_and_add(old_patterns, new_pattern_on_new_proc)
iterate_and_add(old_patterns, local_pattern)
if print_count < 10:
print_count += 1
print("Time on new process one pattern:", time.time() - start_new_process_one_patt)
#END of same code on new process
there_is_new_pattern.value = 0
queue.put("DONE")
我在 python
中开始学习 multiprocessing
,我注意到相同的代码在主进程上的执行速度比在使用 multiprocessing
模块创建的进程中快得多。
这是我的代码的简化示例,其中我首先在 main process
上执行代码并打印前 10 次计算的时间和总计算的时间。然后在 new process
上执行相同的代码(这是一个很长的 运行 过程,我可以随时发送 new_pattern
)。
import multiprocessing
import random
import time
old_patterns = [[random.uniform(-1, 1) for _ in range(0, 10)] for _ in range(0, 2000)]
new_patterns = [[random.uniform(-1, 1) for _ in range(0, 10)] for _ in range(0, 100)]
new_pattern_for_processing = multiprocessing.Array('d', 10)
there_is_new_pattern = multiprocessing.Value('i', 0)
queue = multiprocessing.Queue()
def iterate_and_add(old_patterns, new_pattern):
for each_pattern in old_patterns:
sum = 0
for count in range(0, 10):
sum += each_pattern[count] + new_pattern[count]
print_count_main_process = 0
def patt_recognition_main_process(new_pattern):
global print_count_main_process
# START of same code on main process
start_main_process_one_patt = time.time()
iterate_and_add(old_patterns, new_pattern)
if print_count_main_process < 10:
print_count_main_process += 1
print("Time on main process one pattern:", time.time() - start_main_process_one_patt)
# END of same code on main process
def patt_recognition_new_process(old_patterns, new_pattern_on_new_proc, there_is_new_pattern, queue):
print_count = 0
while True:
if there_is_new_pattern.value:
#START of same code on new process
start_new_process_one_patt = time.time()
iterate_and_add(old_patterns, new_pattern_on_new_proc)
if print_count < 10:
print_count += 1
print("Time on new process one pattern:", time.time() - start_new_process_one_patt)
#END of same code on new process
queue.put("DONE")
there_is_new_pattern.value = 0
if __name__ == "__main__":
start_main_process = time.time()
for new_pattern in new_patterns:
patt_recognition_main_process(new_pattern)
print(".\n.\n.")
print("Total Time on main process:", time.time() - start_main_process)
print("\n###########################################################################\n")
start_new_process = time.time()
p1 = multiprocessing.Process(target=patt_recognition_new_process, args=(old_patterns, new_pattern_for_processing, there_is_new_pattern, queue))
p1.start()
for new_pattern in new_patterns:
for idx, n in enumerate(new_pattern):
new_pattern_for_processing[idx] = n
there_is_new_pattern.value = 1
while True:
msg = queue.get()
if msg == "DONE":
break
print(".\n.\n.")
print("Total Time on new process:", time.time()-start_new_process)
这是我的结果:
Time on main process one pattern: 0.0025289058685302734
Time on main process one pattern: 0.0020127296447753906
Time on main process one pattern: 0.002008199691772461
Time on main process one pattern: 0.002511262893676758
Time on main process one pattern: 0.0020067691802978516
Time on main process one pattern: 0.0020036697387695312
Time on main process one pattern: 0.0020072460174560547
Time on main process one pattern: 0.0019974708557128906
Time on main process one pattern: 0.001997232437133789
Time on main process one pattern: 0.0030074119567871094
.
.
.
Total Time on main process: 0.22810864448547363
###########################################################################
Time on new process one pattern: 0.03462791442871094
Time on new process one pattern: 0.03308463096618652
Time on new process one pattern: 0.034590721130371094
Time on new process one pattern: 0.033623456954956055
Time on new process one pattern: 0.03407788276672363
Time on new process one pattern: 0.03308820724487305
Time on new process one pattern: 0.03408670425415039
Time on new process one pattern: 0.0345921516418457
Time on new process one pattern: 0.03710794448852539
Time on new process one pattern: 0.03358912467956543
.
.
.
Total Time on new process: 4.0528037548065186
为什么执行时间相差这么大?
在这种特殊情况下,您似乎在另一个进程中进行顺序执行,而不是并行化您的算法。这会产生一些开销。
流程创建本身就需要时间。但这并不是全部。您还在队列中传输数据并使用 Manager 代理。这些实际上都是队列,或者实际上是两个队列和另一个进程。与使用内存中的数据副本相比,队列非常非常慢。
如果你拿你的代码,在另一个进程中执行它并使用队列来传输数据进出,它总是比较慢。从性能的角度来看,这毫无意义。尽管如此,可能还有其他原因需要这样做,例如,如果您的主程序需要做其他事情,例如等待 IO。
如果您想要提高性能,您应该创建多个进程并拆分您的算法,以便在不同的进程中处理部分范围,从而并行工作。如果您想让一组工作进程准备好等待更多工作,您也可以考虑 Multiprocessing.Pool
。这将减少流程创建开销,因为您只需执行一次。在Python3中,也可以使用ProcessPoolExecutor
。
并行处理很有用,但很少有蛇油可以轻松解决所有问题。为了最大限度地利用它,您需要重新设计您的程序以最大化并行处理并最小化队列中的数据传输。
有点微妙,但问题在于
new_pattern_for_processing = multiprocessing.Array('d', 10)
它不包含 python float
个对象,它包含原始字节,在这种情况下足以容纳 10 个 8 字节机器级别 double
。当您读取或写入此数组时,python 必须将 float
转换为 double
或相反。如果您只读或写一次,这没什么大不了的,但是您的代码会在循环中执行多次,并且这些转换占主导地位。
为了确认,我将机器级数组复制到 python 浮点数列表一次,并让流程对其进行处理。现在它的速度和父级一样。我的更改仅在一个功能中
def patt_recognition_new_process(old_patterns, new_pattern_on_new_proc, there_is_new_pattern, queue):
print_count = 0
while True:
if there_is_new_pattern.value:
local_pattern = new_pattern_on_new_proc[:]
#START of same code on new process
start_new_process_one_patt = time.time()
#iterate_and_add(old_patterns, new_pattern_on_new_proc)
iterate_and_add(old_patterns, local_pattern)
if print_count < 10:
print_count += 1
print("Time on new process one pattern:", time.time() - start_new_process_one_patt)
#END of same code on new process
there_is_new_pattern.value = 0
queue.put("DONE")