Python 中的负进程执行时间。我该如何正确测量呢？

Question

所以我正在做一个作业，我需要编写两个使用线程\多处理的程序。虽然它的线程部分进展顺利，但我在测量进程的执行时间时遇到了一些麻烦。这是代码：

import multiprocessing as mp
import time 

start = time.perf_counter()
txtsmpl = open('C:\dev\OS\dummy.txt', 'r').read()
processes_num = 5
to_replace = 'az'
replace_with = '{[]}'


def process_txt(start_time, inp_text):
    txt = list(inp_text)
    for i, letter in enumerate(txt):
        if letter in to_replace:
            txt[i] = replace_with
    txt = ''.join(txt)
    print(txt + '\n')
    print('Ran for ' + str(round(time.perf_counter() - start_time, 4)) + ' second(s)...\n')

def main():
    processes = []
    for _ in range(processes_num):
        p = mp.Process(target=process_txt, args = [time.perf_counter(), txtsmpl])
        p.start()
        processes.append(p)

    for process in processes:
        process.join()

    finish = time.perf_counter()
    print(f'Finished in {round(finish-start, 4)} second(s)')

if __name__ == "__main__":
    main()

这是推荐的测量方法（将开始时间作为参数传递给 thread\processss）。但是对于流程，我得到了负执行时间。当然，我知道当它们全部运行并行时会发生这种情况，但我不知道如何防止它发生。感谢您的帮助！

Answer 1

首先，当您在 Windows 下运行时，您注意到您必须保护创建 sub-processes 的代码，方法是确保它运行在一个块中正如您所做的那样，由 if __name__ == '__main__': 控制。原因是当 sub-process 执行时，整个文件是 re-executed，你会进入一个无限的递归循环来创建新的 sub-processes。但这里的要点是你有代码 not 由 if __name__ == '__main__': 块控制，你确实 not 不需要也不想执行通过每个 sub-process，即文件的打开和读取。此代码应移动。

现在揭开你的神秘面纱。根据 time.perf_counter() 上的手册：

Return the value (in fractional seconds) of a performance counter, i.e. a clock with the highest available resolution to measure a short duration. It does include time elapsed during sleep and is system-wide. The reference point of the returned value is undefined, so that only the difference between the results of consecutive calls is valid.

注意上面最后一句话。来自多个进程的并行调用。您认为您调用这个为计算经过时间的开始时间和结束时间赋值的函数是连续调用，但不能保证它们是连续调用。你真的需要只使用 time.time():

import multiprocessing as mp
import time

to_replace = 'az'
replace_with = '{[]}'


def process_txt(start_time, inp_text):
    txt = list(inp_text)
    for i, letter in enumerate(txt):
        if letter in to_replace:
            txt[i] = replace_with
    txt = ''.join(txt)
    print(txt + '\n')
    print('Ran for ' + str(round(time.time() - start_time, 4)) + ' second(s)...\n')

def main():
    start = time.time()
    txtsmpl = open('C:\dev\OS\dummy.txt', 'r').read()
    processes_num = 5
    processes = []
    for _ in range(processes_num):
        p = mp.Process(target=process_txt, args = [time.time(), txtsmpl])
        p.start()
        processes.append(p)

    for process in processes:
        process.join()

    finish = time.time()
    print(f'Finished in {round(finish-start, 4)} second(s)')

if __name__ == "__main__":
    main()

更新

为了更好地了解每个 sub-process 处理文本所花费的时间，process_txt 应该通过自己调用 time.time() 来初始化 start_text 并且要更好地了解使用 sub-processes 节省了多少时间，请将主进程中的分配 start = time.start_time() 移动到创建 sub-processes:

之前

import multiprocessing as mp
import time

to_replace = 'az'
replace_with = '{[]}'


def process_txt(inp_text):
    start_time = time.time()
    txt = list(inp_text)
    for i, letter in enumerate(txt):
        if letter in to_replace:
            txt[i] = replace_with
    txt = ''.join(txt)
    print(txt + '\n')
    print('Ran for ' + str(round(time.time() - start_time, 4)) + ' second(s)...\n')

def main():
    txtsmpl = open('C:\dev\OS\dummy.txt', 'r').read()
    processes_num = 5
    processes = []
    start = time.time()
    for _ in range(processes_num):
        p = mp.Process(target=process_txt, args = [txtsmpl])
        p.start()
        processes.append(p)

    for process in processes:
        process.join()

    finish = time.time()
    print(f'Finished in {round(finish-start, 4)} second(s)')

if __name__ == "__main__":
    main()

您可能会发现，为此类 short-running 任务创建 sub-processes 的开销远远超过使用并行性的任何好处。通过在循环中调用 process_txt 5 次而不创建 sub-processes.

，这段代码会运行快得多

Answer 2

如果您想分析 python 代码，您可以使用 cProfile。

import cProfile

def _a():
    a = 1
    while a < 1000:
        a+= 1

def _b():
    a = 1
    while a < 100:
        a+= 1

cProfile.run("_a()")
cProfile.run("_b()")

输出：

         4 function calls in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 __init__.py:4(_a)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


         4 function calls in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 __init__.py:9(_b)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



Process finished with exit code 0

Python 中的负进程执行时间。我该如何正确测量呢？

Negative process execution time in Python. How do I correctly measure this?

python

time

multiprocessing

python-3.x

python-multiprocessing