在 Python 中计时短路会产生意想不到的结果

Timing the shortcircuit in Python gives unexpected results

import time as dt 

success = True
can_test = True 


time = 0

for i in range(10000000):
  start = dt.time()
  if success and can_test:
    stop = dt.time()
    time+= stop-start


print(f'"and" operation took: {time} seconds')


time = 0

for i in range(10000000):
  start = dt.time()
  if success or can_test:
    stop = dt.time()
    time += stop-start


print(f'"or" operation took: {time} seconds')

当我运行上面的python程序时,我希望and操作比or操作慢(因为我知道短路会减少执行时间)。然而,结果不仅完全相反,而且还在波动。我能理解波动! (因为后台进程)。可为什么结果却相反!发生了什么事?

这是一个示例结果。


"and" operation took: 5.200342893600464 seconds
"or" operation took: 5.3243467807769775 seconds

这是一个有趣的问题,所以我决定深入调查您的主要顾虑。

# required modules line_profiler, matplotlib, seaborn abd scipy
import time as dt 
from line_profiler import LineProfiler
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

success = True
can_test = True 
def and_op():
    for x in range(2000):
        s = success and can_test
def or_op():
    for x in range(2000):
        s = success or can_test
or_op_list = []
for x in range(0,1000):
    lp = LineProfiler()
    lp_wrapper = lp(or_op)
    lp_wrapper()
    lstats = lp.get_stats()
    total_time = 0
    for v in lstats.timings.values():
        for op in v:
            total_time += op[-1]
            final = op[-1]
        operator = final/total_time
    or_op_list.append(operator)

and_op_list = []
for x in range(0,1000):
    lp = LineProfiler()
    lp_wrapper = lp(and_op)
    lp_wrapper()
    lstats = lp.get_stats()
    total_time = 0
    for v in lstats.timings.values():
        for op in v:
            total_time += op[-1]
            final = op[-1]
        operator = final/total_time
    and_op_list.append(operator)
sns.kdeplot(and_op_list, label = 'AND')
sns.kdeplot(or_op_list, label = 'OR')
plt.show()
print(stats.ttest_ind(and_op_list,or_op_list, equal_var = False))

p值=1.8293386245013954e-103

确实,“或”与“与”运算相比具有统计意义且不同

当我在我的机器上 运行 你的代码时,它有时会打印出 True and True 也比 True or True 快。

出现这种现象的原因是您代码中的dt.time()一个“微秒”的尺度(即 1000 纳秒),但是,这个 微秒尺度太稀疏,无法测量每个时间所花费的时间if success and can_test:if success or can_test: 的执行。在大多数情况下,if success and can_test:if success or can_test:所花费的时间是不到1微秒.

因此在您的以下部分代码中:

for i in range(10000000):
   start = dt.time()
       if success and can_test:  # a dust particle
       stop = dt.time()
       time += stop - start  # measured by a normal scale ruler
for i in range(10000000):
   start = dt.time()
       if success or can_test:  # a dust particle
       stop = dt.time()
       time += stop - start  # measured by a normal scale ruler

您的代码所做的就像用普通标尺测量每个灰尘颗粒添加测量值。由于测量误差很大,结果失真.

为了进一步调查,如果我们执行下面的代码(d记录所花费的时间及其频率):

import time as dt
from pprint import pprint

success = True
can_test = True

time = 0
d = {}
for i in range(10000000):
    start = dt.time_ns()
    if success and can_test:  # a dust particle
        stop = dt.time_ns()
        diff_time = stop - start  # measurement by a normal scale ruler
        d[diff_time] = d.get(diff_time, 0) + 1
        time += diff_time
print(f'"and" operation took: {time} ns')
print('"and" operation time distribution:')
pprint(d)
print()

time = 0
d = {}
for i in range(10000000):
    start = dt.time_ns()
    if success or can_test:  # a dust particle
        stop = dt.time_ns()
        diff_time = stop - start  # measurement by a normal scale ruler
        d[diff_time] = d.get(diff_time, 0) + 1
        time += diff_time
print(f'"or" operation took: {time} ns')
print('"or" operation time distribution:')
pprint(d)

它将打印如下:

"and" operation took: 1467442000 ns
"and" operation time distribution:
{0: 8565832,
 1000: 1432066,
 2000: 136,
 3000: 24,
 4000: 12,
 5000: 15,
 6000: 10,
 7000: 12,
 8000: 6,
 9000: 7,
 10000: 6,
 11000: 3,
 12000: 191,
 13000: 722,
 14000: 170,
 15000: 462,
 16000: 23,
 17000: 30,
 18000: 27,
 19000: 10,
 20000: 12,
 21000: 11,
 22000: 61,
 23000: 65,
 24000: 9,
 25000: 2,
 26000: 2,
 27000: 3,
 28000: 1,
 29000: 4,
 30000: 4,
 31000: 2,
 32000: 2,
 33000: 2,
 34000: 3,
 35000: 3,
 36000: 5,
 37000: 4,
 40000: 2,
 41000: 1,
 42000: 2,
 43000: 2,
 44000: 2,
 48000: 2,
 50000: 3,
 51000: 3,
 52000: 1,
 53000: 3,
 54000: 1,
 55000: 4,
 58000: 1,
 59000: 2,
 61000: 1,
 62000: 4,
 63000: 1,
 84000: 1,
 98000: 1,
 1035000: 1,
 1043000: 1,
 1608000: 1,
 1642000: 1}

"or" operation took: 1455555000 ns
"or" operation time distribution:
{0: 8569860,
 1000: 1428228,
 2000: 131,
 3000: 31,
 4000: 22,
 5000: 8,
 6000: 8,
 7000: 6,
 8000: 3,
 9000: 6,
 10000: 3,
 11000: 4,
 12000: 173,
 13000: 623,
 14000: 174,
 15000: 446,
 16000: 28,
 17000: 22,
 18000: 31,
 19000: 9,
 20000: 11,
 21000: 8,
 22000: 42,
 23000: 72,
 24000: 7,
 25000: 3,
 26000: 1,
 27000: 5,
 28000: 2,
 29000: 2,
 31000: 1,
 33000: 1,
 34000: 2,
 35000: 4,
 36000: 1,
 37000: 1,
 38000: 2,
 41000: 1,
 44000: 1,
 45000: 2,
 46000: 2,
 47000: 2,
 48000: 2,
 49000: 1,
 50000: 1,
 51000: 2,
 53000: 1,
 61000: 1,
 64000: 1,
 65000: 1,
 942000: 1}

我们可以看到大约 85.7% 的尝试测量时间(8565832 / 10000000 等于 0.85658328569860 / 10000000 等于 0.8569860)失败了,因为它只是测量了 0 纳秒。大约 14.3% 的尝试测量时间(1432066 / 10000000 等于 0.1432066 并且 1428228/10000000 等于 0.1428228)测量到 1000 纳秒。而且,不用说,尝试测量时间的其余部分(不到 0.1%)也导致了 1000 纳秒的销售。我们可以看到微秒尺度太稀疏,无法测量每次执行所花费的时间.


但我们仍然可以使用普通比例尺。通过收集尘粒并用尺子测量尘球。所以我们可以试试下面的代码:

import time as dt

success = True
can_test = True

start = dt.time()
for i in range(10000000):  # getting together the dust particles
    if success and can_test:  # a dust particle
        pass
stop = dt.time()
time = stop - start  # measure the size of the dustball
print(f'"and" operation took: {time} seconds')

start = dt.time()
for i in range(10000000):  # getting together the dust particles
    if success or can_test:  # a dust particle
        pass
stop = dt.time()
time = stop - start  # measure the size of the dustball
print(f'"or" operation took: {time} seconds')

它将打印如下:

"and" operation took: 0.6261420249938965 seconds
"or" operation took: 0.48876094818115234 seconds

或者,我们可以用一把细尺 dt.perf_counter()可以精确测量每个灰尘颗粒的大小,如下所示:

import time as dt

success = True
can_test = True

time = 0
for i in range(10000000):
    start = dt.perf_counter()
    if success and can_test:  # a dust particle
        stop = dt.perf_counter()
        time += stop - start  # measured by a fine-scale ruler
print(f'"and" operation took: {time} seconds')

time = 0
for i in range(10000000):
    start = dt.perf_counter()
    if success or can_test:  # a dust particle
        stop = dt.perf_counter()
        time += stop - start  # measured by a fine-scale ruler
print(f'"or" operation took: {time} seconds')

它将打印如下:

"and" operation took: 1.6929048989996773 seconds
"or" operation took: 1.3965214280016083 seconds

当然,True or TrueTrue and True 快!