在 Python 中计时短路会产生意想不到的结果

Question

import time as dt 

success = True
can_test = True 


time = 0

for i in range(10000000):
  start = dt.time()
  if success and can_test:
    stop = dt.time()
    time+= stop-start


print(f'"and" operation took: {time} seconds')


time = 0

for i in range(10000000):
  start = dt.time()
  if success or can_test:
    stop = dt.time()
    time += stop-start


print(f'"or" operation took: {time} seconds')

当我运行上面的python程序时，我希望and操作比or操作慢（因为我知道短路会减少执行时间）。然而，结果不仅完全相反，而且还在波动。我能理解波动！（因为后台进程）。可为什么结果却相反！发生了什么事？

这是一个示例结果。


"and" operation took: 5.200342893600464 seconds
"or" operation took: 5.3243467807769775 seconds

Answer 1

这是一个有趣的问题，所以我决定深入调查您的主要顾虑。

# required modules line_profiler, matplotlib, seaborn abd scipy
import time as dt 
from line_profiler import LineProfiler
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

success = True
can_test = True 
def and_op():
    for x in range(2000):
        s = success and can_test
def or_op():
    for x in range(2000):
        s = success or can_test
or_op_list = []
for x in range(0,1000):
    lp = LineProfiler()
    lp_wrapper = lp(or_op)
    lp_wrapper()
    lstats = lp.get_stats()
    total_time = 0
    for v in lstats.timings.values():
        for op in v:
            total_time += op[-1]
            final = op[-1]
        operator = final/total_time
    or_op_list.append(operator)

and_op_list = []
for x in range(0,1000):
    lp = LineProfiler()
    lp_wrapper = lp(and_op)
    lp_wrapper()
    lstats = lp.get_stats()
    total_time = 0
    for v in lstats.timings.values():
        for op in v:
            total_time += op[-1]
            final = op[-1]
        operator = final/total_time
    and_op_list.append(operator)
sns.kdeplot(and_op_list, label = 'AND')
sns.kdeplot(or_op_list, label = 'OR')
plt.show()
print(stats.ttest_ind(and_op_list,or_op_list, equal_var = False))

p值=1.8293386245013954e-103

确实，“或”与“与”运算相比具有统计意义且不同

Answer 2

当我在我的机器上运行你的代码时，它有时会打印出 True and True 也比 True or True 快。

出现这种现象的原因是您代码中的dt.time()在一个“微秒”的尺度（即 1000 纳秒），但是，这个 微秒尺度太稀疏，无法测量每个时间所花费的时间if success and can_test: 或 if success or can_test: 的执行。在大多数情况下，if success and can_test:或if success or can_test:所花费的时间是不到1微秒.

因此在您的以下部分代码中：

for i in range(10000000):
   start = dt.time()
       if success and can_test:  # a dust particle
       stop = dt.time()
       time += stop - start  # measured by a normal scale ruler

for i in range(10000000):
   start = dt.time()
       if success or can_test:  # a dust particle
       stop = dt.time()
       time += stop - start  # measured by a normal scale ruler

您的代码所做的就像用普通标尺测量每个灰尘颗粒和添加测量值。由于测量误差很大，结果失真.

为了进一步调查，如果我们执行下面的代码（d记录所花费的时间及其频率）：

import time as dt
from pprint import pprint

success = True
can_test = True

time = 0
d = {}
for i in range(10000000):
    start = dt.time_ns()
    if success and can_test:  # a dust particle
        stop = dt.time_ns()
        diff_time = stop - start  # measurement by a normal scale ruler
        d[diff_time] = d.get(diff_time, 0) + 1
        time += diff_time
print(f'"and" operation took: {time} ns')
print('"and" operation time distribution:')
pprint(d)
print()

time = 0
d = {}
for i in range(10000000):
    start = dt.time_ns()
    if success or can_test:  # a dust particle
        stop = dt.time_ns()
        diff_time = stop - start  # measurement by a normal scale ruler
        d[diff_time] = d.get(diff_time, 0) + 1
        time += diff_time
print(f'"or" operation took: {time} ns')
print('"or" operation time distribution:')
pprint(d)

它将打印如下：

"and" operation took: 1467442000 ns
"and" operation time distribution:
{0: 8565832,
 1000: 1432066,
 2000: 136,
 3000: 24,
 4000: 12,
 5000: 15,
 6000: 10,
 7000: 12,
 8000: 6,
 9000: 7,
 10000: 6,
 11000: 3,
 12000: 191,
 13000: 722,
 14000: 170,
 15000: 462,
 16000: 23,
 17000: 30,
 18000: 27,
 19000: 10,
 20000: 12,
 21000: 11,
 22000: 61,
 23000: 65,
 24000: 9,
 25000: 2,
 26000: 2,
 27000: 3,
 28000: 1,
 29000: 4,
 30000: 4,
 31000: 2,
 32000: 2,
 33000: 2,
 34000: 3,
 35000: 3,
 36000: 5,
 37000: 4,
 40000: 2,
 41000: 1,
 42000: 2,
 43000: 2,
 44000: 2,
 48000: 2,
 50000: 3,
 51000: 3,
 52000: 1,
 53000: 3,
 54000: 1,
 55000: 4,
 58000: 1,
 59000: 2,
 61000: 1,
 62000: 4,
 63000: 1,
 84000: 1,
 98000: 1,
 1035000: 1,
 1043000: 1,
 1608000: 1,
 1642000: 1}

"or" operation took: 1455555000 ns
"or" operation time distribution:
{0: 8569860,
 1000: 1428228,
 2000: 131,
 3000: 31,
 4000: 22,
 5000: 8,
 6000: 8,
 7000: 6,
 8000: 3,
 9000: 6,
 10000: 3,
 11000: 4,
 12000: 173,
 13000: 623,
 14000: 174,
 15000: 446,
 16000: 28,
 17000: 22,
 18000: 31,
 19000: 9,
 20000: 11,
 21000: 8,
 22000: 42,
 23000: 72,
 24000: 7,
 25000: 3,
 26000: 1,
 27000: 5,
 28000: 2,
 29000: 2,
 31000: 1,
 33000: 1,
 34000: 2,
 35000: 4,
 36000: 1,
 37000: 1,
 38000: 2,
 41000: 1,
 44000: 1,
 45000: 2,
 46000: 2,
 47000: 2,
 48000: 2,
 49000: 1,
 50000: 1,
 51000: 2,
 53000: 1,
 61000: 1,
 64000: 1,
 65000: 1,
 942000: 1}

我们可以看到大约 85.7% 的尝试测量时间（8565832 / 10000000 等于 0.8565832 和 8569860 / 10000000 等于 0.8569860）失败了，因为它只是测量了 0 纳秒。大约 14.3% 的尝试测量时间（1432066 / 10000000 等于 0.1432066 并且 1428228/10000000 等于 0.1428228）测量到 1000 纳秒。而且，不用说，尝试测量时间的其余部分（不到 0.1%）也导致了 1000 纳秒的销售。我们可以看到微秒尺度太稀疏，无法测量每次执行所花费的时间.

但我们仍然可以使用普通比例尺。通过收集尘粒并用尺子测量尘球。所以我们可以试试下面的代码：

import time as dt

success = True
can_test = True

start = dt.time()
for i in range(10000000):  # getting together the dust particles
    if success and can_test:  # a dust particle
        pass
stop = dt.time()
time = stop - start  # measure the size of the dustball
print(f'"and" operation took: {time} seconds')

start = dt.time()
for i in range(10000000):  # getting together the dust particles
    if success or can_test:  # a dust particle
        pass
stop = dt.time()
time = stop - start  # measure the size of the dustball
print(f'"or" operation took: {time} seconds')

它将打印如下：

"and" operation took: 0.6261420249938965 seconds
"or" operation took: 0.48876094818115234 seconds

或者，我们可以用一把细尺 dt.perf_counter()可以精确测量每个灰尘颗粒的大小，如下所示：

import time as dt

success = True
can_test = True

time = 0
for i in range(10000000):
    start = dt.perf_counter()
    if success and can_test:  # a dust particle
        stop = dt.perf_counter()
        time += stop - start  # measured by a fine-scale ruler
print(f'"and" operation took: {time} seconds')

time = 0
for i in range(10000000):
    start = dt.perf_counter()
    if success or can_test:  # a dust particle
        stop = dt.perf_counter()
        time += stop - start  # measured by a fine-scale ruler
print(f'"or" operation took: {time} seconds')

它将打印如下：

"and" operation took: 1.6929048989996773 seconds
"or" operation took: 1.3965214280016083 seconds

当然，True or True 比 True and True 快！

在 Python 中计时短路会产生意想不到的结果

Timing the shortcircuit in Python gives unexpected results

python

time

short-circuiting