numpy.cos 在某些数字上的工作时间明显更长

Question

TLDR:

numpy.cos() 在特定数字上的工作时间延长了 30%（例如，正好是 24000.0）。添加一个小增量 (+0.01) 会导致 numpy.cos() 照常工作。

我不知道为什么。

我在使用 numpy 时遇到了一个奇怪的问题。我在检查缓存工作时不小心做了一个错误的图表 - numpy.cos(X) 时间如何取决于 X。这是我修改后的代码（从我的 Jupyter 笔记本中复制）：

import numpy as np
import timeit
st = 'import numpy as np'
cmp = []
cmp_list = []
left = 0
right = 50000
step = 1000
# Loop for additional average smoothing
for _ in range(10):
    cmp_list = []
    # Calculate np.cos depending on its argument
    for i in range(left, right, step):
        s=(timeit.timeit('np.cos({})'.format(i), number=15000, setup=st))
        cmp_list.append(int(s*1000)/1000)
    cmp.append(cmp_list)

# Calculate average times
av=[np.average([cmp[i][j] for i in range(len(cmp))]) for j in range(len(cmp[0]))]

# Draw the graph
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
plt.plot(range(left, right, step), av, marker='.')
plt.show()

图形如下所示：

起初我以为这只是一个随机故障。我重新计算了我的细胞，但结果几乎相同。所以我开始使用 step 参数，计算次数和平均值列表长度。但是一切都对这个数字没有影响：

更接近：

在它之后，range 没用（它不能用浮点数步进）所以我手动计算了 np.cos：

print(timeit.timeit('np.cos({})'.format(24000.01),number=5000000,setup=st))
print(timeit.timeit('np.cos({})'.format(24000.00),number=5000000,setup=st))
print(timeit.timeit('np.cos({})'.format(23999.99),number=5000000,setup=st))

结果是：

3.4297256958670914
4.337243619374931
3.4064380447380245

np.cos() 比 24000.01!

长 30% 精确计算 24000.00

还有一个类似的奇怪数字（大约500000，我记不太清了）。

我查看了 numpy 文档，查看了它的源代码，并没有提到这个效果。我知道三角函数使用多种算法取决于值的大小、精度等，但让我感到困惑的是确切的数字可以计算得更长。

为什么np.cos()会有这种奇怪的效果？它是某种处理器副作用吗（因为 numpy.cos 使用依赖于处理器的 C 函数）？我安装了 Intel Core i5 和 Ubuntu，如果对某人有帮助的话。

编辑 1： 我试图在另一台装有 AMD Ryzen 5 的机器上重现它。结果不稳定。这是相同代码的三个连续运行的图表：

import numpy as np
import timeit

s = 'import numpy as np'
times = []
x_ranges = np.arange(23999, 24001, 0.01)
for x in x_ranges:
    times.append(timeit.timeit('np.cos({})'.format(x), number=100000, setup=s))

# ---------------

import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(x_ranges, times)
plt.show()

好吧，有一些模式（比如大部分一致的左边部分和不一致的右边部分），但它与英特尔处理器运行有很大不同。看起来它真的只是处理器的特殊方面，AMD 的行为在其不确定性方面更容易预测 :)

P.S。 @WarrenWeckesser 感谢 ``np.arange``` 功能。它确实很有用，但正如预期的那样，它对结果没有任何影响。

Answer 1

这些特殊数字的计算结果缓慢可能与精确舍入和 table maker's dilemma.

有关

To illustrate, suppose you are making a table of the exponential function to 4 places. Then exp(1.626) = 5.0835. Should this be rounded to 5.083 or 5.084? If exp(1.626) is computed more carefully, it becomes 5.08350. And then 5.083500. And then 5.0835000. Since exp is transcendental, this could go on arbitrarily long before distinguishing whether exp(1.626) is 5.083500...0ddd or 5.0834999...9ddd.

不过，由于这个原因，IEEE 标准不要求超越函数精确舍入，因此 math.cos 函数的实现可能会在尽力计算最准确的结果，然后发现效果不值得付出努力。

为了证明某些数字 X 的情况，必须高精度地计算 math.cos(X) 的值并检查其二进制表示 - 尾数的可表示部分必须是后跟以下模式之一：

1 和长运行的 0
0 和一个长运行的 1（当值的计算精度低于容纳运行中所有 1 所需的精度时，这种情况显示为第一个）

因此，一个数成为超越函数的慢自变量的概率是 1/2ⁿ，其中 n 是上述模式的最大长度被算法看到，之后它放弃尝试得出完全舍入的结果。

突出显示 IEEE 754 双精度情况下尾数的可表示部分的演示（其中尾数有 53 位）：

In [1]: from mpmath import mp

In [2]: import math

In [3]: def show_mantissa_bits(x, n, k):
   ...:     print(bin(int(mp.floor(abs(x) * 2**n)))[2:])
   ...:     print('^'*k)
   ...:     

In [4]: mp.prec = 100

In [5]: show_mantissa_bits(mp.cos(108), 64, 53)
110000000100001011001011010000110111110010100011000011000000000
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In [6]: show_mantissa_bits(mp.cos(108.01), 64, 53)
101110111000000110001101110001000010100111000010101100000100110
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In [7]: show_mantissa_bits(mp.cos(448), 64, 53)
101000101000100111000010111100001011111000001111110001000000000
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In [8]: show_mantissa_bits(mp.cos(448.01), 64, 53)
101001110110001010010100100000110001111100000001101110111010111
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In [9]: show_mantissa_bits(mp.cos(495), 64, 53)
11001010100101110110001100110101010011110010000000000011111111
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In [10]: show_mantissa_bits(mp.cos(495.01), 64, 53)
11010100100111100110000000011000110000001001101100010000001010
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In [11]: show_mantissa_bits(mp.cos(24000), 64, 53)
11001000100000001100110111011101001101101101000000110011111111
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In [12]: show_mantissa_bits(mp.cos(24000.01), 64, 53)
10111110011100111001010101100101110001011010101011001010110011
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

numpy.cos 在某些数字上的工作时间明显更长

numpy.cos works significantly longer on certain numbers

python

benchmarking

numpy

TLDR: