使用 numpy 拟合多项式随 dtype 变化,即使实际数据值保持不变

Fitting a polynomial with numpy changes with dtype, even though actual data values remain the same

我有一个由 xdataydata 组成的数据集,我想对其进行多项式拟合,但由于某种原因,拟合结果取决于数据集的 dtype ,即使数据的实际值保持 不变。我知道如果您更改 dtype 例如从 floatint,可能会有一些信息丢失,但在这种情况下,我从 'f4' 转换到 'f8',因此没有信息丢失,这是为什么我不知所措。这是怎么回事?

import numpy as np
from numpy.polynomial import polynomial

x32 = np.array([
    1892.8972, 1893.1168, 1893.1626, 1893.4313, 1893.4929, 1895.6392,
    1895.7642, 1896.4286, 1896.5693, 1897.313,  1898.4648
], dtype='f4')

y32 = np.array([
    510.83655, 489.91592, 486.4508,  469.21814, 465.7902,  388.65576,
    385.37637, 369.07236, 365.8301,  349.7118,  327.4062
], dtype='f4')

x64 = x32.astype('f8')
y64 = y32.astype('f8')

a, residuals1, _, _, _ = np.polyfit(x32, y32, 2, full=True)
b, residuals2, _, _, _ = np.polyfit(x64, y64, 2, full=True)

c, (residuals3, _, _, _) = polynomial.polyfit(x32, y32, 2, full=True)
d, (residuals4, _, _, _) = polynomial.polyfit(x64, y64, 2, full=True)

print(residuals1, residuals2, residuals3, residuals4)  # [] [195.86309188] [] [195.86309157]
print(a)        # [ 3.54575804e+00 -1.34738721e+04  1.28004924e+07]
print(b)        # [-8.70836523e-03  7.50419309e-02  3.15525483e+04]
print(c[::-1])  # [ 3.54575804e+00 -1.34738721e+04  1.28004924e+07]
print(d[::-1])  # [-8.7083541e-03   7.5099051e-02   3.1552398e+04 ]

我也只注意到这个问题,因为我也对残差值感兴趣,结果发现它们是空的,这导致我的程序崩溃。

这种不同的行为是由于 polynomial 中的 rcond,它依赖于精度:

    rcond : float, optional
        Relative condition number of the fit. Singular values smaller than
        this relative to the largest singular value will be ignored. The
        default value is len(x)*eps, where eps is the relative precision of
        the float type, about 2e-16 in most cases.

...

    # set rcond
    if rcond is None:
        rcond = len(x)*finfo(x.dtype).eps

将 32 位示例的 rcond 设置为适当较小的值将产生与 64 位示例相同的结果(例如 rcond=1e-7 或更小)。

差异的发生是由于 polyfit()rcond 隐藏参数对于 float32 和 float64 不同。这是近似的相对误差。对于 float32,其默认值约为 2e-7,对于 float64,其默认值约为 2e-16。如果您自己指定相同的 rcond 参数,那么您将得到相同的结果。

下面的代码使用 rcond 参数并使用 np.polyval 绘制图表以显示几乎相同的视觉结果。

Try it online!

import numpy as np
from numpy.polynomial import polynomial
import matplotlib.pyplot as plt

x32 = np.array([
    1892.8972, 1893.1168, 1893.1626, 1893.4313, 1893.4929, 1895.6392,
    1895.7642, 1896.4286, 1896.5693, 1897.313,  1898.4648
], dtype = 'f4')

y32 = np.array([
    510.83655, 489.91592, 486.4508,  469.21814, 465.7902,  388.65576,
    385.37637, 369.07236, 365.8301,  349.7118,  327.4062
], dtype = 'f4')

x64 = x32.astype('f8')
y64 = y32.astype('f8')

rcond = 2e-7

a, residuals1, _, _, _ = np.polyfit(x32, y32, 2, full=True, rcond = rcond)
b, residuals2, _, _, _ = np.polyfit(x64, y64, 2, full=True, rcond = rcond)

c, (residuals3, _, _, _) = polynomial.polyfit(x32, y32, 2, full=True, rcond = rcond)
d, (residuals4, _, _, _) = polynomial.polyfit(x64, y64, 2, full=True, rcond = rcond)

print(residuals1, residuals2, residuals3, residuals4)  
# [] [195.86309188] [] [195.86309157]
print(a)  # [ 3.54575804e+00 -1.34738721e+04  1.28004924e+07]
print(b)  # [-8.70836523e-03  7.50419309e-02  3.15525483e+04]
print(c)  # [ 1.28004924e+07 -1.34738721e+04  3.54575804e+00]
print(d)  # [ 3.1552398e+04  7.5099051e-02 -8.7083541e-03]

plt.plot(x64, y64, label = 'orig')
plt.plot(x32, np.polyval(a, x32), label = 'x32_v0')
plt.plot(x64, np.polyval(b, x64), label = 'x64_v0')
plt.plot(x32, np.polyval(c[::-1], x32), label = 'x32_v1')
plt.plot(x64, np.polyval(d[::-1], x64), label = 'x64_v1')
plt.legend()
plt.show()