使用 numpy 拟合多项式随 dtype 变化,即使实际数据值保持不变
Fitting a polynomial with numpy changes with dtype, even though actual data values remain the same
我有一个由 xdata
和 ydata
组成的数据集,我想对其进行多项式拟合,但由于某种原因,拟合结果取决于数据集的 dtype
,即使数据的实际值保持 不变。我知道如果您更改 dtype
例如从 float
到 int
,可能会有一些信息丢失,但在这种情况下,我从 'f4'
转换到 'f8'
,因此没有信息丢失,这是为什么我不知所措。这是怎么回事?
import numpy as np
from numpy.polynomial import polynomial
x32 = np.array([
1892.8972, 1893.1168, 1893.1626, 1893.4313, 1893.4929, 1895.6392,
1895.7642, 1896.4286, 1896.5693, 1897.313, 1898.4648
], dtype='f4')
y32 = np.array([
510.83655, 489.91592, 486.4508, 469.21814, 465.7902, 388.65576,
385.37637, 369.07236, 365.8301, 349.7118, 327.4062
], dtype='f4')
x64 = x32.astype('f8')
y64 = y32.astype('f8')
a, residuals1, _, _, _ = np.polyfit(x32, y32, 2, full=True)
b, residuals2, _, _, _ = np.polyfit(x64, y64, 2, full=True)
c, (residuals3, _, _, _) = polynomial.polyfit(x32, y32, 2, full=True)
d, (residuals4, _, _, _) = polynomial.polyfit(x64, y64, 2, full=True)
print(residuals1, residuals2, residuals3, residuals4) # [] [195.86309188] [] [195.86309157]
print(a) # [ 3.54575804e+00 -1.34738721e+04 1.28004924e+07]
print(b) # [-8.70836523e-03 7.50419309e-02 3.15525483e+04]
print(c[::-1]) # [ 3.54575804e+00 -1.34738721e+04 1.28004924e+07]
print(d[::-1]) # [-8.7083541e-03 7.5099051e-02 3.1552398e+04 ]
我也只注意到这个问题,因为我也对残差值感兴趣,结果发现它们是空的,这导致我的程序崩溃。
这种不同的行为是由于 polynomial
中的 rcond
,它依赖于精度:
rcond : float, optional
Relative condition number of the fit. Singular values smaller than
this relative to the largest singular value will be ignored. The
default value is len(x)*eps, where eps is the relative precision of
the float type, about 2e-16 in most cases.
...
# set rcond
if rcond is None:
rcond = len(x)*finfo(x.dtype).eps
将 32 位示例的 rcond
设置为适当较小的值将产生与 64 位示例相同的结果(例如 rcond=1e-7
或更小)。
差异的发生是由于 polyfit() 的 rcond
隐藏参数对于 float32 和 float64 不同。这是近似的相对误差。对于 float32,其默认值约为 2e-7,对于 float64,其默认值约为 2e-16。如果您自己指定相同的 rcond 参数,那么您将得到相同的结果。
下面的代码使用 rcond
参数并使用 np.polyval
绘制图表以显示几乎相同的视觉结果。
import numpy as np
from numpy.polynomial import polynomial
import matplotlib.pyplot as plt
x32 = np.array([
1892.8972, 1893.1168, 1893.1626, 1893.4313, 1893.4929, 1895.6392,
1895.7642, 1896.4286, 1896.5693, 1897.313, 1898.4648
], dtype = 'f4')
y32 = np.array([
510.83655, 489.91592, 486.4508, 469.21814, 465.7902, 388.65576,
385.37637, 369.07236, 365.8301, 349.7118, 327.4062
], dtype = 'f4')
x64 = x32.astype('f8')
y64 = y32.astype('f8')
rcond = 2e-7
a, residuals1, _, _, _ = np.polyfit(x32, y32, 2, full=True, rcond = rcond)
b, residuals2, _, _, _ = np.polyfit(x64, y64, 2, full=True, rcond = rcond)
c, (residuals3, _, _, _) = polynomial.polyfit(x32, y32, 2, full=True, rcond = rcond)
d, (residuals4, _, _, _) = polynomial.polyfit(x64, y64, 2, full=True, rcond = rcond)
print(residuals1, residuals2, residuals3, residuals4)
# [] [195.86309188] [] [195.86309157]
print(a) # [ 3.54575804e+00 -1.34738721e+04 1.28004924e+07]
print(b) # [-8.70836523e-03 7.50419309e-02 3.15525483e+04]
print(c) # [ 1.28004924e+07 -1.34738721e+04 3.54575804e+00]
print(d) # [ 3.1552398e+04 7.5099051e-02 -8.7083541e-03]
plt.plot(x64, y64, label = 'orig')
plt.plot(x32, np.polyval(a, x32), label = 'x32_v0')
plt.plot(x64, np.polyval(b, x64), label = 'x64_v0')
plt.plot(x32, np.polyval(c[::-1], x32), label = 'x32_v1')
plt.plot(x64, np.polyval(d[::-1], x64), label = 'x64_v1')
plt.legend()
plt.show()
我有一个由 xdata
和 ydata
组成的数据集,我想对其进行多项式拟合,但由于某种原因,拟合结果取决于数据集的 dtype
,即使数据的实际值保持 不变。我知道如果您更改 dtype
例如从 float
到 int
,可能会有一些信息丢失,但在这种情况下,我从 'f4'
转换到 'f8'
,因此没有信息丢失,这是为什么我不知所措。这是怎么回事?
import numpy as np
from numpy.polynomial import polynomial
x32 = np.array([
1892.8972, 1893.1168, 1893.1626, 1893.4313, 1893.4929, 1895.6392,
1895.7642, 1896.4286, 1896.5693, 1897.313, 1898.4648
], dtype='f4')
y32 = np.array([
510.83655, 489.91592, 486.4508, 469.21814, 465.7902, 388.65576,
385.37637, 369.07236, 365.8301, 349.7118, 327.4062
], dtype='f4')
x64 = x32.astype('f8')
y64 = y32.astype('f8')
a, residuals1, _, _, _ = np.polyfit(x32, y32, 2, full=True)
b, residuals2, _, _, _ = np.polyfit(x64, y64, 2, full=True)
c, (residuals3, _, _, _) = polynomial.polyfit(x32, y32, 2, full=True)
d, (residuals4, _, _, _) = polynomial.polyfit(x64, y64, 2, full=True)
print(residuals1, residuals2, residuals3, residuals4) # [] [195.86309188] [] [195.86309157]
print(a) # [ 3.54575804e+00 -1.34738721e+04 1.28004924e+07]
print(b) # [-8.70836523e-03 7.50419309e-02 3.15525483e+04]
print(c[::-1]) # [ 3.54575804e+00 -1.34738721e+04 1.28004924e+07]
print(d[::-1]) # [-8.7083541e-03 7.5099051e-02 3.1552398e+04 ]
我也只注意到这个问题,因为我也对残差值感兴趣,结果发现它们是空的,这导致我的程序崩溃。
这种不同的行为是由于 polynomial
中的 rcond
,它依赖于精度:
rcond : float, optional
Relative condition number of the fit. Singular values smaller than
this relative to the largest singular value will be ignored. The
default value is len(x)*eps, where eps is the relative precision of
the float type, about 2e-16 in most cases.
...
# set rcond
if rcond is None:
rcond = len(x)*finfo(x.dtype).eps
将 32 位示例的 rcond
设置为适当较小的值将产生与 64 位示例相同的结果(例如 rcond=1e-7
或更小)。
差异的发生是由于 polyfit() 的 rcond
隐藏参数对于 float32 和 float64 不同。这是近似的相对误差。对于 float32,其默认值约为 2e-7,对于 float64,其默认值约为 2e-16。如果您自己指定相同的 rcond 参数,那么您将得到相同的结果。
下面的代码使用 rcond
参数并使用 np.polyval
绘制图表以显示几乎相同的视觉结果。
import numpy as np
from numpy.polynomial import polynomial
import matplotlib.pyplot as plt
x32 = np.array([
1892.8972, 1893.1168, 1893.1626, 1893.4313, 1893.4929, 1895.6392,
1895.7642, 1896.4286, 1896.5693, 1897.313, 1898.4648
], dtype = 'f4')
y32 = np.array([
510.83655, 489.91592, 486.4508, 469.21814, 465.7902, 388.65576,
385.37637, 369.07236, 365.8301, 349.7118, 327.4062
], dtype = 'f4')
x64 = x32.astype('f8')
y64 = y32.astype('f8')
rcond = 2e-7
a, residuals1, _, _, _ = np.polyfit(x32, y32, 2, full=True, rcond = rcond)
b, residuals2, _, _, _ = np.polyfit(x64, y64, 2, full=True, rcond = rcond)
c, (residuals3, _, _, _) = polynomial.polyfit(x32, y32, 2, full=True, rcond = rcond)
d, (residuals4, _, _, _) = polynomial.polyfit(x64, y64, 2, full=True, rcond = rcond)
print(residuals1, residuals2, residuals3, residuals4)
# [] [195.86309188] [] [195.86309157]
print(a) # [ 3.54575804e+00 -1.34738721e+04 1.28004924e+07]
print(b) # [-8.70836523e-03 7.50419309e-02 3.15525483e+04]
print(c) # [ 1.28004924e+07 -1.34738721e+04 3.54575804e+00]
print(d) # [ 3.1552398e+04 7.5099051e-02 -8.7083541e-03]
plt.plot(x64, y64, label = 'orig')
plt.plot(x32, np.polyval(a, x32), label = 'x32_v0')
plt.plot(x64, np.polyval(b, x64), label = 'x64_v0')
plt.plot(x32, np.polyval(c[::-1], x32), label = 'x32_v1')
plt.plot(x64, np.polyval(d[::-1], x64), label = 'x64_v1')
plt.legend()
plt.show()