何时以及如何使用 Polynomial.fit() 而不是 polyfit()?

When and how to use Polynomial.fit() as opposed to polyfit()?

使用 Python 3.10.0 和 NumPy 1.21.4。

我正在尝试理解为什么 Polynomial.fit() calculates wildly different coefficient values from polyfit()

在下面的代码中:

import numpy as np

def main():
    x = np.array([3000, 3200, 3400, 3600, 3800, 4000, 4200, 4400, 4600, 4800, 5000, 5200, 5400, 5600, 5800, 6000, 6200, 6400, 6600, 6800, 7000])
    y = np.array([5183.17702344, 5280.24520952, 5758.94478531, 6070.62698406, 6584.21169885, 8121.20863245, 7000.57326186, 7380.01493624, 7687.97802847, 7899.71417408, 8506.90860692, 8421.73816463, 8705.58403352, 9275.46094996, 9552.44715196, 9850.70796049, 9703.53073907, 9833.39941224, 9900.21604921, 9901.06392084, 9974.51206378])

    c1 = np.polynomial.polynomial.polyfit(x, y, 2)
    c2 = np.polynomial.polynomial.Polynomial.fit(x, y, 2).coef

    print(c1)
    print(c2)

if __name__ == '__main__':

    main()

c1 包含:

[-3.33620814e+03  3.44704650e+00 -2.18221029e-04]

当插入 a + bx + cx^2 时产生最适合的线,我预测 c2 包含:

[8443.4986422  2529.67242075 -872.88411679]

当插入相同的公式时,这会导致截然不同的行。

文档似乎暗示 Polynomial.fit() 是计算直线的新首选方法,但它一直输出错误的系数(除非我对多项式回归的理解完全错误)。

如果我没有正确使用这些功能,正确的使用方法是什么?

如果我正确使用了这两个函数,为什么我会使用 Polynomial.fit() 而不是 polyfit(),因为文档似乎暗示我应该这样做?

根据 Polynomial.fit() 文档,它 returns:

A series that represents the least squares fit to the data and has the domain and window specified in the call. If the coefficients for the unscaled and unshifted basis polynomials are of interest, do new_series.convert().coef.

您可以在 https://numpy.org/doc/stable/reference/routines.polynomials.html#transitioning-from-numpy-poly1d-to-numpy-polynomial那个

coefficients are given in the scaled domain defined by the linear mapping between the window and domain. convert can be used to get the coefficients in the unscaled data domain.

你可以查看

import numpy as np

def main():
    x = np.array([3000, 3200, 3400, 3600, 3800, 4000, 4200, 4400, 4600, 4800, 5000, 5200, 5400, 5600, 5800, 6000, 6200, 6400, 6600, 6800, 7000])
    y = np.array([5183.17702344, 5280.24520952, 5758.94478531, 6070.62698406, 6584.21169885, 8121.20863245, 7000.57326186, 7380.01493624, 7687.97802847, 7899.71417408, 8506.90860692, 8421.73816463, 8705.58403352, 9275.46094996, 9552.44715196, 9850.70796049, 9703.53073907, 9833.39941224, 9900.21604921, 9901.06392084, 9974.51206378])

    c1 = np.polynomial.polynomial.polyfit(x, y, 2)
    c2 = np.polynomial.polynomial.Polynomial.fit(x, y, 2).convert().coef
    c3 = np.polynomial.polynomial.Polynomial.fit(x, y, 2, window=(x.min(), x.max())).coef

    print(c1)
    print(c2)
    print(c3)

if __name__ == '__main__':

    main()

# [-3.33620814e+03  3.44704650e+00 -2.18221029e-04]
# [-3.33620814e+03  3.44704650e+00 -2.18221029e-04]
# [-3.33620814e+03  3.44704650e+00 -2.18221029e-04]

使用 Polynomial.fit() 的最重要原因可能是它在当前版本的 NumPy 中的支持,并将 polyfit 视为遗留

import numpy as np

def main():
    x = np.array([3000, 3200, 3400, 3600, 3800, 4000, 4200, 4400, 4600, 4800, 5000, 5200, 5400, 5600, 5800, 6000, 6200, 6400, 6600, 6800, 7000])
    y = np.array([5183.17702344, 5280.24520952, 5758.94478531, 6070.62698406, 6584.21169885, 8121.20863245, 7000.57326186, 7380.01493624, 7687.97802847, 7899.71417408, 8506.90860692, 8421.73816463, 8705.58403352, 9275.46094996, 9552.44715196, 9850.70796049, 9703.53073907, 9833.39941224, 9900.21604921, 9901.06392084, 9974.51206378])

    c1 = np.polynomial.polynomial.polyfit(x, y, 2)
    c2 = np.polynomial.polynomial.Polynomial.fit(x, y, 2, domain=[]).coef

    print(c1)
    print(c2)
main()

您还可以通过将空列表传递给 domain 关键字来获取系数,这会强制 class 使用其默认域 [-1,1] 并给出这些输出

[-3.33620814e+03  3.44704650e+00 -2.18221029e-04]
[-3.33620814e+03  3.44704650e+00 -2.18221029e-04]