具有相等数量的数据点和系数的多项式拟合

Question

我目前正在尝试使用 jupyter 进行多项式拟合。下面的函数 returns 给定 xs 中具有相应 ys 的数据点的 m 次最小二乘多项式。

from numpy import *
from matplotlib import pyplot as plt

def findas(m,xs,ys):
    A = array([[0]*(m+1)]*(m+1))
    b = array([0]*(m+1))
    for k in range(m+1):
        b[k] = sum(ys*xs**k)
        for i in range(m+1):
            A[k,i] = sum(xs**(k+i))
    coefs = linalg.solve(A,b)
    print(coefs)
    def fit(x):
        return sum(coefs*(x**array(range(len(coefs)))))
    return fit

假设我有以下六个数据点并拟合一个 5 次多项式：

xs = array([1,2,3,4,5,6])
ys = array([-5.21659 ,2.53152 ,2.05687 ,14.1135 ,20.9673 ,33.5652])
ft = findas(5,xs,ys)

根据我的理解，生成的曲线应该准确地通过每个数据点（事实上，拉格朗日多项式应该是结果）。

xdense = arange(1,6.1,0.1)
ydense = [ft(x) for x in xdense]   

plt.plot(xdense,ydense)
plt.plot(xs,ys,'rx')
plt.show()

示例输出：

然而，事实并非如此。曲线还差得很远！这里发生了什么？这与舍入误差有关吗？提前致谢！

Answer 1

您的代码看起来是正确的；你 re-discovered 试图用 finite-precision 算法反转 nearly-singular 矩阵的问题。矩阵 A 看起来像这样

[[       6       21       91      441     2275    12201]
 [      21       91      441     2275    12201    67171]
 [      91      441     2275    12201    67171   376761]
 [     441     2275    12201    67171   376761  2142595]
 [    2275    12201    67171   376761  2142595 12313161]
 [   12201    67171   376761  2142595 12313161 71340451]]

请注意最大值和最小值之间的差异有多大。它的特征值由

给出

[7.35326515e+07 1.98781177e+04 5.75757797e+01 1.74921341e+00
 5.89932892e-02 1.37532926e-04]

请注意，最大与最小之比约为 10^11。所以这个矩阵在理论上不是奇异的，但对于数值计算来说几乎是奇异的。它的反转会导致非常大的 round-off 错误，就像除以非常小的数字一样，最终结果的精度会大大降低。

更多详细信息here和相关链接

Answer 2

好像是截断错误！代码块

A = array([[0]*(m+1)]*(m+1))
b = array([0]*(m+1))
for k in range(m+1):
...

应改为：

A = array([[0.]*(m+1)]*(m+1))
b = array([0.]*(m+1))
for k in range(m+1):
...

即我们必须将零指定为浮点数。

此外，round-off 误差会在矩阵求逆过程中放大。当我们要求逆的矩阵的特征值在数量级上存在显着差异时，情况尤其如此。

具有相等数量的数据点和系数的多项式拟合

Polynomial fitting with equal number of data points and coefficients

python

regression

curve-fitting

data-fitting

polynomials