在 gnuplot 中拟合曲线:重复拟合给出不同的结果

Fitting a curve in gnuplot: Repeated fitting give different results

我一直在尝试将等式 y = Ax^2 + Bx + C 拟合到以下数据集 (data.txt):

-9.39398e+09 1.52819e-19
-9.07008e+09 1.50337e-19
-8.74617e+09 1.44628e-19
-8.42227e+09 1.37837e-19
-8.09817e+09 1.31042e-19
-7.77427e+09 1.24624e-19
-7.45037e+09 1.18873e-19
-7.12646e+09 1.11213e-19
-6.80256e+09 1.00253e-19
-6.47865e+09 8.95713e-20
-6.15475e+09 7.92741e-20
-5.83066e+09 6.94736e-20
-5.50675e+09 6.02803e-20
-5.18285e+09 5.1667e-20
-4.85894e+09 4.37442e-20
-4.53504e+09 3.65056e-20
-4.21113e+09 3.00328e-20
-3.88723e+09 2.4281e-20
-3.56332e+09 1.93223e-20
-3.23923e+09 1.49467e-20
-2.91533e+09 1.13226e-20
-2.59142e+09 8.33933e-21
-2.26752e+09 5.93767e-21
-1.94361e+09 4.05992e-21
-1.61971e+09 2.64039e-21
-1.29581e+09 1.623e-21
-9.71713e+08 9.0523e-22
-6.47809e+08 4.05351e-22
-3.23904e+08 8.97219e-23
0 0
3.23904e+08 8.97219e-23
6.47809e+08 4.05351e-22
9.71713e+08 9.0523e-22
1.29581e+09 1.623e-21
1.61971e+09 2.64039e-21
1.94361e+09 4.05992e-21
2.26752e+09 5.93767e-21
2.59142e+09 8.33933e-21
2.91533e+09 1.13226e-20
3.23923e+09 1.49467e-20
3.56332e+09 1.93223e-20
3.88723e+09 2.4281e-20
4.21113e+09 3.00328e-20
4.53504e+09 3.65056e-20
4.85894e+09 4.37442e-20
5.18285e+09 5.1667e-20
5.50675e+09 6.02803e-20
5.83066e+09 6.94736e-20
6.15475e+09 7.92741e-20
6.47865e+09 8.95713e-20
6.80256e+09 1.00253e-19
7.12646e+09 1.11213e-19
7.45037e+09 1.18873e-19
7.77427e+09 1.24624e-19
8.09817e+09 1.31042e-19
8.42227e+09 1.37837e-19
8.74617e+09 1.44628e-19
9.07008e+09 1.50337e-19
9.39398e+09 1.52819e-19

在 gnuplot 中我输入了命令:

fit a*x**2 + b*x + c 'data.txt' via a, b, c

吐出来的结果是:

Final set of parameters            Asymptotic Standard Error
=======================            ==========================
a               = -1.73185e-20     +/- 2.658e-11    (1.535e+11%)
b               = 1                +/- 0.1325       (13.25%)
c               = 1                +/- 1.076e+09    (1.076e+11%)

correlation matrix of the fit parameters:
                a      b      c      
a               1.000 
b              -0.000  1.000 
c              -0.739  0.000  1.000 

当我使用 a、b、c 的值绘图时,数据与拟合线之间没有拟合。

所以我再次发出命令

fit a*x**2 + b*x + c 'data.txt' via a, b, c

然后 gnuplot 吐出以下输出:

Final set of parameters            Asymptotic Standard Error
=======================            ==========================
a               = -1.73185e-20     +/- 3.278e-21    (18.93%)
b               = -2.24969e-22     +/- 1.645e-11    (7.313e+12%)
c               = 1                +/- 0.1327       (13.27%)

correlation matrix of the fit parameters:
                a      b      c      
a               1.000 
b               0.000  1.000 
c              -0.739 -0.001  1.000

a、b、c 的值已更改,但这与数据的拟合也不令人满意。 所以我再次下命令:

fit a*x**2 + b*x + c 'data.txt' via a, b, c

这次吐出如下结果:

Final set of parameters            Asymptotic Standard Error
=======================            ==========================
a               = 1.96019e-39      +/- 5.98e-33     (3.051e+08%)
b               = -2.24969e-22     +/- 2.98e-23     (13.25%)
c               = -1.11942e-21     +/- 2.421e-13    (2.162e+10%)

correlation matrix of the fit parameters:
                a      b      c      
a               1.000 
b               0.000  1.000 
c              -0.739 -0.000  1.000 

这也没有给出令人满意的合身性。

然后我再次重复 fit 命令并得到以下内容:

After 4 iterations the fit converged.
final sum of squares of residuals : 2.4063e-39
abs. change during last iteration : -2.64182e-48


Hmmmm.... Sum of squared residuals is zero. Can't compute errors.

Final set of parameters 
======================= 

a               = 1.96019e-39    
b               = 2.01689e-41    
c               = -1.11942e-21

现在 a、b、c 的值与数据非常吻合。

我的问题是:

  1. 为什么第一次、第二次、第三次尝试a、b、c的值与数据拟合不好?

  2. 最后计算出来的a,b,c值可以用吗?

  3. 在上次尝试的输出中,我是否应该担心消息“嗯……残差平方和为零。无法计算错误。” ?

我认为您的值的数值范围太宽,并且我猜某些计算(例如残差)由于截断错误而出错。

如果您通过将第一列乘以 10^-9 并将第二列乘以 10^18 来标准化数据

-9.39398 0.152819
-9.07008 0.150337
-8.74617 0.144628
-8.42227 0.137837
-8.09817 0.131042
-7.77427 0.124624
-7.45037 0.118873
-7.12646 0.111213
-6.80256 0.100253
-6.47865 0.0895713
-6.15475 0.0792741
-5.83066 0.0694736
-5.50675 0.0602803
-5.18285 0.051667
-4.85894 0.0437442
-4.53504 0.0365056
-4.21113 0.0300328
-3.88723 0.024281
-3.56332 0.0193223
-3.23923 0.0149467
-2.91533 0.0113226
-2.59142 0.00833933
-2.26752 0.00593767
-1.94361 0.00405992
-1.61971 0.00264039
-1.295810 0.001623
-0.971713 0.00090523
-0.647809 0.000405351
-0.323904 0.0000897219
0 0
0.323904 0.0000897219
0.647809 0.000405351
0.971713 0.00090523
1.295810 0.001623
1.61971 0.00264039
1.94361 0.00405992
2.26752 0.00593767
2.59142 0.00833933
2.91533 0.0113226
3.23923 0.0149467
3.56332 0.0193223
3.88723 0.024281
4.21113 0.0300328
4.53504 0.0365056
4.85894 0.0437442
5.18285 0.051667
5.50675 0.0602803
5.83066 0.0694736
6.15475 0.0792741
6.47865 0.0895713
6.80256 0.100253
7.12646 0.111213
7.45037 0.118873
7.77427 0.124624
8.09817 0.131042
8.42227 0.137837
8.74617 0.144628
9.07008 0.150337
9.39398 0.152819

然后

gnuplot> fit a*x**2 + b*x + c 'data.txt' via a, b, c
iter      chisq       delta/lim  lambda   a             b             c            
   0 2.4049966612e-03   0.00e+00  4.62e-02    1.960970e-03   6.941376e-15  -1.162140e-03
   1 2.4049966612e-03  -1.79e-09  4.62e-03    1.960970e-03   6.941376e-15  -1.162140e-03
iter      chisq       delta/lim  lambda   a             b             c            

After 1 iterations the fit converged.
final sum of squares of residuals : 0.002405
rel. change during last iteration : -1.78522e-14

degrees of freedom    (FIT_NDF)                        : 56
rms of residuals      (FIT_STDFIT) = sqrt(WSSR/ndf)    : 0.00655335
variance of residuals (reduced chisquare) = WSSR/ndf   : 4.29464e-05

Final set of parameters            Asymptotic Standard Error
=======================            ==========================
a               = 0.00196097       +/- 3.136e-05    (1.599%)
b               = 6.94138e-15      +/- 0.0001416    (2.04e+12%)
c               = -0.00116214      +/- 0.00128      (110.1%)

correlation matrix of the fit parameters:
                a      b      c      
a               1.000 
b               0.001  1.000 
c              -0.745 -0.001  1.000 

稳定


如果 A、B、C 是归一化数据的拟合参数,那么您可以通过以下方式获得初始 a、b、c 参数(对应于原始数据的参数):

a = A / (10^18 * 10^18) = A * 10^-36
b = B / (10^9 * 10^18) = B * 10^-27
c = C / 10^18 = C * 10^-18

Explanation/update2:

让 (xi,yi), i=1,2,..n 成为你的初始数据,你想要拟合:

y ≈ a x^2 + b x + c

但是,对于您的计算机,最好使用:

Y ≈ A X^2 + B X + C

其中 X = 10^-9 x,Y = 10^18 y

要从 A,B,c 中找到 a,b,c 只需通过识别进行:

Y ≈ A X^2 + B X + C => 10^18 y ≈ A 10^-18 x^2 + B 10^-9 x + C
                    =>       y ≈ A 10^-36 x^2 + B 10^-27 x + C 10^-18
      to be compared to      y ≈ a x^2        + b x        + c 

因此你有:

a = A * 10^-36
b = B * 10^-27
c = C * 10^-18

现在可以了吗?