为什么 scipy.optimize.curve_fit 反复评估初始猜测(而且可能代价高昂)?
Why is scipy.optimize.curve_fit evaluating the initial guess repeatedly (and probably costly)?
当使用 curve_fit
时,模型函数被重复且毫无用处(并且可能代价高昂)评估 而没有 更改参数。为什么会这样?
考虑以下用于识别二次函数参数的示例:
from scipy.optimize import curve_fit
i=0
def f(x, a, b):
global i; i += 1
print('run: {:2}, p: {:<11.10}, {:<11.10}'.format(i, a, b))
return(x**a+b)
popt, pcov = curve_fit(f, [0, 1, 2, 3], [1, 2, 5, 10], p0 = [1.5, 0.5])
print('\npopt:', popt)
输出:
run: 1, p: 1.5 , 0.5
run: 2, p: 1.5 , 0.5
run: 3, p: 1.5 , 0.5
run: 4, p: 1.500000022, 0.5
run: 5, p: 1.5 , 0.5000000075
run: 6, p: 2.166073038, 0.9668143807
run: 7, p: 2.166073071, 0.9668143807
run: 8, p: 2.166073038, 0.9668143951
run: 9, p: 2.014374939, 0.9956632744
run: 10, p: 2.014374969, 0.9956632744
run: 11, p: 2.014374939, 0.9956632892
run: 12, p: 2.000113621, 0.9999686478
run: 13, p: 2.000113651, 0.9999686478
run: 14, p: 2.000113621, 0.9999686627
run: 15, p: 2.000000007, 0.999999998
run: 16, p: 2.000000037, 0.999999998
run: 17, p: 2.000000007, 1.000000013
run: 18, p: 2.0 , 1.0
popt: [2. 1.]
第一次评估计算初始猜测的值,但随后在第二次和第三次评估中再次计算 运行。只有在第四次和第五次评估中,优化才通过计算导数开始。如果函数评估成本高昂并且可以接受一些较大的容差,则这些冗余评估可能会占用大量时间。
WIth full_output(参见 leastsq
文档):
In [125]: curve_fit(f, [0, 1, 2, 3], [1, 2, 5, 10], p0 = [1.5, 0.5],full_output=True)
run: 1, p: 1.5 , 0.5
run: 2, p: 1.5 , 0.5
run: 3, p: 1.5 , 0.5
run: 4, p: 1.500000022, 0.5
run: 5, p: 1.5 , 0.5000000075
run: 6, p: 2.166073038, 0.9668143807
run: 7, p: 2.166073071, 0.9668143807
run: 8, p: 2.166073038, 0.9668143951
run: 9, p: 2.014374939, 0.9956632744
run: 10, p: 2.014374969, 0.9956632744
run: 11, p: 2.014374939, 0.9956632892
run: 12, p: 2.000113621, 0.9999686478
run: 13, p: 2.000113651, 0.9999686478
run: 14, p: 2.000113621, 0.9999686627
run: 15, p: 2.000000007, 0.999999998
run: 16, p: 2.000000037, 0.999999998
run: 17, p: 2.000000007, 1.000000013
run: 18, p: 2.0 , 1.0
Out[125]:
(array([2., 1.]), array([[ 0., -0.],
[-0., 0.]]), {'fvec': array([0., 0., 0., 0.]),
'nfev': 16,
'fjac': array([[-10.2688908 , 0. , 0.26999885, 0.96286064],
[ -1.2328595 , -1.5748198 , 0.2521752 , -0.73019944]]),
'ipvt': array([1, 2], dtype=int32),
'qtf': array([-7.08708997e-08, 3.07498129e-09])}, 'The relative error between two consecutive iterates is at most 0.000000', 2)
其中返回信息为:
infodict : dict
a dictionary of optional outputs with the keys:
``nfev``
The number of function calls
``fvec``
The function evaluated at the output
``fjac``
A permutation of the R matrix of a QR
factorization of the final approximate
Jacobian matrix, stored column wise.
Together with ipvt, the covariance of the
estimate can be approximated.
``ipvt``
An integer array of length N which defines
a permutation matrix, p, such that
fjac*p = q*r, where r is upper triangular
with diagonal elements of nonincreasing
magnitude. Column j of p is column ipvt(j)
of the identity matrix.
``qtf``
The vector (transpose(q) * fvec).
所以它声称评估了 16 次,而不是你的 18 次。所以一个或多个初始评估可能来自 curve_fit
参数检查(或类似的东西)。
使用其他方法:
In [134]: i=0
In [135]: curve_fit(f, [0, 1, 2, 3], [1, 2, 5, 10], p0 = [1.5, 0.5],method='trf')
run: 1, p: 1.5 , 0.5
run: 2, p: 1.500000022, 0.5
run: 3, p: 1.5 , 0.5000000149
run: 4, p: 2.166073038, 0.9668143807
run: 5, p: 2.166073071, 0.9668143807
run: 6, p: 2.166073038, 0.9668143956
run: 7, p: 2.01437494 , 0.9956632758
run: 8, p: 2.01437497 , 0.9956632758
run: 9, p: 2.01437494 , 0.9956632907
run: 10, p: 2.000113621, 0.9999686478
run: 11, p: 2.000113651, 0.9999686478
run: 12, p: 2.000113621, 0.9999686627
run: 13, p: 2.000000007, 0.999999998
run: 14, p: 2.000000037, 0.999999998
run: 15, p: 2.000000007, 1.000000013
run: 16, p: 2.0 , 1.0
run: 17, p: 2.00000003 , 1.0
run: 18, p: 2.0 , 1.000000015
Out[135]:
(array([2., 1.]), array([[ 3.77052335e-34, -1.19338002e-33],
[-1.19338002e-33, 9.94005329e-33]]))
和
In [136]: i=0
In [137]: curve_fit(f, [0, 1, 2, 3], [1, 2, 5, 10], p0 = [1.5, 0.5],method='dogbox')
run: 1, p: 1.5 , 0.5
run: 2, p: 1.500000022, 0.5
run: 3, p: 1.5 , 0.5000000149
run: 4, p: 2.166073038, 0.9668143807
run: 5, p: 2.166073071, 0.9668143807
run: 6, p: 2.166073038, 0.9668143956
run: 7, p: 2.01437494 , 0.9956632758
run: 8, p: 2.01437497 , 0.9956632758
run: 9, p: 2.01437494 , 0.9956632907
run: 10, p: 2.000113621, 0.9999686478
run: 11, p: 2.000113651, 0.9999686478
run: 12, p: 2.000113621, 0.9999686627
run: 13, p: 2.000000007, 0.999999998
run: 14, p: 2.000000037, 0.999999998
run: 15, p: 2.000000007, 1.000000013
run: 16, p: 2.0 , 1.0
run: 17, p: 2.00000003 , 1.0
run: 18, p: 2.0 , 1.000000015
Out[137]:
(array([2., 1.]), array([[ 3.77052335e-34, -1.19338002e-33],
[-1.19338002e-33, 9.94005329e-33]]))
当使用 curve_fit
时,模型函数被重复且毫无用处(并且可能代价高昂)评估 而没有 更改参数。为什么会这样?
考虑以下用于识别二次函数参数的示例:
from scipy.optimize import curve_fit
i=0
def f(x, a, b):
global i; i += 1
print('run: {:2}, p: {:<11.10}, {:<11.10}'.format(i, a, b))
return(x**a+b)
popt, pcov = curve_fit(f, [0, 1, 2, 3], [1, 2, 5, 10], p0 = [1.5, 0.5])
print('\npopt:', popt)
输出:
run: 1, p: 1.5 , 0.5
run: 2, p: 1.5 , 0.5
run: 3, p: 1.5 , 0.5
run: 4, p: 1.500000022, 0.5
run: 5, p: 1.5 , 0.5000000075
run: 6, p: 2.166073038, 0.9668143807
run: 7, p: 2.166073071, 0.9668143807
run: 8, p: 2.166073038, 0.9668143951
run: 9, p: 2.014374939, 0.9956632744
run: 10, p: 2.014374969, 0.9956632744
run: 11, p: 2.014374939, 0.9956632892
run: 12, p: 2.000113621, 0.9999686478
run: 13, p: 2.000113651, 0.9999686478
run: 14, p: 2.000113621, 0.9999686627
run: 15, p: 2.000000007, 0.999999998
run: 16, p: 2.000000037, 0.999999998
run: 17, p: 2.000000007, 1.000000013
run: 18, p: 2.0 , 1.0
popt: [2. 1.]
第一次评估计算初始猜测的值,但随后在第二次和第三次评估中再次计算 运行。只有在第四次和第五次评估中,优化才通过计算导数开始。如果函数评估成本高昂并且可以接受一些较大的容差,则这些冗余评估可能会占用大量时间。
WIth full_output(参见 leastsq
文档):
In [125]: curve_fit(f, [0, 1, 2, 3], [1, 2, 5, 10], p0 = [1.5, 0.5],full_output=True)
run: 1, p: 1.5 , 0.5
run: 2, p: 1.5 , 0.5
run: 3, p: 1.5 , 0.5
run: 4, p: 1.500000022, 0.5
run: 5, p: 1.5 , 0.5000000075
run: 6, p: 2.166073038, 0.9668143807
run: 7, p: 2.166073071, 0.9668143807
run: 8, p: 2.166073038, 0.9668143951
run: 9, p: 2.014374939, 0.9956632744
run: 10, p: 2.014374969, 0.9956632744
run: 11, p: 2.014374939, 0.9956632892
run: 12, p: 2.000113621, 0.9999686478
run: 13, p: 2.000113651, 0.9999686478
run: 14, p: 2.000113621, 0.9999686627
run: 15, p: 2.000000007, 0.999999998
run: 16, p: 2.000000037, 0.999999998
run: 17, p: 2.000000007, 1.000000013
run: 18, p: 2.0 , 1.0
Out[125]:
(array([2., 1.]), array([[ 0., -0.],
[-0., 0.]]), {'fvec': array([0., 0., 0., 0.]),
'nfev': 16,
'fjac': array([[-10.2688908 , 0. , 0.26999885, 0.96286064],
[ -1.2328595 , -1.5748198 , 0.2521752 , -0.73019944]]),
'ipvt': array([1, 2], dtype=int32),
'qtf': array([-7.08708997e-08, 3.07498129e-09])}, 'The relative error between two consecutive iterates is at most 0.000000', 2)
其中返回信息为:
infodict : dict
a dictionary of optional outputs with the keys:
``nfev``
The number of function calls
``fvec``
The function evaluated at the output
``fjac``
A permutation of the R matrix of a QR
factorization of the final approximate
Jacobian matrix, stored column wise.
Together with ipvt, the covariance of the
estimate can be approximated.
``ipvt``
An integer array of length N which defines
a permutation matrix, p, such that
fjac*p = q*r, where r is upper triangular
with diagonal elements of nonincreasing
magnitude. Column j of p is column ipvt(j)
of the identity matrix.
``qtf``
The vector (transpose(q) * fvec).
所以它声称评估了 16 次,而不是你的 18 次。所以一个或多个初始评估可能来自 curve_fit
参数检查(或类似的东西)。
使用其他方法:
In [134]: i=0
In [135]: curve_fit(f, [0, 1, 2, 3], [1, 2, 5, 10], p0 = [1.5, 0.5],method='trf')
run: 1, p: 1.5 , 0.5
run: 2, p: 1.500000022, 0.5
run: 3, p: 1.5 , 0.5000000149
run: 4, p: 2.166073038, 0.9668143807
run: 5, p: 2.166073071, 0.9668143807
run: 6, p: 2.166073038, 0.9668143956
run: 7, p: 2.01437494 , 0.9956632758
run: 8, p: 2.01437497 , 0.9956632758
run: 9, p: 2.01437494 , 0.9956632907
run: 10, p: 2.000113621, 0.9999686478
run: 11, p: 2.000113651, 0.9999686478
run: 12, p: 2.000113621, 0.9999686627
run: 13, p: 2.000000007, 0.999999998
run: 14, p: 2.000000037, 0.999999998
run: 15, p: 2.000000007, 1.000000013
run: 16, p: 2.0 , 1.0
run: 17, p: 2.00000003 , 1.0
run: 18, p: 2.0 , 1.000000015
Out[135]:
(array([2., 1.]), array([[ 3.77052335e-34, -1.19338002e-33],
[-1.19338002e-33, 9.94005329e-33]]))
和
In [136]: i=0
In [137]: curve_fit(f, [0, 1, 2, 3], [1, 2, 5, 10], p0 = [1.5, 0.5],method='dogbox')
run: 1, p: 1.5 , 0.5
run: 2, p: 1.500000022, 0.5
run: 3, p: 1.5 , 0.5000000149
run: 4, p: 2.166073038, 0.9668143807
run: 5, p: 2.166073071, 0.9668143807
run: 6, p: 2.166073038, 0.9668143956
run: 7, p: 2.01437494 , 0.9956632758
run: 8, p: 2.01437497 , 0.9956632758
run: 9, p: 2.01437494 , 0.9956632907
run: 10, p: 2.000113621, 0.9999686478
run: 11, p: 2.000113651, 0.9999686478
run: 12, p: 2.000113621, 0.9999686627
run: 13, p: 2.000000007, 0.999999998
run: 14, p: 2.000000037, 0.999999998
run: 15, p: 2.000000007, 1.000000013
run: 16, p: 2.0 , 1.0
run: 17, p: 2.00000003 , 1.0
run: 18, p: 2.0 , 1.000000015
Out[137]:
(array([2., 1.]), array([[ 3.77052335e-34, -1.19338002e-33],
[-1.19338002e-33, 9.94005329e-33]]))