错误的指数幂图 - 如何改进曲线拟合

Wrong Exponential Power Plot - How to improve curve fit

不幸的是,scipy 的力量并不适合 return。我尝试使用 p0 作为具有接近值的输入参数,但没有帮助。

如果有人能指出我的问题,我会很高兴。

# Imports 
from scipy.optimize import curve_fit
import numpy as np 
import matplotlib.pyplot as plt

# Data
data = [[0.004408724185371062, 78.78011887652593], [0.005507091456466967, 65.01330508350753], [0.007073553026306459, 58.13364205119446], [0.009417452253958304, 50.12258366028477], [0.01315330108197482, 44.22980301062208], [0.019648758406406834, 35.436139354228956], [0.03248060063099905, 28.359815190205957], [0.06366197723675814, 21.54769216720596], [0.17683882565766149, 14.532777174472574], [1.5915494309189533, 6.156872080264581]]

# Fill lists to store x and y value
x_data,y_data = [], []
for i in data:
    x_data.append(i[0])
    y_data.append(i[1])

# Exponential Function
def func(x,m,c):
        return x**m * c 

# Curve fit
coeff, _ = curve_fit(func, x_data, y_data)
m, c = coeff[0], coeff[1]

# Plot function
x_function = np.linspace(0, 1.5, 100) 
y = x_function**m * c 
a = plt.scatter(x_data, y_data, s=30, marker = "v")
yfunction = x_function**m * c 
plt.plot(x_function, yfunction, '-')
plt.show()

另一个拟合度非常差的数据集是:

data = [[0.004408724185371062, 194.04075083542443], [0.005507091456466967, 146.09194314074864], [0.007073553026306459, 120.2115882821158], [0.009417452253958304, 74.04014371874908], [0.01315330108197482, 34.167114633194736], [0.019648758406406834, 12.775528348369871], [0.03248060063099905, 7.903195816871708], [0.06366197723675814, 5.186092050500438], [0.17683882565766149, 3.260540592404184], [1.5915494309189533, 2.006254812978579]]

我可能会遗漏一些东西,但我认为 curve_fit 工作正常。当我将 curve_fit 获得的残差与使用您在评论中提供的 excel 获得的参数获得的残差进行比较时, python 结果总是导致较低的残差(代码是下面提供)。你说 "Unfortunately the power fit with scipy does not return a good fit." 但你对 "good fit" 的衡量标准是什么? python 拟合似乎总是优于 excel 关于残差的拟合。

不确定是否一定要是这个函数,但如果不是,你也可以考虑在你的函数中添加第三个参数(在它下面命名为"d"),这将导致更好的结果。

这是修改后的代码。我更改了您的 "func" 并提高了情节的分辨率。然后残差也被打印出来。对于第一个数据集,excel 约为 79.35,python 约为 34.29。对于第二个数据集,excel 为 15220.79,python 为 601.08(假设我没有搞砸任何事情)。

from scipy.optimize import curve_fit
import numpy as np 
import matplotlib.pyplot as plt

# Data
data = [[0.004408724185371062, 78.78011887652593], [0.005507091456466967, 65.01330508350753], [0.007073553026306459, 58.13364205119446], [0.009417452253958304, 50.12258366028477], [0.01315330108197482, 44.22980301062208], [0.019648758406406834, 35.436139354228956], [0.03248060063099905, 28.359815190205957], [0.06366197723675814, 21.54769216720596], [0.17683882565766149, 14.532777174472574], [1.5915494309189533, 6.156872080264581]]
#data = [[0.004408724185371062, 194.04075083542443], [0.005507091456466967, 146.09194314074864], [0.007073553026306459, 120.2115882821158], [0.009417452253958304, 74.04014371874908], [0.01315330108197482, 34.167114633194736], [0.019648758406406834, 12.775528348369871], [0.03248060063099905, 7.903195816871708], [0.06366197723675814, 5.186092050500438], [0.17683882565766149, 3.260540592404184], [1.5915494309189533, 2.006254812978579]]
# Fill lists to store x and y value
x_data,y_data = [], []
for i in data:
    x_data.append(i[0])
    y_data.append(i[1])

# Exponential Function
def func(x,m,c):
    #slightly rewritten; you could also consider using a third parameter d
    return c*np.power(x,m) #  + d

# Curve fit
coeff, _ = curve_fit(func, x_data, y_data)
m, c = coeff[0], coeff[1] #, coeff[2]
print m, c #, d

# Plot function
a = plt.scatter(x_data, y_data, s=30, marker = "v")
x_function = np.linspace(0, 1.5, 1000) 
yfunction = c*np.power(x_function,m) # + d
plt.plot(x_function, yfunction, '-')
plt.show()
print "residuals python:",((y_data - func(x_data, *coeff))**2).sum()
#compare to excel, first data set
print "residuals excel:",((y_data - func(x_data, -0.425,7.027))**2).sum()
#compare to excel, second data set
print "residuals excel:",((y_data - func(x_data, -0.841,1.0823))**2).sum()

以你的第二个数据集为例:如果你绘制原始数据,数据的一个困难就变得很明显:你的数据非常不均匀。现在,由于您的函数具有纯幂律形式,因此最容易以对数刻度进行拟合:

In [1]: import numpy as np

In [2]: import matplotlib.pyplot as plt

In [3]: plt.ion()

In [4]: data = [[0.004408724185371062, 194.04075083542443], [0.005507091456466967, 146.09194314074864], [0.007073553026306459, 120.2115882821158], [0.009417452253958304, 74.04014371874908], [0.01315330108197482, 34.167114633194736], [0.019648758406406834, 12.775528348369871], [0.03248060063099905, 7.903195816871708], [0.06366197723675814, 5.186092050500438], [0.17683882565766149, 3.260540592404184], [1.5915494309189533, 2.006254812978579]]

In [5]: data = np.asarray(data)   # just for convenience

In [6]: data.shape
Out[6]: (10, 2)

In [7]: x, y = data[:, 0], data[:, 1]

In [8]: lx, ly = np.log(x), np.log(y)

In [9]: plt.plot(lx, ly, 'ro')
Out[9]: [<matplotlib.lines.Line2D at 0x323a250>]

In [10]: def lfunc(x, a, b):
   ....:     return a*x + b
   ....: 

In [11]: from scipy.optimize import curve_fit

In [12]: opt, cov = curve_fit(lfunc, lx, ly)

In [13]: opt
Out[13]: array([-0.84071518,  0.07906558])

In [14]: plt.plot(lx, lfunc(lx, *opt), 'b-')
Out[14]: [<matplotlib.lines.Line2D at 0x3be0f90>]

这是否是适合数据的模型是一个单独的问题。