指数拟合：optimize.curve_fit 和 stats.expon.fit 产生不同的结果

Question

我正在尝试根据我在此处阅读的答案使用两种不同的方法使直方图符合指数分布。我有兴趣获得分布的尺度参数的倒数。

根据此处给出的答案 ()，我使用 scipy.stats.expon 分布的 fit 方法。

import glob
import numpy as np
import scipy.stats as ss
import matplotlib.pyplot as plt
import seaborn as sns

fig, ax = plt.subplots(5, 1, sharex = True)
j = 0

for files in glob.glob("data_*"):

    time = []
    hist = []

    with open(files, 'r') as f:
         for line in f:
             line = line.split(' ')
             time.append(float(line[0]))
             H.append(float(line[1]))

    P  = ss.expon.fit(H, floc = 0)
    T  = np.linspace(0,200, 1000)
    rP = ss.expon.pdf(T, *P)

    ax[j].plot(T, rP, lw = 3.0)
    ax[j].hist(H,bins = 30, alpha = 0.6, label = r"$\lambda = $" + str(1/P[1]), density = True, stacked = True)
    ax[j].set_yticks([])
    ax[j].legend()

    j = j +1 

sns.despine(top = True, left = True, right = True)
plt.xlabel("Time")
plt.show()

通过这样做，我得到了以下情节：

拟合看起来不错，但我想知道 uncertainty/error lambda 值。 stats.expon 文档中没有关于如何获取它的信息。

这里已经有人问过这个问题了(). The accepted answer suggested using curve_fit to fit the histogram instead. Therefore, following the tutorial here (https://riptutorial.com/scipy/example/31081/fitting-a-function-to-data-from-a-histogram)，我尝试使用curve_fit。这是修改后的代码（我插入了这些行而不是使用 scipy.stats.expon）：


    def func(x, a):
        return a*np.exp(-a*x)

    bins = np.linspace(0, 200, 201)
    data_entries, bins = np.histogram(np.array(H), bins = bins)
    binscenters = np.array([0.5 * (bins[i] + bins[i + 1]) for i in range (len(bins)-1)])
    popt, pcov = curve_fit(func, xdata = binscenters, ydata = data_entries)

    ax[j].plot(T, func(T, *popt))
    ax[j].hist(H, bins = 30, alpha = 0.6, label = r"$\lambda = $" + str(popt[0]), density = True, stacked = True)

此拟合产生的结果与 stats.expon.fit 大不相同，并且似乎（至少在定性上）更差地拟合数据。

我使用 curve_fit 不正确吗？我相信在某种程度上，curve_fit 和 expon.fit 应该产生相同的结果。有什么办法可以从 expon.fit 得到估计的 lambda 中的错误？我正在考虑计算数据均值与初始拟合估计的 lambda 之间的相对误差，但我不知道这是否正确。任何提示将不胜感激。

Answer 1

我设法解决了我的问题。原来我在 numpy.histogram.

上缺少 density = True

函数

def func(x, a):
        return a*np.exp(-a*x)

是指数PDF。由于我的数据未标准化（因此不是 PDF），因此使用 curve_fit 的拟合不正确。通过此修改，ss.expon.fit 和 curve_fit 都会产生相同的 lambda 值。

指数拟合：optimize.curve_fit 和 stats.expon.fit 产生不同的结果

Exponential Fit: optimize.curve_fit and stats.expon.fit produce different results

python

curve-fitting

scipy

exponential