如何在 python 中使用 GEV 估计最大似然

Question

我正在尝试将广义极值 (GEV) 分布的概率密度函数 (pdf) 与数据的 pdf 相匹配。该直方图是 bin 的函数。随着这个bin的调整，函数拟合的结果也会发生变化。而curve_fit(func, x, y)恰好扮演了这个角色。但是这个函数使用了“最小二乘估计”。我想要的是使用最大似然估计（MLE）。而且配合 stats.genextreme.fit(data) 功能也有很好的效果。但是这个函数并不代表直方图形状根据bin变化。使用原始数据即可。

我正在尝试使用 MLE。我成功地使用 MLE 估计了标准正态分布的参数。但是，它是基于原始数据的，不会根据bin改变。连GEV的参数都无法用原始数据估算出来

我查看了genextreme_gen、rv_continuous等的源代码，但是，这段代码太复杂了。以我的Python技能无法接受源代码。

我想通过 MLE 估计 GEV 分布的参数。而我想得到的结果是estimate是根据bin变化的。

我该怎么办？

对不起我的英语不好，谢谢你的帮助。

+)

h = 0.5  # bin width
dat = h105[1]  # data
b = np.arange(min(dat)-h/2, max(dat), h)  # bin range
n, bins = np.histogram(dat, bins=b, density=True)  # histogram
x = 0.5*(bins[1:]+bins[:-1])  # x-value of histogram

popt,_ = curve_fit(fg, x, n)  # curve_fit(GEV's pdf, x-value of histogram, pdf value)
popt = -popt[0], popt[1], popt[2]  # estimated paramter (Least squares estimation, LSE)
x1 = np.linspace((popt[1]-popt[2])/popt[0], dat.max(), 1000)
a1 = stats.genextreme.pdf(x1, *popt)  # pdf

popt = stats.genextreme.fit(dat) # estimated parameter (Maximum likelihood estimation, MLE)
x2 = np.linspace((popt[1]-popt[2])/popt[0], dat.max(), 1000)
a2 = stats.genextreme.pdf(x2, *popt)

bin 宽度 = 2

bin 宽度 = 0.5

Answer 1

执行此操作的一种方法是将 bin 转换为数据。您可以通过计算每个 bin 中的数据点数量，然后重复 bin 的中心次数来做到这一点。

我也尝试从每个 bin 中采样统一的值，但是使用 bin 的中心然后重复它似乎提供了更高可能性的参数。

import scipy.stats as stats
from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt

ground_truth_params = (0.001, 0.5, 0.999)
count = 50

h = 0.2  # bin width
dat = stats.genextreme.rvs(*ground_truth_params, count)  # data
b = np.arange(np.min(dat)-h/2, np.max(dat), h)  # bin range
n, bins = np.histogram(dat, bins=b, density=True)  # histogram
bin_counts, _ = np.histogram(dat, bins=b, density=False)  # histogram
x = 0.5*(bins[1:]+bins[:-1])  # x-value of histogram
    
def flatten(l):
    return [item for sublist in l for item in sublist]

popt,_ = curve_fit(stats.genextreme.pdf, x, n, p0=[0,1,1])  # curve_fit(GEV's pdf, x-value of histogram, pdf value)
popt_lse = -popt[0], popt[1], popt[2]  # estimated paramter (Least squares estimation, LSE)
popt_mle = stats.genextreme.fit(dat) # estimated parameter (Maximum likelihood estimation, MLE)

uniform_dat_from_bins = flatten((np.linspace(x - h/2, x + h/2, n) for n, x in zip(bin_counts, x)))
popt_uniform_mle = stats.genextreme.fit(uniform_dat_from_bins) # estimated parameter (Maximum likelihood estimation, MLE)

centered_dat_from_bins = flatten(([x] * n for n, x in zip(bin_counts, x)))
popt_centered_mle = stats.genextreme.fit(centered_dat_from_bins) # estimated parameter (Maximum likelihood estimation, MLE)

plot_params = {
    ground_truth_params: 'tab:green',
    popt_lse: 'tab:red',
    popt_mle: 'tab:orange',
    popt_centered_mle: 'tab:blue',
    popt_uniform_mle: 'tab:purple'
}
param_names = ['GT', 'LSE', 'MLE', 'bin centered MLE', 'bin uniform MLE']

plt.figure(figsize=(10,5))

plt.bar(x, n, width=h, color='lightgrey')
plt.ylim(0, 0.5)
plt.xlim(-2,10)

for params, color in plot_params.items():
    x_pdf = np.linspace(-2, 10, 1000)
    y_pdf = stats.genextreme.pdf(x_pdf, *params) # the normal pdf
    plt.plot(x_pdf, y_pdf, label='pdf', color=color)
plt.legend(param_names)

plt.figure(figsize=(10,5))
for params, color in plot_params.items():
    plt.plot(np.sum(stats.genextreme.logpdf(dat, *params)), 'o', color=color)

此图显示了使用不同方法估计的 PDF 以及真实 PDF

下图显示了给定原始数据的估计参数的可能性。

MLE 对原始数据估计的 PDF 具有预期的最大值。然后跟随使用直方图 bin（居中和均匀）估计的 PDF。在他们之后是地面实况 PDF。最后是概率最低的 PDF，这是使用最小二乘法估计的。

如何在 python 中使用 GEV 估计最大似然

How to estimate maximum likelihood with GEV in python

python

data-fitting

probability-density

mle