使用 lmfit 在循环中建模和拟合双峰对数正态分布

Modelling and fitting bi-modal lognormal distributions in a loop using lmfit

我花了太多时间试图解决这个问题 - 所以是时候寻求帮助了。我正在尝试使用 lmfit 将两个对数正态分布(a 和 c)以及这两个对数正态分布的总和 (a+c) 拟合到一个大小分布。模式a以x=0.2,y=1为中心,模式c以x=1.2,y=<<<1为中心。有许多大小分布 (>200),它们都略有不同,并从外部循环传递到以下代码。对于这个例子,我提供了一个真实的分布并且没有包括循环。希望我的代码有足够的注释,以便理解我想要实现的目标。

我一定是缺少对 lmfit 的一些基本理解(剧透警报 - 我也不擅长数学)因为我有 2 个问题:

  1. 拟合(a、c 和 a+c)未准确表示数据。请注意拟合(红色实线)如何偏离数据(蓝色实线)。我认为这与初始猜测参数有关。我已经尝试了很多,但一直无法找到合适的。
  2. re-运行 具有“新”最佳拟合值(results2、results3)的模型似乎根本没有显着改善拟合。为什么?

使用提供的 x 和 y 数据的示例结果:

这是我之前制作的一个,显示了我所追求的拟合类型(使用较旧的 mpfit 模块生成,使用与下面提供的数据不同的数据并使用独特的初始最佳猜测参数(不在循环中)。请原谅图例格式,我不得不删除某些信息):

非常感谢任何帮助。这是带有示例分布的代码:

from lmfit import models
import matplotlib.pyplot as plt
import numpy as np

# real life data example
y = np.array([1.000000, 0.754712, 0.610303, 0.527856, 0.412125, 0.329689, 0.255756, 0.184424, 0.136819,
              0.102316, 0.078763, 0.060896, 0.047118, 0.020297, 0.007714, 0.010202, 0.008710, 0.005579,
              0.004644, 0.004043, 0.002618, 0.001194, 0.001263, 0.001043, 0.000584, 0.000330, 0.000179,
              0.000117, 0.000050, 0.000035, 0.000017, 0.000007])

x = np.array([0.124980, 0.130042, 0.135712, 0.141490, 0.147659, 0.154711, 0.162421, 0.170855, 0.180262,
              0.191324, 0.203064, 0.215738, 0.232411, 0.261810, 0.320252, 0.360761, 0.448802, 0.482528,
              0.525526, 0.581518, 0.658988, 0.870114, 1.001815, 1.238899, 1.341285, 1.535134, 1.691963,
              1.973359, 2.285620, 2.572177, 2.900414, 3.342739])

# create the joint model using prefixes for each mode
model = (models.LognormalModel(prefix='p1_') +
         models.LognormalModel(prefix='p2_'))

# add some best guesses for the model parameters
params = model.make_params(p1_center=0.1, p1_sigma=2, p1_amplitude=1,
                           p2_center=1, p2_sigma=2, p2_amplitude=0.000000000000001)

# bound those best guesses
# params['p1_amplitude'].min = 0.0
# params['p1_amplitude'].max = 1e5
# params['p1_sigma'].min = 1.01
# params['p1_sigma'].max = 5
# params['p1_center'].min = 0.01
# params['p1_center'].max = 1.0
# 
# params['p2_amplitude'].min = 0.0
# params['p2_amplitude'].max = 1
# params['p2_sigma'].min = 1.01
# params['p2_sigma'].max = 10
# params['p2_center'].min = 1.0
# params['p2_center'].max = 3

# actually fit the model
result = model.fit(y, params, x=x)

# ====================================
# ================================
# re-run using the best-fit params derived above
params2 = model.make_params(p1_center=result.best_values['p1_center'], p1_sigma=result.best_values['p1_sigma'],
                            p1_amplitude=result.best_values['p1_amplitude'],
                            p2_center=result.best_values['p2_center'], p2_sigma=result.best_values['p2_sigma'],
                            p2_amplitude=result.best_values['p2_amplitude'], )
# re-fit the model
result2 = model.fit(y, params2, x=x)

# ================================
# re-run again using the best-fit params derived above
params3 = model.make_params(p1_center=result2.best_values['p1_center'], p1_sigma=result2.best_values['p1_sigma'],
                            p1_amplitude=result2.best_values['p1_amplitude'],
                            p2_center=result2.best_values['p2_center'], p2_sigma=result2.best_values['p2_sigma'],
                            p2_amplitude=result2.best_values['p2_amplitude'], )
# re-fit the model
result3 = model.fit(y, params3, x=x)

# ================================
# add individual fine and coarse modes using the revised fit parameters
model_a = models.LognormalModel()
params_a = model_a.make_params(center=result3.best_values['p1_center'], sigma=result3.best_values['p1_sigma'],
                               amplitude=result3.best_values['p1_amplitude'])
result_a = model_a.fit(y, params_a, x=x)

model_c = models.LognormalModel()
params_c = model_c.make_params(center=result3.best_values['p2_center'], sigma=result3.best_values['p2_sigma'],
                               amplitude=result3.best_values['p2_amplitude'])
result_c = model_c.fit(y, params_c, x=x)

# ====================================
plt.plot(x, y, 'b-', label='data')
plt.plot(x, result.best_fit, 'r-', label='best_fit_1')
plt.plot(x, result.init_fit, 'lightgrey', ls=':', label='ini_fit_1')
plt.plot(x, result2.best_fit, 'r--', label='best_fit_2')
plt.plot(x, result2.init_fit, 'lightgrey', ls='--', label='ini_fit_2')
plt.plot(x, result3.best_fit, 'r.-', label='best_fit_3')
plt.plot(x, result3.init_fit, 'lightgrey', ls='--', label='ini_fit_3')

plt.plot(x, result_a.best_fit, 'grey', ls=':', label='best_fit_a')
plt.plot(x, result_c.best_fit, 'grey', ls='--', label='best_fit_c')
plt.xscale("log")
plt.yscale("log")
plt.legend()
plt.show()

我可以给出三个主要建议:

  1. 初始值很重要,不应偏离太远 模型的某些部分对参数完全不敏感 值。您的初始模型有点偏离几个订单 震级。
  2. 总是看拟合结果。这是主要的 结果——拟合图是实际的表示 数值结果。没有显示你打印出合适的 报告很好地表明您没有查看实际情况 结果。真的,总是看结果。
  3. 如果您根据以下图表判断拟合质量 数据和模型,使用您选择如何绘制数据来指导 你如何拟合数据。特别是在你的情况下,如果你是 在对数刻度上绘图,然后将数据的对数拟合到对数 模型:适合“log space”。

这样的搭配可能是这样的:

from lmfit import models, Model
from lmfit.lineshapes import lognormal
import matplotlib.pyplot as plt
import numpy as np


y = np.array([1.000000, 0.754712, 0.610303, 0.527856, 0.412125, 0.329689, 0.255756, 0.184424, 0.136819,
              0.102316, 0.078763, 0.060896, 0.047118, 0.020297, 0.007714, 0.010202, 0.008710, 0.005579,
              0.004644, 0.004043, 0.002618, 0.001194, 0.001263, 0.001043, 0.000584, 0.000330, 0.000179,
              0.000117, 0.000050, 0.000035, 0.000017, 0.000007])

x = np.array([0.124980, 0.130042, 0.135712, 0.141490, 0.147659, 0.154711, 0.162421, 0.170855, 0.180262,
              0.191324, 0.203064, 0.215738, 0.232411, 0.261810, 0.320252, 0.360761, 0.448802, 0.482528,
              0.525526, 0.581518, 0.658988, 0.870114, 1.001815, 1.238899, 1.341285, 1.535134, 1.691963,
              1.973359, 2.285620, 2.572177, 2.900414, 3.342739])

# use a model that is the log of the sum of two log-normal functions
# note to be careful about log(x) for x < 0.
def log_lognormal(x, amp1, cen1, sig1, amp2, cen2, sig2):
    comp1 = lognormal(x, amp1, cen1, sig1)
    comp2 = lognormal(x, amp2, cen2, sig2)
    total = comp1 + comp2
    total[np.where(total<1.e-99)] = 1.e-99
    return np.log(comp1+comp2)

model = Model(log_lognormal)
params = model.make_params(amp1=5.0, cen1=-4, sig1=1,
                           amp2=0.1, cen2=-1, sig2=1)

# part of making sure that the lognormals are strictly positive 
params['amp1'].min = 0
params['amp2'].min = 0

result = model.fit(np.log(y), params, x=x)
print(result.fit_report())      # <-- HERE IS WHERE THE RESULTS ARE!!

# also, make a plot of data and fit
plt.plot(x, y, 'b-', label='data')
plt.plot(x, np.exp(result.best_fit), 'r-', label='best_fit')
plt.plot(x, np.exp(result.init_fit), 'grey',  label='ini_fit')
plt.xscale("log")
plt.yscale("log")
plt.legend()
plt.show()

这将打印出来

[[Model]]
    Model(log_lognormal)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 211
    # data points      = 32
    # variables        = 6
    chi-square         = 0.91190970
    reduced chi-square = 0.03507345
    Akaike info crit   = -101.854407
    Bayesian info crit = -93.0599914
[[Variables]]
    amp1:  21.3565856 +/- 193.951379 (908.16%) (init = 5)
    cen1: -4.40637490 +/- 3.81299642 (86.53%) (init = -4)
    sig1:  0.77286862 +/- 0.55925566 (72.36%) (init = 1)
    amp2:  0.00401804 +/- 7.5833e-04 (18.87%) (init = 0.1)
    cen2: -0.74055538 +/- 0.13043827 (17.61%) (init = -1)
    sig2:  0.64346873 +/- 0.04102122 (6.38%) (init = 1)
[[Correlations]] (unreported correlations are < 0.100)
    C(amp1, cen1) = -0.999
    C(cen1, sig1) = -0.999
    C(amp1, sig1) = 0.997
    C(cen2, sig2) = -0.964
    C(amp2, cen2) = -0.940
    C(amp2, sig2) = 0.849
    C(sig1, amp2) = -0.758
    C(cen1, amp2) = 0.740
    C(amp1, amp2) = -0.726
    C(sig1, cen2) = 0.687
    C(cen1, cen2) = -0.669
    C(amp1, cen2) = 0.655
    C(sig1, sig2) = -0.598
    C(cen1, sig2) = 0.581
    C(amp1, sig2) = -0.567

并生成类似

的情节