Python 拟合任意高斯函数 - 拟合不佳?
Python Fit Arbitrary Gaussian Functions - Bad Fits?
我正在尝试概括一些代码,以便能够在单个数据集中适应多个(n 从 1 到 >10)高斯 curves/peaks。
使用 Scipy 优化 Curve_fit 当我对 1-3 个高斯函数进行硬编码时,我可以得到非常好的拟合,并且我已经成功地生成了 运行 没有错误的函数对于泛化,任意数量的高斯。但是,输出拟合非常差。尽管给出了与用于生成 'raw' 数据的输入参数相同的输入参数,但这是最佳情况。
此外,在某些时候可能需要从简单的高斯修改特定函数的可能性不为零,但目前应该没问题。
下面是我的代码示例,输出图如下。
import numpy as np
import pandas as pd
import scipy
import scipy.optimize
import matplotlib.pyplot as plt
from matplotlib import gridspec
amp1 = 1
cen1 = 1
sigma1 = 0.05
df=pd.DataFrame(index=np.linspace(0,10,num=1000),columns=['int'])
def _ngaussian(x, amps,cens,sigmas):
fn = 0
if len(amps)== len(cens)== len(sigmas):
for i in range(len(amps)):
fn = fn+amps[i]*(1/(sigmas[i]*(np.sqrt(2*np.pi))))*\
(np.exp((-1.0/2.0)*(((x-cens[i])/sigmas[i])**2)))
else:
print('Your inputs have unequal lengths')
return fn
amps = [1,1.1,0.9]
cens = [1,2,1.7]
sigmas=[0.05]*3
popt_peaks = [amps,cens,sigmas]
df['peaks'] = _ngaussian(df.index, *popt_peaks)
# Optionally adding noise to the raw data
#noise = np.random.normal(0,0.1,len(df['peaks']))
#df['peaks'] = df['peaks']+noise
def wrapper_fit_func(x, *args):
N = len(args)
a, b, c = list(args[0][:N]),list(args[0][N:N*2]),list(args[0][2*N:3*N])
return _ngaussian(x, a, b, c)
def unwrapper_fit_func(x, *args):
N = int(len(args)/3)
a, b, c = list(args[:N]),list(args[N:N*2]),list(args[2*N:3*N])
return _ngaussian(x, a, b, c)
popt_fitpeaks, pcov_fitpeaks = scipy.optimize.curve_fit(lambda x, *popt_peaks: wrapper_fit_func(x, popt_peaks),
df.index, df['peaks'], p0=popt_peaks,
method='lm')
df['peaks_fit'] = unwrapper_fit_func(df.index, *popt_fitpeaks)
fig = plt.figure(figsize=(8,8))
gs = gridspec.GridSpec(1,1)
ax1 = fig.add_subplot(gs[0])
ax1.set_xlim(0,3)
ax1.plot(df.index, df['peaks'], "b",label='ideal data')
ax1.plot(df.index, df['peaks_fit'], "g",label='fit data')
ax1.legend(loc='upper right')
如果您感兴趣,上下文是分析化学、核磁共振 (NMR) 和傅里叶变换离子回旋共振质谱 (FTICR MS) 信号处理。
您可能会发现 lmfit
(https://lmfit.github.io/lmfit-py/,披露:我是主要作者)对此很有用。它提供了一个易于使用的模型 class 用于建模数据,包括高斯、Voigt 和类似线形的内置模型,可以轻松比较模型函数。
可以添加(或叠加)Lmfit 模型来制作复合模型,从而轻松支持 1、2、3 等高斯分布,并包含不同的基线函数。上面的 link 中有文档和几个示例。对您的示例进行少量重写(包括添加一些噪音)可能如下所示:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from lmfit.models import GaussianModel
amp1 = 1
cen1 = 1
sigma1 = 0.05
df=pd.DataFrame(index=np.linspace(0,10,num=1000),columns=['int'])
def _ngaussian(x, amps,cens,sigmas):
fn = 0
if len(amps)== len(cens)== len(sigmas):
for i in range(len(amps)):
fn = fn+amps[i]*(1/(sigmas[i]*(np.sqrt(2*np.pi))))*\
(np.exp((-1.0/2.0)*(((x-cens[i])/sigmas[i])**2)))
fn = fn+np.random.normal(size=len(x), scale=0.05)
else:
print('Your inputs have unequal lengths')
return fn
amps = [1.30, 0.92, 2.11]
cens = [1.10, 1.73, 2.06]
sigmas=[0.05, 0.09, 0.07]
popt_peaks = [amps,cens,sigmas]
df['peaks'] = _ngaussian(df.index, *popt_peaks)
# create a model with 3 Gaussians: pretty easy to generalize
# to a loop to make N peaks
model = (GaussianModel(prefix='p1_') +
GaussianModel(prefix='p2_') +
GaussianModel(prefix='p3_') )
# create Parameters (named from function arguments). For
# Gaussian, Lorentzian, Voigt, etc these are "center", "amplitude", "sigma"
params = model.make_params(p1_center=1.0, p1_amplitude=2, p1_sigma=0.1,
p2_center=1.5, p2_amplitude=2, p2_sigma=0.1,
p3_center=2.0, p3_amplitude=2, p3_sigma=0.1)
# Parameters can have min/max bounds, be fixed (`.vary = False`)
# or constrained to a mathematical expression of other Parameter values
params['p1_center'].min = 0.8
params['p1_center'].max = 1.5
params['p2_center'].min = 1.1
params['p2_center'].max = 1.9
params['p3_center'].min = 1.88
params['p3_center'].max = 3.00
# run the fit
result = model.fit(df['peaks'], params, x=df.index)
# print out the fit results
print(result.fit_report())
# plot results
plt.plot(df.index, df['peaks'], 'o', label='data')
plt.plot(df.index, result.best_fit, '-', label='fit')
plt.legend()
plt.gca().set_xlim(0, 3)
plt.show()
这将生成如下拟合图:
并打印出
的报告
[[Model]]
((Model(gaussian, prefix='p1_') + Model(gaussian, prefix='p2_')) + Model(gaussian, prefix='p3_'))
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 102
# data points = 1000
# variables = 9
chi-square = 6.88439024
reduced chi-square = 0.00694691
Akaike info crit = -4960.49871
Bayesian info crit = -4916.32892
[[Variables]]
p1_amplitude: 1.29432022 +/- 0.00428720 (0.33%) (init = 2)
p1_center: 1.09993745 +/- 1.9012e-04 (0.02%) (init = 1)
p1_sigma: 0.04970776 +/- 1.9012e-04 (0.38%) (init = 0.1)
p2_amplitude: 0.91875183 +/- 0.00604913 (0.66%) (init = 2)
p2_center: 1.73039597 +/- 6.7594e-04 (0.04%) (init = 1.5)
p2_sigma: 0.09054027 +/- 7.0994e-04 (0.78%) (init = 0.1)
p3_amplitude: 2.10077395 +/- 0.00533617 (0.25%) (init = 2)
p3_center: 2.06019332 +/- 2.0105e-04 (0.01%) (init = 2)
p3_sigma: 0.06970239 +/- 2.0752e-04 (0.30%) (init = 0.1)
p1_fwhm: 0.11705282 +/- 4.4770e-04 (0.38%) == '2.3548200*p1_sigma'
p1_height: 10.3878975 +/- 0.03440799 (0.33%) == '0.3989423*p1_amplitude/max(2.220446049250313e-16, p1_sigma)'
p2_fwhm: 0.21320604 +/- 0.00167179 (0.78%) == '2.3548200*p2_sigma'
p2_height: 4.04824243 +/- 0.02582408 (0.64%) == '0.3989423*p2_amplitude/max(2.220446049250313e-16, p2_sigma)'
p3_fwhm: 0.16413657 +/- 4.8866e-04 (0.30%) == '2.3548200*p3_sigma'
p3_height: 12.0238006 +/- 0.02922330 (0.24%) == '0.3989423*p3_amplitude/max(2.220446049250313e-16, p3_sigma)'
[[Correlations]] (unreported correlations are < 0.100)
C(p3_amplitude, p3_sigma) = 0.622
C(p2_amplitude, p2_sigma) = 0.621
C(p1_amplitude, p1_sigma) = 0.577
C(p2_sigma, p3_sigma) = -0.299
C(p2_sigma, p3_amplitude) = -0.271
C(p2_amplitude, p3_sigma) = -0.239
C(p2_sigma, p3_center) = 0.226
C(p2_amplitude, p3_amplitude) = -0.210
C(p2_center, p3_sigma) = -0.192
C(p2_amplitude, p3_center) = 0.171
C(p2_center, p3_amplitude) = -0.160
C(p2_center, p3_center) = 0.126
我正在尝试概括一些代码,以便能够在单个数据集中适应多个(n 从 1 到 >10)高斯 curves/peaks。
使用 Scipy 优化 Curve_fit 当我对 1-3 个高斯函数进行硬编码时,我可以得到非常好的拟合,并且我已经成功地生成了 运行 没有错误的函数对于泛化,任意数量的高斯。但是,输出拟合非常差。尽管给出了与用于生成 'raw' 数据的输入参数相同的输入参数,但这是最佳情况。
此外,在某些时候可能需要从简单的高斯修改特定函数的可能性不为零,但目前应该没问题。
下面是我的代码示例,输出图如下。
import numpy as np
import pandas as pd
import scipy
import scipy.optimize
import matplotlib.pyplot as plt
from matplotlib import gridspec
amp1 = 1
cen1 = 1
sigma1 = 0.05
df=pd.DataFrame(index=np.linspace(0,10,num=1000),columns=['int'])
def _ngaussian(x, amps,cens,sigmas):
fn = 0
if len(amps)== len(cens)== len(sigmas):
for i in range(len(amps)):
fn = fn+amps[i]*(1/(sigmas[i]*(np.sqrt(2*np.pi))))*\
(np.exp((-1.0/2.0)*(((x-cens[i])/sigmas[i])**2)))
else:
print('Your inputs have unequal lengths')
return fn
amps = [1,1.1,0.9]
cens = [1,2,1.7]
sigmas=[0.05]*3
popt_peaks = [amps,cens,sigmas]
df['peaks'] = _ngaussian(df.index, *popt_peaks)
# Optionally adding noise to the raw data
#noise = np.random.normal(0,0.1,len(df['peaks']))
#df['peaks'] = df['peaks']+noise
def wrapper_fit_func(x, *args):
N = len(args)
a, b, c = list(args[0][:N]),list(args[0][N:N*2]),list(args[0][2*N:3*N])
return _ngaussian(x, a, b, c)
def unwrapper_fit_func(x, *args):
N = int(len(args)/3)
a, b, c = list(args[:N]),list(args[N:N*2]),list(args[2*N:3*N])
return _ngaussian(x, a, b, c)
popt_fitpeaks, pcov_fitpeaks = scipy.optimize.curve_fit(lambda x, *popt_peaks: wrapper_fit_func(x, popt_peaks),
df.index, df['peaks'], p0=popt_peaks,
method='lm')
df['peaks_fit'] = unwrapper_fit_func(df.index, *popt_fitpeaks)
fig = plt.figure(figsize=(8,8))
gs = gridspec.GridSpec(1,1)
ax1 = fig.add_subplot(gs[0])
ax1.set_xlim(0,3)
ax1.plot(df.index, df['peaks'], "b",label='ideal data')
ax1.plot(df.index, df['peaks_fit'], "g",label='fit data')
ax1.legend(loc='upper right')
如果您感兴趣,上下文是分析化学、核磁共振 (NMR) 和傅里叶变换离子回旋共振质谱 (FTICR MS) 信号处理。
您可能会发现 lmfit
(https://lmfit.github.io/lmfit-py/,披露:我是主要作者)对此很有用。它提供了一个易于使用的模型 class 用于建模数据,包括高斯、Voigt 和类似线形的内置模型,可以轻松比较模型函数。
可以添加(或叠加)Lmfit 模型来制作复合模型,从而轻松支持 1、2、3 等高斯分布,并包含不同的基线函数。上面的 link 中有文档和几个示例。对您的示例进行少量重写(包括添加一些噪音)可能如下所示:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from lmfit.models import GaussianModel
amp1 = 1
cen1 = 1
sigma1 = 0.05
df=pd.DataFrame(index=np.linspace(0,10,num=1000),columns=['int'])
def _ngaussian(x, amps,cens,sigmas):
fn = 0
if len(amps)== len(cens)== len(sigmas):
for i in range(len(amps)):
fn = fn+amps[i]*(1/(sigmas[i]*(np.sqrt(2*np.pi))))*\
(np.exp((-1.0/2.0)*(((x-cens[i])/sigmas[i])**2)))
fn = fn+np.random.normal(size=len(x), scale=0.05)
else:
print('Your inputs have unequal lengths')
return fn
amps = [1.30, 0.92, 2.11]
cens = [1.10, 1.73, 2.06]
sigmas=[0.05, 0.09, 0.07]
popt_peaks = [amps,cens,sigmas]
df['peaks'] = _ngaussian(df.index, *popt_peaks)
# create a model with 3 Gaussians: pretty easy to generalize
# to a loop to make N peaks
model = (GaussianModel(prefix='p1_') +
GaussianModel(prefix='p2_') +
GaussianModel(prefix='p3_') )
# create Parameters (named from function arguments). For
# Gaussian, Lorentzian, Voigt, etc these are "center", "amplitude", "sigma"
params = model.make_params(p1_center=1.0, p1_amplitude=2, p1_sigma=0.1,
p2_center=1.5, p2_amplitude=2, p2_sigma=0.1,
p3_center=2.0, p3_amplitude=2, p3_sigma=0.1)
# Parameters can have min/max bounds, be fixed (`.vary = False`)
# or constrained to a mathematical expression of other Parameter values
params['p1_center'].min = 0.8
params['p1_center'].max = 1.5
params['p2_center'].min = 1.1
params['p2_center'].max = 1.9
params['p3_center'].min = 1.88
params['p3_center'].max = 3.00
# run the fit
result = model.fit(df['peaks'], params, x=df.index)
# print out the fit results
print(result.fit_report())
# plot results
plt.plot(df.index, df['peaks'], 'o', label='data')
plt.plot(df.index, result.best_fit, '-', label='fit')
plt.legend()
plt.gca().set_xlim(0, 3)
plt.show()
这将生成如下拟合图:
并打印出
的报告[[Model]]
((Model(gaussian, prefix='p1_') + Model(gaussian, prefix='p2_')) + Model(gaussian, prefix='p3_'))
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 102
# data points = 1000
# variables = 9
chi-square = 6.88439024
reduced chi-square = 0.00694691
Akaike info crit = -4960.49871
Bayesian info crit = -4916.32892
[[Variables]]
p1_amplitude: 1.29432022 +/- 0.00428720 (0.33%) (init = 2)
p1_center: 1.09993745 +/- 1.9012e-04 (0.02%) (init = 1)
p1_sigma: 0.04970776 +/- 1.9012e-04 (0.38%) (init = 0.1)
p2_amplitude: 0.91875183 +/- 0.00604913 (0.66%) (init = 2)
p2_center: 1.73039597 +/- 6.7594e-04 (0.04%) (init = 1.5)
p2_sigma: 0.09054027 +/- 7.0994e-04 (0.78%) (init = 0.1)
p3_amplitude: 2.10077395 +/- 0.00533617 (0.25%) (init = 2)
p3_center: 2.06019332 +/- 2.0105e-04 (0.01%) (init = 2)
p3_sigma: 0.06970239 +/- 2.0752e-04 (0.30%) (init = 0.1)
p1_fwhm: 0.11705282 +/- 4.4770e-04 (0.38%) == '2.3548200*p1_sigma'
p1_height: 10.3878975 +/- 0.03440799 (0.33%) == '0.3989423*p1_amplitude/max(2.220446049250313e-16, p1_sigma)'
p2_fwhm: 0.21320604 +/- 0.00167179 (0.78%) == '2.3548200*p2_sigma'
p2_height: 4.04824243 +/- 0.02582408 (0.64%) == '0.3989423*p2_amplitude/max(2.220446049250313e-16, p2_sigma)'
p3_fwhm: 0.16413657 +/- 4.8866e-04 (0.30%) == '2.3548200*p3_sigma'
p3_height: 12.0238006 +/- 0.02922330 (0.24%) == '0.3989423*p3_amplitude/max(2.220446049250313e-16, p3_sigma)'
[[Correlations]] (unreported correlations are < 0.100)
C(p3_amplitude, p3_sigma) = 0.622
C(p2_amplitude, p2_sigma) = 0.621
C(p1_amplitude, p1_sigma) = 0.577
C(p2_sigma, p3_sigma) = -0.299
C(p2_sigma, p3_amplitude) = -0.271
C(p2_amplitude, p3_sigma) = -0.239
C(p2_sigma, p3_center) = 0.226
C(p2_amplitude, p3_amplitude) = -0.210
C(p2_center, p3_sigma) = -0.192
C(p2_amplitude, p3_center) = 0.171
C(p2_center, p3_amplitude) = -0.160
C(p2_center, p3_center) = 0.126