高斯拟合返回负西格玛

Gaussian fit returning negative sigma

我的一种算法基于高斯函数执行自动峰值检测,然后根据 sigma 的乘数(用户设置)或 'full width at half maximum' 确定边缘。在用户指定 he/she 希望峰值限制在 2 Sigma 的情况下,算法从峰值中心 (mu) 取 -/+ 2*sigma。但是,我注意到curve_fit返回的sigma可以是负数,这是之前已经注意到的事情here。但是,当我通过执行 -/+ 确定边界时,这会导致算法 'failing'(由于 - - 场景),如以下代码所示。

MVCE

#! /usr/bin/env python
from scipy.optimize import curve_fit
import bisect
import numpy as np

X = [16.4697402328,16.4701402404,16.4705402481,16.4709402557,16.4713402633,16.4717402709,16.4721402785,16.4725402862,16.4729402938,16.4733403014,16.473740309,16.4741403166,16.4745403243,16.4749403319,16.4753403395,16.4757403471,16.4761403547,16.4765403623,16.47694037,16.4773403776,16.4777403852,16.4781403928,16.4785404004,16.4789404081,16.4793404157,16.4797404233,16.4801404309,16.4805404385,16.4809404462,16.4813404538,16.4817404614,16.482140469,16.4825404766,16.4829404843,16.4833404919,16.4837404995,16.4841405071,16.4845405147,16.4849405224,16.48534053,16.4857405376,16.4861405452,16.4865405528,16.4869405604,16.4873405681,16.4877405757,16.4881405833,16.4885405909,16.4889405985,16.4893406062,16.4897406138,16.4901406214,16.490540629,16.4909406366,16.4913406443,16.4917406519,16.4921406595,16.4925406671,16.4929406747,16.4933406824,16.49374069,16.4941406976,16.4945407052,16.4949407128,16.4953407205,16.4957407281,16.4961407357,16.4965407433,16.4969407509,16.4973407585,16.4977407662,16.4981407738,16.4985407814,16.498940789,16.4993407966,16.4997408043,16.5001408119,16.5005408195,16.5009408271,16.5013408347,16.5017408424,16.50214085,16.5025408576,16.5029408652,16.5033408728,16.5037408805,16.5041408881,16.5045408957,16.5049409033,16.5053409109,16.5057409186,16.5061409262,16.5065409338,16.5069409414,16.507340949,16.5077409566,16.5081409643,16.5085409719,16.5089409795,16.5093409871,16.5097409947,16.5101410024,16.51054101,16.5109410176,16.5113410252,16.5117410328,16.5121410405,16.5125410481,16.5129410557,16.5133410633,16.5137410709,16.5141410786,16.5145410862,16.5149410938,16.5153411014,16.515741109,16.5161411166,16.5165411243,16.5169411319,16.5173411395,16.5177411471,16.5181411547,16.5185411624,16.51894117,16.5193411776,16.5197411852,16.5201411928,16.5205412005,16.5209412081,16.5213412157,16.5217412233,16.5221412309,16.5225412386,16.5229412462,16.5233412538,16.5237412614,16.524141269,16.5245412767,16.5249412843,16.5253412919,16.5257412995,16.5261413071,16.5265413147,16.5269413224,16.52734133,16.5277413376,16.5281413452,16.5285413528,16.5289413605,16.5293413681,16.5297413757,16.5301413833,16.5305413909,16.5309413986,16.5313414062,16.5317414138,16.5321414214,16.532541429,16.5329414367,16.5333414443,16.5337414519,16.5341414595,16.5345414671,16.5349414748,16.5353414824,16.53574149,16.5361414976,16.5365415052,16.5369415128,16.5373415205,16.5377415281,16.5381415357,16.5385415433,16.5389415509,16.5393415586,16.5397415662,16.5401415738,16.5405415814,16.540941589,16.5413415967,16.5417416043,16.5421416119,16.5425416195,16.5429416271,16.5433416348,16.5437416424,16.54414165,16.5445416576,16.5449416652,16.5453416729,16.5457416805,16.5461416881,16.5465416957,16.5469417033,16.5473417109,16.5477417186,16.5481417262,16.5485417338,16.5489417414,16.549341749,16.5497417567,16.5501417643,16.5505417719,16.5509417795,16.5513417871,16.5517417948,16.5521418024,16.55254181,16.5529418176,16.5533418252,16.5537418329,16.5541418405,16.5545418481,16.5549418557,16.5553418633,16.5557418709,16.5561418786,16.5565418862,16.5569418938,16.5573419014,16.557741909,16.5581419167,16.5585419243,16.5589419319,16.5593419395,16.5597419471,16.5601419548,16.5605419624,16.56094197,16.5613419776,16.5617419852,16.5621419929,16.5625420005,16.5629420081,16.5633420157,16.5637420233,16.564142031]
Y = [11579127.8554,11671781.7263,11764419.0191,11857026.0444,11949589.1124,12042094.5338,12134528.6188,12226877.6781,12319128.0219,12411265.9609,12503277.8053,12595149.8657,12686868.4525,12778419.8762,12869790.334,12960965.209,13051929.5278,13142668.3154,13233166.5969,13323409.3973,13413381.7417,13503068.6552,13592455.1627,13681526.2894,13770267.0602,13858662.5004,13946697.6348,14034357.4886,14121627.0868,14208491.4544,14294935.6166,14380944.5984,14466503.4248,14551597.1208,14636210.7116,14720329.3102,14803938.4081,14887023.5981,14969570.4732,15051564.6263,15132991.6503,15213837.1383,15294086.683,15373725.8775,15452740.3147,15531115.5875,15608837.2888,15685891.0116,15762262.3488,15837936.8934,15912900.2382,15987137.9762,16060635.7004,16133379.0036,16205353.4789,16276544.72,16346938.7731,16416522.8674,16485284.4226,16553210.8587,16620289.5956,16686508.0531,16751853.6511,16816313.8096,16879875.9485,16942527.4876,17004255.8468,17065048.446,17124892.7052,17183776.0442,17241685.8829,17298609.6412,17354534.739,17409448.5962,17463338.6327,17516192.2683,17567996.9463,17618741.7702,17668418.588,17717019.5043,17764536.6238,17810962.0514,17856287.8916,17900506.2493,17943609.2292,17985588.936,18026437.4744,18066146.9493,18104709.4653,18142117.1271,18178362.0396,18213436.3074,18247332.0352,18280041.3279,18311556.2901,18341869.0265,18370971.642,18398856.332,18425517.6188,18450952.493,18475158.064,18498131.4412,18519869.7341,18540370.0523,18559629.505,18577645.202,18594414.2525,18609933.7661,18624200.8523,18637212.6205,18648966.1802,18659458.6408,18668687.1119,18676648.7029,18683340.5233,18688759.6825,18692903.29,18695768.4553,18697352.5327,18697655.9558,18696681.2608,18694431.0245,18690907.8241,18686114.2363,18680052.838,18672726.2063,18664136.918,18654287.5501,18643180.6795,18630818.883,18617204.7377,18602340.8204,18586229.7081,18568873.9777,18550276.2061,18530438.9703,18509364.8471,18487056.4135,18463516.2464,18438747.4526,18412756.9228,18385553.1936,18357144.808,18327540.3094,18296748.2409,18264777.1456,18231635.5669,18197332.0479,18161875.1318,18125273.3619,18087535.2812,18048669.4331,18008684.3606,17967588.6071,17925390.7158,17882099.2297,17837722.6922,17792269.6464,17745748.6355,17698168.2027,17649537.512,17599868.3744,17549173.3069,17497464.8262,17444755.4492,17391057.6927,17336384.0736,17280747.1087,17224159.3148,17166633.2088,17108181.3075,17048816.1277,16988550.1864,16927396.0002,16865366.0862,16802472.961,16738729.1416,16674147.1447,16608739.4873,16542518.6861,16475497.2591,16407688.2541,16339106.0951,16269765.4262,16199680.8916,16128867.1358,16057338.8029,15985110.5372,15912196.9829,15838612.7844,15764372.5859,15689491.0316,15613982.7659,15537862.4329,15461144.6771,15383844.1425,15305975.4735,15227553.3143,15148592.3093,15069107.1026,14989112.3386,14908622.6595,14827652.5673,14746216.3337,14664328.209,14582002.4435,14499253.2874,14416094.9911,14332541.8049,14248607.9791,14164307.764,14079655.4098,13994665.1668,13909351.2855,13823728.016,13737809.6086,13651610.3137,13565144.3816,13478426.0625,13391469.6068,13304289.2646,13216899.2865,13129313.8865,13041546.3657,12953609.0623,12865514.2686,12777274.277,12688901.3798,12600407.8693,12511806.0378,12423108.1777,12334326.5812,12245473.5407,12156561.3486,12067602.297,11978608.6785,11889592.7852]

def gaussFunction(x, *p):
    """Define and return a Gaussian function.

    This function returns the value of a Gaussian function, using the
    A, mu and sigma value that is provided as *p.

    Keyword arguments:
    x -- number
    p -- A, mu and sigma numbers
    """
    A, mu, sigma = p
    return A*np.exp(-(x-mu)**2/(2.*sigma**2))

newGaussX = np.linspace(10, 25, 2500*(X[-1]-X[0]))
p0 = [np.max(Y), X[np.argmax(Y)],0.1]

coeff, var_matrix = curve_fit(gaussFunction, X, Y, p0)
newGaussY = gaussFunction(newGaussX, *coeff)

print "Sigma is "+str(coeff[2])

# Original
low = bisect.bisect_left(newGaussX,coeff[1]-2*coeff[2])
high = bisect.bisect_right(newGaussX,coeff[1]+2*coeff[2])
print newGaussX[low], newGaussX[high]

# Absolute
low = bisect.bisect_left(newGaussX,coeff[1]-2*abs(coeff[2]))
high = bisect.bisect_right(newGaussX,coeff[1]+2*abs(coeff[2]))
print newGaussX[low], newGaussX[high]

底线,是取 sigma 的 abs() 'correct' 还是应该以不同的方式解决这个问题?

您正在拟合一个不关心 sigma 是正数还是负数的函数 gaussFunction。所以无论你得到正面还是负面结果主要是运气问题,取返回的 sigma 的绝对值就可以了。还要考虑其他可能性:

  1. (Thomas Kühn 建议):修改模型函数,使其关心 sigma 的符号。使其更接近标准化的高斯形式就足够了:公式 A/np.sqrt(sigma)*np.exp(-(x-mu)**2/(2.*sigma**2)) 将确保您仅获得正西格玛。一个可能的、轻微的缺点是该函数需要更长的时间来计算。

  2. 使用方差 sigma_squared 作为参数:

    A, mu, sigma_squared = p
    return A*np.exp(-(x-mu)**2/(2.*sigma_squared))
    

就保持模型方程的简单性而言,这可能是最简单的。您需要对该参数的初始猜测进行平方,并在需要 sigma 本身时取平方根。


旁白:您将 0.1 硬编码为对标准偏差的猜测。这可能应该基于数据,像这样:

peak = X[Y > np.exp(-0.5)*Y.max()]
guess_sigma = 0.5*(peak.max() - peak.min())

这个想法是,在均值的一个标准偏差内,高斯值大于最大值的 np.exp(-0.5) 倍。所以第一行定位这个 "peak",第二行取其宽度的一半作为 sigma 的猜测值。

要使上述工作正常,X 和 Y 应该已经转换为 NumPy 数组,例如 X = np.array([16.4697402328,16.4701402404,....。总的来说这是一个好主意:否则,您将让每个接收 X 或 Y 的 NumPy 方法再次进行此转换。

您可能会发现 lmfit (http://lmfit.github.io/lmfit-py/) 对此很有用。它包括一个用于曲线拟合的高斯模型,该模型确实对高斯进行归一化,并且还使用比 abs(sigma) 更温和的参数转换将 sigma 限制为正值。你的例子看起来像这样

from lmfit.models import GaussianModel
xdat = np.array(X)
ydat = np.array(Y)

model = GaussianModel()
params = model.guess(ydat, x=xdat)
result = model.fit(ydat, params, x=xdat)

print(result.fit_report())

这将打印一份报告,其中包含所有参数的最佳拟合值和估计的不确定性,并包括 FWHM。

[[Model]]
    Model(gaussian)
[[Fit Statistics]]
    # function evals   = 31
    # data points      = 237
    # variables        = 3
    chi-square         = 95927408861.607
    reduced chi-square = 409946191.716
    Akaike info crit   = 4703.055
    Bayesian info crit = 4713.459
[[Variables]]
    sigma:       0.04880178 +/- 1.57e-05 (0.03%) (init= 0.0314006)
    center:      16.5174203 +/- 8.01e-06 (0.00%) (init= 16.51754)
    amplitude:   2.2859e+06 +/- 586.4103 (0.03%) (init= 670578.1)
    fwhm:        0.11491942 +/- 3.51e-05 (0.03%)  == '2.3548200*sigma'
    height:      1.8687e+07 +/- 910.0152 (0.00%)  == '0.3989423*amplitude/max(1.e-15, sigma)'
[[Correlations]] (unreported correlations are <  0.100)
    C(sigma, amplitude)          =  0.949

center +/- 2*sigma 的值将通过

找到
xlo = result.params['center'].value - 2 * result.params['sigma'].value
xhi = result.params['center'].value + 2 * result.params['sigma'].value

您可以使用 result 来评估具有拟合参数和不同 X 值的模型:

newGaussX = np.linspace(10, 25, 2500*(X[-1]-X[0]))
newGaussY = result.eval(x=newGaussX)

我还建议使用 numpy.where 来查找 center+/-2*sigma 的位置而不是对分:

low  = np.where(newGaussX > xlo)[0][0]     # replace bisect_left
high = np.where(newGaussX <= xhi)[0][-1] + 1 # replace bisect_right

我遇到了同样的问题,我想出了一个简单但有效的解决方案,基本上是使用高斯函数定义中的方差而不是标准差,因为方差总是正的。然后,通过对方差求平方根得到 std_dev,得到正值,即 std_dev 始终为正。所以,问题很容易解决 ;)

我的意思是,以这种方式创建函数:

def gaussian(x, Heigh, Mean, Variance):
   return Heigh * np.exp(- (x-Mean)**2 / (2 * Variance))

而不是:

def gaussian(x, Heigh, Mean, Std_dev):
   return Heigh * np.exp(- (x-Mean)**2 / (2 * Std_dev**2))

然后像往常一样做合身。