我可以拟合质心和峰高随机分布的多个高斯分布吗？

Question

这里是新手，但我在发帖前已尝试尽职调查。对任何无意的失礼表示歉意。

我正在以电压与时间序列的形式从示波器采集数据。时间段的宽度为 0.8 纳秒。我运行多个 'data capture' 周期。单次捕获将具有固定数量的样本，以及 5 到 15 个高斯峰，具体的峰数未知。高斯峰具有相对受限的 FWHM（在 2 到 3 纳秒之间）、变化的峰高和随机到达时间（即质心位置不是周期性的）。

我一直在使用 Python 对这些数据进行高斯拟合，并且使用 scipy.optimise 库和 astropy 库取得了一些成功。下面包含使用 scipy.optimise 的代码。我可以拟合多个高斯，但我的代码中的一个关键步骤是提供对峰数的“猜测”，并为每个峰提供质心位置、峰高和峰宽的估计。有没有一种方法可以概括此代码而不必提供 'guess'？如果我放宽 'guess' 中的条件，拟合质量就会下降。我知道峰将是宽度受限的高斯分布，但想概括代码以适应任何给定数据捕获中的峰质心和峰高。

import ctypes
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

#Get data from file
with open('test3.txt') as f:
    w, h = [float(x) for x in next(f).split()]
    print(w, h)
    array = [[float(x) for x in line.split()] for line in f]

#Separate    
x, z = zip(*array)
#Change sign since fitting routine seems to
#prefer positive numbers
y=[ -p for p in z]

def func(x, *params):
    y = np.zeros_like(x)
    for i in range(0, len(params), 3):
        ctr = params[i]
        amp = params[i+1]
        wid = params[i+2]
        y = y + amp * np.exp( -((x - ctr)/wid)**2)
    return y

#Guess the peak positions, heights, and widths
guess = [16, 5, 2, 75, 5, 2, 105, 5, 2, 139, 5, 2, 225, 5, 2, 315, 5, 2, 330, 5, 2]

#Fit and print parameters to screen
popt, pcov = curve_fit(func, x, y, p0=guess)
print(popt)
fit = func(x, *popt)

#Plot stuff
plt.plot(x, y)
plt.plot(x, fit , 'r-')
plt.show()

结果如下所示： Plot of Data and Fits

数据文件在这里：https://spaces.hightail.com/receive/5MY7Vc7r9R

这与How can I fit multiple Gaussian curved to mass spectrometry data in Python? and fit multiple gaussians to the data in python类似，但这两个依赖于拟合周期性数据集或具有已知峰值位置、宽度和高度的数据集。他们让我走到这一步很有用，但我现在被困住了。我可以跟进任何想法或建议吗？

谢谢，阿迪

Answer 1

我的想法是，我们将曲线的值与其平均值进行比较。
multiplier 变量表示该值必须大于平均值多少倍才能让我们理解这是一个峰值。超过该值的峰的第一个点被认为是近似该峰平均值的起点。
我还用 x 和 y 的数组替换了列表。

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

#Get data from file
with open('test3.txt') as f:
    w, h = [float(x) for x in next(f).split()]
    array = [[float(x) for x in line.split()] for line in f]

#Separate    
x, z = zip(*array)
x = np.array(x)
y = np.array([ -p for p in z])
#Change sign since fitting routine seems to
#prefer positive numbers


def func(x, *params):
    y = np.zeros_like(x)
    for i in range(0, len(params), 3):
        ctr = params[i]
        amp = params[i+1]
        wid = params[i+2]
        y = y + amp * np.exp( -((x - ctr)/wid)**2)
    return y

#Guess the peak positions, heights, and widths
# guess = [16, 5, 2, 75, 5, 2, 105, 5, 2, 139, 5, 2, 225, 5, 2, 315, 5, 2, 330, 5, 2]

def getPeaks(x, y, multiplier):
    x_peaks = []
    isPeak = False
    for i, j in zip(x, y):
        if j > y.mean() * multiplier:
            if not isPeak:
                isPeak = True
                x_peaks.append(i)
        else:
            isPeak = False
    return x_peaks

multiplier = 3
x_peaks = getPeaks(x, y, multiplier)

guess = []
for i in range(len(x_peaks)):
    guess.append(x_peaks[i])
    guess.append(5)
    guess.append(2)
    

#Fit and print parameters to screen
popt, pcov = curve_fit(func, x, y, p0=guess)
print(popt)
fit = func(x, *popt)

#Plot stuff
plt.plot(x, y)
plt.plot(x, fit , 'r--')
# plt.plot(popt[::3], np.ones_like(popt[::3]) * multiplier, 'ko')
plt.show()

Answer 2

正如评论中提到的，每个估计的迭代算法都需要从一些超参数开始。在您描述的问题中，您具有初始高斯参数和高斯数量。在估计高斯分布时，EM 算法被证明是收敛的。我建议将它与随机初始高斯参数和网格搜索一起使用，以寻找分布数量的最佳解决方案。从 5 到 15 开始计算每个解决方案的距离并采用最小距离解决方案。 (https://en.m.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm)

我可以拟合质心和峰高随机分布的多个高斯分布吗？

Can I fit a spectrum of multiple gaussians whos centroids and peak heights are randomly distributed?

python

curve-fitting