识别一个数据集中的独立正态分布

Question

我构建的模型产生的输出具有三个正态分布的形状。

import numpy as np
d1 = [np.random.normal(2,.1) for _ in range(100)]
d2 = [np.random.normal(2.5,.1) for _ in range(100)]
d3 = [np.random.normal(3,.1) for _ in range(100)]
sudo_model_output = d1 + d2 + d3
np.random.shuffle(sudo_model_output)

找到与每个正态分布相关的正态分布均值和标准差的 pythonic 方法是什么？我无法对分布开始和结束位置的估计进行硬编码（此处约为 2.25 和 2.75），因为该值会随着模拟的每次迭代而变化。

Answer 1

我改编自：Fitting a histogram with python

from scipy.optimize import leastsq
import numpy as np
import matplotlib.pyplot as p
%matplotlib inline

d1 = [np.random.normal(2,.1) for _ in range(1000)]
d2 = [np.random.normal(2.5,.1) for _ in range(1000)]
d3 = [np.random.normal(3,.1) for _ in range(1000)]
sum1 = d1 + d2 + d3
bins=np.arange(0,4,0.01)
a=np.histogram(sum1,bins=bins)

fitfunc  = lambda p, x: p[0]*exp(-0.5*((x-p[1])/p[2])**2) +\
        p[3]*exp(-0.5*((x-p[4])/p[5])**2) +\
        p[6]*exp(-0.5*((x-p[7])/p[8])**2)

errfunc  = lambda p, x, y: (y - fitfunc(p, x))

xdata,ydata=bins[:-1],a[0]
p.plot(xdata,ydata) 

init  = [40, 2.1, 0.1,40, 2.4, 0.1,40, 3.1, 0.1 ]

out   = leastsq(errfunc, init, args=(xdata, ydata))
c = out[0]
print c

现在拟合看起来很不错，但我对这 9 个变量的振幅、中心和宽度的初始猜测（参见 init）非常接近。如果你知道它们都是相同的高度或宽度，因此可以减少变量的数量，这将有助于拟合。

识别一个数据集中的独立正态分布

recognize separate normal distributions in one data set

python

statistics

probability

scipy