估计曲线与高斯分布的相似度（Python）

Question

我想量化测量值曲线与具有 Python 的高斯分布的相似性。

给出了两个值数组：

H=(0,5,10,15,20,25,30,35,40,50,70) 是以米为单位的高度

C(H)=(0,1,1,2,4,6,7,5,3,1,0)为测量量（如浓度）

有没有办法在Python到

a) 将高斯曲线拟合到 C(H)?

的值

b) 得到某种描述曲线与高斯曲线相似程度的相似系数？

提前致谢

Answer 1

对于第一个问题，您要问的是是否可以使用 Python 来估计描述您的数据的正常人群的参数。有无数的估计量可供选择，但如果您要寻找的是最大似然估计，那么这些只是样本均值和样本标准差，您可以使用 vanilla Python 或类似的东西轻松找到它们NumPy:

In [22]: H = [0,5,10,15,20,25,30,35,40,50,70]

In [23]: C = [0,1,1,2,4,6,7,5,3,1,0]

In [24]: a = np.repeat(H, C)

In [25]: a
Out[25]:
array([ 5, 10, 15, 15, 20, 20, 20, 20, 25, 25, 25, 25, 25, 25, 30, 30, 30,
       30, 30, 30, 30, 35, 35, 35, 35, 35, 40, 40, 40, 50])

In [26]: a.mean(), a.std()
Out[26]: (27.666666666666668, 9.46337971105226)

许多常见分布的参数估计在 SciPy 中可用，也可以在此处使用：

In [27]: scipy.stats.norm.fit(a)
Out[27]: (27.666666666666668, 9.46337971105226)

第二个问题相当模糊，但足够具体，答案在于确定“goodness of fit" of the normal model, or, somewhat more generally, finding an appropriate "normality test" for your data. The Wikipedia articles list statistical tests that apply once you know what you want to check, but without further assumptions, there's no silver bullet. Chances are that a qualitative tool like a Q–Q plot 可能会告诉你想知道的；对于你给定的样本，有点难说，但我假设您的实际数据与您在此处提供的数据不同。

import matplotlib.pyplot as plt
import scipy.stats as st
st.probplot(a, dist=st.norm, plot=plt)
plt.show()

Answer 2

因为您特别要求 Python 代码，这里有一个图形化的 Python 曲线拟合器，它使用您的数据并拟合到高斯峰值方程。 RMSE 和 R 平方值应该是衡量相似性的有用指标，因为它们共同描述了数据的高斯拟合质量。

import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

H=(0,5,10,15,20,25,30,35,40,50,70) 
C=(0,1,1,2,4,6,7,5,3,1,0)

xData = numpy.array(H, dtype=float)
yData = numpy.array(C, dtype=float)


def func(x, a, b, c): # Gaussian peak
    return  a * numpy.exp(-0.5 * numpy.power((x-b) / c, 2.0))


# estimate initial parameters from the data
a_est = max(C)
b_est = (max(H) + min(H)) / 2
c_est = max(C)
initialParameters = numpy.array([a_est, b_est, c_est], dtype=float)

# curve fit the test data
fittedParameters, pcov = curve_fit(func, xData, yData, initialParameters)

modelPredictions = func(xData, *fittedParameters) 

absError = modelPredictions - yData

SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))

print('Parameters:', fittedParameters)
print('RMSE:', RMSE)
print('R-squared:', Rsquared)

print()


##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    # first the raw data as a scatter plot
    axes.plot(xData, yData,  'D')

    # create data for the fitted equation plot
    xModel = numpy.linspace(min(xData), max(xData))
    yModel = func(xModel, *fittedParameters)

    # now the model as a line plot
    axes.plot(xModel, yModel)

    axes.set_xlabel('X Data') # X axis data label
    axes.set_ylabel('Y Data') # Y axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot

graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

估计曲线与高斯分布的相似度（Python）

Estimate the similarity of a curve to a gaussian distribution (in Python)

python

gaussian

curve-fitting