将字典拟合成正态分布曲线

Fitting dictionary into normal distribution curve

这是字典:

l= {31.2: 1,35.1: 4,39.0: 13,42.9: 33,46.8: 115,50.7: 271,54.6: 363,58.5:381,62.4:379,66.3:370,70.2:256,74.1: 47,78.0: 2}

所以这意味着 31.2 出现了 1 次,35.1 出现了 4 次等等。 我试过了:

fig, ax = plt.subplots(1, 1)

ax.scatter(l.keys(), l.values)
ax.set_xlabel('Key')
ax.set_ylabel('Length of value')

我还通过

找到了均值和标准差
np.mean([k for k in l.keys()])
np.std([k for k in l.keys()])

这是找到该数据的均值和标准差的方法吗?我对此表示怀疑,因为它没有考虑到每个数据的出现次数。我想查看此数据的正态曲线。还有一种方法可以知道一个值出现的频率。例如,如果我将曲线延伸到 x 轴上的 0,并且我想知道 0 的出现涉及多少数据点(也可以是概率)。

这是获取均值和标准差的方法:

l= {31.2: 1,35.1: 4,39.0: 13,42.9: 33,46.8: 115,50.7: 271,54.6: 363,58.5:381,62.4:379,66.3:370,70.2:256,74.1: 47,78.0: 2}
ll=[[i]*j for i,j in zip(l.keys(),l.values())]
flat_list = [item for sublist in ll for item in sublist]
np.mean(flat_list), np.std(flat_list)

打印 (59.559194630872476, 7.528353520785996).

您可以使用 np.histogram(flat_list) 制作直方图来评估每次出现的频率。

下面是绘制正态高斯曲线来拟合数据的方法:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

l = {31.2: 1, 35.1: 4, 39.0: 13, 42.9: 33, 46.8: 115, 50.7: 271, 54.6: 363, 58.5: 381, 62.4: 379, 66.3: 370, 70.2: 256, 74.1: 47, 78.0: 2}
# convert the dictionary to a list
l_list = [k for k, v in l.items() for _ in range(v)]

fig, ax = plt.subplots(1, 1)

ax.scatter(l.keys(), l.values())
ax.set_xlabel('Key')
ax.set_ylabel('Length of value')

mu = np.mean(l_list)
sigma = np.std(l_list)

u = np.linspace(mu - 4 * sigma, mu + 4 * sigma, 100)
ax2 = ax.twinx()
ax2.plot(u, stats.norm.pdf(u, mu, sigma), color='crimson')
ax2.set_ylabel('normal curve')

plt.show()