将字典拟合成正态分布曲线
Fitting dictionary into normal distribution curve
这是字典:
l= {31.2: 1,35.1: 4,39.0: 13,42.9: 33,46.8: 115,50.7: 271,54.6: 363,58.5:381,62.4:379,66.3:370,70.2:256,74.1: 47,78.0: 2}
所以这意味着 31.2 出现了 1 次,35.1 出现了 4 次等等。
我试过了:
fig, ax = plt.subplots(1, 1)
ax.scatter(l.keys(), l.values)
ax.set_xlabel('Key')
ax.set_ylabel('Length of value')
我还通过
找到了均值和标准差
np.mean([k for k in l.keys()])
np.std([k for k in l.keys()])
这是找到该数据的均值和标准差的方法吗?我对此表示怀疑,因为它没有考虑到每个数据的出现次数。我想查看此数据的正态曲线。还有一种方法可以知道一个值出现的频率。例如,如果我将曲线延伸到 x 轴上的 0,并且我想知道 0 的出现涉及多少数据点(也可以是概率)。
这是获取均值和标准差的方法:
l= {31.2: 1,35.1: 4,39.0: 13,42.9: 33,46.8: 115,50.7: 271,54.6: 363,58.5:381,62.4:379,66.3:370,70.2:256,74.1: 47,78.0: 2}
ll=[[i]*j for i,j in zip(l.keys(),l.values())]
flat_list = [item for sublist in ll for item in sublist]
np.mean(flat_list), np.std(flat_list)
打印 (59.559194630872476, 7.528353520785996)
.
您可以使用 np.histogram(flat_list)
制作直方图来评估每次出现的频率。
下面是绘制正态高斯曲线来拟合数据的方法:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
l = {31.2: 1, 35.1: 4, 39.0: 13, 42.9: 33, 46.8: 115, 50.7: 271, 54.6: 363, 58.5: 381, 62.4: 379, 66.3: 370, 70.2: 256, 74.1: 47, 78.0: 2}
# convert the dictionary to a list
l_list = [k for k, v in l.items() for _ in range(v)]
fig, ax = plt.subplots(1, 1)
ax.scatter(l.keys(), l.values())
ax.set_xlabel('Key')
ax.set_ylabel('Length of value')
mu = np.mean(l_list)
sigma = np.std(l_list)
u = np.linspace(mu - 4 * sigma, mu + 4 * sigma, 100)
ax2 = ax.twinx()
ax2.plot(u, stats.norm.pdf(u, mu, sigma), color='crimson')
ax2.set_ylabel('normal curve')
plt.show()
这是字典:
l= {31.2: 1,35.1: 4,39.0: 13,42.9: 33,46.8: 115,50.7: 271,54.6: 363,58.5:381,62.4:379,66.3:370,70.2:256,74.1: 47,78.0: 2}
所以这意味着 31.2 出现了 1 次,35.1 出现了 4 次等等。 我试过了:
fig, ax = plt.subplots(1, 1)
ax.scatter(l.keys(), l.values)
ax.set_xlabel('Key')
ax.set_ylabel('Length of value')
我还通过
找到了均值和标准差np.mean([k for k in l.keys()])
np.std([k for k in l.keys()])
这是找到该数据的均值和标准差的方法吗?我对此表示怀疑,因为它没有考虑到每个数据的出现次数。我想查看此数据的正态曲线。还有一种方法可以知道一个值出现的频率。例如,如果我将曲线延伸到 x 轴上的 0,并且我想知道 0 的出现涉及多少数据点(也可以是概率)。
这是获取均值和标准差的方法:
l= {31.2: 1,35.1: 4,39.0: 13,42.9: 33,46.8: 115,50.7: 271,54.6: 363,58.5:381,62.4:379,66.3:370,70.2:256,74.1: 47,78.0: 2}
ll=[[i]*j for i,j in zip(l.keys(),l.values())]
flat_list = [item for sublist in ll for item in sublist]
np.mean(flat_list), np.std(flat_list)
打印 (59.559194630872476, 7.528353520785996)
.
您可以使用 np.histogram(flat_list)
制作直方图来评估每次出现的频率。
下面是绘制正态高斯曲线来拟合数据的方法:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
l = {31.2: 1, 35.1: 4, 39.0: 13, 42.9: 33, 46.8: 115, 50.7: 271, 54.6: 363, 58.5: 381, 62.4: 379, 66.3: 370, 70.2: 256, 74.1: 47, 78.0: 2}
# convert the dictionary to a list
l_list = [k for k, v in l.items() for _ in range(v)]
fig, ax = plt.subplots(1, 1)
ax.scatter(l.keys(), l.values())
ax.set_xlabel('Key')
ax.set_ylabel('Length of value')
mu = np.mean(l_list)
sigma = np.std(l_list)
u = np.linspace(mu - 4 * sigma, mu + 4 * sigma, 100)
ax2 = ax.twinx()
ax2.plot(u, stats.norm.pdf(u, mu, sigma), color='crimson')
ax2.set_ylabel('normal curve')
plt.show()