频率计数差异:Stats.relfreq vs Seaborn

Differences in frequency count: Stats.relfreq vs Seaborn

我正在使用 Seaborn 绘制相对频率直方图。因为我还没有找到一种方法来保存与最高峰相关的值,所以我使用 stats.relfreq 来做到这一点。但是相对频率似乎不匹配。

我在 Jupyter Notebook 中使用 Python。

我的数据:

my_data = [0.9995, 0.9995, -0.0803, -0.7736, 0.9418, 0.3612, 0.5023, 0.9686, 0.5574, 0.8629, 0.5226, 0.9947, 0.9947, 0.9947, 0.9947, 0.9947, 0.9947, 0.9947, 0.9947, 0.9947, 0.9947, 0.9947, 0.9947, -0.8391, -0.4767, 0.3612, 0.4215, 0.8176, 0.5106, -0.0772, 0.0865, -0.6739, -0.5574, -0.6776, 0.4588, -0.2263, 0.8224, 0.3804, 0.3804, -0.0516, -0.3818, 0.0325, 0.6341, 0.0516, -0.5859, -0.5106, -0.0258, 0.128, 0.8126, -0.4201, -0.2449, -0.4215, -0.3506, 0.3612, -0.872, -0.872, 0.7506, -0.5719, 0.7003, -0.235, 0.1747, 0.5994, 0.5423, -0.25, 0.8834, 0.1761, -0.7691, 0.6249, 0.7819, -0.34700000000000003, -0.6486, 0.2955, 0.6486, 0.1734, -0.2732, -0.6486, -0.6049, -0.6049, -0.8622, -0.8622, -0.8622, 0.5423, 0.4404, 0.25, 0.25, 0.5106, 0.4404, 0.4404, 0.5519, 0.5519, 0.5583, -0.1027, -0.2732, -0.1027, 0.5423, 0.4939, -0.2144, 0.25, 0.2247, 0.9079, 0.128, -0.7273, -0.4329, 0.8126, 0.2263, -0.5423, 0.5106, -0.7362, 0.34, -0.6115, -0.5994, -0.6697, 0.9201, 0.1027, 0.5922, 0.5922, 0.3822, 0.5667, 0.8316, 0.9679, 0.29600000000000004, 0.3612, 0.5574, 0.3169, 0.3612, -0.9413, -0.9413, 0.5994, 0.6478, 0.4404, 0.29600000000000004]

我的代码:

from scipy import stats
import seaborn as sns

# Calculate relative frequency of values, using 10 bins.
res = stats.relfreq(points, numbins = 10)
relative_frequency = res.frequency
print(relative_frequency)

#find highest value and corresponding index
highest_val = np.max(relative_frequency)
highest_index = np.where(relative_frequency == highest_val)
highest_index = int(highest_index[0])
print(highest_index)

# Ordered list with possible scores associated to each frequency bin
possible_scores = [-0.9, -0.7, -0.5, -0.3, -0.1, 0.1, 0.3, 0.5, 0.7, 0.9]
averaged_relative_frequency_score = possible_scores[highest_index]
print(averaged_relative_frequency_score)

# Plot histogram with Seaborn
ax = sns.histplot(data = date_result['Score'], stat = 'probability', bins = 10, binwidth = 0.2, binrange = [-1, 1])

plt.xlim(-1,1)
plt.show()

下面是我得到的不同输出。

print(relative_frequency)
#relative_frequency [0.0610687  0.06870229 0.09923664 0.07633588 0.04580153 0.08396947
 0.16793893 0.17557252 0.07633588 0.14503817]

print(highest_index)
# highest index = 7

print(averaged_relative_frequency_score)
# averaged_relative_frequency_score = 0.5

以及 Seaborn 情节:

如您所知,如果一切正常,Seaborn 图中的相应索引在使用统计模块计算的频率中将是 9。与 Seaborn 相比,stats.relfreq 中的 bin 大小是否不同?

我是不是误解了什么明显的东西?我似乎不明白为什么我用这两种方法得到不同的峰。

再见!

刚写完这篇我就知道哪里出了问题。

stats.relfreq 中的 bin 默认为 oversized

要获得相同的结果,您必须使用 defaultreallimits 参数指定直方图的限制。

在代码中实现:

res = stats.relfreq(points, numbins = 10, defaultreallimits = [-1, 1])