如何根据频率 table 绘制直方图和分布？

Question

我有频率table

frequency table

我一直在尝试将这些数据绘制成这样的东西，

histogram with distribution curve

所以尝试了这个，

to_plot = compare_df[['counts', 'theoritical counts']]
bins=[0,2500,5000,7500,10000,12500,15000,17500,20000]
sns.displot(to_plot,bins=bins)

但是，结果是这样的， plot

知道我做错了什么吗？请帮忙。

Answer 1

两件事：

当您向 sns.displot 提供 DataFrame 时，您还需要指定将哪个列用于分发作为 x kwarg。
这导致了第二个问题：我不知道有什么方法可以使用 sns.displot 获得多个分布，但您可以大致以这种方式使用 sns.histplot：

import matplotlib.pyplot as plt
import seaborn as sns 

titanic = sns.load_dataset('titanic')

ax = sns.histplot(data=titanic,x='age',bins=30,color='r',alpha=.25,
                  label='age')
sns.histplot(data=titanic,x='fare',ax=ax,bins=30,color='b',alpha=.25,
             label='fare')         
ax.legend()
plt.show()

结果如下，请注意，我只是使用了一个示例数据集来尽快为您提供粗略图像：

Answer 2

首先，请注意，仅根据频率创建 kde 图时会丢失重要信息。

sns.histplot() 有一个参数 weights= 可以处理频率。我没有看到使用长数据框和 hue 来实现此功能的方法，但您可以为每一列单独调用 histplot。这是一个从生成的数据开始的示例：

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

sns.set()
bins = np.array([0, 2500, 5000, 7500, 10000, 12500, 15000, 17500, 20000])
df = pd.DataFrame({'counts': np.random.randint(2, 30, 8),
                   'theoretical counts': np.random.randint(2, 30, 8)},
                  index=pd.interval_range(0, 20000, freq=2500))
df['theoretical counts'] = (3 * df['counts'] + df['theoretical counts']) // 4
fig, ax = plt.subplots()
for column, color in zip(['counts', 'theoretical counts'], ['cornflowerblue', 'crimson']):
    sns.histplot(x=(bins[:-1] + bins[1:]) / 2, weights=df[column], bins=8, binrange=(0, 20000),
                 kde=True, kde_kws={'cut': .3},
                 color=color, alpha=0.5, label=column, ax=ax)
ax.legend()
ax.set_xticks(range(0, 20001, 2500))
plt.show()

由于 bin 宽度变化很大，没有足够的信息来确定合适的 kde 曲线。此外，条形图似乎比直方图更合适。这是一个例子：

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

sns.set()
bins = [0, 250, 500, 1000, 1500, 2500, 5000, 10000, 50000, np.inf]
bin_labels = [f'{b0}-{b1}' for b0, b1, in zip(bins[:-1], bins[1:])]
df = pd.DataFrame({'counts': np.random.randint(2, 30, 9),
                   'theoretical counts': np.random.randint(2, 30, 9)})
df['theoretical counts'] = (3 * df['counts'] + df['theoretical counts']) // 4
fig, ax = plt.subplots(figsize=(10, 4))
sns.barplot(data=df.melt(), x=np.tile(bin_labels, 2), y='value',
            hue='variable', palette=['cornflowerblue', 'crimson'], ax=ax)
plt.tight_layout()
plt.show()

sns.barplot() 有一些选项，例如 dodge=False, alpha=0.5 在同一位置绘制条形图。

如何根据频率 table 绘制直方图和分布？

How to plot histogram and distribution from frequency table?

python

plot

distribution

histogram

seaborn