绘制 pandas DataframeGroupby 系列的 HIST

Question

的小型数据框

我正在尝试使用以下代码打印数据以进行小型分析：

data = pd.read_csv('insurance.csv')

   age     sex     bmi  children smoker     region      charges
0   19  female  27.900         0    yes  southwest  16884.92400
1   18    male  33.770         1     no  southeast   1725.55230
2   28    male  33.000         3     no  southeast   4449.46200
3   33    male  22.705         0     no  northwest  21984.47061
4   32    male  28.880         0     no  northwest   3866.85520

data.groupby('sex').region.hist()

代码 returns 一个 pandas 系列，其中第一个元素是 subplot1，第二个 subplot2。

代码将它们绘制在同一个图形上，我无法将它们分开绘制。

Answer 1

要根据性别为每一列生成直方图：
- 'children' 和 'smoker' 看起来不同，因为数字是离散的，分别只有 6 个和 2 个唯一值。
- data.groupby('sex').hist(layout=(1, 4), figsize=(12, 4), ec='k', grid=False) 单独生成图表，但没有简单的方法来添加标题。
生成正确的可视化通常涉及重塑绘图数据 API。
测试于 python 3.8.11、pandas 1.3.2、matplotlib 3.4.2、seaborn 0.11.2

import pandas as pd

# load data
data = pd.read_csv('insurance.csv')

# convert smoker from a string to int value; hist doesn't work on object type columns
data.smoker = data.smoker.map({'no': 0, 'yes': 1})

# group each column by sex; data.groupby(['sex', 'region']) is also an option
for gender, df in data.groupby('sex'):

    # plot a hist for each column
    axes = df.hist(layout=(1, 5), figsize=(15, 4), ec='k', grid=False)

    # extract the figure object from the array of axes
    fig = axes[0][0].get_figure()

    # add the gender as the title
    fig.suptitle(gender)

关于 OP 中的 data.groupby('sex').region.hist()，这是一个计数图，显示每个区域的性别计数；它不是直方图。
pandas.crosstab 默认计算因子

ax = pd.crosstab(data.region, data.sex).plot(kind='bar', rot=0)
ax.legend(title='gender', bbox_to_anchor=(1, 1.02), loc='upper left')

使用`seaborn.displot`

这需要将数据从宽格式转换为长格式，这是通过 pandas.DataFrame.melt

import pandas as pd
import seaborn as sns

data = pd.read_csv('insurance.csv')
data.smoker = data.smoker.map({'no': 0, 'yes': 1})

# convert the dataframe from a wide to long form
df = data.melt(id_vars=['sex', 'region'])

# plot
p = sns.displot(data=df, kind='hist', x='value', col='variable', row='region', hue='sex',
                multiple='dodge', common_bins=False, facet_kws={'sharey': False, 'sharex': False})

绘制 pandas DataframeGroupby 系列的 HIST

Plot HIST of a pandas DataframeGroupbySeries

python

matplotlib

histogram

pandas

seaborn

使用`seaborn.displot`

绘制 pandas DataframeGroupby 系列的 HIST

Plot HIST of a pandas DataframeGroupbySeries

python

matplotlib

histogram

pandas

seaborn

使用seaborn.displot

使用`seaborn.displot`