在 Python 数据框中使用分组的堆积条

Stacked bar using group by in Python dataframe

我正在尝试创建一个复制图像的堆叠条形图,我已经从 csv 中读取了我的数据并尝试进行分组并显示堆叠条,但没有获得所需的输出。

我按照这样的数据分组:

modified_df1 = modified_df.groupby(["business_postal_code","risk_category"]).size().reset_index(name='counts')
modified_df1 = modified_df.loc[modified_df['counts'] > 1100]

经过分组和筛选后,数据如下所示:

    business_postal_code    risk_category   counts
20  94102.0                 Low Risk        1334
22  94102.0                 UnKnown         1106
24  94103.0                 Low Risk        1472
25  94103.0                 Moderate Risk   1474
26  94103.0                 UnKnown         1329
44  94109.0                 Low Risk        1415
48  94110.0                 Low Risk        2189
49  94110.0                 Moderate Risk   1731
50  94110.0                 UnKnown         1331
117 94133.0                 Low Risk        1412

然后堆栈栏:

df2 = modified_df1.groupby(['business_postal_code','risk_category'])['business_postal_code'].count().unstack('risk_category')
df2[['Moderate Risk','Low Risk']].plot(kind='bar', stacked=True)

请建议如何实现所需的输出。问题是,我必须按 2 列对数据进行分组,然后必须应用过滤器(如果计数 > 1100)并打印堆栈栏。

IIUC,你可以试试:

df.pivot(*df).plot(kind = 'bar', stacked = True)

或:

df.pivot_table(index = 'business_postal_code', columns = 'risk_category' , values = 'counts').plot(kind = 'bar', stacked = True)

输出:

完整示例:

df = pd.DataFrame({'business_postal_code': {20: 94102.0,
  22: 94102.0,
  24: 94103.0,
  25: 94103.0,
  26: 94103.0,
  44: 94109.0,
  48: 94110.0,
  49: 94110.0,
  50: 94110.0,
  117: 94133.0},
 'risk_category': {20: 'Low Risk',
  22: 'UnKnown',
  24: 'Low Risk',
  25: 'Moderate Risk',
  26: 'UnKnown',
  44: 'Low Risk',
  48: 'Low Risk',
  49: 'Moderate Risk',
  50: 'UnKnown',
  117: 'Low Risk'},
 'counts': {20: 1334,
  22: 1106,
  24: 1472,
  25: 1474,
  26: 1329,
  44: 1415,
  48: 2189,
  49: 1731,
  50: 1331,
  117: 1412}})
df.pivot(*df).plot(kind = 'bar', stacked = True)

使用 sum() 而不是 count() 和 group by 也会得到预期的输出。

df2 = modified_df1.groupby(['business_postal_code','risk_category'])['counts'].sum().unstack('risk_category')

df2[['Moderate Risk','Low Risk','High Risk','SAFE']].plot(kind='bar', stacked=True, figsize= (12,8))

但是,Nk03 建议的方法也有效,而且方法更简洁。