在 Python 数据框中使用分组的堆积条
Stacked bar using group by in Python dataframe
我正在尝试创建一个复制图像的堆叠条形图,我已经从 csv 中读取了我的数据并尝试进行分组并显示堆叠条,但没有获得所需的输出。
我按照这样的数据分组:
modified_df1 = modified_df.groupby(["business_postal_code","risk_category"]).size().reset_index(name='counts')
modified_df1 = modified_df.loc[modified_df['counts'] > 1100]
经过分组和筛选后,数据如下所示:
business_postal_code risk_category counts
20 94102.0 Low Risk 1334
22 94102.0 UnKnown 1106
24 94103.0 Low Risk 1472
25 94103.0 Moderate Risk 1474
26 94103.0 UnKnown 1329
44 94109.0 Low Risk 1415
48 94110.0 Low Risk 2189
49 94110.0 Moderate Risk 1731
50 94110.0 UnKnown 1331
117 94133.0 Low Risk 1412
然后堆栈栏:
df2 = modified_df1.groupby(['business_postal_code','risk_category'])['business_postal_code'].count().unstack('risk_category')
df2[['Moderate Risk','Low Risk']].plot(kind='bar', stacked=True)
请建议如何实现所需的输出。问题是,我必须按 2 列对数据进行分组,然后必须应用过滤器(如果计数 > 1100)并打印堆栈栏。
IIUC,你可以试试:
df.pivot(*df).plot(kind = 'bar', stacked = True)
或:
df.pivot_table(index = 'business_postal_code', columns = 'risk_category' , values = 'counts').plot(kind = 'bar', stacked = True)
输出:
完整示例:
df = pd.DataFrame({'business_postal_code': {20: 94102.0,
22: 94102.0,
24: 94103.0,
25: 94103.0,
26: 94103.0,
44: 94109.0,
48: 94110.0,
49: 94110.0,
50: 94110.0,
117: 94133.0},
'risk_category': {20: 'Low Risk',
22: 'UnKnown',
24: 'Low Risk',
25: 'Moderate Risk',
26: 'UnKnown',
44: 'Low Risk',
48: 'Low Risk',
49: 'Moderate Risk',
50: 'UnKnown',
117: 'Low Risk'},
'counts': {20: 1334,
22: 1106,
24: 1472,
25: 1474,
26: 1329,
44: 1415,
48: 2189,
49: 1731,
50: 1331,
117: 1412}})
df.pivot(*df).plot(kind = 'bar', stacked = True)
使用 sum() 而不是 count() 和 group by 也会得到预期的输出。
df2 = modified_df1.groupby(['business_postal_code','risk_category'])['counts'].sum().unstack('risk_category')
df2[['Moderate Risk','Low Risk','High Risk','SAFE']].plot(kind='bar', stacked=True, figsize= (12,8))
但是,Nk03 建议的方法也有效,而且方法更简洁。
我正在尝试创建一个复制图像的堆叠条形图,我已经从 csv 中读取了我的数据并尝试进行分组并显示堆叠条,但没有获得所需的输出。
我按照这样的数据分组:
modified_df1 = modified_df.groupby(["business_postal_code","risk_category"]).size().reset_index(name='counts')
modified_df1 = modified_df.loc[modified_df['counts'] > 1100]
经过分组和筛选后,数据如下所示:
business_postal_code risk_category counts
20 94102.0 Low Risk 1334
22 94102.0 UnKnown 1106
24 94103.0 Low Risk 1472
25 94103.0 Moderate Risk 1474
26 94103.0 UnKnown 1329
44 94109.0 Low Risk 1415
48 94110.0 Low Risk 2189
49 94110.0 Moderate Risk 1731
50 94110.0 UnKnown 1331
117 94133.0 Low Risk 1412
然后堆栈栏:
df2 = modified_df1.groupby(['business_postal_code','risk_category'])['business_postal_code'].count().unstack('risk_category')
df2[['Moderate Risk','Low Risk']].plot(kind='bar', stacked=True)
请建议如何实现所需的输出。问题是,我必须按 2 列对数据进行分组,然后必须应用过滤器(如果计数 > 1100)并打印堆栈栏。
IIUC,你可以试试:
df.pivot(*df).plot(kind = 'bar', stacked = True)
或:
df.pivot_table(index = 'business_postal_code', columns = 'risk_category' , values = 'counts').plot(kind = 'bar', stacked = True)
输出:
完整示例:
df = pd.DataFrame({'business_postal_code': {20: 94102.0,
22: 94102.0,
24: 94103.0,
25: 94103.0,
26: 94103.0,
44: 94109.0,
48: 94110.0,
49: 94110.0,
50: 94110.0,
117: 94133.0},
'risk_category': {20: 'Low Risk',
22: 'UnKnown',
24: 'Low Risk',
25: 'Moderate Risk',
26: 'UnKnown',
44: 'Low Risk',
48: 'Low Risk',
49: 'Moderate Risk',
50: 'UnKnown',
117: 'Low Risk'},
'counts': {20: 1334,
22: 1106,
24: 1472,
25: 1474,
26: 1329,
44: 1415,
48: 2189,
49: 1731,
50: 1331,
117: 1412}})
df.pivot(*df).plot(kind = 'bar', stacked = True)
使用 sum() 而不是 count() 和 group by 也会得到预期的输出。
df2 = modified_df1.groupby(['business_postal_code','risk_category'])['counts'].sum().unstack('risk_category')
df2[['Moderate Risk','Low Risk','High Risk','SAFE']].plot(kind='bar', stacked=True, figsize= (12,8))
但是,Nk03 建议的方法也有效,而且方法更简洁。