列 groupby 上数据帧的直方图

Histogram of dataframe on column groupby

假设我有以下 df:

df
   user      start                end           mode
0   82  2009-04-19 05:49:50 2009-04-19 06:17:40 metro
1   82  2009-04-19 06:18:05 2009-04-19 06:22:44 foot
2   10  2007-06-26 11:32:29 2007-06-26 11:40:29 bus
3   10  2008-03-28 14:52:54 2008-03-28 15:59:59 metro
4   20  2011-08-27 06:13:01 2011-08-27 08:01:37 foot
5   20  2012-02-20 14:10:33 2012-02-20 14:29:59 bus
6   20  2012-02-21 01:22:05 2012-02-21 01:55:47 bus

所以我想按 mode 列对我的 df 进行分组,并绘制 (year, month) 的直方图,显示当年那个月每种模式的数量。

我也是:

df[['mode']].groupby([df['start'].dt.year, df['start'].dt.month]).count().plot(kind='barh')

输出:

结果是每个月mode的累计值。

但我想查看每年每个月每个 mode(如果可用)的数量。

您可以使用 seaborn 并传递 hue 参数。

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'user': [82, 82, 10, 10, 20, 20, 20],
 'start': ['2009-04-19 05:49:50',
  '2009-04-19 06:18:05',
  '2007-06-26 11:32:29',
  '2008-03-28 14:52:54',
  '2011-08-27 06:13:01',
  '2012-02-20 14:10:33',
  '2012-02-21 01:22:05'],
 'end': ['2009-04-19 06:17:40',
  '2009-04-19 06:22:44',
  '2007-06-26 11:40:29',
  '2008-03-28 15:59:59',
  '2011-08-27 08:01:37',
  '2012-02-20 14:29:59',
  '2012-02-21 01:55:47'],
 'mode': ['metro', 'foot', 'bus', 'metro', 'foot', 'bus', 'bus']})


df['start'] = pd.to_datetime(df['start'])
df['end'] = pd.to_datetime(df['end'])


df['year'] = df.start.dt.year
df['month'] = df.start.dt.month
df = df.groupby(['year','month','mode']).size().reset_index().rename(columns={0:'count'})
g = sns.barplot(data=df, y=df['year'].astype(str)+','+df['month'].astype(str),x='count', hue='mode', orient='h')
plt.ylim(reversed(plt.ylim()));