Pandas DataFrame --> GroupBy --> MultiIndex Process

Question

我正在尝试将以下形式的大型 DataFrame 重组为 MultiIndex：

        date  store_nbr  item_nbr  units  snowfall  preciptotal  event
0 2012-01-01          1         1      0       0.0          0.0    0.0
1 2012-01-01          1         2      0       0.0          0.0    0.0
2 2012-01-01          1         3      0       0.0          0.0    0.0
3 2012-01-01          1         4      0       0.0          0.0    0.0
4 2012-01-01          1         5      0       0.0          0.0    0.0

我想按 store_nbr (1-45) 分组，在每个 store_nbr 组中按 item_nbr (1-111)，然后对相应的索引对（例如， store_nbr=12, item_nbr=109), 按时间顺序显示行，这样有序的行看起来像，例如：

store_nbr=12, item_nbr=109:   date=2014-02-06, units=0, snowfall=...
                              date=2014-02-07, units=0, snowfall=...
                              date=2014-02-08, units=0, snowfall=...
...                           ...
store_nbr=12, item_nbr=110:   date=2014-02-06, units=0, snowfall=...
                              date=2014-02-07, units=1, snowfall=...
                              date=2014-02-08, units=1, snowfall=...
...

看起来 groupby 和 set_index 的某种组合在这里可能会有用，但我在以下行之后卡住了：

grouped = stores.set_index(['store_nbr', 'item_nbr'])

这会产生以下 MultiIndex：

                         date  units  snowfall  preciptotal  event
store_nbr item_nbr                                                
1         1        2012-01-01      0       0.0          0.0    0.0
          2        2012-01-01      0       0.0          0.0    0.0
          3        2012-01-01      0       0.0          0.0    0.0
          4        2012-01-01      0       0.0          0.0    0.0
          5        2012-01-01      0       0.0          0.0    0.0

这里有人有什么建议吗？有没有简单的方法通过操作 groupby 对象来做到这一点？

Answer 1

您可以使用以下方式对行进行排序：

df.sort_values(by='date')

Pandas DataFrame --> GroupBy --> MultiIndex Process

Pandas DataFrame --> GroupBy --> MultiIndex Process

python

hierarchical-data

multi-index

pandas

pandas-groupby