按 Pandas 中的日期重新采样 — 弄乱索引中的日期

Question

我在 Pandas 中有一个多索引数据帧，其中的数据按建筑物索引，然后按日期索引。不同的列代表不同种类的能源，值代表给定月份使用了多少能源。 Image of the dataframe's head is here. 我想把它变成年度数据。我目前有行

df.unstack(level=0).resample('BAS-JUL').sum()

这几乎完美。问题是：所有日期都被指定为该月的第一天，但出于某种原因，就像 resample 一样，它选择 7 月 2 日作为 2012 年的截止日期。所以 7 月 1 日的数字， 2012年最终被计入2011年的数据。 It ends up looking like this. 您可以看到“使用月份”列中的第二个值是 7 月 2 日。除此之外，resample 似乎运行良好。

如果我运行df.index.get_level_values(1)[:20]，输出是：

DatetimeIndex(['2011-07-01', '2011-08-01', '2011-09-01', '2011-10-01',
           '2011-11-01', '2011-12-01', '2012-01-01', '2012-02-01',
           '2012-03-01', '2012-04-01', '2012-05-01', '2012-06-01',
           '2012-07-01', '2012-08-01', '2012-09-01', '2012-10-01',
           '2012-11-01', '2012-12-01', '2013-01-01', '2013-02-01'],
          dtype='datetime64[ns]', name='Usage Month', freq=None)

所以原始数据框中的索引是 2012 年 7 月 1 日。

任何关于如何修复这个小错误的想法都将不胜感激！

Answer 1

使用'AS-JUL':

df.unstack(level=0).resample('AS-JUL').sum()

B 表示业务年度开始。

按 Pandas 中的日期重新采样 — 弄乱索引中的日期

Resample by Date in Pandas — messes up a date in index

python

multi-index

dataframe

python-datetime

pandas