pandas 中 2 个日期之间的小时值总和

Sum hourly values between 2 dates in pandas

我有一个这样的 df:

                    DATE   PP
0     2011-12-20 07:00:00  0.0
1     2011-12-20 08:00:00  0.0
2     2011-12-20 09:00:00  2.0
3     2011-12-20 10:00:00  0.0
4     2011-12-20 11:00:00  0.0
5     2011-12-20 12:00:00  0.0
6     2011-12-20 13:00:00  0.0
7     2011-12-20 14:00:00  5.0
8     2011-12-20 15:00:00  0.0
9     2011-12-20 16:00:00  0.0
10    2011-12-20 17:00:00  2.0
11    2011-12-20 18:00:00  0.0
12    2011-12-20 19:00:00  0.0
13    2011-12-20 20:00:00  1.0
14    2011-12-20 21:00:00  0.0
15    2011-12-20 22:00:00  0.0
16    2011-12-20 23:00:00  0.0
17    2011-12-21 00:00:00  0.0
18    2011-12-21 01:00:00  3.0
19    2011-12-21 02:00:00  0.0
20    2011-12-21 03:00:00  0.0
21    2011-12-21 04:00:00  0.0
22    2011-12-21 05:00:00  0.0
23    2011-12-21 06:00:00  5.0
24    2011-12-21 07:00:00  0.0
...   ....       ...       ...
75609 2020-08-05 16:00:00  0.0
75610 2020-08-05 19:00:00  0.0

[75614 rows x 2 columns]

我想要 PP 列在不同日期的 2 个特定小时日期之间的累积值。我想要从一天到第二天的 07:00:00 每个 07:00:00 的总和。例如我想要 PP 从 2011-12-20 07:00:00 到 2011-12-21 07:00:00:

的累计值

预期结果:

   DATE        CUMULATIVE VALUES PP
0  2011-12-20    18
1  2011-12-21    5
2  2011-12-22    10
etc... etc...    ...

我试过这个:

df['DAY'] = df['DATE'].dt.strftime('%d')
cumulatives=pd.DataFrame(df.groupby(['DAY'])['PP'].sum())

但这只是对一整天的总和,而不是 7:00:00 到 7:00:00 天之间的总和。

数据:

{'DATE': ['2011-12-20 07:00:00', '2011-12-20 08:00:00', '2011-12-20 09:00:00', 
          '2011-12-20 10:00:00', '2011-12-20 11:00:00', '2011-12-20 12:00:00', 
          '2011-12-20 13:00:00', '2011-12-20 14:00:00', '2011-12-20 15:00:00', 
          '2011-12-20 16:00:00', '2011-12-20 17:00:00', '2011-12-20 18:00:00', 
          '2011-12-20 19:00:00', '2011-12-20 20:00:00', '2011-12-20 21:00:00', 
          '2011-12-20 22:00:00', '2011-12-20 23:00:00', '2011-12-21 00:00:00', 
          '2011-12-21 01:00:00', '2011-12-21 02:00:00', '2011-12-21 03:00:00',
          '2011-12-21 04:00:00', '2011-12-21 05:00:00', '2011-12-21 06:00:00', 
          '2011-12-21 07:00:00', '2020-08-05 16:00:00', '2020-08-05 19:00:00'], 
 'PP': [0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 2.0, 0.0, 0.0, 1.0, 
        0.0, 0.0, 0.0, 0.0, 3.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0]}

一种方法是从日期中减去 7 小时,这样每一天都从前一天的 17:00 开始;然后 groupby.sum 获取所需的输出:

df['DATE'] = pd.to_datetime(df['DATE'])
out = df.groupby(df['DATE'].sub(pd.to_timedelta('7h')).dt.date)['PP'].sum().reset_index(name='SUM')

输出:

         DATE   SUM
0  2011-12-20  18.0
1  2011-12-21   0.0
2  2020-08-05   0.0