pandas 中 2 个日期之间的小时值总和
Sum hourly values between 2 dates in pandas
我有一个这样的 df:
DATE PP
0 2011-12-20 07:00:00 0.0
1 2011-12-20 08:00:00 0.0
2 2011-12-20 09:00:00 2.0
3 2011-12-20 10:00:00 0.0
4 2011-12-20 11:00:00 0.0
5 2011-12-20 12:00:00 0.0
6 2011-12-20 13:00:00 0.0
7 2011-12-20 14:00:00 5.0
8 2011-12-20 15:00:00 0.0
9 2011-12-20 16:00:00 0.0
10 2011-12-20 17:00:00 2.0
11 2011-12-20 18:00:00 0.0
12 2011-12-20 19:00:00 0.0
13 2011-12-20 20:00:00 1.0
14 2011-12-20 21:00:00 0.0
15 2011-12-20 22:00:00 0.0
16 2011-12-20 23:00:00 0.0
17 2011-12-21 00:00:00 0.0
18 2011-12-21 01:00:00 3.0
19 2011-12-21 02:00:00 0.0
20 2011-12-21 03:00:00 0.0
21 2011-12-21 04:00:00 0.0
22 2011-12-21 05:00:00 0.0
23 2011-12-21 06:00:00 5.0
24 2011-12-21 07:00:00 0.0
... .... ... ...
75609 2020-08-05 16:00:00 0.0
75610 2020-08-05 19:00:00 0.0
[75614 rows x 2 columns]
我想要 PP
列在不同日期的 2 个特定小时日期之间的累积值。我想要从一天到第二天的 07:00:00 每个 07:00:00 的总和。例如我想要 PP 从 2011-12-20 07:00:00 到 2011-12-21 07:00:00:
的累计值
预期结果:
DATE CUMULATIVE VALUES PP
0 2011-12-20 18
1 2011-12-21 5
2 2011-12-22 10
etc... etc... ...
我试过这个:
df['DAY'] = df['DATE'].dt.strftime('%d')
cumulatives=pd.DataFrame(df.groupby(['DAY'])['PP'].sum())
但这只是对一整天的总和,而不是 7:00:00 到 7:00:00 天之间的总和。
数据:
{'DATE': ['2011-12-20 07:00:00', '2011-12-20 08:00:00', '2011-12-20 09:00:00',
'2011-12-20 10:00:00', '2011-12-20 11:00:00', '2011-12-20 12:00:00',
'2011-12-20 13:00:00', '2011-12-20 14:00:00', '2011-12-20 15:00:00',
'2011-12-20 16:00:00', '2011-12-20 17:00:00', '2011-12-20 18:00:00',
'2011-12-20 19:00:00', '2011-12-20 20:00:00', '2011-12-20 21:00:00',
'2011-12-20 22:00:00', '2011-12-20 23:00:00', '2011-12-21 00:00:00',
'2011-12-21 01:00:00', '2011-12-21 02:00:00', '2011-12-21 03:00:00',
'2011-12-21 04:00:00', '2011-12-21 05:00:00', '2011-12-21 06:00:00',
'2011-12-21 07:00:00', '2020-08-05 16:00:00', '2020-08-05 19:00:00'],
'PP': [0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 2.0, 0.0, 0.0, 1.0,
0.0, 0.0, 0.0, 0.0, 3.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0]}
一种方法是从日期中减去 7 小时,这样每一天都从前一天的 17:00 开始;然后 groupby.sum
获取所需的输出:
df['DATE'] = pd.to_datetime(df['DATE'])
out = df.groupby(df['DATE'].sub(pd.to_timedelta('7h')).dt.date)['PP'].sum().reset_index(name='SUM')
输出:
DATE SUM
0 2011-12-20 18.0
1 2011-12-21 0.0
2 2020-08-05 0.0
我有一个这样的 df:
DATE PP
0 2011-12-20 07:00:00 0.0
1 2011-12-20 08:00:00 0.0
2 2011-12-20 09:00:00 2.0
3 2011-12-20 10:00:00 0.0
4 2011-12-20 11:00:00 0.0
5 2011-12-20 12:00:00 0.0
6 2011-12-20 13:00:00 0.0
7 2011-12-20 14:00:00 5.0
8 2011-12-20 15:00:00 0.0
9 2011-12-20 16:00:00 0.0
10 2011-12-20 17:00:00 2.0
11 2011-12-20 18:00:00 0.0
12 2011-12-20 19:00:00 0.0
13 2011-12-20 20:00:00 1.0
14 2011-12-20 21:00:00 0.0
15 2011-12-20 22:00:00 0.0
16 2011-12-20 23:00:00 0.0
17 2011-12-21 00:00:00 0.0
18 2011-12-21 01:00:00 3.0
19 2011-12-21 02:00:00 0.0
20 2011-12-21 03:00:00 0.0
21 2011-12-21 04:00:00 0.0
22 2011-12-21 05:00:00 0.0
23 2011-12-21 06:00:00 5.0
24 2011-12-21 07:00:00 0.0
... .... ... ...
75609 2020-08-05 16:00:00 0.0
75610 2020-08-05 19:00:00 0.0
[75614 rows x 2 columns]
我想要 PP
列在不同日期的 2 个特定小时日期之间的累积值。我想要从一天到第二天的 07:00:00 每个 07:00:00 的总和。例如我想要 PP 从 2011-12-20 07:00:00 到 2011-12-21 07:00:00:
预期结果:
DATE CUMULATIVE VALUES PP
0 2011-12-20 18
1 2011-12-21 5
2 2011-12-22 10
etc... etc... ...
我试过这个:
df['DAY'] = df['DATE'].dt.strftime('%d')
cumulatives=pd.DataFrame(df.groupby(['DAY'])['PP'].sum())
但这只是对一整天的总和,而不是 7:00:00 到 7:00:00 天之间的总和。
数据:
{'DATE': ['2011-12-20 07:00:00', '2011-12-20 08:00:00', '2011-12-20 09:00:00',
'2011-12-20 10:00:00', '2011-12-20 11:00:00', '2011-12-20 12:00:00',
'2011-12-20 13:00:00', '2011-12-20 14:00:00', '2011-12-20 15:00:00',
'2011-12-20 16:00:00', '2011-12-20 17:00:00', '2011-12-20 18:00:00',
'2011-12-20 19:00:00', '2011-12-20 20:00:00', '2011-12-20 21:00:00',
'2011-12-20 22:00:00', '2011-12-20 23:00:00', '2011-12-21 00:00:00',
'2011-12-21 01:00:00', '2011-12-21 02:00:00', '2011-12-21 03:00:00',
'2011-12-21 04:00:00', '2011-12-21 05:00:00', '2011-12-21 06:00:00',
'2011-12-21 07:00:00', '2020-08-05 16:00:00', '2020-08-05 19:00:00'],
'PP': [0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 2.0, 0.0, 0.0, 1.0,
0.0, 0.0, 0.0, 0.0, 3.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0]}
一种方法是从日期中减去 7 小时,这样每一天都从前一天的 17:00 开始;然后 groupby.sum
获取所需的输出:
df['DATE'] = pd.to_datetime(df['DATE'])
out = df.groupby(df['DATE'].sub(pd.to_timedelta('7h')).dt.date)['PP'].sum().reset_index(name='SUM')
输出:
DATE SUM
0 2011-12-20 18.0
1 2011-12-21 0.0
2 2020-08-05 0.0